Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: Fix data race in control file reads #2100

Merged
merged 3 commits into from
Oct 4, 2022

Conversation

Ngoguey42
Copy link
Contributor

No description provided.

Comment on lines -72 to -74
let* len = Io.read_size io in
let len = Int63.to_int len in
let* string = Io.read_to_string io ~off:Int63.zero ~len in
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the RW instance updates the file between the read_size and the read_to_string we would have a problem

let len = page_size in
let bytes = Bytes.create len in
let rec aux ~off =
let nread = Util.really_read t.fd off len bytes in
Copy link
Member

@samoht samoht Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have a data race here if the RW instance updates the file between two short reads, don't you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts:

  1. I don't think a loop is needed here since Util.really_read also loops until len is read. But --
  2. Regardless of location of the loop, it does seem to me that a race can happen if you need multiple reads. What is the best (or even a good) way to protect against this? The simplest I can think of is to check last updated timestamp before each read and start over if it changes, but maybe there is some other trick?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is how my patch functions:

  • I've added a read_all_to_string function that just reads a file. It doesn't guarantee that the full content of the file is read atomically. It only guarantees that pages are read atomically (i.e. a page never spans on two iterations of the loop)
  • I've changed the read function in the control file to rely on this new read_all_to_string function. Since control files are always contained within a page, that read is atomic. The assertion ensures so.

Following your reviews I've changed my code in the following manner:

Copy link
Member

@samoht samoht Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since control files are always contained within a page, that read is atomic

If you assume an underlyingext2 filesystem, that's probably mostly true (the kernel can still decide to interrupt file-system reads). If you run this under NFS (and other network-based filesystems), that's unfortunately not true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we will need to add a checksum in the control file (cc @art-w). At least this PR fixes the issue for some file systems.

Copy link
Member

@samoht samoht Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea it's great to improve the current situation - it's also nice to be clear about the limitations.

Are the RO instance re-reading the full files on every "sync"?

The only way to get atomic writes in POSIX is via rename. I don't have enough context to know if this could work in this case but we are doing this in various other places of irmin, for instance in ocaml-git when we update references: https://github.com/mirage/ocaml-git/blob/master/src/git-unix/git_unix.ml#L529-L547

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the RO instance re-reading the full files on every "sync"?

Yes.

The only way to get atomic writes in POSIX is via rename.

I see, thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the comments in the code to reflect that reads might not be atomic - this can be misleading for critical pieces of code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #2107

@codecov-commenter
Copy link

codecov-commenter commented Oct 3, 2022

Codecov Report

Merging #2100 (2b68dc9) into main (4936dec) will increase coverage by 0.05%.
The diff coverage is 71.42%.

@@            Coverage Diff             @@
##             main    #2100      +/-   ##
==========================================
+ Coverage   64.63%   64.69%   +0.05%     
==========================================
  Files         132      132              
  Lines       15697    15726      +29     
==========================================
+ Hits        10146    10174      +28     
- Misses       5551     5552       +1     
Impacted Files Coverage Δ
src/irmin-pack/unix/control_file.ml 86.48% <66.66%> (+1.12%) ⬆️
src/irmin-pack/unix/io.ml 65.71% <72.72%> (+0.82%) ⬆️
src/irmin-fs/irmin_fs.ml 81.21% <0.00%> (-1.22%) ⬇️
src/irmin-pack/unix/file_manager.ml 84.06% <0.00%> (-0.69%) ⬇️
src/irmin-fs/unix/irmin_fs_unix.ml 67.74% <0.00%> (-0.65%) ⬇️
src/irmin-pack/unix/ext.ml 61.69% <0.00%> (-0.25%) ⬇️
src/irmin-pack/unix/async.ml 53.84% <0.00%> (-0.16%) ⬇️
src/irmin-pack/unix/errors.ml 28.88% <0.00%> (ø)
src/irmin-graphql/server.ml 62.03% <0.00%> (+0.35%) ⬆️
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

in
try
aux ~off:Int63.zero;
Ok (Buffer.contents buf)
Copy link
Contributor

@icristescu icristescu Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, just to be very clear, add a comment about really_read and the unix_buffer_size (the function only loops for reading buffers with length greater than the page size, hence atomicity guaranteed for one page).

@icristescu
Copy link
Contributor

Thanks!

@icristescu icristescu merged commit 10d4703 into mirage:main Oct 4, 2022
metanivek added a commit to metanivek/opam-repository that referenced this pull request Oct 6, 2022
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2)

CHANGES:

### Added

- **irmin**
  - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris)

- **irmin-graphql**
  -  Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris)
  -  Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris)

- **irmin-pack**
  - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w)

- **irmin-tezos-utils**
  - Add package `irmin-tezos-utils` containing a graphical tool for manual pack
    files analysis. (mirage/irmin#1939, @clecat)

### Changed

- **irmin-pack**
  - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081,
  @icristescu)
  - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack`
    to `irmin-pack.unix` (mirage/irmin#2084, @metanivek)
  - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w)

### Fixed

- **irmin-pack**
  - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42)
  - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106,
    @icristescu)

### Fixed

- **irmin-pack**
  - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with
    gced stores. (mirage/irmin#1919, @icristescu)
metanivek added a commit to metanivek/opam-repository that referenced this pull request Oct 6, 2022
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2)

CHANGES:

### Added

- **irmin**
  - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris)

- **irmin-graphql**
  -  Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris)
  -  Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris)

- **irmin-pack**
  - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w)

- **irmin-tezos-utils**
  - Add package `irmin-tezos-utils` containing a graphical tool for manual pack
    files analysis. (mirage/irmin#1939, @clecat)

### Changed

- **irmin-pack**
  - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081,
  @icristescu)
  - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack`
    to `irmin-pack.unix` (mirage/irmin#2084, @metanivek)
  - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w)

### Fixed

- **irmin-pack**
  - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42)
  - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106,
    @icristescu)
  - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with
    gced stores. (mirage/irmin#1919, @icristescu)
metanivek added a commit to metanivek/opam-repository that referenced this pull request Oct 6, 2022
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2)

CHANGES:

### Added

- **irmin**
  - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris)

- **irmin-graphql**
  -  Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris)
  -  Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris)

- **irmin-pack**
  - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w)

- **irmin-tezos-utils**
  - Add package `irmin-tezos-utils` containing a graphical tool for manual pack
    files analysis. (mirage/irmin#1939, @clecat)

### Changed

- **irmin-pack**
  - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081,
  @icristescu)
  - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack`
    to `irmin-pack.unix` (mirage/irmin#2084, @metanivek)
  - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w)

### Fixed

- **irmin-pack**
  - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42)
  - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106,
    @icristescu)
  - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with
    gced stores. (mirage/irmin#1919, @icristescu)
icristescu pushed a commit to metanivek/opam-repository that referenced this pull request Oct 7, 2022
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.4.2)

CHANGES:

### Added

- **irmin**
  - Add `test_set_and_get*` functions to retrieve commit associated with an update to the store (mirage/irmin#2075, @patricoferris)

- **irmin-graphql**
  -  Expose `test_set_and_get` function as a new mutation (mirage/irmin#2075, @patricoferris)
  -  Add `contents_hash` function to get a value's hash (mirage/irmin#2099, @patricoferris)

- **irmin-pack**
  - Expose `Gc.cancel` to abort a running GC (mirage/irmin#2101, @art-w)

- **irmin-tezos-utils**
  - Add package `irmin-tezos-utils` containing a graphical tool for manual pack
    files analysis. (mirage/irmin#1939, @clecat)

### Changed

- **irmin-pack**
  - `irmin_pack_mem` no longer exposes disk specifics functions (mirage/irmin#2081,
  @icristescu)
  - Move unix specific details for `Pack_key` and `Pack_value` from `irmin-pack`
    to `irmin-pack.unix` (mirage/irmin#2084, @metanivek)
  - Remove unnecessary files at `open_rw` and after a failed GC (mirage/irmin#2095, @art-w)

### Fixed

- **irmin-pack**
  - Fix data race in RO instances when reading control file (mirage/irmin#2100, @Ngoguey42)
  - Fix bugs in gc related to commits that share the same tree. (mirage/irmin#2106,
    @icristescu)
  - Fix the traverse pack files commands in the `irmin-tezos` CLI to work with
    gced stores. (mirage/irmin#1919, @icristescu)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants