Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: add V5 version #2184

Merged
merged 2 commits into from
Feb 13, 2023
Merged

irmin-pack: add V5 version #2184

merged 2 commits into from
Feb 13, 2023

Conversation

metanivek
Copy link
Member

This PR makes changes for the upper control file needed for the lower layer and bumps the irmin-pack version to V5.

  • Add volume_num to upper control file.
  • Change version of volume control file to V5.
  • Refactor checksum code to model valid and invalid payloads and allow checking checksums when upgrading.

I ran into one small issue where I wanted to use Version.t option as a type in the control file (for tracking upgrades) but ran into an error in the tests that seems to be coming out of repr (or maybe the version.ml code -- my brief investigation didn't shed light on what was going on). To get around this, I changed the type to int option and am using Version.to_int V4 when setting the value during an upgrade. It's not the most ideal but not the worst thing either. If anybody wants to try to dig further... 👼

Add `volume_num` to upper control file. Change version of volume control
file to V5. Refactor checksum code to model valid and invalid payloads
and allow checking checksums when upgrading.
Copy link
Member

@Firobe Firobe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this seems to make sense!
Just to make sure I have a good understanding of the invariants around volume_num though, is it right to say that (in an ideal world when everything is implemented):

  • if volume_num > 0, then the store definitely has a lower layer
  • if volume_num = 0 after opening it, then the store definitely doesn't have a lower layer during this session (but one may be added at a later open), since even a newly created lower would have volume_num = 1 after migration/creation

?

@metanivek
Copy link
Member Author

@Firobe thanks for the review!

For definitional purposes, a lower layer contains 0 or more volumes, so volume_num = 0 doesn't necessarily mean there is not a lower. You need more information to determine that. Said the other way, having a lower layer does not mean you have any volumes.

I'm currently working on the code to load the lower and it is modeled as an option type on the file manager. If a lower root is configured, it is Some; otherwise, it is None. This will be the primary way of "knowing" if a lower exists or not.

Here is how I am currently thinking about when to create the first volume in a lower (I think this agrees with what you said but open to discussion!):

  • When creating a store that has configured a lower root, we could create the first volume at that time. (this is code I am currently working on)
  • When opening a store in rw that has a configured lower root, this store needs a migration, so we create the first volume and move the suffix into it. (aside: we should discuss the nuances of the signal to migrate to make sure we handle all scenarios)

All other increments of volume_num will be through an explicit call to add_volume (code I am also working on).

So, to loop around to where we began: I think in practice volume_num > 0 does indicate the presence of a lower layer since we create the first volume when the store is created/opened, but it is not the primary signal. The primary way of knowing is the lower_root configuration parameter. Although, this brings up a question of what we do if lower_root is changed since we are not directly correlating the two values (we don't record on disk any config information, currently). The current expectation would be that if a user changes the lower_root value that they also moved the volumes that were in the previous path; otherwise, the store would not open correctly.

@Firobe
Copy link
Member

Firobe commented Feb 10, 2023

Thank you for the explanation! That was more or less what I had in mind: in practice yes, but not the primary signal. In what I'm working on I've used a placeholder optional field in file manager as well to track that, so that'll be easy to merge!

Copy link
Contributor

@art-w art-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good! Regarding the version issue, I'm guessing that either the polymorphic variant don't play too well with repr, or that the encoding changes depending on the number of constructors? (which would explain why status has some reserved T1...T15 constructors) In any case, an integer version makes sense :)

@metanivek metanivek merged commit 96a1f71 into mirage:main Feb 13, 2023
@metanivek metanivek deleted the v5_control_file branch February 13, 2023 20:29
metanivek added a commit to metanivek/opam-repository that referenced this pull request Apr 21, 2023
…min-pack, irmin-pack-tools, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.7.0)

CHANGES:

### Added

- **irmin**
  - Add `Conf.pp` and `Conf.equal` to print and compare configuration values
    (mirage/irmin#2227, @samoht)
  - Add a `clear` optional arguments to all function that adds a new commit:
    `Commit.v`, `set`, `set_tree`, `remove`, `test_and_set`,
    `test_and_set_tree`, `test_set_and_get`, `test_set_and_get_tree`, `merge`,
    `merge_tree` and `with_tree`. This new argument allows to control whether
    the tree caches are cleared up after objects are exported to disk during
    the commit. (mirage/irmin#2225, @samoht)

- **irmin-pack**
  - Add configuration option, `lower_root`, to specify a path for archiving data
    during a GC. (mirage/irmin#2177, @metanivek)
  - Add `is_split_allowed` to check if a store allows split. (mirage/irmin#2175, @metanivek)
  - Add `add_volume` to allow creating new empty volume in lower layer. (mirage/irmin#2188,
    @metanivek)
  - Add a `behaviour` function to the GC to check wether the GC will archive or
    delete data. (mirage/irmin#2190, @Firobe)
  - Add a migration on `open_rw` to move the data to the `lower_root` if
    the configuration was enabled (mirage/irmin#2205, @art-w)

### Changed

- **irmin**
  - Expose type equality for `Schema.Info` to avoid defining the `info` function
    multiple times when using similar stores (mirage/irmin#2189, mirage/irmin#2193, @samoht)
- **irmin-pack**
  - GC now changes its behaviour depending on the presence of a lower layer.
    (mirage/irmin#2190, @Firobe)
  - Split now raises an exception if it is not allowed. It is not allowed on
    stores that do not allow GC. (mirage/irmin#2175, @metanivek)
  - GC now supports stores imported V1/V2 stores, in presence of a lower layer
    only. (mirage/irmin#2190, @art-w, @Firobe)
  - Upgrade on-disk format to version 5. (mirage/irmin#2184, @metanivek)
  - Archive to lower volume does not copy orphaned commits. (mirage/irmin#2215, @art-w)

### Fixed
- **irmin-pack**
  - Unhandled exceptions in GC worker process are now reported as a failure
    (mirage/irmin#2163, @metanivek)
  - Fix the silent mode for the integrity checks. (mirage/irmin#2179, @icristescu)
  - Fix file descriptor leak caused by `mmap`. (mirage/irmin#2232, @art-w)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants