Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve File Modes #107

Merged
merged 10 commits into from
Aug 20, 2023
Merged

Preserve File Modes #107

merged 10 commits into from
Aug 20, 2023

Conversation

peterwaller-arm
Copy link
Contributor

@peterwaller-arm peterwaller-arm commented Jan 28, 2023

Add a notion of FileMetadata (describing file state), as distinct from
ObjectMetadata (which describes objects in the store) which is stored in
the FileHandle and therefore on disk as part of snapshot_deltas.
FileMetadata is intended to be extensible but in the first instance
describes file modes bits. In the future it may also describe mtime and
I can imagine further uses arising.

[Note, I've edited this body to reflect the new approach, much of it
remains true from before and stays the same]

Users of the main branch, please note I forsee further potential version
bumps for additional functionality in the near future. If you're using
elfshaker in anger to make long term packs you may want to hold off
making those packs so you get the extra benefits, though it should work
aside from the missing as-yet-unimplemented features.

Compatibility notes:

  • This elfshaker commit will still be able to read old elfshaker pack
    index files, because the default of an optional is None, which gives
    us exactly the old behaviour of elfshaker prior to this commit.

  • elfshaker after this commit will produce packfiles that cannot be read
    by older commits of elfshaker. However, those old versions will report
    to the user that the version they have is too old and point them to
    where they can obtain a newer version.

The biggest risk here is that someone using the main branch makes
packfiles, and distributes those packfiles to a user who is not able to
build elfshaker, and we have not yet gotten around to releasing a new
version of elfshaker. So we should aim for a release of a new version
reasonably soon so that users don't find themselves in this situation
where they can't get an elfshaker which can read the packfiles they have
been provided.

Using an old elfshaker to produce packfiles and ship them to users of
any version of elfshaker is intended to be supported in principle.

Implementation:

We make use of the fact that a default Option is None, and that
Serde deserializers can cope with missing fields at the end of a struct
with the default attribute. Therefore to make elfshaker able to read old
files while implementing the new field, it's only necessary to add
#[serde(default)].

The version bump ensures that users of an old elfshaker trying to read a
pack written with the newer elfshaker see an informative message rather
than a deserialization failure.

Review notes:

Please see the individual commits which should make it reasonably clear how this works.

TODO

  • Write additional tests. I intend to do this before merging, so
    please consider this draft until then. Feel free to take an early
    look or defer your look until I declare this ready.
    • Check that file modes changing across snapshots has the correct effect; I recall there being a potential gotcha here that changes in metadata without changes in content might be a problem. I'll need to look into that.

src/packidx.rs Outdated Show resolved Hide resolved
@peterwaller-arm peterwaller-arm changed the title Modes Preserve File Modes Jan 28, 2023
@peterwaller-arm
Copy link
Contributor Author

There is a significant problem here I've spotted while considering the test for changing the metadata for a path. With the method, we can't represent:

  • Changes in file modes.
  • Files with the same checksum but differing metadata.

The reason is that ObjectMetadatarefers to the metadata of the object (where it live in the pack and its size), and not the metadata of the file. This is designed to be a 1-1 mapping between a checksum and ObjectMetadata, which is fair so long as it represents information about an object (which is uniquely identified by its hash), but not OK for files, whose identity does not relate to this hash.

It looks like a better place to put the file metadata would be in the FileHandle. We can represent mode changes then in the index snapshot deltas as removing the file with the old metadata and inserting a new one with the new data, just as we do for any other file change. I think we can use the same serde default trick here.

@peterwaller-arm
Copy link
Contributor Author

I've added tests, switched the implementation and ticked the TODOs and edited the body of the PR so it reflects current reality.

The new implementation functions as described above, storing the metadata as part of the FileHandle.

I'm happy with this implementation at this point aside from the need to manually test it further.

@peterwaller-arm peterwaller-arm self-assigned this Jan 28, 2023
@peterwaller-arm
Copy link
Contributor Author

I consider this ready for review and testing but please leave the merging to me, I want to test it a bit more first.

@peterwaller-arm peterwaller-arm marked this pull request as ready for review January 28, 2023 21:16
Copy link
Member

@veselink1 veselink1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like the approach, some comments.

src/repo/fs.rs Show resolved Hide resolved
src/packidx.rs Outdated
pub object: Handle, // ofset into object_pool
pub object: Handle, // offset into object_pool
#[serde(default)] // New in PackIndex version 2, default None.
pub file_metadata: Option<FileMetadata>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might have to update PackIdx::compute_snapshot_checksum to consider file_metadata too. We use compute_snapshot_checksum to dedup snapshots and logically, two snapshots where that files have different mode bits should be considered different.

The same is true for Repository::compute_entry_diff which is used to calculate the changes we need to make when we extract a snapshot. I believe we will currently miss changes to file mode bits if the objects haven't changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great spot. I believe I found and fixed some variants of this problem, I'll dig in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we will currently miss changes to file mode bits if the objects haven't changed.

I think that's covered already, in compute_entry_diff, the e.file_metadata already forms part of the key in the hash map:

 let from_lookup: HashMap<_, _> = from_entries
            .iter()
            .map(|e| ((&e.path, &e.checksum, &e.file_metadata), e))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, including the file mode in the checksum is making the gc test fail. Any bright ideas? I've run out of time for now to dig deeper but I'll take another look some other time.

// Max supported version of the pack index by this version of elfshaker.
// It's intended to remain backwards compatible with previous packs.
// When changing the on-disk format it's necessary to bump this integer.
const MAX_VERSION: u32 = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably consider adding another test pack in test-scripts/artifacts for every version bump.

src/repo/repository.rs Outdated Show resolved Hide resolved
test-scripts/check.sh Show resolved Hide resolved
Base automatically changed from general-maintenance to main February 3, 2023 15:04
peterwaller-arm added a commit that referenced this pull request Aug 8, 2023
@veselink1
Copy link
Member

Change looks good, but the gc test is failing (as you mentioned).

I believe it is because of e4620da as we're now considering file metadata.

The check.sh script tests against the old format verification.pack.idx -- which doesn't have file mode. And the gc test checks that we dedup a loose snapshot against one packed in verification.pack (which doesn't have file mode bits).

To fix it, we could consider no file mode bits in the snapshot to mean 644, which is what elfshaker extract would extract as anyway. This would allow dedup against existing snapshots.

Alternatively, we could change the gc test to create a new .pack against which to dedup.

veselink1 and others added 9 commits August 19, 2023 17:37
As find -d cannot always parse the default output format of date.
This is a nop change to introduce a parameter which enables us to
control the permissions of the created file.

Since it defaults to the old behaviour, no functional changes are
intended (NFC).
Soon introducing file metadata as distinct from object metadata.

Object metadata: That which relates to the object as it appears in the
pack.

File metadata: That which relates to the path, such as file modes.
This adds a FileMetadata as a part of the FileHandle.

This does make me question if FileHandle is now appropriately named, but
logically, the file metadata is a part of the file state which should
be considered changed along with the other properties of the FileHandle,
so this seems like the correct place to put this information for now.

Co-authored-by: Vesko Karaganev <vesko.karaganev@gmail.com>
Prior to this commit, modes did nothing since they were not populated.
After this commit, they are populated in the index, and that information
is used to specify modes on the created files.

Closes #13.

Co-authored-by: Vesko Karaganev <vesko.karaganev@gmail.com>
Co-authored-by: Vesko Karaganev <vesko.karaganev@gmail.com>
Co-authored-by: Vesko Karaganev <vesko.karaganev@gmail.com>
It appears to be optional on an old Ubuntu installation, but in more
recent versions of nc, -p is rejected with -l.
@peterwaller-arm peterwaller-arm merged commit 51edd5c into main Aug 20, 2023
4 checks passed
@peterwaller-arm peterwaller-arm deleted the modes branch August 20, 2023 09:13
peterwaller-arm added a commit that referenced this pull request Aug 20, 2023
veselink1 pushed a commit that referenced this pull request Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants