-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve File Modes #107
Preserve File Modes #107
Conversation
bbf0602
to
57aee2f
Compare
There is a significant problem here I've spotted while considering the test for changing the metadata for a path. With the method, we can't represent:
The reason is that It looks like a better place to put the file metadata would be in the |
1dd2fb2
to
7b5882f
Compare
I've added tests, switched the implementation and ticked the TODOs and edited the body of the PR so it reflects current reality. The new implementation functions as described above, storing the metadata as part of the FileHandle. I'm happy with this implementation at this point aside from the need to manually test it further. |
I consider this ready for review and testing but please leave the merging to me, I want to test it a bit more first. |
d8242b8
to
dbcff1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I like the approach, some comments.
src/packidx.rs
Outdated
pub object: Handle, // ofset into object_pool | ||
pub object: Handle, // offset into object_pool | ||
#[serde(default)] // New in PackIndex version 2, default None. | ||
pub file_metadata: Option<FileMetadata>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might have to update PackIdx::compute_snapshot_checksum
to consider file_metadata
too. We use compute_snapshot_checksum to dedup snapshots and logically, two snapshots where that files have different mode bits should be considered different.
The same is true for Repository::compute_entry_diff
which is used to calculate the changes we need to make when we extract a snapshot. I believe we will currently miss changes to file mode bits if the objects haven't changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great spot. I believe I found and fixed some variants of this problem, I'll dig in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we will currently miss changes to file mode bits if the objects haven't changed.
I think that's covered already, in compute_entry_diff
, the e.file_metadata
already forms part of the key in the hash map:
let from_lookup: HashMap<_, _> = from_entries
.iter()
.map(|e| ((&e.path, &e.checksum, &e.file_metadata), e))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, including the file mode in the checksum is making the gc test fail. Any bright ideas? I've run out of time for now to dig deeper but I'll take another look some other time.
// Max supported version of the pack index by this version of elfshaker. | ||
// It's intended to remain backwards compatible with previous packs. | ||
// When changing the on-disk format it's necessary to bump this integer. | ||
const MAX_VERSION: u32 = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably consider adding another test pack in test-scripts/artifacts
for every version bump.
Addresses a review comment in #107.
Change looks good, but the gc test is failing (as you mentioned). I believe it is because of e4620da as we're now considering file metadata. The check.sh script tests against the old format To fix it, we could consider no file mode bits in the snapshot to mean Alternatively, we could change the gc test to create a new .pack against which to dedup. |
As find -d cannot always parse the default output format of date.
This is a nop change to introduce a parameter which enables us to control the permissions of the created file. Since it defaults to the old behaviour, no functional changes are intended (NFC).
Soon introducing file metadata as distinct from object metadata. Object metadata: That which relates to the object as it appears in the pack. File metadata: That which relates to the path, such as file modes.
This adds a FileMetadata as a part of the FileHandle. This does make me question if FileHandle is now appropriately named, but logically, the file metadata is a part of the file state which should be considered changed along with the other properties of the FileHandle, so this seems like the correct place to put this information for now. Co-authored-by: Vesko Karaganev <vesko.karaganev@gmail.com>
Prior to this commit, modes did nothing since they were not populated. After this commit, they are populated in the index, and that information is used to specify modes on the created files. Closes #13. Co-authored-by: Vesko Karaganev <vesko.karaganev@gmail.com>
Co-authored-by: Vesko Karaganev <vesko.karaganev@gmail.com>
Addresses a review comment in #107.
Co-authored-by: Vesko Karaganev <vesko.karaganev@gmail.com>
It appears to be optional on an old Ubuntu installation, but in more recent versions of nc, -p is rejected with -l.
Addresses a review comment in #107.
Addresses a review comment in #107.
Add a notion of FileMetadata (describing file state), as distinct from
ObjectMetadata (which describes objects in the store) which is stored in
the FileHandle and therefore on disk as part of snapshot_deltas.
FileMetadata is intended to be extensible but in the first instance
describes file modes bits. In the future it may also describe mtime and
I can imagine further uses arising.
[Note, I've edited this body to reflect the new approach, much of it
remains true from before and stays the same]
Users of the main branch, please note I forsee further potential version
bumps for additional functionality in the near future. If you're using
elfshaker in anger to make long term packs you may want to hold off
making those packs so you get the extra benefits, though it should work
aside from the missing as-yet-unimplemented features.
Compatibility notes:
This elfshaker commit will still be able to read old elfshaker pack
index files, because the default of an optional is None, which gives
us exactly the old behaviour of elfshaker prior to this commit.
elfshaker after this commit will produce packfiles that cannot be read
by older commits of elfshaker. However, those old versions will report
to the user that the version they have is too old and point them to
where they can obtain a newer version.
The biggest risk here is that someone using the main branch makes
packfiles, and distributes those packfiles to a user who is not able to
build elfshaker, and we have not yet gotten around to releasing a new
version of elfshaker. So we should aim for a release of a new version
reasonably soon so that users don't find themselves in this situation
where they can't get an elfshaker which can read the packfiles they have
been provided.
Using an old elfshaker to produce packfiles and ship them to users of
any version of elfshaker is intended to be supported in principle.
Implementation:
We make use of the fact that a default
Option
isNone
, and thatSerde deserializers can cope with missing fields at the end of a struct
with the default attribute. Therefore to make elfshaker able to read old
files while implementing the new field, it's only necessary to add
#[serde(default)]
.The version bump ensures that users of an old elfshaker trying to read a
pack written with the newer elfshaker see an informative message rather
than a deserialization failure.
Review notes:
Please see the individual commits which should make it reasonably clear how this works.
TODO
please consider this draft until then. Feel free to take an early
look or defer your look until I declare this ready.