Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DirTrack: check both size and mtime for cached entries #4038

Merged
merged 1 commit into from Dec 11, 2019

Conversation

@hannesm
Copy link
Member

hannesm commented Dec 7, 2019

the earlier code checked size OR mtime for equality, whereas it should be size AND mtime to avoid bad cache lookups

@hannesm hannesm mentioned this pull request Dec 9, 2019
3 of 9 tasks complete
@hannesm

This comment has been minimized.

Copy link
Member Author

hannesm commented Dec 9, 2019

I ended up using an experimental branch of orb (rjbou/orb#1) which:

  • enables dirtrack
  • creates a switch /tmp/orb-YYY-ZZZ
  • installs an opam package
  • exports the switch packages
  • recods checksums0 from dirtrack for the given opam packages
  • removes the switch
  • create the same switch /tmp/orb-YYY-ZZZ
  • import the above exported packages
  • records checksums1 from dirtrack

now, the checksum1 and checksum0 were equal, although manually checking the md5 of the files in the old and new lead to different results (checksum1 contained wrong data). the byte size of the binaries was equal, but not their mtime.

in general, I'm not entirely convinced of this cache (i can generate a equally-sized and equal mtime file at any point in time, whose data is different), does it really lead to better performance in the general setting? in my experiments, I use each digest only once anyways.

@rjbou

This comment has been minimized.

Copy link
Collaborator

rjbou commented Dec 9, 2019

I'm wondering if for orb, we shouldn't use a digest cache, but compute every time the digest?

@rjbou

This comment has been minimized.

Copy link
Collaborator

rjbou commented Dec 9, 2019

Not that orb have in its opam file opam-* pinned to rbou/orb branch. For those specific ongoing work, maybe better do pull requests on this branch, like that it is quickly merged/avail, and later it will be one PR to merge to finalise first orb release.

@hannesm

This comment has been minimized.

Copy link
Member Author

hannesm commented Dec 9, 2019

@rjbou as commented above, if the cache is not useful, I'd be in favour of removing it entirely. If the cache is useful, this is a fix for it.

it is only this PR and #4040 on top of your orb branch (see rjbou/opam@orb...hannesm:orb). imho this PR is worth to review and merge separately from the other(s).

@AltGr

This comment has been minimized.

Copy link
Member

AltGr commented Dec 11, 2019

Hm, I don't remember precisely, but indeed this is very suspicious (in particular the || that you fixed !)

For the use that opam has of this module, it would repeatedly scan the whole prefix between each package installs, so having such a cache made sense (assuming that no package install modified files while forging their mtime to go unnoticed...) ; re-computing all the hashes of all the prefix after each install would just not be practical.

Of course, now opam by default goes the full way of only using the size+mtime (with a correct &&, thankfully !), since it was still too slow (the first scan is after all the most costly, in particular on a HDD).

So the question is, does this middle-way "precise tracking" still make sense ? I don't think anyone uses it (@rjbou found out it had been broken for months, if not years, a while ago).
If not, it may be removed (together with the cache). If it still does, we should see if it is usable or not at all without the cache, and accordingly remove just the cache, or keep it as is and add an option to disable the cache for orb.

@AltGr

This comment has been minimized.

Copy link
Member

AltGr commented Dec 11, 2019

In any case, thanks for the fix :)

@AltGr AltGr merged commit a22d748 into ocaml:master Dec 11, 2019
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@hannesm

This comment has been minimized.

Copy link
Member Author

hannesm commented Dec 11, 2019

thanks. for (my branch of) orb, checking mtime and size solves the issue. :)

@rjbou rjbou added this to the 2.0.6 milestone Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.