-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some packages have duplicated x-revision fields #779
Comments
It looks like a bug: http://hackage.haskell.org/package/cookie-0.4.1.5/revisions/ |
|
Is that official documentation somewhere? If so, I'd love to see it. I'm not sure if this is intended to be the official documented spec right here. |
And to be a bit more clear: how are clients supposed to determine the revision number? By counting previous files of the same name? Is that guaranteed to be reliable? |
Why doesn't the x-revision field line up with reality? Perhaps it was added after the revision feature? |
The only reliable, efficient and sensible way to determine the revision count is in fact by counting the occurrence of the tar entry for the respective pkg-id inside the PS: The revision counting can be done incrementally without having to rescan the whole index on each update thanks to the hackage index being monotonically growing as that's part of the incremental index update of hackage-security. The tricky part is detecting the very unlikely but still possible case that the index got rebased; I'm not sure if hackage-security exposes this information yet, but hackage-security is internally able to detectable when the index couldn't be updated incrementally. But it's probably not worth the added complexity yet for the current size of the index as you still need to handle the decompression incrementally which is certainly possible but tricky, as you'd have to store the compression state and so on... but I don't want to bore you with technical details. |
Given that the Hackage interface only provides revision number information to users, it's currently the only reasonable specification format to expect from them. I encourage people to use
As I see it, the only thing to be done today is what Stack does right now, essentially:
This is slow and wasteful (recalculating information which shouldn't have changed). The two alternatives I can think of are (1) parsing the |
This will be a new library for storing package information. This first bit overhauls the Hackage index update code, and stores information in a SQLite database instead of the old caches. This turns out to be significantly faster for `stack update` calls. Fixes #3586 Note that it would be nicer to just resume the caching from where we'd last left off, or to parse the revision numbers from the cabal files themselves. See the discussion in haskell/hackage-server#779 to see why that isn't possible.
These are my best understanding:
There are no hard guarantees that Also exactly same
For main Hackage index the possibility is there, but it's very exceptional case. From https://www.well-typed.com/blog/2015/08/hackage-security-beta/
It's safe to assume that given If I'd build any cache based on
Or even checksum the whole contents of tar until N. On my machine SHA512 (which is slow and overkill for this purpose) of 01-index.tar takes 1.5 seconds (I have SSD). |
Thanks @phadej, I think that clarifies my concerns. Your proposed workaround for the potential rebase situation should work, I'll see if I can make it happen. |
Thanks to @phadej for the inspiration for this in his comment: haskell/hackage-server#779 (comment)
Alright, that seems to have worked: commercialhaskell/stack@33ef253 To summarize my understanding:
Thanks for the help @hvr and @phadej. If my summary above is correct, I believe this issue can be closed. If there's some documentation that would be an appropriate place for this summary, let me know and I'll send a PR. |
One final note: the last 1024 bytes (two 512-byte blocks) of a tar file contain all null bytes. Each time the tarball is updated, those last two blocks will be overwritten with new data. Therefore, when calculated hashes, you need to ignore the trailing 1024 bytes. |
This will be a new library for storing package information. This first bit overhauls the Hackage index update code, and stores information in a SQLite database instead of the old caches. This turns out to be significantly faster for `stack update` calls. Fixes #3586 Note that it would be nicer to just resume the caching from where we'd last left off, or to parse the revision numbers from the cabal files themselves. See the discussion in haskell/hackage-server#779 to see why that isn't possible.
Thanks to @phadej for the inspiration for this in his comment: haskell/hackage-server#779 (comment)
based on last above discussion, closing this issue. |
I have discovered four cases of package revisions where the
x-revision:
field within the cabal file does not match the revision specified by Hackage. Note the URLs below (interestingly, all for revision 3), and the fact that all of these examples containx-revision: 2
in their contents:It's unclear as a consumer of metadata from Hackage whether to trust the
x-revision
fields or not. The alternative is to count how many previous cabal files are in the tarball, but I haven't seen any guarantee written down that cabal file contents will not be repeated (though that seems to be the case today).The text was updated successfully, but these errors were encountered: