Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the original checksums on the fallback archives from Software Heritage #5720

Open
kit-ty-kate opened this issue Nov 8, 2023 · 4 comments

Comments

@kit-ty-kate
Copy link
Member

Software Heritage fallbacks added in #4859 adds the ability for opam to fetch archives from Software Heritage.

Currently such archives are (for reasons that escape me [1]) not backups of the original archives but backups of the untarred archives that are re-tarred again later when requested. This way of doing makes it so that archives loose their original checksums and retrieving it in a deterministic manner is close to impossible due to file ordering and metadata having changed.

There is currently a long standing upstream issue that hopes to fix this issue in the medium to long term: https://gitlab.softwareheritage.org/swh/devel/swh-model/-/issues/2430

I personally think we should:

  • make sure users understand that the checksum is not checked and that the content might not be the same, when prompting to use the fallback
  • make the fallback require --confirm-level=unsafe-yes as currently only --yes is required:
    if OpamConsole.confirm ~default:false
  • Wait for the proper fix upstream and use it whenever possible

[1]: I’m guessing it’s for space efficiencies, but still...

@hannesm
Copy link
Member

hannesm commented Nov 8, 2023

thank you for opening this issue. I was not aware that these archives are used as a fallback without verifying the checksum. Would it be possible to guard this behaviour with even another command-line option (i.e. not unsafe-yes, but something like no-checksum-for-software-heritage)? Since the unsafe-yes is AFAICT needed for interactive usage of opam, while I really have no interest in using source code which checksum wasn't verified (I prefer to have a failure on installation in that case).

Thanks a lot.

@rjbou
Copy link
Collaborator

rjbou commented Nov 9, 2023

On validation, checksums are not checked as they can't be used. It is another mechanism that is in place for SWH fallback. We rely on the swhid given in the opam file to download the archive. That swhid is an unique identifier computed from the content of the archive, and it is given by the maintainer. So when we download the SWH archive, we recompute the swhid on the untarred archive in order to validate it (no corruption).

On the fallback itself, it is possible to disable it using opam option swh-fallback=false.

@rjbou
Copy link
Collaborator

rjbou commented Nov 14, 2023

Some clarification, after a long discussion :)

There was a misunderstanding on Software heritage usage, and the fallback implemented in opam.
The fallback in opam is triggered only and only if there is an swhid already present in the opam file. That swhid was added by a maintainer, usually by computing it from the archive that it used for release. That's why we rely on the swhid present in the opam file (and we check it to be sure that the archive matches the swhid), on opam side, it is safe to use. Opam does not retrieve archives from SWH on its own.

But that safe to use guaranty is today not fully fulfilled: there is no check done on opam repo ci, on publication tools, etc. At the beginning, the Software Heritage & OCaml story contained:

  1. addition of opam repo in SWH
  2. some tooling to ease generating/checking shwids
  3. fallback on opam
  4. addition on opam repo ci for checks & proposals
  5. addition in publication tools

But it was done (and funded) only until point 2. So at the moment, there is no support on opam repo, nor on publication tools. It results on 0 package in opam repo contain a swhid.

Once that said, there is still a strong reliability on repo/maintainer for swhid fallback retrieval: maintainers need to give the good swhid, repos need to check it, and some tooling need to be written to help on these tasks.

Until the opam repo & publication tools are upgraded, we propose to change the default by deactivating the SWH fallback, and to display a note in the case an opam file contains an swhid and the archive is missing to inform that it is possible to enable SWH fallback, at own risk.

@kit-ty-kate
Copy link
Member Author

The software heritage fallback was disabled by default in #5899 so moving this issue off the 2.2 milestone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants