Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cope with invalid hash algorithms in RECORD #179

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dimbleby
Copy link
Contributor

@dimbleby dimbleby commented Apr 8, 2023

Fixes #178

I said there that I didn't have a fix to offer; but perhaps even a wrong fix is a better starting point - or anyway has more momentum - than no fix at all

- raise an InvalidRecordEntry
- catch such things during wheel validation and include them in the
  issues reported
@dimbleby
Copy link
Contributor Author

dimbleby commented May 3, 2023

I can't tell what the test failure at https://github.com/pypa/installer/actions/runs/4646834563/jobs/8680555289?pr=179 was complaining about

I pushed again in the hope of kicking a retry but apparently that needs manual approval

@dimbleby
Copy link
Contributor Author

dimbleby commented May 3, 2023

oh this

FAIL Required test coverage of 100% not reached. Total coverage: 99.82%

I find it hard to believe that windows 3.9 uniquely fails to get to the line in question, so not sure what to do about that

@dimbleby
Copy link
Contributor Author

dimbleby commented May 3, 2023

I pushed a commit that adds # pragma: no cover to that line, and notes that it is a false positive reported only on windows and python < 3.10. Say if you have a better idea!

Comment on lines +181 to +184
hash_value = Hash.parse(hash_)
if hash_value.name not in hashlib.algorithms_available:
issues.append(f"invalid hash algorithm '{hash_value.name}'")
hash_value = None
Copy link
Member

@pradyunsg pradyunsg May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be performing this validation in the validate method somehow, since this would prevent a RecordEntry from being created with a hash that isn't in algorithms_available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that it's desirable to be unable to create a RecordEntry with a bad hash algorithm, but I'm happy to be guided in another direction if you have a preference.

Yet another approach, doubling down on "invalid hash algorithms are invalid" would be to raise a ValueError from Hash.parse(), though the error message from that path isn't currently very informative

@pradyunsg
Copy link
Member

Thanks for filing this PR @dimbleby, and appreicate your patience on this as I went away from OSS for a month.

Say if you have a better idea!

The only reason I can think of is that sha is a valid algorithm name on those (the only check being undertaken is in algorithms_available so it has to be in that list/set). Consider renaming the algorithm to notavailable= or invalid= instead?

@dimbleby
Copy link
Contributor Author

No worries, take your time!

if sha was somehow a valid algorithm name then the new test wouldn't pass at all, right?

@pradyunsg
Copy link
Member

It would, if it's conditionally available on only certain platforms. 😬

if hash_:
try:
hash_value: Optional[Hash] = Hash.parse(hash_)
hash_value = Hash.parse(hash_)
if hash_value.name not in hashlib.algorithms_available:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 376 / https://packaging.python.org/en/latest/specifications/recording-installed-packages/ requires that RECORD must be in hashlib.algorithms_guaranteed, using _available is not good enough when producing installed databases in site-packages.

But the wheel format itself (https://packaging.python.org/en/latest/specifications/binary-distribution-format/) doesn't say anything one way or another about it...

RECORD is a list of (almost) all the files in the wheel and their secure hashes. Unlike PEP 376, every file except RECORD, which cannot contain a hash of itself, must include its hash. The hash algorithm must be sha256 or better; specifically, md5 and sha1 are not permitted, as signed wheel files rely on the strong hashes in RECORD to validate the integrity of the archive.

So it disagrees with PEP 376 in at least two points:

  • hash all files, even pyc
  • md5 and sha1 are algorithms_guaranteed, but still not allowed

And therefore the PEP cannot be taken as authoritative on algorithms_guaranteed. But only for wheels. Once e extracted it does need to be in _guaranteed.

Fun.

...

In theory this would solve the platform-conditional concern.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me what the intended availability of installer.records is. Can I use it to parse the RECORD in a wheel file, or one in site-packages? Is the appropriate guard here to enforce this when creating new RECORD files only?

Experimental patch to validate this as part of destinations, where we know that any algorithm passed is definitely intended for creating a new installed RECORD file: 67da026.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what conclusions, if any, you reach from the above.

I am therefore politely saying "noted"... but not doing anything about it!

Please say if you have opinions about how this MR ought to look (or submit an MR, I'm fine with this one being overtaken by something better)

@dimbleby
Copy link
Contributor Author

@pradyunsg if sha was conditionally available only on certain platforms, then the new test would fail on those certain platforms - yes?

The test verifies that we get an error complaining that the algorithm name is invalid: I don't see how that test can pass without hitting the line of code in question. I really don't see how this can be anything other than coverage weirdness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Record validation fails on invalid hash algorithm names
3 participants