Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHA256 hash of .whl as info output #5908

Merged
merged 5 commits into from Jun 26, 2019
Merged

Conversation

@ianw
Copy link
Contributor

@ianw ianw commented Oct 22, 2018

Currently I'm trying to debug some issues with what appear to be
corrupt wheels. It would be very useful to see what pip thought the
state of things was as it wrote the wheel output; if a final corrupt
distributed file is then different to what pip has saved in its build
logs, you know the problem is somewhere after pip but before
distribution.

Currently we get a log of the initial creation, then the stamp when it
gets moved in the final output location, e.g.:

 creating '/tmp/pip-wheel-71CpBe/foo-1.2.3-py2.py3-none-any.whl
 ...
 Stored in directory: /opt/wheel/workspace

A lot happens in between this, so my suggestion is we add the final
output file and it's hash before the "Stored in directory:", e.g. you
now see:

 Building wheels for collected packages: simple
   Running setup.py bdist_wheel for simple: started
   Running setup.py bdist_wheel for simple: finished with status 'done'
   Finished: simple-3.0-py3-none-any.whl sha256=39005a57a6327972575072af82e11d0817439fe6a069381f6f2a123a8c0bf1cf
   Stored in directory: /tmp/pytest-of-iwienand/pytest-18/test_pip_wheel_success0/workspace/scratch
 Successfully built simple

Despite the hash being fairly important for things like
--require-hashes, AFAICS the final hash is not put in the logs at all
currently, so I think this is generically helpful.

@ianw ianw force-pushed the path-and-hash branch 3 times, most recently from bb8f03d to 8be73da Oct 23, 2018
@ianw ianw changed the title Add more info to built wheel output Add SHA256 hash of .whl as info output Oct 23, 2018
@ianw ianw force-pushed the path-and-hash branch from 8be73da to 247f80d Oct 23, 2018
@cjerdonek cjerdonek added this to the Print Better Error Messages milestone Oct 24, 2018
@BrownTruck
Copy link
Contributor

@BrownTruck BrownTruck commented Dec 16, 2018

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck
Copy link
Contributor

@BrownTruck BrownTruck commented Jan 25, 2019

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck
Copy link
Contributor

@BrownTruck BrownTruck commented Feb 5, 2019

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

Currently I'm trying to debug some issues with what appear to be
corrupt wheels.  It would be very useful to see what pip thought the
state of things was as it wrote the wheel output; if a final corrupt
distributed file is then different to what pip has saved in its build
logs, you know the problem is somewhere after pip but before
distribution.

Currently we get a log of the initial creation, then the stamp when it
gets moved in the final output location, e.g.:

 creating '/tmp/pip-wheel-71CpBe/foo-1.2.3-py2.py3-none-any.whl
 ...
 Stored in directory: /opt/wheel/workspace

A lot happens in between this, so my suggestion is we add the final
output file and it's hash before the "Stored in directory:", e.g. you
now see:

 Building wheels for collected packages: simple
   Running setup.py bdist_wheel for simple: started
   Running setup.py bdist_wheel for simple: finished with status 'done'
   Finished: simple-3.0-py3-none-any.whl sha256=39005a57a6327972575072af82e11d0817439fe6a069381f6f2a123a8c0bf1cf
   Stored in directory: /tmp/pytest-of-iwienand/pytest-18/test_pip_wheel_success0/workspace/scratch
 Successfully built simple

Despite the hash being fairly important for things like
--require-hashes, AFAICS the final hash is not put in the logs at all
currently, so I think this is generically helpful.
@ianw ianw force-pushed the path-and-hash branch from b995b1e to 4e8b6e9 May 6, 2019
@ianw
Copy link
Contributor Author

@ianw ianw commented May 6, 2019

Any chance I could get a review of this? Either way? We keep a local copy of pip going to have this in it, which I would certainly rather dispense with if possible.

Copy link
Member

@cjerdonek cjerdonek left a comment

Thanks for this PR, and thanks for your great patience. It would be great to get this into the next release (next month). I'll add it to the 19.2 milestone now.

@@ -0,0 +1,2 @@
Wheel builds will output the final filename and SHA256 hash of .whl
Copy link
Member

@cjerdonek cjerdonek Jun 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the imperative tone (see https://pip.pypa.io/en/stable/development/contributing/#news-entries ). Maybe something like this?

Log the final filename and SHA256 hash of a .whl file when done building a wheel.

Copy link
Contributor Author

@ianw ianw Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed this with 09e3184

src/pip/_internal/wheel.py Show resolved Hide resolved
@@ -885,7 +892,10 @@ def _build_one_inside_env(self, req, output_dir, python_tag=None):
wheel_name = os.path.basename(wheel_path)
dest_path = os.path.join(output_dir, wheel_name)
try:
wheel_hash, _ = hash_file(wheel_path)
Copy link
Member

@cjerdonek cjerdonek Jun 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful for readers of the code to give a name to the second variable in the return value. You could add a code comment above the line saying, "We don't need to use the second value." if you would like to communicate that.

Copy link
Member

@xavfernandez xavfernandez Jun 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_length should do, or also output the file size :)

Copy link
Contributor Author

@ianw ianw Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have put the length into the output string with a8e382b, so this value is now used

shutil.move(wheel_path, dest_path)
logger.info('Finished: %s sha256=%s',
wheel_name, wheel_hash.hexdigest())
Copy link
Member

@cjerdonek cjerdonek Jun 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other log messages in your post have the form "Running setup.py bdist_wheel for simple: finished ..." So maybe this could be "Finished wheel for <name>: ..."

Copy link
Member

@cjerdonek cjerdonek Jun 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this info() message can be merged with the one right after. Maybe the directory path can go on a separate line like it is now.

Copy link
Contributor Author

@ianw ianw Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the format and have changed that with a8e382b

I did leave the following info message (Stored in directory:) somewhat on purpose, because I felt like it might be the type of thing that people had greps for in various ways in existing code. I have made this point explicit with a comment in a8e382b, and additionally added a match for it to the functional test.

The line will become quite long with the full path too; however if you feel strongly the backwards compatibility issues aren't a concern I can change it.

Copy link
Member

@cjerdonek cjerdonek Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't feel strongly about merging the log messages. However, I'd rather not create any expectation among users (or future maintainers for that matter) that log messages won't or shouldn't change. One reason is that I think we should be open to improving these messages when we can as a matter of UX. Thus, my preference would be for the comment not to be there.

Copy link
Member

@cjerdonek cjerdonek Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line will become quite long with the full path too;

By the way, to clarify, my suggestion was for the second log message to be in the same log message as the first but still be on a separate line (i.e. a newline character would go in the message). Thus, the lengths of lines needn't increase, and in fact the same greps could even work if the text was kept the same.

Copy link
Contributor Author

@ianw ianw Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus, my preference would be for the comment not to be there.

Removed with 45a6415

By the way, to clarify, my suggestion was for the second log message to be in the same log message as the first but still be on a separate line (i.e. a newline character would go in the message).

Ahh ... yes I wasn't sure about multi-line log strings. This probably comes from some ancient history of mine where I was dealing with syslog() and /dev/msg a lot where multi-line strings are not handled well at all :) Again if you feel strongly happy to convert it to a suggested format

@cjerdonek cjerdonek removed this from the Print Better Error Messages milestone Jun 22, 2019
@cjerdonek cjerdonek added this to the 19.2 milestone Jun 22, 2019
"""Return (hash, length) for path using hashlib.sha256()"""
h = hashlib.sha256()
length = 0
with open(path, 'rb') as f:
for block in read_chunks(f, size=blocksize):
length += len(block)
h.update(block)
return (h, str(length)) # type: ignore
Copy link
Member

@xavfernandez xavfernandez Jun 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could return the length as int since rehash uses str(length) and _build_one_inside_env ignores it

Copy link
Member

@cjerdonek cjerdonek Jun 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation. For that matter (but not for this PR), rehash() should probably also be made to return an int! (And the caller can convert to a string as needed.)

Copy link
Contributor Author

@ianw ianw Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made hash_file return a int with 53c38d8. I will leave rehash() for now to avoid further confusion :)

@@ -885,7 +892,10 @@ def _build_one_inside_env(self, req, output_dir, python_tag=None):
wheel_name = os.path.basename(wheel_path)
dest_path = os.path.join(output_dir, wheel_name)
try:
wheel_hash, _ = hash_file(wheel_path)
Copy link
Member

@xavfernandez xavfernandez Jun 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_length should do, or also output the file size :)

Copy link
Contributor Author

@ianw ianw left a comment

Thank you for review; I'm not sure if you prefer a squashed change or multiple commits, I'm happy to do either.

shutil.move(wheel_path, dest_path)
logger.info('Finished: %s sha256=%s',
wheel_name, wheel_hash.hexdigest())
Copy link
Contributor Author

@ianw ianw Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the format and have changed that with a8e382b

I did leave the following info message (Stored in directory:) somewhat on purpose, because I felt like it might be the type of thing that people had greps for in various ways in existing code. I have made this point explicit with a comment in a8e382b, and additionally added a match for it to the functional test.

The line will become quite long with the full path too; however if you feel strongly the backwards compatibility issues aren't a concern I can change it.

@ianw ianw force-pushed the path-and-hash branch from a8e382b to 8cfd581 Jun 25, 2019
@cjerdonek
Copy link
Member

@cjerdonek cjerdonek commented Jun 25, 2019

Thank you for review; I'm not sure if you prefer a squashed change or multiple commits, I'm happy to do either.

Thanks! Either way is fine. We tend to squash when committing anyways using the button (unless the proposer has made deliberately clean and crafted commits, which usually requires rebasing after addressing review comments).

Returning an int for the length makes for a better interface for this
new function.
@ianw ianw force-pushed the path-and-hash branch 2 times, most recently from 81762b3 to 45a6415 Jun 25, 2019
This rewords the output to be more like the form of the preceeding
messages.  Additionally the size is added, since we have calculated it
anyway.  The output will now look like:

 Collecting simple==3.0
 Building wheels for collected packages: simple
   Building wheel for simple (setup.py): started
   Building wheel for simple (setup.py): finished with status 'done'
   Created wheel for simple: filename=simple-3.0-py3-none-any.whl size=1138 sha256=2a980a802c9d38a24d29aded2dc2df2b080e58370902e5fdf950090ff67aec10
   Stored in directory: /tmp/pytest-of-iwienand/pytest-0/test_pip_wheel_success0/workspace/scratch
 Successfully built simple

A match for the following line is added to the functional test too.
@ianw ianw force-pushed the path-and-hash branch from 45a6415 to 8d4e17c Jun 25, 2019
@cjerdonek cjerdonek merged commit 0dbab23 into pypa:master Jun 26, 2019
29 checks passed
@cjerdonek
Copy link
Member

@cjerdonek cjerdonek commented Jun 26, 2019

Thanks again, @ianw! 🎉

@lock lock bot added the S: auto-locked label Jul 26, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jul 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants