Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SHA256 hash of .whl as info output #5908

Merged
merged 5 commits into from Jun 26, 2019

Conversation

@ianw
Copy link
Contributor

commented Oct 22, 2018

Currently I'm trying to debug some issues with what appear to be
corrupt wheels. It would be very useful to see what pip thought the
state of things was as it wrote the wheel output; if a final corrupt
distributed file is then different to what pip has saved in its build
logs, you know the problem is somewhere after pip but before
distribution.

Currently we get a log of the initial creation, then the stamp when it
gets moved in the final output location, e.g.:

 creating '/tmp/pip-wheel-71CpBe/foo-1.2.3-py2.py3-none-any.whl
 ...
 Stored in directory: /opt/wheel/workspace

A lot happens in between this, so my suggestion is we add the final
output file and it's hash before the "Stored in directory:", e.g. you
now see:

 Building wheels for collected packages: simple
   Running setup.py bdist_wheel for simple: started
   Running setup.py bdist_wheel for simple: finished with status 'done'
   Finished: simple-3.0-py3-none-any.whl sha256=39005a57a6327972575072af82e11d0817439fe6a069381f6f2a123a8c0bf1cf
   Stored in directory: /tmp/pytest-of-iwienand/pytest-18/test_pip_wheel_success0/workspace/scratch
 Successfully built simple

Despite the hash being fairly important for things like
--require-hashes, AFAICS the final hash is not put in the logs at all
currently, so I think this is generically helpful.

@ianw ianw force-pushed the ianw:path-and-hash branch 3 times, most recently from bb8f03d to 8be73da Oct 22, 2018

@ianw ianw changed the title Add more info to built wheel output Add SHA256 hash of .whl as info output Oct 23, 2018

@ianw ianw force-pushed the ianw:path-and-hash branch from 8be73da to 247f80d Oct 23, 2018

@cjerdonek cjerdonek added this to the Print Better Error Messages milestone Oct 24, 2018

@BrownTruck

This comment has been minimized.

Copy link
Contributor

commented Dec 16, 2018

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@ianw ianw force-pushed the ianw:path-and-hash branch from 247f80d to 929a6b8 Dec 17, 2018

@ianw ianw force-pushed the ianw:path-and-hash branch from 929a6b8 to 8d0c708 Dec 17, 2018

@BrownTruck

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2019

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@ianw ianw force-pushed the ianw:path-and-hash branch from 8d0c708 to 45a0faa Feb 4, 2019

@BrownTruck

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2019

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@ianw ianw force-pushed the ianw:path-and-hash branch from 45a0faa to b995b1e Feb 7, 2019

Add SHA256 hash of .whl as info output
Currently I'm trying to debug some issues with what appear to be
corrupt wheels.  It would be very useful to see what pip thought the
state of things was as it wrote the wheel output; if a final corrupt
distributed file is then different to what pip has saved in its build
logs, you know the problem is somewhere after pip but before
distribution.

Currently we get a log of the initial creation, then the stamp when it
gets moved in the final output location, e.g.:

 creating '/tmp/pip-wheel-71CpBe/foo-1.2.3-py2.py3-none-any.whl
 ...
 Stored in directory: /opt/wheel/workspace

A lot happens in between this, so my suggestion is we add the final
output file and it's hash before the "Stored in directory:", e.g. you
now see:

 Building wheels for collected packages: simple
   Running setup.py bdist_wheel for simple: started
   Running setup.py bdist_wheel for simple: finished with status 'done'
   Finished: simple-3.0-py3-none-any.whl sha256=39005a57a6327972575072af82e11d0817439fe6a069381f6f2a123a8c0bf1cf
   Stored in directory: /tmp/pytest-of-iwienand/pytest-18/test_pip_wheel_success0/workspace/scratch
 Successfully built simple

Despite the hash being fairly important for things like
--require-hashes, AFAICS the final hash is not put in the logs at all
currently, so I think this is generically helpful.

@ianw ianw force-pushed the ianw:path-and-hash branch from b995b1e to 4e8b6e9 May 6, 2019

@ianw

This comment has been minimized.

Copy link
Contributor Author

commented May 6, 2019

Any chance I could get a review of this? Either way? We keep a local copy of pip going to have this in it, which I would certainly rather dispense with if possible.

@cjerdonek
Copy link
Member

left a comment

Thanks for this PR, and thanks for your great patience. It would be great to get this into the next release (next month). I'll add it to the 19.2 milestone now.

@@ -0,0 +1,2 @@
Wheel builds will output the final filename and SHA256 hash of .whl

This comment has been minimized.

Copy link
@cjerdonek

cjerdonek Jun 22, 2019

Member

You should use the imperative tone (see https://pip.pypa.io/en/stable/development/contributing/#news-entries ). Maybe something like this?

Log the final filename and SHA256 hash of a .whl file when done building a wheel.

This comment has been minimized.

Copy link
@ianw

ianw Jun 25, 2019

Author Contributor

I have addressed this with 09e3184

src/pip/_internal/wheel.py Show resolved Hide resolved
@@ -885,7 +892,10 @@ def _build_one_inside_env(self, req, output_dir, python_tag=None):
wheel_name = os.path.basename(wheel_path)
dest_path = os.path.join(output_dir, wheel_name)
try:
wheel_hash, _ = hash_file(wheel_path)

This comment has been minimized.

Copy link
@cjerdonek

cjerdonek Jun 22, 2019

Member

I think it would be helpful for readers of the code to give a name to the second variable in the return value. You could add a code comment above the line saying, "We don't need to use the second value." if you would like to communicate that.

This comment has been minimized.

Copy link
@xavfernandez

xavfernandez Jun 24, 2019

Contributor

_length should do, or also output the file size :)

This comment has been minimized.

Copy link
@ianw

ianw Jun 25, 2019

Author Contributor

I have put the length into the output string with a8e382b, so this value is now used

shutil.move(wheel_path, dest_path)
logger.info('Finished: %s sha256=%s',
wheel_name, wheel_hash.hexdigest())

This comment has been minimized.

Copy link
@cjerdonek

cjerdonek Jun 22, 2019

Member

The other log messages in your post have the form "Running setup.py bdist_wheel for simple: finished ..." So maybe this could be "Finished wheel for <name>: ..."

This comment has been minimized.

Copy link
@cjerdonek

cjerdonek Jun 22, 2019

Member

Also, this info() message can be merged with the one right after. Maybe the directory path can go on a separate line like it is now.

This comment has been minimized.

Copy link
@ianw

ianw Jun 25, 2019

Author Contributor

I agree on the format and have changed that with a8e382b

I did leave the following info message (Stored in directory:) somewhat on purpose, because I felt like it might be the type of thing that people had greps for in various ways in existing code. I have made this point explicit with a comment in a8e382b, and additionally added a match for it to the functional test.

The line will become quite long with the full path too; however if you feel strongly the backwards compatibility issues aren't a concern I can change it.

This comment has been minimized.

Copy link
@cjerdonek

cjerdonek Jun 25, 2019

Member

I guess I don't feel strongly about merging the log messages. However, I'd rather not create any expectation among users (or future maintainers for that matter) that log messages won't or shouldn't change. One reason is that I think we should be open to improving these messages when we can as a matter of UX. Thus, my preference would be for the comment not to be there.

This comment has been minimized.

Copy link
@cjerdonek

cjerdonek Jun 25, 2019

Member

The line will become quite long with the full path too;

By the way, to clarify, my suggestion was for the second log message to be in the same log message as the first but still be on a separate line (i.e. a newline character would go in the message). Thus, the lengths of lines needn't increase, and in fact the same greps could even work if the text was kept the same.

This comment has been minimized.

Copy link
@ianw

ianw Jun 25, 2019

Author Contributor

Thus, my preference would be for the comment not to be there.

Removed with 45a6415

By the way, to clarify, my suggestion was for the second log message to be in the same log message as the first but still be on a separate line (i.e. a newline character would go in the message).

Ahh ... yes I wasn't sure about multi-line log strings. This probably comes from some ancient history of mine where I was dealing with syslog() and /dev/msg a lot where multi-line strings are not handled well at all :) Again if you feel strongly happy to convert it to a suggested format

"""Return (hash, length) for path using hashlib.sha256()"""
h = hashlib.sha256()
length = 0
with open(path, 'rb') as f:
for block in read_chunks(f, size=blocksize):
length += len(block)
h.update(block)
return (h, str(length)) # type: ignore

This comment has been minimized.

Copy link
@xavfernandez

xavfernandez Jun 24, 2019

Contributor

This could return the length as int since rehash uses str(length) and _build_one_inside_env ignores it

This comment has been minimized.

Copy link
@cjerdonek

cjerdonek Jun 24, 2019

Member

Good observation. For that matter (but not for this PR), rehash() should probably also be made to return an int! (And the caller can convert to a string as needed.)

This comment has been minimized.

Copy link
@ianw

ianw Jun 25, 2019

Author Contributor

I have made hash_file return a int with 53c38d8. I will leave rehash() for now to avoid further confusion :)

@@ -885,7 +892,10 @@ def _build_one_inside_env(self, req, output_dir, python_tag=None):
wheel_name = os.path.basename(wheel_path)
dest_path = os.path.join(output_dir, wheel_name)
try:
wheel_hash, _ = hash_file(wheel_path)

This comment has been minimized.

Copy link
@xavfernandez

xavfernandez Jun 24, 2019

Contributor

_length should do, or also output the file size :)

@ianw
Copy link
Contributor Author

left a comment

Thank you for review; I'm not sure if you prefer a squashed change or multiple commits, I'm happy to do either.

shutil.move(wheel_path, dest_path)
logger.info('Finished: %s sha256=%s',
wheel_name, wheel_hash.hexdigest())

This comment has been minimized.

Copy link
@ianw

ianw Jun 25, 2019

Author Contributor

I agree on the format and have changed that with a8e382b

I did leave the following info message (Stored in directory:) somewhat on purpose, because I felt like it might be the type of thing that people had greps for in various ways in existing code. I have made this point explicit with a comment in a8e382b, and additionally added a match for it to the functional test.

The line will become quite long with the full path too; however if you feel strongly the backwards compatibility issues aren't a concern I can change it.

@ianw ianw force-pushed the ianw:path-and-hash branch from a8e382b to 8cfd581 Jun 25, 2019

@cjerdonek

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

Thank you for review; I'm not sure if you prefer a squashed change or multiple commits, I'm happy to do either.

Thanks! Either way is fine. We tend to squash when committing anyways using the button (unless the proposer has made deliberately clean and crafted commits, which usually requires rebasing after addressing review comments).

Make hash_file return a int length
Returning an int for the length makes for a better interface for this
new function.

@ianw ianw force-pushed the ianw:path-and-hash branch 2 times, most recently from 81762b3 to 45a6415 Jun 25, 2019

Reword wheel hash details output
This rewords the output to be more like the form of the preceeding
messages.  Additionally the size is added, since we have calculated it
anyway.  The output will now look like:

 Collecting simple==3.0
 Building wheels for collected packages: simple
   Building wheel for simple (setup.py): started
   Building wheel for simple (setup.py): finished with status 'done'
   Created wheel for simple: filename=simple-3.0-py3-none-any.whl size=1138 sha256=2a980a802c9d38a24d29aded2dc2df2b080e58370902e5fdf950090ff67aec10
   Stored in directory: /tmp/pytest-of-iwienand/pytest-0/test_pip_wheel_success0/workspace/scratch
 Successfully built simple

A match for the following line is added to the functional test too.

@ianw ianw force-pushed the ianw:path-and-hash branch from 45a6415 to 8d4e17c Jun 25, 2019

@cjerdonek cjerdonek merged commit 0dbab23 into pypa:master Jun 26, 2019

29 checks passed

Linux Build #20190625.9 succeeded
Details
Linux (Package) Package succeeded
Details
Linux (Test Primary Python27) Test Primary Python27 succeeded
Details
Linux (Test Primary Python36) Test Primary Python36 succeeded
Details
Linux (Test Secondary Python34) Test Secondary Python34 succeeded
Details
Linux (Test Secondary Python35) Test Secondary Python35 succeeded
Details
Linux (Test Secondary Python37) Test Secondary Python37 succeeded
Details
Windows Build #20190625.9 succeeded
Details
Windows (Package) Package succeeded
Details
Windows (Test Primary Python27-x64) Test Primary Python27-x64 succeeded
Details
Windows (Test Primary Python36-x64) Test Primary Python36-x64 succeeded
Details
Windows (Test Secondary Python27-x86) Test Secondary Python27-x86 succeeded
Details
Windows (Test Secondary Python34-x64) Test Secondary Python34-x64 succeeded
Details
Windows (Test Secondary Python34-x86) Test Secondary Python34-x86 succeeded
Details
Windows (Test Secondary Python35-x64) Test Secondary Python35-x64 succeeded
Details
Windows (Test Secondary Python35-x86) Test Secondary Python35-x86 succeeded
Details
Windows (Test Secondary Python36-x86) Test Secondary Python36-x86 succeeded
Details
Windows (Test Secondary Python37-x64) Test Secondary Python37-x64 succeeded
Details
Windows (Test Secondary Python37-x86) Test Secondary Python37-x86 succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
macOS Build #20190625.9 succeeded
Details
macOS (Package) Package succeeded
Details
macOS (Test Primary Python27) Test Primary Python27 succeeded
Details
macOS (Test Primary Python36) Test Primary Python36 succeeded
Details
macOS (Test Secondary Python34) Test Secondary Python34 succeeded
Details
macOS (Test Secondary Python35) Test Secondary Python35 succeeded
Details
macOS (Test Secondary Python37) Test Secondary Python37 succeeded
Details
news-file/pr News files updated and/or change is trivial.
Details
@cjerdonek

This comment has been minimized.

Copy link
Member

commented Jun 26, 2019

Thanks again, @ianw! 🎉

@lock lock bot added the S: auto-locked label Jul 26, 2019

@lock lock bot locked as resolved and limited conversation to collaborators Jul 26, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants
You can’t perform that action at this time.