New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sources: add optional "source-checksum" property #980
sources: add optional "source-checksum" property #980
Conversation
Thanks for your contribution. The timing is a bit unfortunate in that we just finished refactoring that source package. It is unfortunate to not have sha3 in our current version of python which we might just be sticking with for a while. For this reason one of our colleagues is SRUing pysha3 into xenial. Maybe @squidsoup can give us an update on that. |
pysha3 is currently in the proposed queue for zesty (https://launchpad.net/ubuntu/zesty/+queue?queue_state=0&queue_text=pysha3), after which it will be SRU'd to xenial. Unfortunately I can't provide any timeframes at the moment. |
I would be interested to see what this looks like once the conflicts are resolved. Thanks for finishing that up, and sorry for going MIA! |
So what should I do @sergiusens ? Rebase the PR so it can be merged? Thanks a lot to you @tsimonq2 ! |
I'd say yes.
I'd also say yes to this. |
…nonical#619 The code adds the optional property "source-checksum" to check the integrity of the source files (tar, zip, deb or rpm). The format of the property is <algorithm>/<digest>. For example: sha2/de2fb61252548af3c87c4aab17e82601691d19e37fd3d29ea6288e56 The currently supported algorithms are: md5, sha1, sha2. The size of the digest for sha2 is automatically detected. The support for the sha3 is not implemented, as it requires python 3.6: https://docs.python.org/3.6/whatsnew/3.6.html#hashlib or a library not present yet in the Ubuntu repositories (pysha3). Once ready it will be easy to add it.
8b2254b
to
0cc9f13
Compare
@sergiusens @kyrofa rebase done! Additionally I:
Waiting for your comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pachulo! I have a few suggestions, but I like where this is heading.
|
||
Snapcraft will use the digest specified to verify the integrity of the | ||
source. The source-type needs to be a file (tar, zip, deb or rpm) and | ||
the algorithm either md5, sha1, sha2 or sha3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sha3 right now, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed!
snapcraft/internal/sources/_base.py
Outdated
@@ -62,3 +65,35 @@ def download(self): | |||
request.raise_for_status() | |||
|
|||
download_requests_stream(request, self.file) | |||
|
|||
def verify_checksum(source_checksum, checkfile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an enormous fan of relying on the length to tell us what hash we think it is (a lot of magic numbers), and I don't think it's particularly helpful to our users. I say we just calculate the hash using the digest they told us to use and compare, simple as that. Something like this:
def verify_checksum(source_checksum, checkfile):
try:
digest, checksum = source_checksum.split('/', 1)
catch ValueError:
raise ValueError('invalid checksum format: {!r}'.format(source_checksum))
with open(checkfile, 'rb') as f:
# This will raise a ValueError if digest is unsupported
checksum = hashlib.new(digest, f.read())
digest = checksum.hexdigest()
if digest != source_checksum_digest:
# Make sure this exception has a good error message, printing both digests.
raise errors.ChecksumDoesNotMatchError(digest, source_checksum_digest)
It also immediately expands the number of hashing algorithms we support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is this intended to be a method on FileBase
? It shouldn't be, and, well, it's not (no self
), but then it's used as if it is. It should probably be in sources/__init__.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I didn't really know where to put it, so I put it in FileBase
, but it makes much more sense in sources/__init__.py
Moved and modified the implementation to one similar to your example.
snapcraft/internal/sources/_base.py
Outdated
command=None): | ||
def __init__(self, source, source_dir, source_checksum=None, | ||
source_tag=None, source_commit=None, source_branch=None, | ||
source_depth=None, command=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why source_checksum
is added at the beginning of named arguments instead of the end, where it'd cause a smaller diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, modified!
snapcraft/internal/sources/errors.py
Outdated
fmt = '{message}' | ||
|
||
def __init__(self, message): | ||
super().__init__(message=message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would do more for you if it accepted an expected and actual digest and printed a decent formatted message on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it makes a lot of sense to do it in the the error init. Modified.
Codecov Report
@@ Coverage Diff @@
## master #980 +/- ##
=========================================
Coverage ? 96.35%
=========================================
Files ? 195
Lines ? 17903
Branches ? 1379
=========================================
Hits ? 17250
Misses ? 441
Partials ? 212
Continue to review full report at Codecov.
|
algorithm, digest = source_checksum.split('/', 1) | ||
|
||
except ValueError: | ||
raise ValueError('invalid checksum format: {!r}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is a little weird through here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified!
@@ -23,3 +23,13 @@ class IncompatibleOptionsError(errors.SnapcraftError): | |||
|
|||
def __init__(self, message): | |||
super().__init__(message=message) | |||
|
|||
|
|||
class DigestDoesNotMatchError(errors.SnapcraftError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SnapcraftError
class is a little smarter than that:
class DigestDoesNotMatchError(errors.SnapcraftError):
fmt = 'Expected the digest for source to be {expected}, but it was {calculated}'
def __init__(self, expected, calculated):
super().__init__(expected=expected, calculated=calculated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't really know how to do it...thanks for the tip!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember writing special exceptions, didn't know we got a new error interface since I initially wrote this, cool!
…chulo/snapcraft into downloaded-files-checksum-bug-1585913
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I'm happy with this, thanks @pachulo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your contribution!
I've left some minor comments. The big issue here is that there are no unit tests.
It needs unit tests for sources that don't allow digest, for sources that allow digest and the digest is right and wrong.
I can guide you with the tests, if you need a hand.
@@ -10,3 +10,39 @@ parts: | |||
source: simple.zip | |||
files: | |||
"*": . | |||
checksum-md5: | |||
plugin: copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use dump instead of copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@@ -40,3 +40,27 @@ parts: | |||
plugin: tar-content | |||
source: simple.tar.bz2 | |||
destination: destdir1/destdir2 | |||
checksum-md5: | |||
plugin: tar-content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use dump instead of tar-content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@@ -37,6 +37,9 @@ def __init__(self, source, source_dir, source_tag=None, source_commit=None, | |||
raise errors.IncompatibleOptionsError( | |||
'can\'t specify both source-tag and source-commit for ' | |||
'a bzr source') | |||
if source_checksum: | |||
raise errors.IncompatibleOptionsError( | |||
'can\'t specify a source-checksum for a bzr source') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny detail, it's more readable like:
"can't specify a source ..."
It's the only exception where we like double quotes. I know the lines above are not following it, we can fix them later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
snapcraft/internal/sources/errors.py
Outdated
|
||
class DigestDoesNotMatchError(errors.SnapcraftError): | ||
|
||
fmt = 'Expected the digest for source to be {expected}, '\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for the \ in there. You can just continue strings in the following line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
Modified with the commented stuff, but I will definitely need guidance with those unit tests @ElOpio ! |
@ElOpio I added the unit tests for the sources that don't allow digests, but I would need help to create the ones for the sources that do admit them: Can you give me an example? |
thanks @pachulo I triggered the cla test to rerun. There are however, some conflicts to be taken care of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking very good @pachulo.
One small detail, please check that all the files you touched have 2017 in the copyright date at the top.
In order to test the integration for invalid checksums, you could copy checksum-algorithms to checksum-algorithms-invalid, and change one character on each checksum.
Then on test_checksum_algorithms you can use test scenarios to run snapcraft pull {part-name} for each part, and use assertRaises to check that they all fail.
I'm in rocket.ubuntu.com if you need a hand.
@@ -0,0 +1,82 @@ | |||
# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*- | |||
# | |||
# Copyright (C) 2016 Canonical Ltd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the copyright year.
…chulo/snapcraft into downloaded-files-checksum-bug-1585913
Good night! I think that now everything is in it's place! I'm waiting for your comments (or even approval)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks very good to me. I just left a few more comments. Thanks!
self.assertRaises(subprocess.CalledProcessError, | ||
self.run_snapcraft, | ||
['pull', 'checksum-sha3-512'], | ||
project_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would look better as scenarios, don't you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know how scenarios look like, but I guess that you are right. Any hint or example to follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grep for scenarios
in the tests directories. You'll find plenty. You need to move this test to it's own test class, and put the parts names in the scenarios list.
Something like:
scenarios = [(part, {'part': part}) for part in ['checksum-md5', 'checksum-sha1', ...]]
Then, during the test will be copied for every part, and self.part will have the value.
class ChecksumAlgorithmsTestCase(integration_tests.TestCase): | ||
|
||
def test_checksum_algorithms(self): | ||
project_dir = 'checksum-algorithms' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add deb-with-checksum and rpm-with-checksum as scenarios here.
@@ -84,13 +93,19 @@ | |||
from ._subversion import Subversion # noqa | |||
from ._tar import Tar # noqa | |||
from ._zip import Zip # noqa | |||
from . import errors | |||
|
|||
if sys.version_info < (3, 6): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this here? What happens on 3.6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the example from PR#940 snapcraft/file_utils.py: In python 3.6 sha3 support is upstreamed in hashlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. Please leave that as a comment in the code.
open(file_to_zip, 'w').close() | ||
zip_file = zipfile.ZipFile(os.path.join('src', 'test.zip'), 'w') | ||
zip_file.write(file_to_zip) | ||
zip_file.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on this one, there's no need to make a zip file, right?
The error should happen before verify_checksum tries to open the file, so you can just pass a string like 'dummy.zip', and you test the same code path faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to make a file. That was my first attempt, but I found out that the error happens after verify_checksum opens the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And does it need to be a valid zip file, or any file will do?
I see that this is not your code, this is the library following the python rule of ask forgiveness, not permission, so it's ok.
self.assertRaises(subprocess.CalledProcessError, | ||
self.run_snapcraft, | ||
['pull', 'checksum-sha3-512'], | ||
project_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grep for scenarios
in the tests directories. You'll find plenty. You need to move this test to it's own test class, and put the parts names in the scenarios list.
Something like:
scenarios = [(part, {'part': part}) for part in ['checksum-md5', 'checksum-sha1', ...]]
Then, during the test will be copied for every part, and self.part will have the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have nothing else to add here. Thanks @pachulo!
I will trigger the autopkgtests.
\o/ |
Note that the CLA check is timing out (there are too many commits, haha). I think we can ignore it at this point. I'm going to update the branch and trigger the autopkgtests. |
The format of the property is `<algorithm>/<digest>`. For example: parts: with-checksum: plugin: dump source: my/tarball.tar.gz source-checksum: md5/d9210476aac5f367b14e513bdefdee09 LP: #1585913
The format of the property is `<algorithm>/<digest>`. For example: parts: with-checksum: plugin: dump source: my/tarball.tar.gz source-checksum: md5/d9210476aac5f367b14e513bdefdee09 LP: #1585913
Features
The size of sha2 digests is automatically detected, supporting the following bit sizes:224256384512Failsafes
Testing coverage