Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
sources: add optional "source-checksum" property #980
Conversation
|
Thanks for your contribution. The timing is a bit unfortunate in that we just finished refactoring that source package. It is unfortunate to not have sha3 in our current version of python which we might just be sticking with for a while. For this reason one of our colleagues is SRUing pysha3 into xenial. Maybe @squidsoup can give us an update on that. |
|
pysha3 is currently in the proposed queue for zesty (https://launchpad.net/ubuntu/zesty/+queue?queue_state=0&queue_text=pysha3), after which it will be SRU'd to xenial. Unfortunately I can't provide any timeframes at the moment. |
|
I would be interested to see what this looks like once the conflicts are resolved. Thanks for finishing that up, and sorry for going MIA! |
|
So what should I do @sergiusens ? Rebase the PR so it can be merged? Thanks a lot to you @tsimonq2 ! |
I'd say yes.
I'd also say yes to this. |
|
@sergiusens @kyrofa rebase done! Additionally I:
Waiting for your comments! |
kyrofa
requested changes
Dec 22, 2016
Thanks @pachulo! I have a few suggestions, but I like where this is heading.
| + | ||
| + Snapcraft will use the digest specified to verify the integrity of the | ||
| + source. The source-type needs to be a file (tar, zip, deb or rpm) and | ||
| + the algorithm either md5, sha1, sha2 or sha3. |
| - command=None): | ||
| + def __init__(self, source, source_dir, source_checksum=None, | ||
| + source_tag=None, source_commit=None, source_branch=None, | ||
| + source_depth=None, command=None): |
kyrofa
Dec 22, 2016
Member
Is there a reason why source_checksum is added at the beginning of named arguments instead of the end, where it'd cause a smaller diff?
| @@ -62,3 +65,35 @@ def download(self): | ||
| request.raise_for_status() | ||
| download_requests_stream(request, self.file) | ||
| + | ||
| + def verify_checksum(source_checksum, checkfile): |
kyrofa
Dec 22, 2016
•
Member
I'm not an enormous fan of relying on the length to tell us what hash we think it is (a lot of magic numbers), and I don't think it's particularly helpful to our users. I say we just calculate the hash using the digest they told us to use and compare, simple as that. Something like this:
def verify_checksum(source_checksum, checkfile):
try:
digest, checksum = source_checksum.split('/', 1)
catch ValueError:
raise ValueError('invalid checksum format: {!r}'.format(source_checksum))
with open(checkfile, 'rb') as f:
# This will raise a ValueError if digest is unsupported
checksum = hashlib.new(digest, f.read())
digest = checksum.hexdigest()
if digest != source_checksum_digest:
# Make sure this exception has a good error message, printing both digests.
raise errors.ChecksumDoesNotMatchError(digest, source_checksum_digest)It also immediately expands the number of hashing algorithms we support.
kyrofa
Dec 22, 2016
Member
Also, is this intended to be a method on FileBase? It shouldn't be, and, well, it's not (no self), but then it's used as if it is. It should probably be in sources/__init__.py.
pachulo
Dec 23, 2016
Contributor
No, I didn't really know where to put it, so I put it in FileBase, but it makes much more sense in sources/__init__.py
Moved and modified the implementation to one similar to your example.
| + fmt = '{message}' | ||
| + | ||
| + def __init__(self, message): | ||
| + super().__init__(message=message) |
kyrofa
Dec 22, 2016
Member
This would do more for you if it accepted an expected and actual digest and printed a decent formatted message on its own.
pachulo
Dec 23, 2016
•
Contributor
Yeah, it makes a lot of sense to do it in the the error init. Modified.
pachulo
added some commits
Dec 23, 2016
codecov-io
commented
Dec 23, 2016
•
Codecov Report
@@ Coverage Diff @@
## master #980 +/- ##
=========================================
Coverage ? 96.35%
=========================================
Files ? 195
Lines ? 17903
Branches ? 1379
=========================================
Hits ? 17250
Misses ? 441
Partials ? 212
Continue to review full report at Codecov.
|
sergiusens
changed the title from
Add optional "source-checksum" property
to
sources: add optional "source-checksum" property
Jan 3, 2017
| + algorithm, digest = source_checksum.split('/', 1) | ||
| + | ||
| + except ValueError: | ||
| + raise ValueError('invalid checksum format: {!r}' |
| @@ -23,3 +23,13 @@ class IncompatibleOptionsError(errors.SnapcraftError): | ||
| def __init__(self, message): | ||
| super().__init__(message=message) | ||
| + | ||
| + | ||
| +class DigestDoesNotMatchError(errors.SnapcraftError): |
kyrofa
Jan 3, 2017
Member
The SnapcraftError class is a little smarter than that:
class DigestDoesNotMatchError(errors.SnapcraftError):
fmt = 'Expected the digest for source to be {expected}, but it was {calculated}'
def __init__(self, expected, calculated):
super().__init__(expected=expected, calculated=calculated)
tsimonq2
Jan 20, 2017
Contributor
I remember writing special exceptions, didn't know we got a new error interface since I initially wrote this, cool!
pachulo
added some commits
Jan 4, 2017
kyrofa
approved these changes
Jan 6, 2017
Alright I'm happy with this, thanks @pachulo!
elopio
requested changes
Jan 11, 2017
Thanks a lot for your contribution!
I've left some minor comments. The big issue here is that there are no unit tests.
It needs unit tests for sources that don't allow digest, for sources that allow digest and the digest is right and wrong.
I can guide you with the tests, if you need a hand.
| @@ -40,3 +40,27 @@ parts: | ||
| plugin: tar-content | ||
| source: simple.tar.bz2 | ||
| destination: destdir1/destdir2 | ||
| + checksum-md5: | ||
| + plugin: tar-content |
| @@ -10,3 +10,39 @@ parts: | ||
| source: simple.zip | ||
| files: | ||
| "*": . | ||
| + checksum-md5: | ||
| + plugin: copy |
| @@ -37,6 +37,9 @@ def __init__(self, source, source_dir, source_tag=None, source_commit=None, | ||
| raise errors.IncompatibleOptionsError( | ||
| 'can\'t specify both source-tag and source-commit for ' | ||
| 'a bzr source') | ||
| + if source_checksum: | ||
| + raise errors.IncompatibleOptionsError( | ||
| + 'can\'t specify a source-checksum for a bzr source') |
elopio
Jan 11, 2017
Member
tiny detail, it's more readable like:
"can't specify a source ..."
It's the only exception where we like double quotes. I know the lines above are not following it, we can fix them later.
| + | ||
| +class DigestDoesNotMatchError(errors.SnapcraftError): | ||
| + | ||
| + fmt = 'Expected the digest for source to be {expected}, '\ |
elopio
Jan 11, 2017
Member
no need for the \ in there. You can just continue strings in the following line.
pachulo
added some commits
Jan 12, 2017
|
Modified with the commented stuff, but I will definitely need guidance with those unit tests @elopio ! |
pachulo
added some commits
Jan 18, 2017
|
@elopio I added the unit tests for the sources that don't allow digests, but I would need help to create the ones for the sources that do admit them: Can you give me an example? |
pachulo
added some commits
Jan 26, 2017
|
@elopio I added most of the unit tests for the sources that allow digests in snapcraft/tests/sources/test_checksum.py: I think that this is what you asked for, but I'm not really sure... Besides that, I would need help to create the unit test for a deb with a correct checksum: if I understood well, I need a real deb file for this one, but I don't really know how to create a dummy one from python. |
| + | ||
| +from snapcraft import tests | ||
| + | ||
| +#from snapcraft.tests.sources import SourceTestCase |
|
nice work @pachulo! Your tests are correct and nicely coded, but now lets take into account the integration tests so we don't have the same tests twice at different levels. Please, let me know what do you think about this proposal. Make a new snap in integration_tests/snaps/checksum-algorithms, and move there all the checksums you have added to simple-tar, and add a test integration_tests/test_checksum_algorithms.py and run snapcraft pull in there. Then simple tar can keep simple, and it will be easier to find the checksum tests. This integration test will be checking the pull step with all the combinations of algorithms + valid/invalid. This way the integration tests cover the pull step. In the unit tests, we can have a few that directly call verify_checksum. One for correct, one for incorrect, one for invalid algorithm. After that, the codecov report will tell us which lines we are missing. I think we will only be missing that when a source is instantiated with source_checksum, the provision step will call verify_checksum. But as that is already covered by the integration tests, we can do use mock.patch to inject a fake instead of the real verification. As I said above, this is just a proposal. Feel free to disagree or to deviated. |
|
pysha3 is now in the repos, can you re add the support? |
|
Can you remove |
|
Hi @elopio ! I think that what you say makes a lot of sense! I've made the first refactor and I have some comments/doubts:
I think that right now the integration tests only check for algorithm+valid, am I wrong?
As for now only the unit test for invalid algorithm is working, I'm still working on it... I'm getting a weird error for the one with the correct checksum:
|
@sergiusens I think that after the changes proposed by @kyrofa a month ago, I will only need to change the documentation and the tests after this PR is accepted: #940 |
pachulo
added some commits
Feb 1, 2017
|
@elopio I think that all tests are working and that they (mostly) adhere to what you proposed!
If so, how should I create an integration test to check algorithm+invalid? |
|
@pachulo I am not sure how this has not come up earlier, but we require a CLA to be signed. Have you done so? I ask as the only check that fails is the one to check the CLA |
|
@sergiusens I thought that I did, but I did it again just in case! |
|
thanks @pachulo I triggered the cla test to rerun. There are however, some conflicts to be taken care of. |
elopio
requested changes
Feb 9, 2017
This is looking very good @pachulo.
One small detail, please check that all the files you touched have 2017 in the copyright date at the top.
In order to test the integration for invalid checksums, you could copy checksum-algorithms to checksum-algorithms-invalid, and change one character on each checksum.
Then on test_checksum_algorithms you can use test scenarios to run snapcraft pull {part-name} for each part, and use assertRaises to check that they all fail.
I'm in rocket.ubuntu.com if you need a hand.
| @@ -0,0 +1,82 @@ | ||
| +# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*- | ||
| +# | ||
| +# Copyright (C) 2016 Canonical Ltd |
pachulo
added some commits
Feb 10, 2017
|
Good night! I think that now everything is in it's place! I'm waiting for your comments (or even approval)! |
pachulo
added some commits
Feb 12, 2017
elopio
requested changes
Feb 14, 2017
looks very good to me. I just left a few more comments. Thanks!
| +class ChecksumAlgorithmsTestCase(integration_tests.TestCase): | ||
| + | ||
| + def test_checksum_algorithms(self): | ||
| + project_dir = 'checksum-algorithms' |
| + self.assertRaises(subprocess.CalledProcessError, | ||
| + self.run_snapcraft, | ||
| + ['pull', 'checksum-sha3-512'], | ||
| + project_dir) |
pachulo
Feb 19, 2017
•
Contributor
I don't really know how scenarios look like, but I guess that you are right. Any hint or example to follow?
elopio
Feb 19, 2017
Member
grep for scenarios in the tests directories. You'll find plenty. You need to move this test to it's own test class, and put the parts names in the scenarios list.
Something like:
scenarios = [(part, {'part': part}) for part in ['checksum-md5', 'checksum-sha1', ...]]
Then, during the test will be copied for every part, and self.part will have the value.
| @@ -84,13 +93,19 @@ | ||
| from ._subversion import Subversion # noqa | ||
| from ._tar import Tar # noqa | ||
| from ._zip import Zip # noqa | ||
| +from . import errors | ||
| + | ||
| +if sys.version_info < (3, 6): |
pachulo
Feb 19, 2017
Contributor
I took the example from PR#940 snapcraft/file_utils.py: In python 3.6 sha3 support is upstreamed in hashlib.
| + open(file_to_zip, 'w').close() | ||
| + zip_file = zipfile.ZipFile(os.path.join('src', 'test.zip'), 'w') | ||
| + zip_file.write(file_to_zip) | ||
| + zip_file.close() |
elopio
Feb 14, 2017
Member
on this one, there's no need to make a zip file, right?
The error should happen before verify_checksum tries to open the file, so you can just pass a string like 'dummy.zip', and you test the same code path faster.
pachulo
Feb 19, 2017
Contributor
Yes, we need to make a file. That was my first attempt, but I found out that the error happens after verify_checksum opens the file.
elopio
Feb 19, 2017
Member
And does it need to be a valid zip file, or any file will do?
I see that this is not your code, this is the library following the python rule of ask forgiveness, not permission, so it's ok.
| + self.assertRaises(subprocess.CalledProcessError, | ||
| + self.run_snapcraft, | ||
| + ['pull', 'checksum-sha3-512'], | ||
| + project_dir) |
pachulo
Feb 19, 2017
•
Contributor
I don't really know how scenarios look like, but I guess that you are right. Any hint or example to follow?
elopio
Feb 19, 2017
Member
grep for scenarios in the tests directories. You'll find plenty. You need to move this test to it's own test class, and put the parts names in the scenarios list.
Something like:
scenarios = [(part, {'part': part}) for part in ['checksum-md5', 'checksum-sha1', ...]]
Then, during the test will be copied for every part, and self.part will have the value.
pachulo
added some commits
Feb 28, 2017
elopio
approved these changes
Mar 1, 2017
I have nothing else to add here. Thanks @pachulo!
I will trigger the autopkgtests.
|
\o/ |
pachulo
added some commits
Mar 4, 2017
|
Note that the CLA check is timing out (there are too many commits, haha). I think we can ignore it at this point. I'm going to update the branch and trigger the autopkgtests. |
pachulo commentedDec 15, 2016
•
Edited 1 time
-
pachulo
Dec 23, 2016
Features
The size of sha2 digests is automatically detected, supporting the following bit sizes:224256384512Failsafes
Testing coverage