sources: add optional "source-checksum" property #980

Merged
merged 40 commits into from Mar 8, 2017

Conversation

Projects
None yet
7 participants
Contributor

pachulo commented Dec 15, 2016

Features

  • Solution for LP: #1585913 bug based on work done by @tsimonq2 in PR #619 but with support only for checksums specified in the snapcraft.yml file.
  • The code adds the optional property "source-checksum" to check the integrity of the source files (tar, zip, deb or rpm).
  • The format of the property is <algorithm>/<digest>. For example:
    • sha256/de2fb61252548af3c87c4aab17e82601691d19e37fd3d29ea6288e56
  • Currently, the supported algorithms are: md5, sha1, sha224, sha256, sha384, sha512, sha3_256, sha3_384 & sha3_512
  • The sha3 support comes from pysha3 in systems where python version < 3.6.
  • The size of sha2 digests is automatically detected, supporting the following bit sizes:
    • 224
    • 256
    • 384
    • 512

Failsafes

  • It has the following failsafes to ensure the code always works as intended:
    • Special exception for a checksum that doesn't match
    • Raises an exception for a checksum that is invalid, or incompatible with the checksums supported

Testing coverage

  • I have the following testing in place:
    • Integration tests:
      • Testing of each supported algorithm in raw form with a created file (tar) that has a static checksum in the snapcraft.yaml file.
      • A single test in simple-zip.
      • Tests deb-with-checksum and rpm-with-checksum created.
    • Unit tests:
      • Testing for sources that don't allow digest.
      • Testing for sources that allow digest and where the digest:
        • is correct
        • is wrong
        • uses an invalid algorithm
Collaborator

sergiusens commented Dec 15, 2016

Thanks for your contribution. The timing is a bit unfortunate in that we just finished refactoring that source package. It is unfortunate to not have sha3 in our current version of python which we might just be sticking with for a while. For this reason one of our colleagues is SRUing pysha3 into xenial. Maybe @squidsoup can give us an update on that.

Contributor

squidsoup commented Dec 15, 2016

pysha3 is currently in the proposed queue for zesty (https://launchpad.net/ubuntu/zesty/+queue?queue_state=0&queue_text=pysha3), after which it will be SRU'd to xenial. Unfortunately I can't provide any timeframes at the moment.

Contributor

tsimonq2 commented Dec 16, 2016

I would be interested to see what this looks like once the conflicts are resolved.

Thanks for finishing that up, and sorry for going MIA!

Contributor

pachulo commented Dec 16, 2016

So what should I do @sergiusens ? Rebase the PR so it can be merged?
Can we go on without the SHA3 support for now and add it when the library is available in the repos?
I think that this feature is a good one to have in snapcraft as soon as possible.

Thanks a lot to you @tsimonq2 !

Member

kyrofa commented Dec 21, 2016

So what should I do @sergiusens ? Rebase the PR so it can be merged?

I'd say yes.

Can we go on without the SHA3 support for now and add it when the library is available in the repos?

I'd also say yes to this.

Solution for LP: #1585913 bug based on work done by tsimonq2 in PR #619
The code adds the optional property "source-checksum" to check the integrity of the source files (tar, zip, deb or rpm).
The format of the property is <algorithm>/<digest>. For example:
sha2/de2fb61252548af3c87c4aab17e82601691d19e37fd3d29ea6288e56
The currently supported algorithms are: md5, sha1, sha2.
The size of the digest for sha2 is automatically detected.
The support for the sha3 is not implemented, as it requires python 3.6: https://docs.python.org/3.6/whatsnew/3.6.html#hashlib or a library not present yet in the Ubuntu repositories (pysha3). Once ready it will be easy to add it.
Contributor

pachulo commented Dec 22, 2016

@sergiusens @kyrofa rebase done!

Additionally I:

  • Tried to put everything in it's place: the new error handle, the function that verifies the checksum, etc.
  • Deleted the code to support sha3 digests (instead of leaving it commented): once the library is added to the repos, it should be a breeze to add it.

Waiting for your comments!

Thanks @pachulo! I have a few suggestions, but I like where this is heading.

snapcraft/internal/sources/__init__.py
+
+ Snapcraft will use the digest specified to verify the integrity of the
+ source. The source-type needs to be a file (tar, zip, deb or rpm) and
+ the algorithm either md5, sha1, sha2 or sha3.
@kyrofa

kyrofa Dec 22, 2016

Member

Not sha3 right now, correct?

@pachulo

pachulo Dec 23, 2016

Contributor

Yes indeed!

snapcraft/internal/sources/_base.py
- command=None):
+ def __init__(self, source, source_dir, source_checksum=None,
+ source_tag=None, source_commit=None, source_branch=None,
+ source_depth=None, command=None):
@kyrofa

kyrofa Dec 22, 2016

Member

Is there a reason why source_checksum is added at the beginning of named arguments instead of the end, where it'd cause a smaller diff?

@pachulo

pachulo Dec 23, 2016

Contributor

not really, modified!

snapcraft/internal/sources/_base.py
@@ -62,3 +65,35 @@ def download(self):
request.raise_for_status()
download_requests_stream(request, self.file)
+
+ def verify_checksum(source_checksum, checkfile):
@kyrofa

kyrofa Dec 22, 2016

Member

I'm not an enormous fan of relying on the length to tell us what hash we think it is (a lot of magic numbers), and I don't think it's particularly helpful to our users. I say we just calculate the hash using the digest they told us to use and compare, simple as that. Something like this:

def verify_checksum(source_checksum, checkfile):
    try:
        digest, checksum = source_checksum.split('/', 1)
    catch ValueError:
        raise ValueError('invalid checksum format: {!r}'.format(source_checksum))

    with open(checkfile, 'rb') as f:
        # This will raise a ValueError if digest is unsupported
        checksum = hashlib.new(digest, f.read())

    digest = checksum.hexdigest()
    if digest != source_checksum_digest:
        # Make sure this exception has a good error message, printing both digests.
        raise errors.ChecksumDoesNotMatchError(digest, source_checksum_digest)

It also immediately expands the number of hashing algorithms we support.

@kyrofa

kyrofa Dec 22, 2016

Member

Also, is this intended to be a method on FileBase? It shouldn't be, and, well, it's not (no self), but then it's used as if it is. It should probably be in sources/__init__.py.

@pachulo

pachulo Dec 23, 2016

Contributor

No, I didn't really know where to put it, so I put it in FileBase, but it makes much more sense in sources/__init__.py
Moved and modified the implementation to one similar to your example.

snapcraft/internal/sources/errors.py
+ fmt = '{message}'
+
+ def __init__(self, message):
+ super().__init__(message=message)
@kyrofa

kyrofa Dec 22, 2016

Member

This would do more for you if it accepted an expected and actual digest and printed a decent formatted message on its own.

@pachulo

pachulo Dec 23, 2016

Contributor

Yeah, it makes a lot of sense to do it in the the error init. Modified.

pachulo added some commits Dec 23, 2016

codecov-io commented Dec 23, 2016

Codecov Report

❗️ No coverage uploaded for pull request base (master@2426177). Click here to learn what that means.
The diff coverage is 89.13%.

@@            Coverage Diff            @@
##             master     #980   +/-   ##
=========================================
  Coverage          ?   96.35%           
=========================================
  Files             ?      195           
  Lines             ?    17903           
  Branches          ?     1379           
=========================================
  Hits              ?    17250           
  Misses            ?      441           
  Partials          ?      212
Impacted Files Coverage Δ
snapcraft/tests/init.py 90.74% <ø> (ø)
snapcraft/internal/pluginhandler/init.py 92.12% <ø> (ø)
snapcraft/internal/sources/_subversion.py 100% <100%> (ø)
snapcraft/internal/sources/_base.py 100% <100%> (ø)
snapcraft/internal/sources/_bazaar.py 100% <100%> (ø)
snapcraft/tests/sources/test_git.py 100% <100%> (ø)
snapcraft/tests/sources/test_subversion.py 100% <100%> (ø)
snapcraft/internal/sources/_mercurial.py 100% <100%> (ø)
snapcraft/internal/sources/errors.py 100% <100%> (ø)
snapcraft/tests/sources/test_mercurial.py 100% <100%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2426177...fbf2812. Read the comment docs.

@sergiusens sergiusens changed the title from Add optional "source-checksum" property to sources: add optional "source-checksum" property Jan 3, 2017

snapcraft/internal/sources/__init__.py
+ algorithm, digest = source_checksum.split('/', 1)
+
+ except ValueError:
+ raise ValueError('invalid checksum format: {!r}'
@kyrofa

kyrofa Jan 3, 2017

Member

Indentation is a little weird through here.

@pachulo

pachulo Jan 4, 2017

Contributor

Modified!

@@ -23,3 +23,13 @@ class IncompatibleOptionsError(errors.SnapcraftError):
def __init__(self, message):
super().__init__(message=message)
+
+
+class DigestDoesNotMatchError(errors.SnapcraftError):
@kyrofa

kyrofa Jan 3, 2017

Member

The SnapcraftError class is a little smarter than that:

class DigestDoesNotMatchError(errors.SnapcraftError):
    fmt = 'Expected the digest for source to be {expected}, but it was {calculated}'

    def __init__(self, expected, calculated):
        super().__init__(expected=expected, calculated=calculated)
@pachulo

pachulo Jan 4, 2017

Contributor

Didn't really know how to do it...thanks for the tip!

@tsimonq2

tsimonq2 Jan 20, 2017

Contributor

I remember writing special exceptions, didn't know we got a new error interface since I initially wrote this, cool!

pachulo added some commits Jan 4, 2017

Merge branch 'downloaded-files-checksum-bug-1585913' of github.com:pa…
…chulo/snapcraft into downloaded-files-checksum-bug-1585913

kyrofa approved these changes Jan 6, 2017

Alright I'm happy with this, thanks @pachulo!

Thanks a lot for your contribution!
I've left some minor comments. The big issue here is that there are no unit tests.
It needs unit tests for sources that don't allow digest, for sources that allow digest and the digest is right and wrong.

I can guide you with the tests, if you need a hand.

@@ -40,3 +40,27 @@ parts:
plugin: tar-content
source: simple.tar.bz2
destination: destdir1/destdir2
+ checksum-md5:
+ plugin: tar-content
@elopio

elopio Jan 11, 2017

Member

please use dump instead of tar-content

@pachulo

pachulo Jan 12, 2017

Contributor

Done!

@@ -10,3 +10,39 @@ parts:
source: simple.zip
files:
"*": .
+ checksum-md5:
+ plugin: copy
@elopio

elopio Jan 11, 2017

Member

please use dump instead of copy

@pachulo

pachulo Jan 12, 2017

Contributor

Done!

snapcraft/internal/sources/_bazaar.py
@@ -37,6 +37,9 @@ def __init__(self, source, source_dir, source_tag=None, source_commit=None,
raise errors.IncompatibleOptionsError(
'can\'t specify both source-tag and source-commit for '
'a bzr source')
+ if source_checksum:
+ raise errors.IncompatibleOptionsError(
+ 'can\'t specify a source-checksum for a bzr source')
@elopio

elopio Jan 11, 2017

Member

tiny detail, it's more readable like:
"can't specify a source ..."
It's the only exception where we like double quotes. I know the lines above are not following it, we can fix them later.

@pachulo

pachulo Jan 12, 2017

Contributor

OK!

snapcraft/internal/sources/errors.py
+
+class DigestDoesNotMatchError(errors.SnapcraftError):
+
+ fmt = 'Expected the digest for source to be {expected}, '\
@elopio

elopio Jan 11, 2017

Member

no need for the \ in there. You can just continue strings in the following line.

@pachulo

pachulo Jan 12, 2017

Contributor

OK!

pachulo added some commits Jan 12, 2017

Contributor

pachulo commented Jan 12, 2017

Modified with the commented stuff, but I will definitely need guidance with those unit tests @elopio !

pachulo added some commits Jan 18, 2017

Contributor

pachulo commented Jan 19, 2017

@elopio I added the unit tests for the sources that don't allow digests, but I would need help to create the ones for the sources that do admit them: Can you give me an example?

pachulo added some commits Jan 26, 2017

Add unit tests for almost all sources that allow digests (unit test f…
…or deb with correct checksum still missing).
Contributor

pachulo commented Jan 31, 2017

@elopio I added most of the unit tests for the sources that allow digests in snapcraft/tests/sources/test_checksum.py: I think that this is what you asked for, but I'm not really sure...

Besides that, I would need help to create the unit test for a deb with a correct checksum: if I understood well, I need a real deb file for this one, but I don't really know how to create a dummy one from python.
I've tried to use small_0.1-1.deb (from integration_tests/snaps/simple-deb) but couldn't get the path right...

snapcraft/tests/sources/test_checksum.py
+
+from snapcraft import tests
+
+#from snapcraft.tests.sources import SourceTestCase
@elopio

elopio Jan 31, 2017

Member

please remove this commented line.

Member

elopio commented Jan 31, 2017

nice work @pachulo! Your tests are correct and nicely coded, but now lets take into account the integration tests so we don't have the same tests twice at different levels.

Please, let me know what do you think about this proposal.

Make a new snap in integration_tests/snaps/checksum-algorithms, and move there all the checksums you have added to simple-tar, and add a test integration_tests/test_checksum_algorithms.py and run snapcraft pull in there. Then simple tar can keep simple, and it will be easier to find the checksum tests.

This integration test will be checking the pull step with all the combinations of algorithms + valid/invalid.
So we could add a valid checksum to simple-deb, simple-zip, simple-rpm, to check the other sources.

This way the integration tests cover the pull step. In the unit tests, we can have a few that directly call verify_checksum. One for correct, one for incorrect, one for invalid algorithm.

After that, the codecov report will tell us which lines we are missing. I think we will only be missing that when a source is instantiated with source_checksum, the provision step will call verify_checksum. But as that is already covered by the integration tests, we can do use mock.patch to inject a fake instead of the real verification.

As I said above, this is just a proposal. Feel free to disagree or to deviated.

Collaborator

sergiusens commented Feb 1, 2017

pysha3 is now in the repos, can you re add the support?

Collaborator

sergiusens commented Feb 1, 2017

Can you remove integration_tests/snaps/checksum-algorithms/icon.png and its icon entry in snapcraft.yaml ?

Contributor

pachulo commented Feb 1, 2017

Hi @elopio !

I think that what you say makes a lot of sense!

I've made the first refactor and I have some comments/doubts:

This integration test will be checking the pull step with all the combinations of algorithms + valid/invalid.

I think that right now the integration tests only check for algorithm+valid, am I wrong?

In the unit tests, we can have a few that directly call verify_checksum. One for correct, one for incorrect, one for invalid algorithm.

As for now only the unit test for invalid algorithm is working, I'm still working on it...

I'm getting a weird error for the one with the correct checksum:

Traceback (most recent call last):
  File "/home/ubuntu/snapcraft/pachulo-snapcraft/snapcraft/tests/sources/test_checksum.py", line 64, in test_correct_checksum
    sources.verify_checksum(source_checksum, 'src/test.zip')
  File "/home/ubuntu/snapcraft/pachulo-snapcraft/snapcraft/internal/sources/__init__.py", line 187, in verify_checksum
    algorithm, digest = source_checksum.split('/', 1)
AttributeError: 'NoneType' object has no attribute 'split'
Contributor

pachulo commented Feb 1, 2017

pysha3 is now in the repos, can you re add the support?

@sergiusens I think that after the changes proposed by @kyrofa a month ago, I will only need to change the documentation and the tests after this PR is accepted: #940

pachulo added some commits Feb 1, 2017

Contributor

pachulo commented Feb 1, 2017

@elopio I think that all tests are working and that they (mostly) adhere to what you proposed!
Can you check my previous comment?

I think that right now the integration tests only check for algorithm+valid, am I wrong?

If so, how should I create an integration test to check algorithm+invalid?

Collaborator

sergiusens commented Feb 6, 2017

@pachulo I am not sure how this has not come up earlier, but we require a CLA to be signed. Have you done so? I ask as the only check that fails is the one to check the CLA

Contributor

pachulo commented Feb 6, 2017

@sergiusens I thought that I did, but I did it again just in case!

Collaborator

sergiusens commented Feb 6, 2017

thanks @pachulo I triggered the cla test to rerun. There are however, some conflicts to be taken care of.

This is looking very good @pachulo.
One small detail, please check that all the files you touched have 2017 in the copyright date at the top.

In order to test the integration for invalid checksums, you could copy checksum-algorithms to checksum-algorithms-invalid, and change one character on each checksum.

Then on test_checksum_algorithms you can use test scenarios to run snapcraft pull {part-name} for each part, and use assertRaises to check that they all fail.

I'm in rocket.ubuntu.com if you need a hand.

snapcraft/tests/sources/test_checksum.py
@@ -0,0 +1,82 @@
+# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*-
+#
+# Copyright (C) 2016 Canonical Ltd
@elopio

elopio Feb 9, 2017

Member

please update the copyright year.

pachulo added some commits Feb 10, 2017

Merge branch 'downloaded-files-checksum-bug-1585913' of github.com:pa…
…chulo/snapcraft into downloaded-files-checksum-bug-1585913
Contributor

pachulo commented Feb 12, 2017

Good night! I think that now everything is in it's place! I'm waiting for your comments (or even approval)!

pachulo added some commits Feb 12, 2017

looks very good to me. I just left a few more comments. Thanks!

+class ChecksumAlgorithmsTestCase(integration_tests.TestCase):
+
+ def test_checksum_algorithms(self):
+ project_dir = 'checksum-algorithms'
@elopio

elopio Feb 14, 2017

Member

you could add deb-with-checksum and rpm-with-checksum as scenarios here.

+ self.assertRaises(subprocess.CalledProcessError,
+ self.run_snapcraft,
+ ['pull', 'checksum-sha3-512'],
+ project_dir)
@elopio

elopio Feb 14, 2017

Member

This would look better as scenarios, don't you think?

@pachulo

pachulo Feb 19, 2017

Contributor

I don't really know how scenarios look like, but I guess that you are right. Any hint or example to follow?

@elopio

elopio Feb 19, 2017

Member

grep for scenarios in the tests directories. You'll find plenty. You need to move this test to it's own test class, and put the parts names in the scenarios list.

Something like:

scenarios = [(part, {'part': part}) for part in ['checksum-md5', 'checksum-sha1', ...]]

Then, during the test will be copied for every part, and self.part will have the value.

snapcraft/internal/sources/__init__.py
@@ -84,13 +93,19 @@
from ._subversion import Subversion # noqa
from ._tar import Tar # noqa
from ._zip import Zip # noqa
+from . import errors
+
+if sys.version_info < (3, 6):
@elopio

elopio Feb 14, 2017

Member

why is this here? What happens on 3.6?

@pachulo

pachulo Feb 19, 2017

Contributor

I took the example from PR#940 snapcraft/file_utils.py: In python 3.6 sha3 support is upstreamed in hashlib.

@elopio

elopio Feb 19, 2017

Member

got it. Please leave that as a comment in the code.

+ open(file_to_zip, 'w').close()
+ zip_file = zipfile.ZipFile(os.path.join('src', 'test.zip'), 'w')
+ zip_file.write(file_to_zip)
+ zip_file.close()
@elopio

elopio Feb 14, 2017

Member

on this one, there's no need to make a zip file, right?
The error should happen before verify_checksum tries to open the file, so you can just pass a string like 'dummy.zip', and you test the same code path faster.

@pachulo

pachulo Feb 19, 2017

Contributor

Yes, we need to make a file. That was my first attempt, but I found out that the error happens after verify_checksum opens the file.

@elopio

elopio Feb 19, 2017

Member

And does it need to be a valid zip file, or any file will do?

I see that this is not your code, this is the library following the python rule of ask forgiveness, not permission, so it's ok.

+ self.assertRaises(subprocess.CalledProcessError,
+ self.run_snapcraft,
+ ['pull', 'checksum-sha3-512'],
+ project_dir)
@elopio

elopio Feb 14, 2017

Member

This would look better as scenarios, don't you think?

@pachulo

pachulo Feb 19, 2017

Contributor

I don't really know how scenarios look like, but I guess that you are right. Any hint or example to follow?

@elopio

elopio Feb 19, 2017

Member

grep for scenarios in the tests directories. You'll find plenty. You need to move this test to it's own test class, and put the parts names in the scenarios list.

Something like:

scenarios = [(part, {'part': part}) for part in ['checksum-md5', 'checksum-sha1', ...]]

Then, during the test will be copied for every part, and self.part will have the value.

elopio approved these changes Mar 1, 2017

I have nothing else to add here. Thanks @pachulo!
I will trigger the autopkgtests.

Contributor

tsimonq2 commented Mar 1, 2017

\o/

Member

kyrofa commented Mar 8, 2017

Note that the CLA check is timing out (there are too many commits, haha). I think we can ignore it at this point. I'm going to update the branch and trigger the autopkgtests.

kyrofa added some commits Mar 8, 2017

@kyrofa kyrofa merged commit 3e9c4d3 into snapcore:master Mar 8, 2017

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
xenial-amd64 autopkgtest finished (success)
Details

elopio added a commit that referenced this pull request Mar 27, 2017

sources: add optional "source-checksum" property (#980)
The format of the property is `<algorithm>/<digest>`. For example:

    parts:
      with-checksum:
        plugin: dump
        source: my/tarball.tar.gz
        source-checksum: md5/d9210476aac5f367b14e513bdefdee09

LP: #1585913

kalikiana pushed a commit to kalikiana/snapcraft that referenced this pull request Apr 6, 2017

sources: add optional "source-checksum" property (#980)
The format of the property is `<algorithm>/<digest>`. For example:

    parts:
      with-checksum:
        plugin: dump
        source: my/tarball.tar.gz
        source-checksum: md5/d9210476aac5f367b14e513bdefdee09

LP: #1585913
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment