sources: add optional "source-checksum" property #980

pachulo · 2016-12-15T11:48:12Z

Features

Solution for LP: #1585913 bug based on work done by @tsimonq2 in PR Add source-checksum option #619 but with support only for checksums specified in the snapcraft.yml file.
The code adds the optional property "source-checksum" to check the integrity of the source files (tar, zip, deb or rpm).
The format of the property is <algorithm>/<digest>. For example:
- sha256/de2fb61252548af3c87c4aab17e82601691d19e37fd3d29ea6288e56
Currently, the supported algorithms are: md5, sha1, sha224, sha256, sha384, sha512, sha3_256, sha3_384 & sha3_512
The sha3 support comes from pysha3 in systems where python version < 3.6.
~~The size of sha2 digests is automatically detected, supporting the following bit sizes:~~
- ~~224~~
- ~~256~~
- ~~384~~
- ~~512~~

Failsafes

It has the following failsafes to ensure the code always works as intended:
- Special exception for a checksum that doesn't match
- Raises an exception for a checksum that is invalid, or incompatible with the checksums supported

Testing coverage

I have the following testing in place:
- Integration tests:
  - Testing of each supported algorithm in raw form with a created file (tar) that has a static checksum in the snapcraft.yaml file.
  - A single test in simple-zip.
  - Tests deb-with-checksum and rpm-with-checksum created.
- Unit tests:
  - Testing for sources that don't allow digest.
  - Testing for sources that allow digest and where the digest:
    - is correct
    - is wrong
    - uses an invalid algorithm

sergiusens · 2016-12-15T14:12:06Z

Thanks for your contribution. The timing is a bit unfortunate in that we just finished refactoring that source package. It is unfortunate to not have sha3 in our current version of python which we might just be sticking with for a while. For this reason one of our colleagues is SRUing pysha3 into xenial. Maybe @squidsoup can give us an update on that.

squidsoup · 2016-12-15T23:54:00Z

pysha3 is currently in the proposed queue for zesty (https://launchpad.net/ubuntu/zesty/+queue?queue_state=0&queue_text=pysha3), after which it will be SRU'd to xenial. Unfortunately I can't provide any timeframes at the moment.

tsimonq2 · 2016-12-16T01:43:51Z

I would be interested to see what this looks like once the conflicts are resolved.

Thanks for finishing that up, and sorry for going MIA!

pachulo · 2016-12-16T09:44:23Z

So what should I do @sergiusens ? Rebase the PR so it can be merged?
Can we go on without the SHA3 support for now and add it when the library is available in the repos?
I think that this feature is a good one to have in snapcraft as soon as possible.

Thanks a lot to you @tsimonq2 !

kyrofa · 2016-12-21T22:19:34Z

So what should I do @sergiusens ? Rebase the PR so it can be merged?

I'd say yes.

Can we go on without the SHA3 support for now and add it when the library is available in the repos?

I'd also say yes to this.

…nonical#619 The code adds the optional property "source-checksum" to check the integrity of the source files (tar, zip, deb or rpm). The format of the property is <algorithm>/<digest>. For example: sha2/de2fb61252548af3c87c4aab17e82601691d19e37fd3d29ea6288e56 The currently supported algorithms are: md5, sha1, sha2. The size of the digest for sha2 is automatically detected. The support for the sha3 is not implemented, as it requires python 3.6: https://docs.python.org/3.6/whatsnew/3.6.html#hashlib or a library not present yet in the Ubuntu repositories (pysha3). Once ready it will be easy to add it.

pachulo · 2016-12-22T15:53:49Z

@sergiusens @kyrofa rebase done!

Additionally I:

Tried to put everything in it's place: the new error handle, the function that verifies the checksum, etc.
Deleted the code to support sha3 digests (instead of leaving it commented): once the library is added to the repos, it should be a breeze to add it.

Waiting for your comments!

kyrofa

Thanks @pachulo! I have a few suggestions, but I like where this is heading.

kyrofa · 2016-12-22T21:15:29Z

snapcraft/internal/sources/__init__.py

+
+    Snapcraft will use the digest specified to verify the integrity of the
+    source. The source-type needs to be a file (tar, zip, deb or rpm) and
+    the algorithm either md5, sha1, sha2 or sha3.


Not sha3 right now, correct?

Yes indeed!

kyrofa · 2016-12-22T21:55:15Z

snapcraft/internal/sources/_base.py

@@ -62,3 +65,35 @@ def download(self):
            request.raise_for_status()

            download_requests_stream(request, self.file)
+
+    def verify_checksum(source_checksum, checkfile):


I'm not an enormous fan of relying on the length to tell us what hash we think it is (a lot of magic numbers), and I don't think it's particularly helpful to our users. I say we just calculate the hash using the digest they told us to use and compare, simple as that. Something like this:

def verify_checksum(source_checksum, checkfile): try: digest, checksum = source_checksum.split('/', 1) catch ValueError: raise ValueError('invalid checksum format: {!r}'.format(source_checksum)) with open(checkfile, 'rb') as f: # This will raise a ValueError if digest is unsupported checksum = hashlib.new(digest, f.read()) digest = checksum.hexdigest() if digest != source_checksum_digest: # Make sure this exception has a good error message, printing both digests. raise errors.ChecksumDoesNotMatchError(digest, source_checksum_digest)

It also immediately expands the number of hashing algorithms we support.

Also, is this intended to be a method on FileBase? It shouldn't be, and, well, it's not (no self), but then it's used as if it is. It should probably be in sources/__init__.py.

No, I didn't really know where to put it, so I put it in FileBase, but it makes much more sense in sources/__init__.py
Moved and modified the implementation to one similar to your example.

kyrofa · 2016-12-22T21:57:59Z

snapcraft/internal/sources/_base.py

-                 command=None):
+    def __init__(self, source, source_dir, source_checksum=None,
+                 source_tag=None, source_commit=None, source_branch=None,
+                 source_depth=None, command=None):


Is there a reason why source_checksum is added at the beginning of named arguments instead of the end, where it'd cause a smaller diff?

not really, modified!

kyrofa · 2016-12-22T22:05:32Z

snapcraft/internal/sources/errors.py

+    fmt = '{message}'
+
+    def __init__(self, message):
+        super().__init__(message=message)


This would do more for you if it accepted an expected and actual digest and printed a decent formatted message on its own.

Yeah, it makes a lot of sense to do it in the the error init. Modified.

codecov-io · 2016-12-23T23:19:18Z

Codecov Report

❗ No coverage uploaded for pull request base (master@2426177). Click here to learn what that means.
The diff coverage is 89.13%.

@@            Coverage Diff            @@
##             master     #980   +/-   ##
=========================================
  Coverage          ?   96.35%           
=========================================
  Files             ?      195           
  Lines             ?    17903           
  Branches          ?     1379           
=========================================
  Hits              ?    17250           
  Misses            ?      441           
  Partials          ?      212

Impacted Files	Coverage Δ
snapcraft/tests/init.py	`90.74% <ø> (ø)`
snapcraft/internal/pluginhandler/init.py	`92.12% <ø> (ø)`
snapcraft/internal/sources/_subversion.py	`100% <100%> (ø)`
snapcraft/internal/sources/_base.py	`100% <100%> (ø)`
snapcraft/internal/sources/_bazaar.py	`100% <100%> (ø)`
snapcraft/tests/sources/test_git.py	`100% <100%> (ø)`
snapcraft/tests/sources/test_subversion.py	`100% <100%> (ø)`
snapcraft/internal/sources/_mercurial.py	`100% <100%> (ø)`
snapcraft/internal/sources/errors.py	`100% <100%> (ø)`
snapcraft/tests/sources/test_mercurial.py	`100% <100%> (ø)`
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2426177...fbf2812. Read the comment docs.

kyrofa · 2017-01-03T18:38:49Z

snapcraft/internal/sources/__init__.py

+        algorithm, digest = source_checksum.split('/', 1)
+
+    except ValueError:
+            raise ValueError('invalid checksum format: {!r}'


Indentation is a little weird through here.

kyrofa · 2017-01-03T18:52:09Z

snapcraft/internal/sources/errors.py

@@ -23,3 +23,13 @@ class IncompatibleOptionsError(errors.SnapcraftError):

    def __init__(self, message):
        super().__init__(message=message)
+
+
+class DigestDoesNotMatchError(errors.SnapcraftError):


The SnapcraftError class is a little smarter than that:

class DigestDoesNotMatchError(errors.SnapcraftError): fmt = 'Expected the digest for source to be {expected}, but it was {calculated}' def __init__(self, expected, calculated): super().__init__(expected=expected, calculated=calculated)

Didn't really know how to do it...thanks for the tip!

I remember writing special exceptions, didn't know we got a new error interface since I initially wrote this, cool!

…chulo/snapcraft into downloaded-files-checksum-bug-1585913

kyrofa

Alright I'm happy with this, thanks @pachulo!

come-maiz

Thanks a lot for your contribution!
I've left some minor comments. The big issue here is that there are no unit tests.
It needs unit tests for sources that don't allow digest, for sources that allow digest and the digest is right and wrong.

I can guide you with the tests, if you need a hand.

come-maiz · 2017-01-11T17:50:19Z

integration_tests/snaps/simple-zip/snapcraft.yaml

@@ -10,3 +10,39 @@ parts:
    source: simple.zip
    files:
        "*": .
+  checksum-md5:
+    plugin: copy


please use dump instead of copy

come-maiz · 2017-01-11T17:51:14Z

integration_tests/snaps/simple-tar/snapcraft.yaml

@@ -40,3 +40,27 @@ parts:
    plugin: tar-content
    source: simple.tar.bz2
    destination: destdir1/destdir2
+  checksum-md5:
+    plugin: tar-content


please use dump instead of tar-content

come-maiz · 2017-01-11T17:54:47Z

snapcraft/internal/sources/_bazaar.py

@@ -37,6 +37,9 @@ def __init__(self, source, source_dir, source_tag=None, source_commit=None,
            raise errors.IncompatibleOptionsError(
                'can\'t specify both source-tag and source-commit for '
                'a bzr source')
+        if source_checksum:
+            raise errors.IncompatibleOptionsError(
+                'can\'t specify a source-checksum for a bzr source')


tiny detail, it's more readable like:
"can't specify a source ..."
It's the only exception where we like double quotes. I know the lines above are not following it, we can fix them later.

come-maiz · 2017-01-11T18:01:35Z

snapcraft/internal/sources/errors.py

+
+class DigestDoesNotMatchError(errors.SnapcraftError):
+
+    fmt = 'Expected the digest for source to be {expected}, '\


no need for the \ in there. You can just continue strings in the following line.

pachulo · 2017-01-12T02:31:50Z

Modified with the commented stuff, but I will definitely need guidance with those unit tests @ElOpio !

pachulo · 2017-01-19T12:13:28Z

@ElOpio I added the unit tests for the sources that don't allow digests, but I would need help to create the ones for the sources that do admit them: Can you give me an example?

sergiusens · 2017-02-06T18:26:32Z

thanks @pachulo I triggered the cla test to rerun. There are however, some conflicts to be taken care of.

come-maiz

This is looking very good @pachulo.
One small detail, please check that all the files you touched have 2017 in the copyright date at the top.

In order to test the integration for invalid checksums, you could copy checksum-algorithms to checksum-algorithms-invalid, and change one character on each checksum.

Then on test_checksum_algorithms you can use test scenarios to run snapcraft pull {part-name} for each part, and use assertRaises to check that they all fail.

I'm in rocket.ubuntu.com if you need a hand.

come-maiz · 2017-02-09T00:17:11Z

snapcraft/tests/sources/test_checksum.py

@@ -0,0 +1,82 @@
+# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*-
+#
+# Copyright (C) 2016 Canonical Ltd


please update the copyright year.

…ums.

…chulo/snapcraft into downloaded-files-checksum-bug-1585913

pachulo · 2017-02-12T21:55:22Z

Good night! I think that now everything is in it's place! I'm waiting for your comments (or even approval)!

come-maiz

looks very good to me. I just left a few more comments. Thanks!

come-maiz · 2017-02-14T04:01:39Z

integration_tests/test_checksum_algorithms.py

+        self.assertRaises(subprocess.CalledProcessError,
+                          self.run_snapcraft,
+                          ['pull', 'checksum-sha3-512'],
+                          project_dir)


This would look better as scenarios, don't you think?

I don't really know how scenarios look like, but I guess that you are right. Any hint or example to follow?

grep for scenarios in the tests directories. You'll find plenty. You need to move this test to it's own test class, and put the parts names in the scenarios list.

Something like:

scenarios = [(part, {'part': part}) for part in ['checksum-md5', 'checksum-sha1', ...]]

Then, during the test will be copied for every part, and self.part will have the value.

come-maiz · 2017-02-14T04:02:43Z

integration_tests/test_checksum_algorithms.py

+class ChecksumAlgorithmsTestCase(integration_tests.TestCase):
+
+    def test_checksum_algorithms(self):
+        project_dir = 'checksum-algorithms'


you could add deb-with-checksum and rpm-with-checksum as scenarios here.

come-maiz · 2017-02-14T04:03:46Z

snapcraft/internal/sources/__init__.py

@@ -84,13 +93,19 @@
 from ._subversion import Subversion  # noqa
 from ._tar import Tar                # noqa
 from ._zip import Zip                # noqa
+from . import errors
+
+if sys.version_info < (3, 6):


why is this here? What happens on 3.6?

I took the example from PR#940 snapcraft/file_utils.py: In python 3.6 sha3 support is upstreamed in hashlib.

got it. Please leave that as a comment in the code.

come-maiz · 2017-02-14T04:06:06Z

snapcraft/tests/sources/test_checksum.py

+        open(file_to_zip, 'w').close()
+        zip_file = zipfile.ZipFile(os.path.join('src', 'test.zip'), 'w')
+        zip_file.write(file_to_zip)
+        zip_file.close()


on this one, there's no need to make a zip file, right?
The error should happen before verify_checksum tries to open the file, so you can just pass a string like 'dummy.zip', and you test the same code path faster.

Yes, we need to make a file. That was my first attempt, but I found out that the error happens after verify_checksum opens the file.

And does it need to be a valid zip file, or any file will do?

I see that this is not your code, this is the library following the python rule of ask forgiveness, not permission, so it's ok.

come-maiz · 2017-02-19T21:15:32Z

integration_tests/test_checksum_algorithms.py

+        self.assertRaises(subprocess.CalledProcessError,
+                          self.run_snapcraft,
+                          ['pull', 'checksum-sha3-512'],
+                          project_dir)


grep for scenarios in the tests directories. You'll find plenty. You need to move this test to it's own test class, and put the parts names in the scenarios list.

Something like:

scenarios = [(part, {'part': part}) for part in ['checksum-md5', 'checksum-sha1', ...]]

Then, during the test will be copied for every part, and self.part will have the value.

come-maiz

I have nothing else to add here. Thanks @pachulo!
I will trigger the autopkgtests.

tsimonq2 · 2017-03-01T23:06:41Z

\o/

kyrofa · 2017-03-08T00:48:00Z

Note that the CLA check is timing out (there are too many commits, haha). I think we can ignore it at this point. I'm going to update the branch and trigger the autopkgtests.

The format of the property is `<algorithm>/<digest>`. For example: parts: with-checksum: plugin: dump source: my/tarball.tar.gz source-checksum: md5/d9210476aac5f367b14e513bdefdee09 LP: #1585913

pachulo force-pushed the downloaded-files-checksum-bug-1585913 branch from 8b2254b to 0cc9f13 Compare December 22, 2016 15:46

kyrofa suggested changes Dec 22, 2016

View reviewed changes

pachulo added 2 commits December 23, 2016 19:42

Modified after kyrofa comments.

7b29f0c

Change order of named arguments to make the diff smaller.

e869408

Merge branch 'master' into downloaded-files-checksum-bug-1585913

d1a01bb

sergiusens changed the title ~~Add optional "source-checksum" property~~ sources: add optional "source-checksum" property Jan 3, 2017

kyrofa reviewed Jan 3, 2017

View reviewed changes

pachulo added 3 commits January 4, 2017 10:52

Modified after kyrofa comments.

f11861f

Merge branch 'downloaded-files-checksum-bug-1585913' of github.com:pa…

542794e

…chulo/snapcraft into downloaded-files-checksum-bug-1585913

Merge branch 'master' into downloaded-files-checksum-bug-1585913

224c30e

kyrofa approved these changes Jan 6, 2017

View reviewed changes

come-maiz suggested changes Jan 11, 2017

View reviewed changes

pachulo added 2 commits January 12, 2017 02:59

Merge branch 'master' into downloaded-files-checksum-bug-1585913

53ffa26

Modified after elopio comments.

7e4f8b6

pachulo added 3 commits January 18, 2017 10:31

Merge branch 'master' into downloaded-files-checksum-bug-1585913

cca7c19

Merge branch 'master' into downloaded-files-checksum-bug-1585913

0150b31

Add unit tests for sources that don't allow digests.

195582f

pachulo added 2 commits January 26, 2017 19:02

Merge branch 'master' into downloaded-files-checksum-bug-1585913

6e5d09e

Merge branch 'master' into downloaded-files-checksum-bug-1585913

a797f38

Conflicts resolved.

e372835

come-maiz suggested changes Feb 9, 2017

View reviewed changes

pachulo added 8 commits February 10, 2017 16:06

Resolve conflicts and update copyright year.

d5ffa72

First (failed) attempt to create integration tests for invalid checks…

ecbe895

…ums.

Change the copyright year in all the files I've touched.

c0054bb

Merge branch 'master' into downloaded-files-checksum-bug-1585913

ee15bd2

Restore silent=False in _git.py

d9d6a5d

Merge branch 'downloaded-files-checksum-bug-1585913' of github.com:pa…

2e901c9

…chulo/snapcraft into downloaded-files-checksum-bug-1585913

Finish integration tests for invalid checksums.

219cded

Add sha3 support & other fixes.

72b3ab7

pachulo added 2 commits February 12, 2017 22:10

Remove build-packages for checksum integration_tests snaps.

d343232

Merge branch 'master' into downloaded-files-checksum-bug-1585913

42befec

come-maiz suggested changes Feb 14, 2017

View reviewed changes

Merge branch 'master' into downloaded-files-checksum-bug-1585913

ee7d5eb

come-maiz suggested changes Feb 19, 2017

View reviewed changes

pachulo added 3 commits February 28, 2017 22:15

Merge branch 'master' into downloaded-files-checksum-bug-1585913

b05380a

Changes after elopio comments (testing with scenarios mainly).

a6d44d9

Create a dummy file for the zip test.

3283f0d

come-maiz approved these changes Mar 1, 2017

View reviewed changes

pachulo added 2 commits March 5, 2017 00:38

Merge branch 'master' into downloaded-files-checksum-bug-1585913

012f7e4

Merge branch 'master' into downloaded-files-checksum-bug-1585913

fbf2812

Kyle Fazzari added 2 commits March 7, 2017 16:48

Merge branch 'master' into downloaded-files-checksum-bug-1585913

70066d8

Merge branch 'master' into downloaded-files-checksum-bug-1585913

5936297

kyrofa merged commit 3e9c4d3 into canonical:master Mar 8, 2017


		class DigestDoesNotMatchError(errors.SnapcraftError):

		fmt = 'Expected the digest for source to be {expected}, '\

sources: add optional "source-checksum" property #980

sources: add optional "source-checksum" property #980

Conversation

pachulo commented Dec 15, 2016 • edited

Features

Failsafes

Testing coverage

sergiusens commented Dec 15, 2016

squidsoup commented Dec 15, 2016 • edited

tsimonq2 commented Dec 16, 2016

pachulo commented Dec 16, 2016

kyrofa commented Dec 21, 2016

pachulo commented Dec 22, 2016

kyrofa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyrofa Dec 22, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pachulo Dec 23, 2016 • edited

Choose a reason for hiding this comment

codecov-io commented Dec 23, 2016 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyrofa left a comment

Choose a reason for hiding this comment

come-maiz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pachulo commented Jan 12, 2017

pachulo commented Jan 19, 2017

sergiusens commented Feb 6, 2017

come-maiz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pachulo commented Feb 12, 2017

come-maiz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pachulo Feb 19, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

come-maiz left a comment

Choose a reason for hiding this comment

tsimonq2 commented Mar 1, 2017

kyrofa commented Mar 8, 2017

pachulo commented Dec 15, 2016 •

edited

squidsoup commented Dec 15, 2016 •

edited

kyrofa Dec 22, 2016 •

edited

pachulo Dec 23, 2016 •

edited

codecov-io commented Dec 23, 2016 •

edited

pachulo Feb 19, 2017 •

edited