Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sources: add optional "source-checksum" property #980

Merged

Conversation

pachulo
Copy link
Contributor

@pachulo pachulo commented Dec 15, 2016

Features

  • Solution for LP: #1585913 bug based on work done by @tsimonq2 in PR Add source-checksum option #619 but with support only for checksums specified in the snapcraft.yml file.
  • The code adds the optional property "source-checksum" to check the integrity of the source files (tar, zip, deb or rpm).
  • The format of the property is <algorithm>/<digest>. For example:
    • sha256/de2fb61252548af3c87c4aab17e82601691d19e37fd3d29ea6288e56
  • Currently, the supported algorithms are: md5, sha1, sha224, sha256, sha384, sha512, sha3_256, sha3_384 & sha3_512
  • The sha3 support comes from pysha3 in systems where python version < 3.6.
  • The size of sha2 digests is automatically detected, supporting the following bit sizes:
    • 224
    • 256
    • 384
    • 512

Failsafes

  • It has the following failsafes to ensure the code always works as intended:
    • Special exception for a checksum that doesn't match
    • Raises an exception for a checksum that is invalid, or incompatible with the checksums supported

Testing coverage

  • I have the following testing in place:
    • Integration tests:
      • Testing of each supported algorithm in raw form with a created file (tar) that has a static checksum in the snapcraft.yaml file.
      • A single test in simple-zip.
      • Tests deb-with-checksum and rpm-with-checksum created.
    • Unit tests:
      • Testing for sources that don't allow digest.
      • Testing for sources that allow digest and where the digest:
        • is correct
        • is wrong
        • uses an invalid algorithm

@sergiusens
Copy link
Collaborator

Thanks for your contribution. The timing is a bit unfortunate in that we just finished refactoring that source package. It is unfortunate to not have sha3 in our current version of python which we might just be sticking with for a while. For this reason one of our colleagues is SRUing pysha3 into xenial. Maybe @squidsoup can give us an update on that.

@squidsoup
Copy link
Contributor

squidsoup commented Dec 15, 2016

pysha3 is currently in the proposed queue for zesty (https://launchpad.net/ubuntu/zesty/+queue?queue_state=0&queue_text=pysha3), after which it will be SRU'd to xenial. Unfortunately I can't provide any timeframes at the moment.

@tsimonq2
Copy link
Contributor

I would be interested to see what this looks like once the conflicts are resolved.

Thanks for finishing that up, and sorry for going MIA!

@pachulo
Copy link
Contributor Author

pachulo commented Dec 16, 2016

So what should I do @sergiusens ? Rebase the PR so it can be merged?
Can we go on without the SHA3 support for now and add it when the library is available in the repos?
I think that this feature is a good one to have in snapcraft as soon as possible.

Thanks a lot to you @tsimonq2 !

@kyrofa
Copy link
Contributor

kyrofa commented Dec 21, 2016

So what should I do @sergiusens ? Rebase the PR so it can be merged?

I'd say yes.

Can we go on without the SHA3 support for now and add it when the library is available in the repos?

I'd also say yes to this.

…nonical#619

The code adds the optional property "source-checksum" to check the integrity of the source files (tar, zip, deb or rpm).
The format of the property is <algorithm>/<digest>. For example:
sha2/de2fb61252548af3c87c4aab17e82601691d19e37fd3d29ea6288e56
The currently supported algorithms are: md5, sha1, sha2.
The size of the digest for sha2 is automatically detected.
The support for the sha3 is not implemented, as it requires python 3.6: https://docs.python.org/3.6/whatsnew/3.6.html#hashlib or a library not present yet in the Ubuntu repositories (pysha3). Once ready it will be easy to add it.
@pachulo pachulo force-pushed the downloaded-files-checksum-bug-1585913 branch from 8b2254b to 0cc9f13 Compare December 22, 2016 15:46
@pachulo
Copy link
Contributor Author

pachulo commented Dec 22, 2016

@sergiusens @kyrofa rebase done!

Additionally I:

  • Tried to put everything in it's place: the new error handle, the function that verifies the checksum, etc.
  • Deleted the code to support sha3 digests (instead of leaving it commented): once the library is added to the repos, it should be a breeze to add it.

Waiting for your comments!

Copy link
Contributor

@kyrofa kyrofa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pachulo! I have a few suggestions, but I like where this is heading.


Snapcraft will use the digest specified to verify the integrity of the
source. The source-type needs to be a file (tar, zip, deb or rpm) and
the algorithm either md5, sha1, sha2 or sha3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sha3 right now, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed!

@@ -62,3 +65,35 @@ def download(self):
request.raise_for_status()

download_requests_stream(request, self.file)

def verify_checksum(source_checksum, checkfile):
Copy link
Contributor

@kyrofa kyrofa Dec 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an enormous fan of relying on the length to tell us what hash we think it is (a lot of magic numbers), and I don't think it's particularly helpful to our users. I say we just calculate the hash using the digest they told us to use and compare, simple as that. Something like this:

def verify_checksum(source_checksum, checkfile):
    try:
        digest, checksum = source_checksum.split('/', 1)
    catch ValueError:
        raise ValueError('invalid checksum format: {!r}'.format(source_checksum))

    with open(checkfile, 'rb') as f:
        # This will raise a ValueError if digest is unsupported
        checksum = hashlib.new(digest, f.read())

    digest = checksum.hexdigest()
    if digest != source_checksum_digest:
        # Make sure this exception has a good error message, printing both digests.
        raise errors.ChecksumDoesNotMatchError(digest, source_checksum_digest)

It also immediately expands the number of hashing algorithms we support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this intended to be a method on FileBase? It shouldn't be, and, well, it's not (no self), but then it's used as if it is. It should probably be in sources/__init__.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I didn't really know where to put it, so I put it in FileBase, but it makes much more sense in sources/__init__.py
Moved and modified the implementation to one similar to your example.

command=None):
def __init__(self, source, source_dir, source_checksum=None,
source_tag=None, source_commit=None, source_branch=None,
source_depth=None, command=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why source_checksum is added at the beginning of named arguments instead of the end, where it'd cause a smaller diff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, modified!

fmt = '{message}'

def __init__(self, message):
super().__init__(message=message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would do more for you if it accepted an expected and actual digest and printed a decent formatted message on its own.

Copy link
Contributor Author

@pachulo pachulo Dec 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it makes a lot of sense to do it in the the error init. Modified.

@codecov-io
Copy link

codecov-io commented Dec 23, 2016

Codecov Report

❗ No coverage uploaded for pull request base (master@2426177). Click here to learn what that means.
The diff coverage is 89.13%.

@@            Coverage Diff            @@
##             master     #980   +/-   ##
=========================================
  Coverage          ?   96.35%           
=========================================
  Files             ?      195           
  Lines             ?    17903           
  Branches          ?     1379           
=========================================
  Hits              ?    17250           
  Misses            ?      441           
  Partials          ?      212
Impacted Files Coverage Δ
snapcraft/tests/init.py 90.74% <ø> (ø)
snapcraft/internal/pluginhandler/init.py 92.12% <ø> (ø)
snapcraft/internal/sources/_subversion.py 100% <100%> (ø)
snapcraft/internal/sources/_base.py 100% <100%> (ø)
snapcraft/internal/sources/_bazaar.py 100% <100%> (ø)
snapcraft/tests/sources/test_git.py 100% <100%> (ø)
snapcraft/tests/sources/test_subversion.py 100% <100%> (ø)
snapcraft/internal/sources/_mercurial.py 100% <100%> (ø)
snapcraft/internal/sources/errors.py 100% <100%> (ø)
snapcraft/tests/sources/test_mercurial.py 100% <100%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2426177...fbf2812. Read the comment docs.

@sergiusens sergiusens changed the title Add optional "source-checksum" property sources: add optional "source-checksum" property Jan 3, 2017
algorithm, digest = source_checksum.split('/', 1)

except ValueError:
raise ValueError('invalid checksum format: {!r}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is a little weird through here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified!

@@ -23,3 +23,13 @@ class IncompatibleOptionsError(errors.SnapcraftError):

def __init__(self, message):
super().__init__(message=message)


class DigestDoesNotMatchError(errors.SnapcraftError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SnapcraftError class is a little smarter than that:

class DigestDoesNotMatchError(errors.SnapcraftError):
    fmt = 'Expected the digest for source to be {expected}, but it was {calculated}'

    def __init__(self, expected, calculated):
        super().__init__(expected=expected, calculated=calculated)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't really know how to do it...thanks for the tip!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember writing special exceptions, didn't know we got a new error interface since I initially wrote this, cool!

Copy link
Contributor

@kyrofa kyrofa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I'm happy with this, thanks @pachulo!

Copy link
Contributor

@come-maiz come-maiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution!
I've left some minor comments. The big issue here is that there are no unit tests.
It needs unit tests for sources that don't allow digest, for sources that allow digest and the digest is right and wrong.

I can guide you with the tests, if you need a hand.

@@ -10,3 +10,39 @@ parts:
source: simple.zip
files:
"*": .
checksum-md5:
plugin: copy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use dump instead of copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -40,3 +40,27 @@ parts:
plugin: tar-content
source: simple.tar.bz2
destination: destdir1/destdir2
checksum-md5:
plugin: tar-content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use dump instead of tar-content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -37,6 +37,9 @@ def __init__(self, source, source_dir, source_tag=None, source_commit=None,
raise errors.IncompatibleOptionsError(
'can\'t specify both source-tag and source-commit for '
'a bzr source')
if source_checksum:
raise errors.IncompatibleOptionsError(
'can\'t specify a source-checksum for a bzr source')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny detail, it's more readable like:
"can't specify a source ..."
It's the only exception where we like double quotes. I know the lines above are not following it, we can fix them later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!


class DigestDoesNotMatchError(errors.SnapcraftError):

fmt = 'Expected the digest for source to be {expected}, '\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the \ in there. You can just continue strings in the following line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

@pachulo
Copy link
Contributor Author

pachulo commented Jan 12, 2017

Modified with the commented stuff, but I will definitely need guidance with those unit tests @ElOpio !

@pachulo
Copy link
Contributor Author

pachulo commented Jan 19, 2017

@ElOpio I added the unit tests for the sources that don't allow digests, but I would need help to create the ones for the sources that do admit them: Can you give me an example?

@sergiusens
Copy link
Collaborator

thanks @pachulo I triggered the cla test to rerun. There are however, some conflicts to be taken care of.

Copy link
Contributor

@come-maiz come-maiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking very good @pachulo.
One small detail, please check that all the files you touched have 2017 in the copyright date at the top.

In order to test the integration for invalid checksums, you could copy checksum-algorithms to checksum-algorithms-invalid, and change one character on each checksum.

Then on test_checksum_algorithms you can use test scenarios to run snapcraft pull {part-name} for each part, and use assertRaises to check that they all fail.

I'm in rocket.ubuntu.com if you need a hand.

@@ -0,0 +1,82 @@
# -*- Mode:Python; indent-tabs-mode:nil; tab-width:4 -*-
#
# Copyright (C) 2016 Canonical Ltd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the copyright year.

@pachulo
Copy link
Contributor Author

pachulo commented Feb 12, 2017

Good night! I think that now everything is in it's place! I'm waiting for your comments (or even approval)!

Copy link
Contributor

@come-maiz come-maiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks very good to me. I just left a few more comments. Thanks!

self.assertRaises(subprocess.CalledProcessError,
self.run_snapcraft,
['pull', 'checksum-sha3-512'],
project_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would look better as scenarios, don't you think?

Copy link
Contributor Author

@pachulo pachulo Feb 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know how scenarios look like, but I guess that you are right. Any hint or example to follow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep for scenarios in the tests directories. You'll find plenty. You need to move this test to it's own test class, and put the parts names in the scenarios list.

Something like:

scenarios = [(part, {'part': part}) for part in ['checksum-md5', 'checksum-sha1', ...]]

Then, during the test will be copied for every part, and self.part will have the value.

class ChecksumAlgorithmsTestCase(integration_tests.TestCase):

def test_checksum_algorithms(self):
project_dir = 'checksum-algorithms'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add deb-with-checksum and rpm-with-checksum as scenarios here.

@@ -84,13 +93,19 @@
from ._subversion import Subversion # noqa
from ._tar import Tar # noqa
from ._zip import Zip # noqa
from . import errors

if sys.version_info < (3, 6):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here? What happens on 3.6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the example from PR#940 snapcraft/file_utils.py: In python 3.6 sha3 support is upstreamed in hashlib.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. Please leave that as a comment in the code.

open(file_to_zip, 'w').close()
zip_file = zipfile.ZipFile(os.path.join('src', 'test.zip'), 'w')
zip_file.write(file_to_zip)
zip_file.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on this one, there's no need to make a zip file, right?
The error should happen before verify_checksum tries to open the file, so you can just pass a string like 'dummy.zip', and you test the same code path faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to make a file. That was my first attempt, but I found out that the error happens after verify_checksum opens the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And does it need to be a valid zip file, or any file will do?

I see that this is not your code, this is the library following the python rule of ask forgiveness, not permission, so it's ok.

self.assertRaises(subprocess.CalledProcessError,
self.run_snapcraft,
['pull', 'checksum-sha3-512'],
project_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep for scenarios in the tests directories. You'll find plenty. You need to move this test to it's own test class, and put the parts names in the scenarios list.

Something like:

scenarios = [(part, {'part': part}) for part in ['checksum-md5', 'checksum-sha1', ...]]

Then, during the test will be copied for every part, and self.part will have the value.

Copy link
Contributor

@come-maiz come-maiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have nothing else to add here. Thanks @pachulo!
I will trigger the autopkgtests.

@tsimonq2
Copy link
Contributor

tsimonq2 commented Mar 1, 2017

\o/

@kyrofa
Copy link
Contributor

kyrofa commented Mar 8, 2017

Note that the CLA check is timing out (there are too many commits, haha). I think we can ignore it at this point. I'm going to update the branch and trigger the autopkgtests.

@kyrofa kyrofa merged commit 3e9c4d3 into canonical:master Mar 8, 2017
come-maiz pushed a commit that referenced this pull request Mar 27, 2017
The format of the property is `<algorithm>/<digest>`. For example:

    parts:
      with-checksum:
        plugin: dump
        source: my/tarball.tar.gz
        source-checksum: md5/d9210476aac5f367b14e513bdefdee09

LP: #1585913
kalikiana pushed a commit to kalikiana/snapcraft that referenced this pull request Apr 6, 2017
The format of the property is `<algorithm>/<digest>`. For example:

    parts:
      with-checksum:
        plugin: dump
        source: my/tarball.tar.gz
        source-checksum: md5/d9210476aac5f367b14e513bdefdee09

LP: #1585913
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants