Add source-checksum option #619

Closed
wants to merge 67 commits into
from

Conversation

Projects
None yet
4 participants
Contributor

tsimonq2 commented Jun 30, 2016

This adds an optional source-checksum tag to ensure download integrity by checking the checksum.

Features

  • It supports the following source-types:
    • source-type: tar
    • source-type: zip
  • It supports the following checksums:
    • md5
    • sha1
    • sha224
    • sha256
    • sha384
    • sha512
  • You can use it in the following ways:
    • source-checksum: HASH - raw hash
    • source-checksum: checksum.txt - local file
    • source-checksum: http(s)://example.com/checksum - remote URL

Failsafes

It has the following failsafes to ensure the code always works as intended:

  • Special exception for a checksum that doesn't match
  • Raises an exception for a checksum that is invalid, or incompatible with the checksums supported

Testing coverage

I have the following testing in place:

  • Unit tests:
  • Integration tests:
    • Testing of each checksum in raw form with a created file (tar/zip) that has a static checksum in the snapcraft.yaml files for simple-tar and simple-zip

Discrepancies between this specification and the source

If you find any difference between the code in this PR and the specification above, please comment, as that wasn't intended.

tsimonq2 added some commits Jun 29, 2016

Can one of the admins verify this patch?

Can one of the admins verify this patch?

snapcraft/internal/sources.py
from snapcraft.internal import common
-
-logging.getLogger('urllib3').setLevel(logging.CRITICAL)
+logging.getLogger('urllib').setLevel(logging.CRITICAL)
@sergiusens

sergiusens Jun 30, 2016

Collaborator

why this change?

snapcraft/internal/sources.py
# TODO add unit tests.
tarball = os.path.join(self.source_dir, os.path.basename(self.source))
+ self.pre_check_checksum(self, source_checksum, tarball)
snapcraft/internal/sources.py
+ raise IncompatibleOptionsError(
+ 'can\'t specify source-checksum for a zip source right now')
+
+ def pre_check_checksum(self, source_checksum, zip):
@sergiusens

sergiusens Jun 30, 2016

Collaborator

this looks like duplicate code, please consider adding it to FileBase (which also needs its init updated)

Collaborator

sergiusens commented Jun 30, 2016

Please merge with master or rebase

tsimonq2 added some commits Jun 30, 2016

Merge branch 'downloaded-files-checksum-bug-1585913' of github.com:ts…
…imonq2/snapcraft into downloaded-files-checksum-bug-1585913
Contributor

tsimonq2 commented Jun 30, 2016

@sergiusens Thanks for taking the time to look at this. I updated my code with your suggestions. Let me know what you think.

@elopio Can you please take a look?

Collaborator

sergiusens commented Jul 1, 2016

From a quick look at the test failures it seems you still need to update all constructors for the sources used in the unit tests.

tsimonq2 added some commits Jul 1, 2016

Contributor

tsimonq2 commented Jul 5, 2016

So right now, the Travis build has passed. My goal now is to do some more local testing and to get an integration test ready. I'll change the description accordingly.

tsimonq2 added some commits Jul 5, 2016

Can one of the admins verify this patch?

Can one of the admins verify this patch?

Can one of the admins verify this patch?

Can one of the admins verify this patch?

tsimonq2 added some commits Jul 7, 2016

Merge branch 'downloaded-files-checksum-bug-1585913' of github.com:ts…
…imonq2/snapcraft into downloaded-files-checksum-bug-1585913

Can one of the admins verify this patch?

tsimonq2 added some commits Jul 8, 2016

Changed logic behind file checksums, changed checksums in integration…
… tests to standard formats, and added the "Invalid checksum format" fallback

Can one of the admins verify this patch?

tsimonq2 added some commits Jul 8, 2016

Contributor

tsimonq2 commented Jul 8, 2016

I believe that all the intended testing and code is in place for this PR. Could this be evaluated please?

@tsimonq2 tsimonq2 changed the title from Add source-checksum option for tar and zip sources (bug 1585913) to Add source-checksum option Jul 8, 2016

Contributor

tsimonq2 commented Jul 8, 2016

I also added sort of a specification to refer to in my initial comment.

Contributor

tsimonq2 commented Jul 12, 2016

Bump

snapcraft/tests/test_sources.py
+ source_checksum, "checksum.tar")
+
+ # From remote file
+ source_checksum = 'http://tsimonq2.net/misc/snapcraft-checksum-tar'
@sergiusens

sergiusens Jul 12, 2016

Collaborator

this should use a fakeserver, check snapcraft/tests/fake_servers.py

@tsimonq2

tsimonq2 Jul 15, 2016

Contributor

@sergiusens I'm a little confused as to the usage of this, where can I find an example of a test case that uses it?

snapcraft/internal/sources.py
+
+def check_checksum_determine_type_and_read_file(source_checksum, checkfile):
+ if len(source_checksum) == 32:
+ chksum = hashlib.md5()
@sergiusens

sergiusens Jul 12, 2016

Collaborator

why the abreviation?

@tsimonq2

tsimonq2 Jul 15, 2016

Contributor

I didn't have any specific reason, I'll change it

Collaborator

sergiusens commented Jul 12, 2016

Hi, looks good

One general comment is to use checksum instead of chksum

@@ -68,6 +71,89 @@ def test_pull_tarball_must_download_to_sourcedir(self, mock_prov):
with open(os.path.join(dest_dir, tar_file_name), 'r') as tar_file:
self.assertEqual('Test fake compressed file', tar_file.read())
+ def test_checksum(self):
@sergiusens

sergiusens Jul 12, 2016

Collaborator

what are you testing here, or in other words, what are your assertions to know it all worked as expected?

@sergiusens

sergiusens Jul 12, 2016

Collaborator

maybe I am confused by check_checksum_determine_format and a better method name is needed here.

@tsimonq2

tsimonq2 Jul 15, 2016

Contributor

@sergiusens this function uses the various supported checksum formats and checks the checksum with them. This ensures that the checksum can be successfully checked with the raw checksums.

I know this works because if it doesn't, the built-in errors will catch it. That's why I'm not doing a try/fail here.

The only reason why I had to split it into separate functions was because the static tests were complaining about check_checksum being too complex. This is now corrected to be in one function.

snapcraft/tests/test_sources.py
@@ -114,6 +200,89 @@ def test_extract_and_keep_zipfile(self, mock_zip):
with open(zip_download, 'r') as zip_file:
self.assertEqual('Test fake compressed file', zip_file.read())
+ def test_checksum_of_zip(self):
@sergiusens

sergiusens Jul 12, 2016

Collaborator

ditto on assertions

snapcraft/internal/sources.py
+ check_checksum_determine_type_and_read_file(source_checksum, checkfile)
+
+
+def check_checksum_determine_type_and_read_file(source_checksum, checkfile):
@sergiusens

sergiusens Jul 12, 2016

Collaborator

if this is only used by check_checksum_determine_format then prefix the method name with _

snapcraft/internal/sources.py
+ elif len(source_checksum) == 128:
+ chksum = hashlib.sha512()
+ else:
+ raise IncompatibleOptionsError("Invalid checksum format")
@sergiusens

sergiusens Jul 12, 2016

Collaborator

This could be a dictionary instead

_HASH_FUNCTIONS = {
    32: hashlib.md5,
    40: hashlib.sha1,
    ...
    ...
}

try:
     checksumer = _HASH_FUNCTIONS[len(source_checksum)]()
except KeyError:
    raise IncompatibleOptionsError("Invalid checksum format")
Collaborator

sergiusens commented Jul 12, 2016

Another general comment,

you should name the method the sources classes verify_checksum; verify_checksum should call _get_checksum and return the checksum doing the http, file or checksum dance you do.

With that returned value, if you use the dict, just do the check there and finally check if they are a match or raise the exception.

This IMHO would read much simpler.

tsimonq2 added some commits Jul 13, 2016

Merge branch 'downloaded-files-checksum-bug-1585913' of github.com:ts…
…imonq2/snapcraft into downloaded-files-checksum-bug-1585913
Merge branch 'downloaded-files-checksum-bug-1585913' of github.com:ts…
…imonq2/snapcraft into downloaded-files-checksum-bug-1585913
Contributor

tsimonq2 commented Jul 21, 2016

@sergiusens I made some changes that addresses your comments, let me know if you have more.

snapcraft/internal/sources.py
+ data = response.read()
+ source_checksum = data.decode('utf-8')
+ if (" " in source_checksum):
+ source_checksum = source_checksum.split(" ", 1)[0]
@sergiusens

sergiusens Jul 30, 2016

Collaborator

use single quotes here

snapcraft/internal/sources.py
+ if (" " in source_checksum):
+ source_checksum = source_checksum.split(" ", 1)[0]
+ else:
+ print('No file name detected in the checksum file, perhaps an '
@sergiusens

sergiusens Jul 30, 2016

Collaborator

this should be an error, shouldn't it?

@tsimonq2

tsimonq2 Jul 31, 2016

Contributor

@sergiusens do you think it's always the case that the checksum file has a filename? If so, I'll correct it.

snapcraft/internal/sources.py
@@ -310,6 +346,10 @@ def __init__(self, source, source_dir, source_tag=None,
def provision(self, dst, clean_target=True, keep_zip=False):
zip = os.path.join(self.source_dir, os.path.basename(self.source))
+ if self.source_checksum:
+ verify_checksum(
+ self.source_checksum, zip)
@sergiusens

sergiusens Jul 30, 2016

Collaborator

these two lines certainly fit in one line, right?

@@ -310,6 +346,10 @@ def __init__(self, source, source_dir, source_tag=None,
def provision(self, dst, clean_target=True, keep_zip=False):
zip = os.path.join(self.source_dir, os.path.basename(self.source))
@sergiusens

sergiusens Jul 30, 2016

Collaborator

please use a different name than zip, zip is a python keyword and it might get confusing.

@tsimonq2

tsimonq2 Jul 31, 2016

Contributor

@sergiusens This is applicable to the original source as well and should be dealt with in a separate PR, I'm just using what has already been given to me for variable names.

snapcraft/internal/sources.py
+ filework = open(filename, 'r')
+ source_checksum = filework.read()
+ if (" " in source_checksum):
+ source_checksum = source_checksum.split(" ", 1)[0]
@sergiusens

sergiusens Jul 30, 2016

Collaborator

quotes

snapcraft/internal/sources.py
+ if (" " in source_checksum):
+ source_checksum = source_checksum.split(" ", 1)[0]
+ else:
+ print('No file name detected in the checksum file, perhaps an '
@sergiusens

sergiusens Jul 30, 2016

Collaborator

should probably be an error.

snapcraft/internal/sources.py
+
+ if checksum != source_checksum:
+ raise ChecksumDoesNotMatch(
+ "the checksum ( "+source_checksum+" ) doesn't match the file"
@sergiusens

sergiusens Jul 30, 2016

Collaborator

can we stick to using format and single quotes? (no +)

@tsimonq2

tsimonq2 Jul 31, 2016

Contributor

I'm not sure what you mean here, @sergiusens

@sergiusens

sergiusens Aug 2, 2016

Collaborator

Should be something like

raise ChecksumDoesNotMatch(
   'the checksum {!r} doesn't match the file'.format(source_checksum)
@tsimonq2

tsimonq2 Aug 3, 2016

Contributor

I can't use single quotes because doesn't has a quote.

snapcraft/internal/sources.py
+ try:
+ checksum = _HASH_FUNCTIONS[len(source_checksum)]
+ except KeyError:
+ raise IncompatibleOptionsError("Invalid checksum format")
@sergiusens

sergiusens Jul 30, 2016

Collaborator

single quotes please

snapcraft/tests/test_sources.py
+ # md5
+ source_checksum = '1276481102f218c981e0324180bafd9f'
+ sources.verify_checksum(
+ source_checksum, "checksum.tar")
@sergiusens

sergiusens Jul 30, 2016

Collaborator

these statements fit in the same line

@tsimonq2

tsimonq2 Jul 31, 2016

Contributor

Pushing a fix to a bunch of lines that can be fixed by this, thanks for pointing this out.

tsimonq2 added some commits Jul 31, 2016

Collaborator

sergiusens commented Aug 2, 2016

OK, one final checkup, in the bug sabdfl mentions we should support sha-3-384; how do we differentiate sha-2 and sha-3 with the autodetection?

tsimonq2 added some commits Aug 3, 2016

Member

kyrofa commented Aug 12, 2016

@tsimonq2 did you see @sergiusens's question?

Collaborator

sergiusens commented Aug 24, 2016

Closing due to inactivity

@sergiusens sergiusens closed this Aug 24, 2016

pachulo added a commit to pachulo/snapcraft that referenced this pull request Dec 15, 2016

Solution for LP: #1585913 bug based on work done by tsimonq2 in PR #619
The code adds the optional property "source-checksum" to check the integrity of the source files (tar, zip, deb or rpm).
The format of the property is <algorithm>/<digest>. For example:
sha2/de2fb61252548af3c87c4aab17e82601691d19e37fd3d29ea6288e56
The currently supported algorithms are: md5, sha1, sha2.
The size of the digest for sha2 is automatically detected.
The support for the sha3 algorithm is implemented but commented out, as it requires python 3.6: https://docs.python.org/3.6/whatsnew/3.6.html#hashlib

pachulo added a commit to pachulo/snapcraft that referenced this pull request Dec 22, 2016

Solution for LP: #1585913 bug based on work done by tsimonq2 in PR #619
The code adds the optional property "source-checksum" to check the integrity of the source files (tar, zip, deb or rpm).
The format of the property is <algorithm>/<digest>. For example:
sha2/de2fb61252548af3c87c4aab17e82601691d19e37fd3d29ea6288e56
The currently supported algorithms are: md5, sha1, sha2.
The size of the digest for sha2 is automatically detected.
The support for the sha3 is not implemented, as it requires python 3.6: https://docs.python.org/3.6/whatsnew/3.6.html#hashlib or a library not present yet in the Ubuntu repositories (pysha3). Once ready it will be easy to add it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment