New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checksums: use blocks to read the files #836
Conversation
When calculating the checksums of a large file, we need to read it into blocks (currently 16MB) to keep memory usage acceptable.
maxHitCount, | ||
stdoutErr[-500:] | ||
)) | ||
maxHitCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be nicer if it was just on the next line...(starting at the ("
)
@@ -1083,7 +1100,7 @@ def mkdir(directory, parents=False): | |||
_log.debug("Directory %s already exitst" % directory) | |||
else: | |||
_log.error("Failed to create directory %s: %s" % (directory, err)) | |||
else:#not parrents | |||
else: # not parrents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"parents", while you are cleaning up stuff.
Looks OK to me. |
@@ -119,12 +120,12 @@ | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split off the hash functions from the dicts/lambda's. too much duplicated code
try:
import hashlib
md5_func=hashlib.md5
sha1_func=hashlib.sha1
except ImportError:
import md5, sha
md5_func=md5.md5
sha1_func=sha.sha
CHECKSUM_FUNCTIONS['md5'] = lambda p: calc_block_checksum(p, md5_func())
CHECKSUM_FUNCTIONS['sha1'] = lambda p: calc_block_checksum(p, sha1_func())
CHECKSUM_FUNCTIONS['sha1'] = lambda p: calc_block_checksum(p, sha1_func()) | ||
|
||
|
||
class zlib_checksum: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class names are camelcase
md5_func = md5.md5 | ||
sha1_func = sha.sha | ||
|
||
CHECKSUM_FUNCTIONS['md5'] = lambda p: calc_block_checksum(p, md5_func()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the try/except a bit higher, and all this in the dict above. this looks odd now.
'adler32': lambda p: '0x%s' % zlib.adler32(open(p, 'r').read()), | ||
'crc32': lambda p: '0x%s' % binascii.crc32(open(p, 'r').read()), | ||
'md5': lambda p: calc_block_checksum(p, md5_func()), | ||
'sha1': lambda p: calc_block_checksum(p, sha1_func()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure you need to call md5_func
and sha1_func
here? It seems like you just want to pass them along.
(did this pass the untit tests?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call them to have a md5 or sha1 object to work with. It has passed the unit tests (the last one failed because the disk was full, @boegel was going to fix it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so better to call the md5_class
and sha1_class
then? (and i wasn't going to make any more remarks...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, the would be more correct.
Represents more correct what the variable is/does.
Thanks @wpoely86, thanks for the reviewing @itkovian, @stdweird, @JensTimmerman! |
Checksums: use blocks to read the files
We need to read a file in blocks to calculate the checksum because else, the memory blows up if you try to install icc.
This PR fixes it for sha1 and md5 but not yet for addler32 or crc32. I will try to fix them when I find the time.