Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crc32 computation partly redundant? #281

Closed
ThomasWaldmann opened this issue Apr 11, 2015 · 5 comments
Closed

crc32 computation partly redundant? #281

ThomasWaldmann opened this issue Apr 11, 2015 · 5 comments

Comments

@ThomasWaldmann
Copy link
Contributor

While profiling attic, I found that crc32() calls make up 4.2s of a total of 60s "attic create" runtime.

It seems that attic computes a crc32 over some object metadata (id, length, tag) AND the payload data.

Isn't this redundant considering that the payload data already has a sha256 (unencrypted repo) or even a HMAC (encrypted repo) inside? So, could the payload data be excluded from the crc32 computation without losing anything?

@jborg
Copy link
Owner

jborg commented Apr 16, 2015

Is really 7% of all cpu time spent in crc32 during a normal "attic create", that sounds very high?

the crc32 is very useful for server side repository checks and repairs without needing access to any encryption keys.

@wscott
Copy link

wscott commented Apr 16, 2015

BTW I would hope that is really crc32c since that is supported in hardware
by current processors.

On Thu, Apr 16, 2015 at 6:12 PM, Jonas Borgström notifications@github.com
wrote:

Is really 7% of all cpu time spent in crc32 during a normal "attic
create", that sounds very high?

the crc32 is very useful for server side repository checks and repairs
without needing access to any encryption keys.

Reply to this email directly or view it on GitHub
#281 (comment).

@ThomasWaldmann
Copy link
Contributor Author

The numbers are from a cProfile run and ofc may vary depending on the hw and attic version used. I used my code which spends quite some less cycles overall, so the (relative) percentage of time spent for crc32 might be higher than in release attic while the (absolute) amount of cycles is likely similar.

diff --git a/attic/archiver.py b/attic/archiver.py
index 7387246..4114fa6 100644
--- a/attic/archiver.py
+++ b/attic/archiver.py
@@ -827,7 +827,7 @@ Type "Yes I am sure" if you understand this and want to continue.\n""")
         return args.func(args)


-def main():
+def main2():
     # Make sure stdout and stderr have errors='replace') to avoid unicode
     # issues when print()-ing unicode file names
     sys.stdout = io.TextIOWrapper(sys.stdout.buffer, sys.stdout.encoding, 'replace', line_buffering=True)
@@ -849,5 +849,11 @@ def main():
             archiver.print_error('Exiting with failure status due to previous errors')
     sys.exit(exit_code)

+
+import cProfile
+
+def main():
+    cProfile.runctx('main2()', globals(), locals(), 'attic.profile')
+
 if __name__ == '__main__':
     main()

@ThomasWaldmann
Copy link
Contributor Author

Related:
http://stackoverflow.com/questions/26429360/crc32-vs-crc32c
https://cloud.google.com/storage/docs/gsutil/addlhelp/CRC32CandInstallingcrcmod
https://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/api/files/crc32c.py?r=170

Note: i tried "crc-32c" from crcmod (with the compiled C extension). I wasn't faster than zlib.crc32. Guess it would need a python lib with C extension that actually uses the hw acceleration (if available).

@ThomasWaldmann
Copy link
Contributor Author

Guess we can close this one, (server side) repo checks w/o having to decrypt/decompress is a good thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants