Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add _GenerateCRCTable() to zipfile.py to pre-compute CRC Table #74661

Closed
shubharamani mannequin opened this issue May 25, 2017 · 5 comments
Closed

Add _GenerateCRCTable() to zipfile.py to pre-compute CRC Table #74661

shubharamani mannequin opened this issue May 25, 2017 · 5 comments
Labels
3.7 (EOL) end of life performance Performance or resource usage

Comments

@shubharamani
Copy link
Mannequin

shubharamani mannequin commented May 25, 2017

BPO 30476
Nosy @jkloth, @serhiy-storchaka, @shubharamani

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2017-05-27.00:58:01.802>
created_at = <Date 2017-05-25.20:25:14.664>
labels = ['3.7', 'performance']
title = 'Add   _GenerateCRCTable() to zipfile.py to pre-compute CRC Table'
updated_at = <Date 2017-05-27.05:02:59.635>
user = 'https://github.com/shubharamani'

bugs.python.org fields:

activity = <Date 2017-05-27.05:02:59.635>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2017-05-27.00:58:01.802>
closer = 'martin.panter'
components = []
creation = <Date 2017-05-25.20:25:14.664>
creator = 'shubhar'
dependencies = []
files = []
hgrepos = []
issue_num = 30476
keywords = []
message_count = 5.0
messages = ['294513', '294562', '294567', '294570', '294578']
nosy_count = 3.0
nosy_names = ['jkloth', 'serhiy.storchaka', 'shubhar']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue30476'
versions = ['Python 3.7']

@shubharamani
Copy link
Mannequin Author

shubharamani mannequin commented May 25, 2017

It is wasteful to generate the CRC Table every time, via _crctable = list(map(_gen_crc, range(256))). Better to have a pre-computed table.

I will submit the patch which incorporates this feature along with micro-benchmark results. Related issue:

http://bugs.python.org/issue30467

http://bugs.python.org/issue30468

@shubharamani shubharamani mannequin added 3.7 (EOL) end of life performance Performance or resource usage labels May 25, 2017
@shubharamani
Copy link
Mannequin Author

shubharamani mannequin commented May 26, 2017

It looks like _GenerateCRCTable() is already implemented here:

https://svn.python.org/projects/python/trunk/Lib/zipfile.py

The question is, why isn't it implemented here ?

https://github.com/python/cpython/blob/master/Lib/zipfile.py

@jkloth
Copy link
Contributor

jkloth commented May 26, 2017

See bpo-10030 (and pr #550). It removed that function in favor of the current approach.

Also, that current code does not generate *every* time, but just once. Note that the computed value is stored in a global cache.

@shubharamani
Copy link
Mannequin Author

shubharamani mannequin commented May 26, 2017

This is not needed after all jkloth explained. Please feel free to close.

@serhiy-storchaka
Copy link
Member

After some investigations this idea no longer looks good to me.

The table of 256 32-bit integers will take at least 44 lines. This is too large. It is easier to check the algorithm than all these numbers. The error even in one digit can break all, but for testing we should check every element of the table. In any case we would need a generating code, either for generating the sources, or for testing.

Current generating code is 25% faster than the code used in 2.7, and (surprisingly!) Python 3.7 is 50% faster than Python 2.7 in the computation of the same code. On my computer the generating takes 1 ms (the startup time of the interpreter is 50 ms), and the table is generated only if ZIP file decryption is used, and only once.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

3 participants