[MRG+2] Improve gunzip performance for big files on Python 3 #3281
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Copied from discussion on #3270 , partial fix for #2658
Problem
gunzip method runs forever for big files in python 3
bytes in Python 3 does not include string concatenation improvements that Python 2 has, effectively making the following take forever:
Solution
Improve the gunzip method to not copy the entire output string in each iteration, but instead append to list + join, significantly improving performance from O(n^2)
Thanks @kmike for your help debugging!