-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a more appropriate compression level for exports #3884
Conversation
Pending discussion on https://bugzilla.redhat.com/show_bug.cgi?id=2188504 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy cow - I love this idea. Addresses the problem, without requiring a whole new config/API/workflow change - brilliant!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the solution 🦭
bit it seems like the tests is failing |
@dralley did you ran any tests to come to the conclusion that compression level 1 is the most balanced? there is the whole range of 1-9 levels |
Oh bah - looks like compresslevel isn't supported on the streaming options. From the tarfile doc: "For modes 'w:gz', 'r:gz', 'w:bz2', 'r:bz2', 'x:gz', 'x:bz2', tarfile.open() accepts the keyword argument compresslevel (default 9) to specify the compression level of the file." |
I checked https://github.com/python/cpython/blob/main/Lib/tarfile.py#L1826 first, it looks like it should be supported... It looks like this was only merged a year ago and therefore might not be in the Python versions we're using python/cpython@50cd4b6 |
I didn't run tests but I did look over some general compression benchmarks that others have done. Level 3 was 30-40% slower but only about 10% smaller, and anything beyond that was an even worse tradeoff. My feeling is that a lot of the content that people want to export like RPMs, metadata, etc. are already compressed so we don't need to try hard to compress it again. |
Gotcha, thanks for explanation. It makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Daniel, might it be possible to subclass "just enough" of Tarfile to support this, and carry just that? As opposed to needing all 3K lines of tarfile.py?
@ggainey I agree and that is the intention - this is still in draft remember - but until then I just wanted a sense of whether it would work or not. |
53bebbf
to
efbbb80
Compare
5efa600
to
e372173
Compare
Well, it works. However, is this something we're actually open to doing? We have explicitly tried to avoid doing monkeypatches or this kind of vendoring in the past. I think this solution would yield the best outcome (relative to no compression at all, or something like that) but I still definitely feel discomfort about it at a technical / maintenance level. Am I missing an easier way to accomplish this? |
ddc442f
to
7b58832
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is the only reasonable way to address the problem for the actually-affected users. It's definitely "black magic", and needs some Very Explicit Comments to make it clear what's going on and why it works - but I can't think of a better approach.
Experimenting with this has been successful, although expressing it in code is a bit more challenging, but I want to double check -- is |
I take it back, something about the data pipeline gets messed up and it's a real pain to debug. Combined with uncertainty about relying on the |
pulpcore/app/monkeypatch.py
Outdated
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly if we do not add any init code, then just don't add these two lines.
CHANGES/3869.bugfix
Outdated
@@ -0,0 +1 @@ | |||
Exports now use gzip compression level 1 rather than compression level 9. Exported archives will now be slightly larger, but exports should be much faster. This is considered to be a more optimal balance of space/time for the export operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make some of our teammates happy, this ought to be line broken at 98 characters.
(100 is the rule, but towncrier indents by another two.)
@@ -0,0 +1,69 @@ | |||
import sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not how i learned python, but maybe django does some auto lookup?
Exports will be larger, but should be much faster. There were reports of large exports taking multiple days to complete due to the overhead incurred by compression. closes pulp#3869
Backport to 3.18: 💚 backport PR created✅ Backport PR branch: Backported as #3917 🤖 @patchback |
Backport to 3.21: 💚 backport PR created✅ Backport PR branch: Backported as #3918 🤖 @patchback |
Backport to 3.22: 💚 backport PR created✅ Backport PR branch: Backported as #3919 🤖 @patchback |
Backport to 3.23: 💚 backport PR created✅ Backport PR branch: Backported as #3920 🤖 @patchback |
I submitted the original Red Hat BZ relating to this issue and also for #3236 . Passing on my thanks to everyone who worked on this. It's going to be a huge help for some of the work we're doing. |
Backport to 3.28: 💚 backport PR created✅ Backport PR branch: Backported as #4220 🤖 @patchback |
Backport to 3.16: 💚 backport PR created✅ Backport PR branch: Backported as #4425 🤖 @patchback |
Exports will be larger, but should be much faster. There were reports of large exports taking multiple days to complete due to the overhead incurred by compression.
closes #3869