Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exports are bottlenecked by gzip compression which cannot be disabled #3869

Closed
dralley opened this issue May 22, 2023 · 2 comments · Fixed by #3884
Closed

Exports are bottlenecked by gzip compression which cannot be disabled #3869

dralley opened this issue May 22, 2023 · 2 comments · Fixed by #3884
Assignees

Comments

@dralley
Copy link
Contributor

dralley commented May 22, 2023

Version
Any

Describe the bug

We identified a severe bottleneck in the way hammer exports work and found the code in upstream Pulp.

Line 406 of https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/export.py

"with tarfile.open(tarfile_fp, "w|gz", fileobj=split_process.stdin)"

The tarfile Python module creates tar files using the gizp Python module.

Data compression for the gzip Python module is provided by the zlib Python module.

The zlib Python module calls the zlib library.

If defaults are used the whole way through this series of events, the result is a single threaded pulp process doing compression of a tarball containing a massive content library. This bottleneck and can make large hammer exports take several days.

Modifying the lines that tell the tarfile.open function to NOT use compression ( change "w|gz" to "w" ) dramatically speeds up the hammer export. In our testing it reduced the time from days to just hours. The drawback is the file size was significantly larger, but the trade-off is worthwhile given we have tight timeframes and plentiful disk capacity.

Can this bottleneck be addressed with multi-threaded gzip compression?

and/or

Can a hammer command line option for no compression be implemented?

Run a hammer export and monitor Pulp processes. One process with run at 100% CPU. Modify the abovementioned Python script to NOT use gzip encryption, and an uncompressed tarball will be created instead much quicker and with multiple Pulp processes.

Steps to Reproduce:

  1. Run a "hammer export". Monitor the Pulp process CPU usage and time taken to complete export.
  2. Change the abovementioned Python code in Pulp.
  3. Run a "hammer export" again and note performance improvement.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
https://bugzilla.redhat.com/show_bug.cgi?id=2188504

@dralley
Copy link
Contributor Author

dralley commented May 25, 2023

I don't know that disabling compression entirely is a good idea. However, the default compression level used is level 9, the most computationally expensive (slow) one. Given level 9 is (roughly) 6x slower than level 1, but only compresses about 20% better, this is probably a poor tradeoff.

We should look at using levels 1-3 instead.

dralley added a commit to dralley/pulpcore that referenced this issue May 25, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes pulp#3869
@pulpbot pulpbot moved this from In Progress to Needs review in RH Pulp Kanban board May 25, 2023
dralley added a commit to dralley/pulpcore that referenced this issue May 25, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes pulp#3869
dralley added a commit to dralley/pulpcore that referenced this issue May 25, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes pulp#3869
dralley added a commit to dralley/pulpcore that referenced this issue May 25, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes pulp#3869
dralley added a commit to dralley/pulpcore that referenced this issue May 30, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes pulp#3869
dralley added a commit to dralley/pulpcore that referenced this issue May 31, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes pulp#3869
dralley added a commit to dralley/pulpcore that referenced this issue May 31, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes pulp#3869
dralley added a commit to dralley/pulpcore that referenced this issue May 31, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes pulp#3869
dralley added a commit to dralley/pulpcore that referenced this issue May 31, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes pulp#3869
dralley added a commit to dralley/pulpcore that referenced this issue Jun 9, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes pulp#3869
ipanova pushed a commit that referenced this issue Jun 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869
@pulpbot pulpbot moved this from Needs review to Done in RH Pulp Kanban board Jun 13, 2023
patchback bot pushed a commit that referenced this issue Jun 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
patchback bot pushed a commit that referenced this issue Jun 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
patchback bot pushed a commit that referenced this issue Jun 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
patchback bot pushed a commit that referenced this issue Jun 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
mdellweg pushed a commit that referenced this issue Jun 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
mdellweg pushed a commit that referenced this issue Jun 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
mdellweg pushed a commit that referenced this issue Jun 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
mdellweg pushed a commit that referenced this issue Jun 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
patchback bot pushed a commit that referenced this issue Aug 1, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
@pulpbot
Copy link
Member

pulpbot commented Sep 11, 2023

patchback bot pushed a commit that referenced this issue Sep 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
dralley added a commit that referenced this issue Sep 13, 2023
Exports will be larger, but should be much faster. There were reports of
large exports taking multiple days to complete due to the overhead
incurred by compression.

closes #3869

(cherry picked from commit 34c5bab)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants