Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve zstd compression #469

Merged
merged 4 commits into from
Jan 8, 2024

Conversation

orhun
Copy link
Contributor

@orhun orhun commented Jan 5, 2024

This PR improves the zstd compression by:

@orhun orhun force-pushed the feat/improve_zstd_compression branch from 827aa17 to 4c70fe7 Compare January 5, 2024 16:22
@@ -195,19 +208,23 @@ pub fn write_conda_package<W: Write + Seek>(

let (info_paths, other_paths) = sort_paths(paths, base_path);

outer_archive.start_file(format!("pkg-{out_name}.tar.zst"), options)?;
let archive_path = format!("pkg-{out_name}.tar.zst");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call it just .tar or/and use tempfile here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 221873d

@wolfv
Copy link
Member

wolfv commented Jan 5, 2024

Looks good to me! Excellent improvement :)

let mut tar_file = File::open(&tar_path)?;
let compression_level = compression_level.to_zstd_level()?;
let mut zst_encoder = zstd::Encoder::new(writer, compression_level)?;
zst_encoder.multithread(num_cpus::get() as u32)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be configurable (default to this number, but users can choose lower number if they want).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ZSTD_NUMTHREADS variable in 6ae2197

The inspiration came from: facebook/zstd#2241

I added a doc comment to the function but it is private thus not very visible. I think we can mention this somewhere, where would be a good place? I thought about putting it under a new section as "Environment Variables" in README.md where we list the other variables as well (I think there is only CONDA_PREFIX for now).

@wolfv
Copy link
Member

wolfv commented Jan 5, 2024

cc @dholth does this reflect what you are doing in conda-package-handling?

@wolfv
Copy link
Member

wolfv commented Jan 5, 2024

Actually, ChatGPT says that the zstd encoding is not deterministic if a differnet number of threads is used. For that reason alone we should make it configurable (and store it in the rattler-build output) so that we can recreate the exact packages :)

Turns out ChatGPT is wrong: facebook/zstd#3608 (comment)

The output is different from single threaded but should not be different with the number of cores...

@dholth
Copy link
Contributor

dholth commented Jan 6, 2024

I should increase the number of threads. I think the comment is wrong about the downsides. https://github.com/conda/conda-package-handling/blob/main/src/conda_package_handling/conda_fmt.py#L24-L27

@wolfv
Copy link
Member

wolfv commented Jan 8, 2024

Not sure that the env var is the right approach for rattler (maybe for rattler-build). We shouldl make it configurable via struct or argument (like the compression level).

@orhun
Copy link
Contributor Author

orhun commented Jan 8, 2024

I added an argument to the write_conda_package in 5aaf090

@wolfv wolfv merged commit 3b0f39b into mamba-org:main Jan 8, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use zstd multithreading for compression / package writing
4 participants