Skip to content

Migrate from python-zstandard to backports.zstd/compression.zstd#178

Merged
marcelm merged 1 commit into
mainfrom
zstd
Jun 1, 2026
Merged

Migrate from python-zstandard to backports.zstd/compression.zstd#178
marcelm merged 1 commit into
mainfrom
zstd

Conversation

@marcelm
Copy link
Copy Markdown
Collaborator

@marcelm marcelm commented May 29, 2026

Since Python 3.14 comes with built-in Zstandard support via compression.zstd, which is available on earlier Python versions via backports.zstd, it makes sense to use that library instead. This allows us to make the Zstandard support non-optional. It was previously optional because python-zstandard is quite a large dependency, but backports.zstd is much smaller, so it’s ok to always pull it in.

Closes #172

Since Python 3.14 comes with built-in Zstandard support via
compression.zstd, which is available on earlier Python versions via
backports.zstd, it makes sense to use that library instead. This allows us
to make the Zstandard support non-optional. It was previously optional
because python-zstandard is quite a large dependency, but backports.zstd is
much smaller, so it’s ok to always pull it in.

Closes #172
@rhpvorderman
Copy link
Copy Markdown
Collaborator

Looks good to me! Any blockers why this is still in draft?

@marcelm marcelm marked this pull request as ready for review June 1, 2026 07:39
@marcelm
Copy link
Copy Markdown
Collaborator Author

marcelm commented Jun 1, 2026

Thanks! Sorry, forgot to describe why I had this as "draft". There are two reasons, none of them a blocker.

First, I had noticed that backports.zstd isn’t actually used when you open a .zst file with the default settings:

python -c 'from xopen import xopen; print(xopen("tests/file.txt.zst", mode="rb"))'
_PipedCompressionProgram('tests/file.txt.zst', mode='rb', program='zstd --long=31 -T1 -c -d', threads=1)

You would have to write threads=0 to get it. This is how we designed it, and I am just wondering if this is still the best behavior. I’m not sure whether I would expect this when using the library. It is consistent with the other functions, though.

Second, compression.zstd seems to support multi-threaded decompression, and this isn’t implemented in this PR. But that can be done later.

@marcelm
Copy link
Copy Markdown
Collaborator Author

marcelm commented Jun 1, 2026

Let’s do this! Further improvements can wait for subsequent PRs.

@rhpvorderman Shall we release 2.1.0?

@marcelm marcelm merged commit 2bd4f88 into main Jun 1, 2026
18 checks passed
@marcelm marcelm deleted the zstd branch June 1, 2026 19:51
@marcelm
Copy link
Copy Markdown
Collaborator Author

marcelm commented Jun 3, 2026

I’ve released 2.1.0.

@rhpvorderman
Copy link
Copy Markdown
Collaborator

You would have to write threads=0 to get it. This is how we designed it, and I am just wondering if this is still the best behavior. I’m not sure whether I would expect this when using the library. It is consistent with the other functions, though.

Second, compression.zstd seems to support multi-threaded decompression, and this isn’t implemented in this PR. But that can be done later.

Oh yes, in that case let's implement the multithreading like we did for python-isal and python-zlib-ng later. I have always felt that the intention of this library was to overcome compression bottlenecks (simply because these are unfortunately big in bioinformatics). External processes can't share memory and pipes have some efficiency cost, so if the possibility for threads and shared memory is present it is always best to go for that option. Also, it eliminates a conda dependency, which is a nice bonus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support zstd from standard library

2 participants