You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as i understand, the GIL should be dropped when calling the underlying LZ4 C library during compression and decompression, see lz4.block.decompress
I only see a minor speedup when using multiple threads for decompression using python-lz4 4.3.2 on a 6 core Intel i5-8400 on Debian 11 and also not on a Amd Ryzen 5900X on Windows 10, neither with Python 3.11.1 nor 3.8.10.
Compression speed seems to increase almost linearly with the number of threads.
The following code gives me about 4500MB/s decompression speed (slight underestimation due to some overhead from starting the threads) when using 6 threads and ~4300MB/s when using 1 thread on an AMD Ryzen 5900X on Windows 10. Using lz4.frame yields similar results. Using py-lz4framed instead gives me about 13000MB/s using 6 threads and ~8300MB/s on 1 thread (not sure the compression settings are the same, but at the very least there is some speedup for multithreading).
Yes, a quick glanced at lz4framed, and wading through the macro definitions, the key difference is that lz4framed is calling PyThread_acquire_lock and PyThread_release_lock from pythread.h before/after calling the lz4 library functions, which we're not doing in this library at the moment. The main challenge is knowing when to release the lock - it's not consistently done in lz4framed.
Shouldn't be too hard to do similarly here. Will have a look when I get time, unless you beat me to it.
As far as i understand, the GIL should be dropped when calling the underlying LZ4 C library during compression and decompression, see lz4.block.decompress
I only see a minor speedup when using multiple threads for decompression using python-lz4 4.3.2 on a 6 core Intel i5-8400 on Debian 11 and also not on a Amd Ryzen 5900X on Windows 10, neither with Python 3.11.1 nor 3.8.10.
Compression speed seems to increase almost linearly with the number of threads.
The following code gives me about 4500MB/s decompression speed (slight underestimation due to some overhead from starting the threads) when using 6 threads and ~4300MB/s when using 1 thread on an AMD Ryzen 5900X on Windows 10. Using lz4.frame yields similar results. Using py-lz4framed instead gives me about 13000MB/s using 6 threads and ~8300MB/s on 1 thread (not sure the compression settings are the same, but at the very least there is some speedup for multithreading).
The text was updated successfully, but these errors were encountered: