Low decompression speed in multithreading #269

Dalbasar · 2023-01-02T21:26:52Z

As far as i understand, the GIL should be dropped when calling the underlying LZ4 C library during compression and decompression, see lz4.block.decompress

I only see a minor speedup when using multiple threads for decompression using python-lz4 4.3.2 on a 6 core Intel i5-8400 on Debian 11 and also not on a Amd Ryzen 5900X on Windows 10, neither with Python 3.11.1 nor 3.8.10.

Compression speed seems to increase almost linearly with the number of threads.

The following code gives me about 4500MB/s decompression speed (slight underestimation due to some overhead from starting the threads) when using 6 threads and ~4300MB/s when using 1 thread on an AMD Ryzen 5900X on Windows 10. Using lz4.frame yields similar results. Using py-lz4framed instead gives me about 13000MB/s using 6 threads and ~8300MB/s on 1 thread (not sure the compression settings are the same, but at the very least there is some speedup for multithreading).

import os
import threading
import time
import lz4.block

size_mb = 2000
n_threads = 6

input_data = size_mb * 1024 * os.urandom(1024)
compressed = lz4.block.compress(input_data)
input_data = None


def decompress(data):
    start_time = time.perf_counter()
    thread_start_time = time.thread_time()
    lz4.block.decompress(data)
    stop_time = time.perf_counter()
    thread_stop_time = time.thread_time()
    print(f"{threading.current_thread()} Decompression took {(stop_time-start_time)*1000:.3f}ms "
          f"(Thread time: {(thread_stop_time-thread_start_time)*1000:.3f}ms)\n")


threads = [threading.Thread(name=str(i), target=decompress, args=(compressed,)) for i in range(n_threads)]

start_thread_time = time.perf_counter()
[t.start() for t in threads]
[t.join() for t in threads]
done_thread_time = time.perf_counter()
duration = done_thread_time - start_thread_time
MBs = n_threads*size_mb/duration
print(f"Total time: {duration}s : {MBs}MB/s")

jonathanunderwood · 2023-01-09T19:12:34Z

Yes, a quick glanced at lz4framed, and wading through the macro definitions, the key difference is that lz4framed is calling
PyThread_acquire_lock and PyThread_release_lock from pythread.h before/after calling the lz4 library functions, which we're not doing in this library at the moment. The main challenge is knowing when to release the lock - it's not consistently done in lz4framed.

Shouldn't be too hard to do similarly here. Will have a look when I get time, unless you beat me to it.

jonathanunderwood · 2023-01-09T20:14:46Z

For when I come back to this:

Useful reference on this stuff: https://pythonextensionpatterns.readthedocs.io/en/latest/thread_safety.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low decompression speed in multithreading #269

Low decompression speed in multithreading #269

Dalbasar commented Jan 2, 2023

jonathanunderwood commented Jan 9, 2023

jonathanunderwood commented Jan 9, 2023

Low decompression speed in multithreading #269

Low decompression speed in multithreading #269

Comments

Dalbasar commented Jan 2, 2023

jonathanunderwood commented Jan 9, 2023

jonathanunderwood commented Jan 9, 2023