You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Decode with a bogus backreference buffer initialized to 16-bit indexes.
Replace those 16-bit indexes (markers) with the actual backreference contents.
Currently, the second step is done on the orchestrator thread. This might limit performance. Marker replacement yields benchmark results of 12 GB/s and compacting the buffers from 16-bit storage type that only contains 8-bit values takes 4 GB/s.
This is quite fast, and parallelizing might effectively (only) yield a factor 2 speedup. Furthermore, at this point NUMA behavior might have to be considered when it comes to the ThreadPool.
Another problem is the load balancing. Introducing yet another thread pool would oversaturate the processor or underutilize the processor when limiting the decoding thread pool instead. Therefore, it might be nice to also use the existing thread pool for marker replacement. But then, it would have to implement a kind of priority system because marker replacement should always have higher priority. And we still would have to ensure that at least one thread can always decode or else it would still slow down. Maybe the orchestrator thread can keep acting as the main marker replacer but it also can distribute further work into the thread pool. And in case that even with higher priority, no one has begun to do the marker replacement when the orchestrator thread has finished its work, then it should be possible to steal back that work packet from the thread pool and let the orchestrator thread do it. This would also require a kind of work package ID to query for work completion and taking work back from the threda pool.
All in all, this slowly becomes an academic/high-performance computing issue not one of general ratarmount/pragzip usage but it would still be nice to have.
The text was updated successfully, but these errors were encountered:
The decoding works in two steps:
Currently, the second step is done on the orchestrator thread. This might limit performance. Marker replacement yields benchmark results of 12 GB/s and compacting the buffers from 16-bit storage type that only contains 8-bit values takes 4 GB/s.
All in all, this slowly becomes an academic/high-performance computing issue not one of general ratarmount/pragzip usage but it would still be nice to have.
The text was updated successfully, but these errors were encountered: