-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Describe the bug
Running hg clone on some repository take about +150% runtime after this specific change (eg. from 3 minutes to 7 minutes).
The runtime change doesn't affect as much the zstd compression stage than all other data processing operations around the compression such as sha1 computation, zstd decompression and binary-diff computation. So we suspect that new code badly impact the memory caches or some other hardware subsystem.
To Reproduce
Full reproduction was a bit hairy as the source of a couple project had to be adapted while bisecting this. I'll try to build two simple changeset in Mercurial to reproduce this. However I want to report this quickly as zstd's core developer might have useful quick insight about this issue.
To give a bit more context : Mercurial use zstd from two source : a vendored python-zstandard version, and the zstd-rs Rust crate. The vendored version was silently very outdated (zstd 1.4.4) until a few month ago when we upgraded it to zstd 0.23.0 after some security concerned pointed out the issue.
The rust-rs version has always used a newer version of zstd and always have been unexpectedly slower for some specific clone workload. These clone involve mostly compression of a large number of very small data chunk (delta from one version to another).
The non-rust version became much slower on these same workload, matching the rust performance. profiling pointed to a huge inflation of the time spent doing sha1, bdiff and zstd decompression, and performance analysis are pointing at a much high traffic with the L2 cache and RAM.
Bisecting this slowness pointed very clearly to our upgrade of python-zstandard. bisecting the python-zstandard code itself pointed to the zstd 1.4.5 → 1.4.8 upgrade. bisecting zstd itself pointed very clearly to 6004c11.
As said before, I can work to provide you with a simple way to run these operations with the two different versions of the zstd code on the specific workload, but I wanted to file this report first.
Expected behavior
We would be happy to not have Mercurial becoming over twice slower on some workload :-)
And it seems suspicious to see such performance degradation of side operation from a commit that is expected to improve performance.
Screenshots and charts
- Execution profile with zstd 1.4.4 (focussed on the affected part)
- Execution profile with zstd 0.23.0 (focussed on the affected part)
We also also provide heaptrack and perf data for before and after, but they total > 100MB
Desktop (please complete the following information):
- OS: Linux (various kernel)
- Version from 6004c11 on ward
- Compiler both gcc and clang since the Rust version seems affected too.
- Flags nothing special (unless the library adds some)
- Other relevant hardware specs: reproduced on both intel and AMD machine, both large desktop and laptops
- Build system see librairy
Additional context
Let me know if you need any other details. I'll try to prepare an easy reproduction for you to run by the end of the week.