-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MUCH slower compression speeds using version 1.5.1 #2966
Comments
Tested on a local desktop :
I'm not sure what's going in your case, but it's not a universal experience. There are too many unsaid variables that can result in such a large difference, |
If you look at the attached source file it's a pretty simple example of compressing a small string. Run that example and check for yourself what the difference is. |
I've extracted the small string as a file, and compressed it with
No significant difference in this test. I'll try to analyze what could be the potential differences between these tests. |
I think it is the way in which I am using zstd in my code that shows the huge difference in performance. It's not related to the string; you can put any string in my code and you will get a big difference in the program's duration output. The |
I have benchmark code in a test program I'm using, and you can see the output below comparing zlib and zstd. At compression level Using zstd 1.5.1:
Using zstd 1.4.9:
Perhaps I'm using the API wrong or something, but the fact remains that there is a huge decrease in performance. |
OK, I can observe a difference when trying to produce a similar scenario, The x2.5 difference can be explained, and mitigated :
However, a x2.5 difference pales in comparison to the huge differences you report (> x100). |
Ok I see, is the |
Ok found the function, it's Now I get the following output: for zstd 1.4.9: duration: 512 ms, output size: 422 bytes Which is not as bad but still a bit slower compared to 1.4.9. |
This comment has been minimized.
This comment has been minimized.
Could you try to re-use the compression state across compression jobs ? edit : ah, and another good data point would be : what's the status regarding |
Are you using zstdlib.vcxproj or compiling another way? the github zstdlib.vcxproj got back our lost performance and |
This is the benchmark result compared with Zlib when using zstd 1.5.1 with the
With large input sizes it starts to get slower than zlib when using compression level 10 for zstd. I'm not sure if it's possible to reuse compression state. The program I'm working on sends messages over the network, and each message is compressed separately/independently, because each message also needs to be decompressed separately/independently. For zlib, there is an API where you can specify your own memory allocation functions, and I use that to allocate once at the beginning and keep reusing the buffers for each message I compress. |
I'm using |
1.5.0 had the same issue as 1.5.1. The differences were introduced between 1.4.9 and 1.5.0. |
I misinterpreted the example source code, provided in This makes the huge performance difference reported even more difficult to explain. I wish I could reproduce it locally, in order to analyze it. |
So, adding the
Maybe the remaining difference is related to the rebalancing work done for compression levels? I'm going to experiment with it some more tomorrow with other compression levels. |
Are you using 32-bit build? |
In v1.5.1:
edit: |
It seems For small input, this invoke sets 16 MiB RAM to 0. This invoke sets 8 MiB RAM to 0. These two invokes are major spots. |
Oops I should have been more specific. I'm using the 64bit release build in my tests above. |
So I did some more tests today, and it appears I can get back to the 1.4.9 levels of performance (roughly) I was used to by using a compressionlevel of Using zstd 1.5.1 with
Using zstd 1.4.9 with compressionlevel
In the application itself benchmarked transfer speeds using zstd can be roughly 2.5 - 3 times faster compared to zlib when using random data which is difficult to compress. So I guess this issue has been resolved. If I didn't have benchmarks I would not have noticed the performance regression going from version 1.4.9 to 1.5.1. The compressionlevel of 10 was also hardcoded in the app. If you guys change (rebalance) what a particular compressionlevel means in terms of performance and output size again it would be good to try to make users aware so they can check their apps to see if it doesn't affect them negatively and if they need to change things to match previous levels of functionality. |
When re-using a compression state, across multiple successive compressions, the state should minimize the amount of allocation and initialization required. This mostly matters in situations where initialization is an overwhelming task compared to compression itself. This can happen when the amount to compress is small, while the compression state was given the impression that it would be much larger, aka, streaming mode without providing a srcSize hint. This lean-initialization optimization was broken in 980f3bb . This commit fixes it, making this scenario once again on par with v1.4.9. Note that this does not completely fix #2966, since another heavy initialization, specific to row mode, is also happening (and was not present in v1.4.9). This will be fixed in a separate commit.
(note : this might break due to the need to also track the starting candidate nb per row)
When re-using a compression state, across multiple successive compressions, the state should minimize the amount of allocation and initialization required. This mostly matters in situations where initialization is an overwhelming task compared to compression itself. This can happen when the amount to compress is small, while the compression state was given the impression that it would be much larger, aka, streaming mode without providing a srcSize hint. This lean-initialization optimization was broken in 980f3bb . This commit fixes it, making this scenario once again on par with v1.4.9. Note that this does not completely fix #2966, since another heavy initialization, specific to row mode, is also happening (and was not present in v1.4.9). This will be fixed in a separate commit.
If add time-stamp to Run 1000 times, and sort the time:
|
fix performance issue in scenario #2966 (part 1)
According to the release notes https://github.com/facebook/zstd/releases/tag/v1.5.2: > This release also corrects a performance regression that was introduced in v1.5.0 that slows down compression of very small data when using the streaming API. Issue facebook/zstd#2966 tracks that topic.
According to the release notes https://github.com/facebook/zstd/releases/tag/v1.5.2: > This release also corrects a performance regression that was introduced in v1.5.0 that slows down compression of very small data when using the streaming API. Issue facebook/zstd#2966 tracks that topic.
After updating to version 1.5.1 of the library I noticed that compression speeds were much worse compared to version 1.4.9 at the same compression level. In my benchmarks for my use case, which is compressing streaming network data, zlib now beats zstd with regard to speed and output size.
To reproduce, build the attached source file first with version 1.4.9 used as a DLL, check the output, and then do the same with version 1.5.1 and check the output after running the program.
On my PC running Windows 10 and using Visual Studio 2022 to build, I get the following results with compressionlevel
10
:with zstd 1.4.9: duration: 512 ms, output size: 422 bytes
with zstd 1.5.1: duration: 71266 ms, output size: 422 bytes
Even with compression level
1
there is a difference; 1.5.1 is slower than 1.4.9 albeit not by much compared to compression level10
.Where does this huge difference in duration come from?
Source.zip
The text was updated successfully, but these errors were encountered: