Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the Weissman score for this? #1087

Closed
pjebs opened this issue Mar 31, 2018 · 3 comments
Closed

What's the Weissman score for this? #1087

pjebs opened this issue Mar 31, 2018 · 3 comments

Comments

@pjebs
Copy link

pjebs commented Mar 31, 2018

I want to use this algorithm for various use cases.
I only want to use it if the weissman score was better than the theoretical limit.
I don't see the weissman score listed on the readme.

@Cyan4973
Copy link
Contributor

@Cyan4973 Cyan4973 closed this as completed Apr 2, 2018
@yucolabjames
Copy link

Wondering the same thing, has anyone ran enough tests to publish an accurate weissman score?

@felixhandte
Copy link
Contributor

felixhandte commented Dec 17, 2018

Weissman scoring has a number of problems:

  1. It is a relative score. You need to pick a reference speed and ratio to compare against.
  2. It produces nonsense answers with T <= 1.
  3. It is sensitive to the time unit used. (log(T_ref / T_exp) would probably have been better than log(T_ref) / log(T_exp)). As it stands, scoring a compressor using minutes vs seconds produces different scores.
  4. It fails to capture the real-world trade-off between ratio and time. The possibility frontier of tradeoffs between speed and ratio in compression is not log-shaped.
  5. It doesn't factor in decompression speed at all.

Nonetheless, with the following parameters:

  • Using gzip (at its default level 6) as the reference compressor.
  • Benchmarking on the Silesia corpus.
  • Using tenths of a second as the time unit (since some of the faster compressors take less than a second, which would otherwise produce negative logs).
  • Using an alpha of one.

I get the following scores:

Algo Lvl Score
gzip 1 1.19
gzip 2 1.20
gzip 3 1.15
gzip 4 1.16
gzip 5 1.09
gzip 6 1.00
gzip 7 0.96
gzip 8 0.87
gzip 9 0.83
lz4 1 2.98
zstd -5 2.97
zstd -4 2.86
zstd -3 2.77
zstd -2 2.58
zstd -1 2.54
zstd 1 2.67
zstd 2 2.34
zstd 3 2.11
zstd 4 1.98
zstd 5 1.67
zstd 6 1.55
zstd 7 1.42
zstd 8 1.34
zstd 9 1.24
zstd 10 1.18
zstd 11 1.12
zstd 12 1.03
zstd 13 0.97
zstd 14 0.94
zstd 15 0.90
zstd 16 0.89
zstd 17 0.86
zstd 18 0.84
zstd 19 0.82
zstd 20 0.82
zstd 21 0.80
zstd 22 0.79

As you can see, both zstd and lz4 do break the theoretical limit of 2.9.

I hope this information is helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants