Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard names for compression levels #89

Closed
KrzysFR opened this issue Dec 7, 2015 · 8 comments
Closed

Standard names for compression levels #89

KrzysFR opened this issue Dec 7, 2015 · 8 comments

Comments

@KrzysFR
Copy link
Contributor

KrzysFR commented Dec 7, 2015

I'm currently updating my wrapper for the new API introduced in 0.4.x, and I'm merging all the methods with and without compression levels into a single set of methods.

Asking the caller to pass an untyped integer value for the compression level seems a bit dangerous: most people will probably remember zlib, and expect a 1 - 9 scale with default at 5. That's why I would like to have an enum with well known names, and a clear default "if you don't know better use that one" level.

Quick questions:

  • Is the default level still intended to be level 1? or is the whole zstd vs zstd_hc a thing of the past?
  • What about level 0? The comment in zstd_static.h says that level 0 is "never used", but quick tests show that using compression level 0 works fine (and compress about the same as level 1)
  • Are there plans to have standardized names for some compression levels, like "fast", "high", "ultra" and so on? And if yes, what would be the values?
  • Right now max level is 20 but it seems that it can change. If yes, do you plan to have some way to, at runtime, probe for the range of supported compression levels?
@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 8, 2015

Bonjour Christophe

Is the default level still intended to be level 1?

Yes.
I have (unfulfilled) long term plans for even faster modes,
that's what negative levels will be used for.

or is the whole zstd vs zstd_hc a thing of the past?

Well, I realized that having 2 separate functions, with zstd meaning the same as zstd_hc with a fixed level 1, did not helped API readability.
So decided to merge everything into a single ZSTD_compress()

What about level 0? The comment in zstd_static.h says that level 0 is "never used", but quick tests show that using compression level 0 works fine (and compress about the same as level 1)

All levels <= 0 are simply remapped to 1 internally. for now.

Are there plans to have standardized names for some compression levels, like "fast", "high", "ultra" and so on?

Nope.
Could change in the future if there is reasons to. But every wrapper should be able to select whichever value it wants.
For example, folly selected fast : 1 , default : 1 , best : 19

Right now max level is 20 but it seems that it can change. If yes, do you plan to have some way to, at runtime, probe for the range of supported compression levels?

Yes.
The value ZSTD_MAX_CLEVEL within zstd_static.h is meant for this usage.

@KrzysFR
Copy link
Contributor Author

KrzysFR commented Dec 8, 2015

For example, folly selected fast : 1 , default : 1 , best : 19

Do you mean this? https://github.com/facebook/folly/blob/master/folly/io/Compression.h#L152
The idea of having values -1, -2, -3 remapped to whatever the min/default/max points in the range supported by the loaded library seems nice, but if you intend to indroduce negative levels, this would need be moved to some other values...

The value ZSTD_MAX_CLEVEL within zstd_static.h is meant for this usage.

It is currently a #define which is not visible when you are consuming a dll (at least from a managed language such as .NET). Or maybe there is a way but I'm not familiar with it. Currently, I have a constant in my .NET code which I synchronize with the latest version of the .h when I rebuild the dll internally, but this is brittle. I don't have this issue with ZSTD_VERSION_... because there is a ZSTD_versionNumber() method.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 8, 2015

It is currently a #define which is not visible when you are consuming a dll

Good point.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 9, 2015

Within latest "dev" branch update d608088, there is a new method ZSTD_maxCLevel() to retrieve that value.

@KrzysFR
Copy link
Contributor Author

KrzysFR commented Dec 10, 2015

I have changed my code to query for ZSTD_maxCLevel() on start, and I'm currently using an enum with 4 levels:

  • Default = 0, but is converted to 1 internally before calling zstd.
  • Fastest = 1 (but could change?)
  • Medium = (1 + MAX) / 2
  • Highest = MAX

This gives Medium = 10 and Highest = 20 in the current version.

Having 0 for the default level helps in that it is also the default value for uninitialized variables and optional parameters in .NET, which works well in the API. This could work if you NEVER use 0 as a compression level (even if you start using negative values). If not then I could simply use a nullable integer in the API, which would also be acceptable.

If the minimum compression level can be less than 1 in the future, we would need a ZSTD_minCLevel() method to query the value. Or maybe a single method that returns the bounds?

@Cyan4973
Copy link
Contributor

Having 0 for the default level helps

Agreed. This is the current convention, and is expected to remain like this. LZ4 uses the same convention.

If the minimum compression level can be less than 1 in the future, we would need a ZSTD_minCLevel() method to query the value.

Agreed, although this is just a "long term intention", with no precise plan.
Even if I introduce negative compression levels later on (which is not yet guaranteed), this won't change anything for "normal" existing compression levels, hence interface break.
So we'll have time to handle this issue if it ever becomes a reality.

@Cyan4973
Copy link
Contributor

Is there anything else to add, or could we close the topic ?

@KrzysFR
Copy link
Contributor Author

KrzysFR commented Dec 15, 2015

I think it's OK to close.

To sum up:

  • I'm using 0 to mean "the default" which currently is 1
  • I'm calling ZSTD_maxCLevel to get the max supported by the version of dll loaded at runtime.
  • I provide a set of Fastest/Medium/High/Highest helpers that can be used optionally, but with guidance in the comments that app code should rely on testing for the best level and use the numerical value instead... or use default if they don't really care.
  • I'm probably going to validate levels at the wrapper level, and throw if they are above this, instead of capping it silently at the MAX, because it's more inline with what a .NET developer would expect (most API validate enum-like parameters and throw).
  • In the future if compression level can get negative, then maybe a ZSTD_minCLevel could be added to get the lower bound (but sill keep 0 as the default).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants