New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for 16 bit FSE compression #448
Comments
@pappuks While it is always interesting with more features, I am curious why you choose this route, since it is generally not used. For example ztsd uses a small FSE (35 symbols) to define literal lengths up to 128K. They have 16 that just output the immediate value and the rest are defined as an
The literals length is equal to the decoded You of course don't need the This allows the FSE table to be small and still quite efficient at representing a big range of numbers. Of course it requires your numbers to be as close to zero, but for delta-encoded images that is also reasonable, assuming you zigzag-encode your deltas. Of course you don't need the FSE codes to be close to 0. |
This is then used to transform a length to a code: const maxLLCode = 35
// llBitsTable translates from ll code to number of bits.
var llBitsTable = [maxLLCode + 1]byte{
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 2, 2, 3, 3,
4, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16}
var llCodeTable = [64]byte{0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 16, 17, 17, 18, 18, 19, 19,
20, 20, 20, 20, 21, 21, 21, 21,
22, 22, 22, 22, 22, 22, 22, 22,
23, 23, 23, 23, 23, 23, 23, 23,
24, 24, 24, 24, 24, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24}
// llCode returns the code that represents the literal length requested.
func llCode(litLength uint32) uint8 {
const llDeltaCode = 19
if litLength <= 63 {
return llCodeTable[litLength]
}
return uint8(highBit(litLength)) + llDeltaCode
}
func highBit(val uint32) (n uint32) {
return uint32(bits.Len32(val) - 1)
} Once you have the code the FSE table can return the number of extra bits to write, it any. Note how |
Thanks @klauspost for your response. I am delta encoding but not using zigzag encoding, instead I am shifting the encoded value by Right now I am seeing thousands of unique values for some of the datasets even with delta and rle encoding. And hence I decided to extend the FSE dictionary range to 16 bit. I will try the approach suggested by you to encode |
ZigZag is fast and easy, basically interleaves positive and negative numbers: https://play.golang.org/p/Y79j8_3Ez0I |
I will not add it, since I see little benefit from it currently. It scales pretty badly and better alternatives are possible. |
Thanks @klauspost , appreciate you looking at this. |
I have created a 16 bit image compression codec in Go for medical images at https://github.com/pappuks/medical-image-codec which uses Delta encoding, RLE, FSE and/or Huffman coding. I have added 16 bit FSE implementation, which is based on the 8 bit FSE implementation from this repository and the 16 bit FSE implementation from https://github.com/Cyan4973/FiniteStateEntropy. The 16 bit FSE implementation in my repository has been updated to handle max values till the range of 65535. This is a deviation from the implementation in https://github.com/Cyan4973/FiniteStateEntropy where the max supported value is 4095.
I would like if the 16 bit FSE implementation can be added to this repository, that way it is available along with the 8 bit FSE implementation.
The files which have the 16 bit FSE implementation are:
https://github.com/pappuks/medical-image-codec/blob/main/fseu16.go
https://github.com/pappuks/medical-image-codec/blob/main/fsecompressu16.go
https://github.com/pappuks/medical-image-codec/blob/main/fsedecompressu16.go
If you agree with the feature request, I can then submit a PR to this branch.
The text was updated successfully, but these errors were encountered: