Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for 16 bit FSE compression #448

Closed
pappuks opened this issue Nov 5, 2021 · 6 comments
Closed

Add support for 16 bit FSE compression #448

pappuks opened this issue Nov 5, 2021 · 6 comments

Comments

@pappuks
Copy link

pappuks commented Nov 5, 2021

I have created a 16 bit image compression codec in Go for medical images at https://github.com/pappuks/medical-image-codec which uses Delta encoding, RLE, FSE and/or Huffman coding. I have added 16 bit FSE implementation, which is based on the 8 bit FSE implementation from this repository and the 16 bit FSE implementation from https://github.com/Cyan4973/FiniteStateEntropy. The 16 bit FSE implementation in my repository has been updated to handle max values till the range of 65535. This is a deviation from the implementation in https://github.com/Cyan4973/FiniteStateEntropy where the max supported value is 4095.

I would like if the 16 bit FSE implementation can be added to this repository, that way it is available along with the 8 bit FSE implementation.

The files which have the 16 bit FSE implementation are:
https://github.com/pappuks/medical-image-codec/blob/main/fseu16.go
https://github.com/pappuks/medical-image-codec/blob/main/fsecompressu16.go
https://github.com/pappuks/medical-image-codec/blob/main/fsedecompressu16.go

If you agree with the feature request, I can then submit a PR to this branch.

@klauspost
Copy link
Owner

klauspost commented Nov 5, 2021

@pappuks While it is always interesting with more features, I am curious why you choose this route, since it is generally not used.

For example ztsd uses a small FSE (35 symbols) to define literal lengths up to 128K. They have 16 that just output the immediate value and the rest are defined as an Baseline + read Number_of_Bits bits:

Literals_Length_Code 0-15
length Literals_Length_Code
Number_of_Bits 0
Literals_Length_Code 16 17 18 19 20 21 22 23
Baseline 16 18 20 22 24 28 32 40
Number_of_Bits 1 1 1 1 2 2 3 3
Literals_Length_Code 24 25 26 27 28 29 30 31
Baseline 48 64 128 256 512 1024 2048 4096
Number_of_Bits 4 6 7 8 9 10 11 12
Literals_Length_Code 32 33 34 35
Baseline 8192 16384 32768 65536
Number_of_Bits 13 14 15 16

The literals length is equal to the decoded Baseline plus the result of reading Number_of_Bits bits from the bitstream.

You of course don't need the 35 case, and the 16 immediate numbers can be as big as you want, or you could even add the 16 numbers you want, assuming you put them on the stream.

This allows the FSE table to be small and still quite efficient at representing a big range of numbers. Of course it requires your numbers to be as close to zero, but for delta-encoded images that is also reasonable, assuming you zigzag-encode your deltas. Of course you don't need the FSE codes to be close to 0.

@klauspost
Copy link
Owner

klauspost commented Nov 5, 2021

This is then used to transform a length to a code:

const maxLLCode = 35

// llBitsTable translates from ll code to number of bits.
var llBitsTable = [maxLLCode + 1]byte{
	0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0,
	1, 1, 1, 1, 2, 2, 3, 3,
	4, 6, 7, 8, 9, 10, 11, 12,
	13, 14, 15, 16}

var llCodeTable = [64]byte{0, 1, 2, 3, 4, 5, 6, 7,
	8, 9, 10, 11, 12, 13, 14, 15,
	16, 16, 17, 17, 18, 18, 19, 19,
	20, 20, 20, 20, 21, 21, 21, 21,
	22, 22, 22, 22, 22, 22, 22, 22,
	23, 23, 23, 23, 23, 23, 23, 23,
	24, 24, 24, 24, 24, 24, 24, 24,
	24, 24, 24, 24, 24, 24, 24, 24}
	
// llCode returns the code that represents the literal length requested.
func llCode(litLength uint32) uint8 {
	const llDeltaCode = 19
	if litLength <= 63 {
		return llCodeTable[litLength]
	}
	return uint8(highBit(litLength)) + llDeltaCode
}

func highBit(val uint32) (n uint32) {
	return uint32(bits.Len32(val) - 1)
}

Once you have the code the FSE table can return the number of extra bits to write, it any.

Note how Baseline == 1<<Number_of_Bits which means you just need to write the lower Number_of_Bits of your value,. ignoring the upper bit.

@pappuks
Copy link
Author

pappuks commented Nov 5, 2021

Thanks @klauspost for your response. I am delta encoding but not using zigzag encoding, instead I am shifting the encoded value by 1 << (pixeldepth - 1). I will try zigzag encoding and see if that makes a difference in the number of unique values, even though I doubt it will change by much.

Right now I am seeing thousands of unique values for some of the datasets even with delta and rle encoding. And hence I decided to extend the FSE dictionary range to 16 bit.

I will try the approach suggested by you to encode baseline + number of bits. Will keep you posted.

@klauspost
Copy link
Owner

ZigZag is fast and easy, basically interleaves positive and negative numbers: https://play.golang.org/p/Y79j8_3Ez0I

@klauspost
Copy link
Owner

I will not add it, since I see little benefit from it currently. It scales pretty badly and better alternatives are possible.

@pappuks
Copy link
Author

pappuks commented Sep 5, 2022

Thanks @klauspost , appreciate you looking at this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants