Add support for 16 bit FSE compression #448

pappuks · 2021-11-05T06:29:18Z

I have created a 16 bit image compression codec in Go for medical images at https://github.com/pappuks/medical-image-codec which uses Delta encoding, RLE, FSE and/or Huffman coding. I have added 16 bit FSE implementation, which is based on the 8 bit FSE implementation from this repository and the 16 bit FSE implementation from https://github.com/Cyan4973/FiniteStateEntropy. The 16 bit FSE implementation in my repository has been updated to handle max values till the range of 65535. This is a deviation from the implementation in https://github.com/Cyan4973/FiniteStateEntropy where the max supported value is 4095.

I would like if the 16 bit FSE implementation can be added to this repository, that way it is available along with the 8 bit FSE implementation.

The files which have the 16 bit FSE implementation are:
https://github.com/pappuks/medical-image-codec/blob/main/fseu16.go
https://github.com/pappuks/medical-image-codec/blob/main/fsecompressu16.go
https://github.com/pappuks/medical-image-codec/blob/main/fsedecompressu16.go

If you agree with the feature request, I can then submit a PR to this branch.

klauspost · 2021-11-05T09:18:11Z

@pappuks While it is always interesting with more features, I am curious why you choose this route, since it is generally not used.

For example ztsd uses a small FSE (35 symbols) to define literal lengths up to 128K. They have 16 that just output the immediate value and the rest are defined as an Baseline + read Number_of_Bits bits:

`Literals_Length_Code`	0-15
length	`Literals_Length_Code`
`Number_of_Bits`	0

`Literals_Length_Code`	16	17	18	19	20	21	22	23
`Baseline`	16	18	20	22	24	28	32	40
`Number_of_Bits`	1	1	1	1	2	2	3	3

`Literals_Length_Code`	24	25	26	27	28	29	30	31
`Baseline`	48	64	128	256	512	1024	2048	4096
`Number_of_Bits`	4	6	7	8	9	10	11	12

`Literals_Length_Code`	32	33	34	35
`Baseline`	8192	16384	32768	65536
`Number_of_Bits`	13	14	15	16

The literals length is equal to the decoded Baseline plus the result of reading Number_of_Bits bits from the bitstream.

You of course don't need the 35 case, and the 16 immediate numbers can be as big as you want, or you could even add the 16 numbers you want, assuming you put them on the stream.

This allows the FSE table to be small and still quite efficient at representing a big range of numbers. Of course it requires your numbers to be as close to zero, but for delta-encoded images that is also reasonable, assuming you zigzag-encode your deltas. Of course you don't need the FSE codes to be close to 0.

klauspost · 2021-11-05T09:30:03Z

This is then used to transform a length to a code:

const maxLLCode = 35

// llBitsTable translates from ll code to number of bits.
var llBitsTable = [maxLLCode + 1]byte{
	0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0,
	1, 1, 1, 1, 2, 2, 3, 3,
	4, 6, 7, 8, 9, 10, 11, 12,
	13, 14, 15, 16}

var llCodeTable = [64]byte{0, 1, 2, 3, 4, 5, 6, 7,
	8, 9, 10, 11, 12, 13, 14, 15,
	16, 16, 17, 17, 18, 18, 19, 19,
	20, 20, 20, 20, 21, 21, 21, 21,
	22, 22, 22, 22, 22, 22, 22, 22,
	23, 23, 23, 23, 23, 23, 23, 23,
	24, 24, 24, 24, 24, 24, 24, 24,
	24, 24, 24, 24, 24, 24, 24, 24}
	
// llCode returns the code that represents the literal length requested.
func llCode(litLength uint32) uint8 {
	const llDeltaCode = 19
	if litLength <= 63 {
		return llCodeTable[litLength]
	}
	return uint8(highBit(litLength)) + llDeltaCode
}

func highBit(val uint32) (n uint32) {
	return uint32(bits.Len32(val) - 1)
}

Once you have the code the FSE table can return the number of extra bits to write, it any.

Note how Baseline == 1<<Number_of_Bits which means you just need to write the lower Number_of_Bits of your value,. ignoring the upper bit.

pappuks · 2021-11-05T15:29:30Z

Thanks @klauspost for your response. I am delta encoding but not using zigzag encoding, instead I am shifting the encoded value by 1 << (pixeldepth - 1). I will try zigzag encoding and see if that makes a difference in the number of unique values, even though I doubt it will change by much.

Right now I am seeing thousands of unique values for some of the datasets even with delta and rle encoding. And hence I decided to extend the FSE dictionary range to 16 bit.

I will try the approach suggested by you to encode baseline + number of bits. Will keep you posted.

klauspost · 2021-11-05T16:20:23Z

ZigZag is fast and easy, basically interleaves positive and negative numbers: https://play.golang.org/p/Y79j8_3Ez0I

klauspost · 2022-09-05T07:31:29Z

I will not add it, since I see little benefit from it currently. It scales pretty badly and better alternatives are possible.

pappuks · 2022-09-05T14:23:56Z

Thanks @klauspost , appreciate you looking at this.

klauspost added the enhancement label Jan 28, 2022

klauspost closed this as completed Sep 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for 16 bit FSE compression #448

Add support for 16 bit FSE compression #448

pappuks commented Nov 5, 2021

klauspost commented Nov 5, 2021 •

edited

klauspost commented Nov 5, 2021 •

edited

pappuks commented Nov 5, 2021 •

edited

klauspost commented Nov 5, 2021

klauspost commented Sep 5, 2022

pappuks commented Sep 5, 2022

Navigation Menu

Add support for 16 bit FSE compression #448

Add support for 16 bit FSE compression #448

Comments

pappuks commented Nov 5, 2021

klauspost commented Nov 5, 2021 • edited

klauspost commented Nov 5, 2021 • edited

pappuks commented Nov 5, 2021 • edited

klauspost commented Nov 5, 2021

klauspost commented Sep 5, 2022

pappuks commented Sep 5, 2022

klauspost commented Nov 5, 2021 •

edited

klauspost commented Nov 5, 2021 •

edited

pappuks commented Nov 5, 2021 •

edited