Explanation of FSE-decoding-table population in format-documentation #1782

KillingSpark · 2019-09-11T08:54:12Z

I think the following section in doc/zstd_compression_format.md is a bit vague about what 'natural order' actually means. It could mean at least two things:

The order in which the states are inserted in the table while spreading symbols (which is what I assumed reading the document without looking at the code)
The order in which the states occur in the decoding table after the symbols are already spread (Which is the actual meaning, if I understand the code correctly)

To get the Number_of_Bits and Baseline required for next state,
it's first necessary to sort all states in their natural order.
The lower states will need 1 more bit than higher ones.
The process is repeated for each symbol.

Example :
Presuming a symbol has a probability of 5.
It receives 5 state values. States are sorted in natural order.

Next power of 2 is 8.
Space of probabilities is divided into 8 equal parts.
Presuming the Accuracy_Log is 7, it defines 128 states.
Divided by 8, each share is 16 large.

In order to reach 8, 8-5=3 lowest states will count "double",
doubling the number of shares (32 in width),
requiring one more bit in the process.

Baseline is assigned starting from the higher states using fewer bits,
and proceeding naturally, then resuming at the first state,
each takes its allocated width from Baseline.

I think it would be clearer like this (added text bold):

To get the Number_of_Bits and Baseline required for next state,
it's first necessary to sort all states in their natural order. This is given when the symbols have been spread in the table. Iterating from 0 to tableSize visits the states in their natural order
The lower states will need 1 more bit than higher ones.
The process is repeated for each symbol.

Example :
Presuming a symbol has a probability of 5.
It receives 5 state values. States are sorted in natural order.

Next power of 2 is 8.
Space of probabilities is divided into 8 equal parts.
Presuming the Accuracy_Log is 7, it defines 128 states.
Divided by 8, each share is 16 large.

In order to reach 8, 8-5=3 lowest states will count "double",
doubling the number of shares (32 in width),
requiring one more bit in the process.

Baseline is assigned starting from the higher states using fewer bits,
and proceeding naturally, then resuming at the first state,
each takes its allocated width from Baseline. Iterating over the table each state representing this symbol receives the Number_of_Bits
and Baseline according to the following table:

The text was updated successfully, but these errors were encountered:

Cyan4973 · 2019-09-11T16:18:54Z

That's a good point @KillingSpark.
The documentation will be updated following your suggestion.

requested in #1782

Cyan4973 · 2019-10-22T20:17:18Z

Hopefully, the updated version of the format documentation should describe this topic better.

KillingSpark · 2019-10-25T08:40:42Z

I thinks this is clearer, thanks!

Cyan4973 added the documentation label Sep 11, 2019

Cyan4973 self-assigned this Sep 11, 2019

Cyan4973 added a commit that referenced this issue Oct 19, 2019

clarifications for the FSE decoding table

ff7bd16

requested in #1782

Cyan4973 mentioned this issue Oct 19, 2019

clarifications for the FSE decoding table #1835

Merged

Cyan4973 closed this as completed Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explanation of FSE-decoding-table population in format-documentation #1782

Explanation of FSE-decoding-table population in format-documentation #1782

KillingSpark commented Sep 11, 2019

Cyan4973 commented Sep 11, 2019

Cyan4973 commented Oct 22, 2019 •

edited

KillingSpark commented Oct 25, 2019

Explanation of FSE-decoding-table population in format-documentation #1782

Explanation of FSE-decoding-table population in format-documentation #1782

Comments

KillingSpark commented Sep 11, 2019

Cyan4973 commented Sep 11, 2019

Cyan4973 commented Oct 22, 2019 • edited

KillingSpark commented Oct 25, 2019

Cyan4973 commented Oct 22, 2019 •

edited