Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation of FSE-decoding-table population in format-documentation #1782

Closed
KillingSpark opened this issue Sep 11, 2019 · 3 comments
Closed
Assignees

Comments

@KillingSpark
Copy link

I think the following section in doc/zstd_compression_format.md is a bit vague about what 'natural order' actually means. It could mean at least two things:

  1. The order in which the states are inserted in the table while spreading symbols (which is what I assumed reading the document without looking at the code)
  2. The order in which the states occur in the decoding table after the symbols are already spread (Which is the actual meaning, if I understand the code correctly)

To get the Number_of_Bits and Baseline required for next state,
it's first necessary to sort all states in their natural order.
The lower states will need 1 more bit than higher ones.
The process is repeated for each symbol.

Example :
Presuming a symbol has a probability of 5.
It receives 5 state values. States are sorted in natural order.

Next power of 2 is 8.
Space of probabilities is divided into 8 equal parts.
Presuming the Accuracy_Log is 7, it defines 128 states.
Divided by 8, each share is 16 large.

In order to reach 8, 8-5=3 lowest states will count "double",
doubling the number of shares (32 in width),
requiring one more bit in the process.

Baseline is assigned starting from the higher states using fewer bits,
and proceeding naturally, then resuming at the first state,
each takes its allocated width from Baseline.

I think it would be clearer like this (added text bold):

To get the Number_of_Bits and Baseline required for next state,
it's first necessary to sort all states in their natural order. This is given when the symbols have been spread in the table. Iterating from 0 to tableSize visits the states in their natural order
The lower states will need 1 more bit than higher ones.
The process is repeated for each symbol.

Example :
Presuming a symbol has a probability of 5.
It receives 5 state values. States are sorted in natural order.

Next power of 2 is 8.
Space of probabilities is divided into 8 equal parts.
Presuming the Accuracy_Log is 7, it defines 128 states.
Divided by 8, each share is 16 large.

In order to reach 8, 8-5=3 lowest states will count "double",
doubling the number of shares (32 in width),
requiring one more bit in the process.

Baseline is assigned starting from the higher states using fewer bits,
and proceeding naturally, then resuming at the first state,
each takes its allocated width from Baseline. Iterating over the table each state representing this symbol receives the Number_of_Bits
and Baseline according to the following table:

@Cyan4973 Cyan4973 self-assigned this Sep 11, 2019
@Cyan4973
Copy link
Contributor

That's a good point @KillingSpark.
The documentation will be updated following your suggestion.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Oct 22, 2019

Hopefully, the updated version of the format documentation should describe this topic better.

@KillingSpark
Copy link
Author

I thinks this is clearer, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants