Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add appendix with example file decoding 'by hand' #120

Merged
merged 4 commits into from Feb 18, 2022

Conversation

ktmf01
Copy link
Collaborator

@ktmf01 ktmf01 commented Nov 25, 2021

In a previous PR (and also in the rewrite), I added an example to the coded residual section which felt out of place. Such an example didn't seem to fit in a formal specification, so the idea came to write an appendix with more thorough examples.

These examples can provide a more hands-on way to understand the FLAC specification for readers that need it. Also, it can be used by people proofreading this specification to do cross-referencing, as it is 'redundant'.

If this turns out to be a welcome addition, I would like to add more examples to this appendix in the future, specifically:

  • Example with a frame with a linear predictor (LPC)
  • Example involving all other types of metadata blocks
  • Example with variable blocksize frame
  • Example with a bit depth that is not a whole number of bytes, specifically to explain how MD5summing works in that case

Please provide feedback

rfc_backmatter.md Outdated Show resolved Hide resolved
ktmf01 added a commit to ktmf01/flac-specification that referenced this issue Dec 12, 2021
As an appendix with examples was suggested in ietf-wg-cellar#120 and the example
felt out of place here, it is removed. Wording of the remainder is
slightly improved
rfc_backmatter.md Outdated Show resolved Hide resolved
@ktmf01
Copy link
Collaborator Author

@ktmf01 ktmf01 commented Feb 15, 2022

I just added commit da39c32 to this PR involving a change for the same cause as #129. However, instead of removing, the UTF-8 characters are changed (as this example really needs UTF-8) so it renders clearer when fed through mmark and xml2rfc. See commit message for details


The vendor string is reference libFLAC 1.3.3 20190804, the field contents of the only field is title=Québec. The vorbis comment field is 13 bytes but only 12 characters in size, because it contains one character needing 2 bytes to represent.
The vendor string is reference libFLAC 1.3.3 20190804, the field contents of the only field is TITLE=Щелкунчик. The vorbis comment field is 24 bytes but only 15 characters in size, because it contains 9 character needing 2 bytes to represent.
Copy link
Collaborator

@retokromer retokromer Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TITLE= Матрёшка would be funnier (sorry, I could not resist ;-)

Copy link
Collaborator Author

@ktmf01 ktmf01 Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea actually. I was looking for short words in scripts other than latin that would not be considered offensive in any way, and that was rather hard as I don't speak or read any Russian, Greek, Chinese etc. Матрёшка is one character less then Щелкунчик so it is probably a better fit, as all those UTF-8 code points still make it look cluttered. Maybe you know something even shorter in Hebrew?

Now I think of it: If I shorten the vendor string, I don't have to change the whole example even with a longer vorbis comment field, so less room for error.

Copy link
Collaborator

@retokromer retokromer Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short, well known and all but offensive would be: שלום

Copy link
Collaborator Author

@ktmf01 ktmf01 Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll add that in and force-push this PR branch to amend the last commit.


Start | Length | Contents | Description
:------|:--------|:-------------------|:-----------------
0x7e+0 | 1 bit | 0b0 | Last metadata block
Copy link
Collaborator Author

@ktmf01 ktmf01 Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another copy-paste error

Suggested change
0x7e+0 | 1 bit | 0b0 | Last metadata block
0x7e+0 | 1 bit | 0b1 | Last metadata block

0x15+4 | 36 bit | 0b0000, 0x00000001 | Total no. of samples 1
0x1a | 16 byte | (...) | MD5 signature

The minimum and maximum blocksize are both 4096. This was apparently the blocksize the encoder was intending to use for this audio, but as only 1 interchannel sample was provided, no frames with size 4096 are actually present in this file. This is because even in fixed blocksize streams, the size of the last frame can be smaller.
Copy link
Collaborator Author

@ktmf01 ktmf01 Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The minimum and maximum blocksize are both 4096. This was apparently the blocksize the encoder was intending to use for this audio, but as only 1 interchannel sample was provided, no frames with size 4096 are actually present in this file. This is because even in fixed blocksize streams, the size of the last frame can be smaller.
The minimum and maximum blocksize are both 4096. This was apparently the blocksize the encoder planned to use, but as only 1 interchannel sample was provided, no frames with 4096 samples are actually present in this file.


The frame ends with 6 padding bits and a 2 byte frame CRC

To decode this subframe, 21 predictions have to calculated and added to their corresponding residuals. This is a sequential process: as each prediction uses previous samples, it is not possible to start this decoding halfway a subframe or decode a subframe with parallel threads.
Copy link
Collaborator Author

@ktmf01 ktmf01 Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To decode this subframe, 21 predictions have to calculated and added to their corresponding residuals. This is a sequential process: as each prediction uses previous samples, it is not possible to start this decoding halfway a subframe or decode a subframe with parallel threads.
To decode this subframe, 21 predictions have to be calculated and added to their corresponding residuals. This is a sequential process: as each prediction uses previous samples, it is not possible to start this decoding halfway a subframe or decode a subframe with parallel threads.


This informational appendix contains short example FLAC files and short parts of FLAC files which are decoded step by step. These examples provide a more engaging way to understand the FLAC format than the formal specification. The text explaining these examples assumes the reader has at least cursory read the specification and that the reader refers to the specification for explanation of the terminology used. These examples mostly focus on the lay-out of several metadata blocks and subframe types and the implications of certain aspects (for example wasted bits and stereo decorrelation) on this lay-out.

The examples feature (parts of) files generated by various FLAC encoders. These are presented in hexadecimal or binary format, followed by tables and text referring to various features by their starting bit positions in these representations. These starting positions (shortened to 'start' in the tables) are a hexadecimal byte position and a start bit within that byte, separated by a plus sign. Counts for these start at zero. For example, a feature starting at the 3rd bit of the 17th byte is referred to as starting at 0x10+2.
Copy link
Collaborator Author

@ktmf01 ktmf01 Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps stress a little more that these examples are informational and could contain errors, despite thorough checking?

Suggested change
The examples feature (parts of) files generated by various FLAC encoders. These are presented in hexadecimal or binary format, followed by tables and text referring to various features by their starting bit positions in these representations. These starting positions (shortened to 'start' in the tables) are a hexadecimal byte position and a start bit within that byte, separated by a plus sign. Counts for these start at zero. For example, a feature starting at the 3rd bit of the 17th byte is referred to as starting at 0x10+2.
The examples feature (parts of) files generated by various FLAC encoders. These are presented in hexadecimal or binary format, followed by tables and text referring to various features by their starting bit positions in these representations. Each starting position (shortened to 'start' in the tables) is a hexadecimal byte position and a start bit within that byte, separated by a plus sign. Counts for these start at zero. For example, a feature starting at the 3rd bit of the 17th byte is referred to as starting at 0x10+2.
All data in this appendix has been thoroughly verified. However, as this appendix is informational, in case any information here conflicts with statements in the formal specification, the latter takes precedence.


Anywhere a number of samples is mentioned (blocksize, total number of samples, sample rate), interchannel samples are meant.

The MD5 sum (starting at 0x1a) is 0x3e84 b418 07dc 6903 0758 6a3d ad1a 2e0f. This is validated after decoding the samples.
Copy link
Collaborator Author

@ktmf01 ktmf01 Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The MD5 sum (starting at 0x1a) is 0x3e84 b418 07dc 6903 0758 6a3d ad1a 2e0f. This is validated after decoding the samples.
The MD5 sum (starting at 0x1a) is 0x3e84 b418 07dc 6903 0758 6a3d ad1a 2e0f. This will be validated after decoding the samples.

0x34+1 | 6 bit | 0b000001 | verbatim subframe
0x34+7 | 1 bit | 0b1 | wasted bits present
0x35+0 | 4 bit | 0b0001 | 4 wasted bits
0x35+4 | 14 bit | 0b0010, 0x8b | 12-bit unencoded sample
Copy link
Collaborator Author

@ktmf01 ktmf01 Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
0x35+4 | 14 bit | 0b0010, 0x8b | 12-bit unencoded sample
0x35+4 | 12 bit | 0b0010, 0x8b | 12-bit unencoded sample

Copy link
Collaborator

@retokromer retokromer Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

0 | 533 | 533 | -267
0 | 268 | 268 | 134

It can be calculated that using a Rice code is more efficient than storing values unencoded. The rice code (excluding the partition order and parameter) takes 197 bits. Storing unencoded, the largest value (-13172) would need 15 bits for storing, so 15*15 = 225 which is larger.
Copy link
Collaborator Author

@ktmf01 ktmf01 Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It can be calculated that using a Rice code is more efficient than storing values unencoded. The rice code (excluding the partition order and parameter) takes 197 bits. Storing unencoded, the largest value (-13172) would need 15 bits for storing, so 15*15 = 225 which is larger.
It can be calculated that using a Rice code is in this case more efficient than storing values unencoded. The rice code (excluding the partition order and parameter) is 199 bits in length. The largest residual value (-13172) would need 15 bits to be stored unencoded, so storing all 15 samples with 15 bits results in a sequence with a length of 225 bits.

ktmf01 and others added 3 commits Feb 17, 2022
In a previous PR, I added an example to the coded residual section
which felt out of place. This examples can provide a more hands-on
way to understand the FLAC specification for readers that need it.
Also, it can be used by people proofreading this specification to
do cross-referencing, as it is 'redundant'.
In the second example, mmark and xml2rfc rendered the following

> the only field is title=Qué (U+00E9)bec

which seemed unclear to me. With this commit, it is rendered as

>  the only field is TITLE=שלום (U+05E9 U+05DC U+05D5 U+05DD)

which I think is clearer.
@ktmf01
Copy link
Collaborator Author

@ktmf01 ktmf01 commented Feb 17, 2022

I just force-pushed this PR branch to make it mergeable after #124 was merged. I also rebased all typos into the first commit, and added the 3 example files as FLAC files. I'll probably merge it after having another look at it tomorrow.

rfc_backmatter.md Outdated Show resolved Hide resolved
rfc_backmatter.md Outdated Show resolved Hide resolved
rfc_backmatter.md Outdated Show resolved Hide resolved
@ktmf01 ktmf01 merged commit 1e4058b into ietf-wg-cellar:master Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants