odd number of frames with an odd number of bytes-per-frame behaves odd #887

umlaeute · 2022-10-19T12:32:51Z

so I have a 24bit PCM WAV that consists of 101 frames, which gives me a data-chunk worth of 303 bytes.

such a file can be created as follows:

#include <sndfile.h>

#define NUMSAMPLES 101
static float samples_float[NUMSAMPLES];

int main() {
  SNDFILE*sf;
  SF_INFO sfinfo;
  sfinfo.frames = NUMSAMPLES;
  sfinfo.samplerate = 44100;
  sfinfo.channels = 1;
  sfinfo.format = SF_FORMAT_WAV | SF_FORMAT_PCM_24;

  sf = sf_open("303.wav", SFM_WRITE, &sfinfo);
  sf_write_float(sf, samples_float, NUMSAMPLES);
  sf_close(sf);
}

but if I read it back, like so:

  sf = sf_open("303.wav", SFM_READ, &sfinfo);
  sf_command(sf, SFC_GET_LOG_INFO, strbuffer, BUFFER_LEN);
  puts(strbuffer);
  sf_close(sf);

i get a nasty error:

*** 'data' chunk should be an even number of bytes in length.

so who is to blame for this?

should SFM_WRITE add a zero-padding byte, so the number of bytes in the data chunk is even?
should SFM_READ check whether the number of bytes equals bytesperframe*frames, (and not complain if its odd)?
is the WAV standard broken?

reading https://mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html, i see:

data Chunk

The data chunk contains the sampled data.

Field Length Contents

ckID 4 Chunk ID: "data"

cksize 4 Chunk size: n

sampled data n Samples

pad byte 0 or 1 Padding byte if n is odd

which indicates that the reported 'data' chunksize (as written into the chunk header) can indeed be odd (but the actual chunksize must be even, supposedly for alignment with the chunk that follows the 'data'-chunk which must always be last 🤔 )

(sorry for the pun in the title, but it was irresistible).

The text was updated successfully, but these errors were encountered:

umlaeute · 2022-10-19T12:36:37Z

btw, if i change the WAV-file (with some binary editor) to report a data chunk size of 304, the error goes away (even if now the RIFF-header indicates the wrong size)...

evpobr · 2022-10-20T08:28:39Z

Thanks for report @umlaeute .

arthurt · 2022-12-13T20:24:05Z

That warning message is purely informational, libsndfile is still able to read such a file. The bug is that we write such an odd-length file, that no padding byte is added to the end.

Another scenario to test would be mono formats of 8-bit or 4-bit sample size which could also potentially have an odd-length data-chunk given an odd sample length. Eg, wav with 1 channel and S8, U8, or ADPCM formats like MS-ADPCM, IMA-ADPCM, G726.

arthurt · 2022-12-14T07:38:13Z

So the problem is generic to any data section that is odd in length. 101 samples of unsigned 8-bit PCM have the same issue.

But it get weirder. Padding the data chunk was added way back in #61 (same error report too, odd number of mono 24-bit samples). So the chunks are padded, but the parser is incorrectly reporting the odd length. This log warning was added in the same commit 4e6c87b .

So the chunk parser has two bugs

It incorrectly reports about odd data chunk lengths
~~2) It does NOT skip the implied padding byte when a data chunk does have an odd length.~~

arthurt · 2022-12-14T07:45:38Z

Correction, the chunk parser DOES heed the padding byte. So the only bug really is the log message, which is in fact spurious.

umlaeute · 2022-12-14T13:52:03Z

The bug is that we write such an odd-length file, that no padding byte is added to the end.

i think this is the real bug.

So the only bug really is the log message, which is in fact spurious.

and i agree that this error message should go away iff WAV-files with odd data chunk lengths are really allowed.

it's surprisingly hard to find an "official standards document" for RIFF/WAV (and the link I gave in the original post can only be considered informal).
However, i have know found https://www.aelius.com/njh/wavemetatools/doc/riffmci.pdf, which seems to originate from IBM+Microsoft (albeit the file is hosted on some random page).

on page 11 it says:

Part	Description
`ckID`	A four-character code that identifies the representation of the chunk data data. A program reading a RIFF file can skip over any chunk whose chunk ID it doesn't recognize; it simply skips the number of bytes specified by `ckSize` plus the pad byte, if present.
`ckSize`	A 32-bit unsigned value identifying the size of `ckData`. This size value does not include the size of the `ckID` or `ckSize` fields or the pad byte at the end of `ckData`.
`ckData`	Binary data of fixed or variable size. The start of `ckData` is word-aligned with respect to the start of the RIFF file. If the chunk size is an odd number of bytes, a pad byte with value zero is written after `ckData`. Word aligning improves access speed (for chunks resident in memory) and maintains compatibility with EA IFF. The `ckSize` value does not include the pad byte.

So if we accept this document as a reference, the padding byte must be written, and it is therefore a "bug" if it is missing.
OTOH, given the prevalence of missing standards in the Windows world, there are probably enough WAV-files out there with a missing pad byte (at the EOF), so dropping the warning message might be considered anyhow.

arthurt · 2022-12-14T18:45:18Z

The bug is that we write such an odd-length file, that no padding byte is added to the end.

i think this is the real bug.

Except it's not a bug, we do currently write the padding byte. The first part of 4e6c87b added a line that writes a padding byte before any post-DATA chunks, if any. The commit doesn't have a comment, so it's otherwise a bit mysterious.

@@ -1188,6 +1191,9 @@ wav_write_tailer (SF_PRIVATE *psf)
	else
		psf->dataend = psf_fseek (psf, 0, SEEK_END) ;

+	if (psf->dataend & 1)
+		psf_binheader_writef (psf, "z", 1) ;

Confirmed with a test file with 101 samples of 8-bit mono pcm of 0x80 and a trailing INFO chunk:

00000000  52 49 46 46 ae 00 00 00  57 41 56 45 66 6d 74 20  |RIFF....WAVEfmt |
00000010  10 00 00 00 01 00 01 00  44 ac 00 00 44 ac 00 00  |........D...D...|
00000020  01 00 08 00 64 61 74 61  65 00 00 00 80 80 80 80  |....datae.......|
                      |^ ^^ ^^ ^^  |^ ^^ ^^ ^^ |^
                      |            |           Start of pcm data
                      |            chunk size = 101. Data is [0x2c-0x91)
                      data chunk

00000030  80 80 80 80 80 80 80 80  80 80 80 80 80 80 80 80  |................|
*
00000090  80 00 4c 49 53 54 1c 00  00 00 49 4e 46 4f 49 43  |..LIST....INFOIC|
          |^ |^ |^ ^^ ^^ ^^
          |  |  Next chunk 'LIST' at 0x92
          |  Padding Byte at 0x91
          Last byte of data at 0x90

000000a0  4d 54 10 00 00 00 4f 67  67 69 6c 79 20 77 6f 6f  |MT....Oggily woo|
000000b0  67 6c 69 79 00 00                                 |gliy..|
000000b6

Output of sndfile-info on the same:

========================================
File : 303.wav
Length : 182
RIFF : 174
WAVE
fmt  : 16
  Format        : 0x1 => WAVE_FORMAT_PCM
  Channels      : 1
  Sample Rate   : 44100
  Block Align   : 1
  Bit Width     : 8
  Bytes/sec     : 44100
*** 'data' chunk should be an even number of bytes in length.
data : 101
LIST : 28
  INFO
    ICMT : Oggily woogliy
End

----------------------------------------
Sample Rate : 44100
Frames      : 101
Channels    : 1
Format      : 0x00010005
Sections    : 1
Seekable    : TRUE
Duration    : 00:00:00.002
Signal Max  : 0 (-inf dB)

arthurt · 2022-12-14T18:51:48Z

And it writes the padding byte if there aren't any trailing chunks as well

00000000  52 49 46 46 8a 00 00 00  57 41 56 45 66 6d 74 20  |RIFF....WAVEfmt |
00000010  10 00 00 00 01 00 01 00  44 ac 00 00 44 ac 00 00  |........D...D...|
00000020  01 00 08 00 64 61 74 61  65 00 00 00 80 80 80 80  |....datae.......|
00000030  80 80 80 80 80 80 80 80  80 80 80 80 80 80 80 80  |................|
*
00000090  80 00                                             |..|
00000092

arthurt · 2022-12-14T19:07:18Z

So I think the bug is just that the message complaining about odd length data chunks is wrong.

The message makes no sense, it is triggered when the data chunk reports that it contains an odd length of data, not when the data chunk is an odd length and has no padding byte. Containing an odd length of data is a legitimate condition.

It's kinda the question of, does the padding byte belong "in" the data chunk, or is the padding byte "between" the data chunk and what comes next.

The chunk parser cannot detect if the padding byte is missing, so it can't usefully report the message. In fact, if you modify the file and remove the padding byte, the chunk parser falls on it's face, because it expects the next chunk to begin at the next 16-bit boundary.

So, the padding byte, while specified, is also kinda implied from the fact that in RIFF files, chunks should always start on 16-bit boundaries.

Where chunkID is a FOURCC that identifies the data contained in the chunk, chunkSize is a 4-byte value giving the size of the data section of the chunk, and data is zero or more bytes of data. The data is always padded to the nearest WORD boundary. chunkSize gives the size of the valid data in the chunk. It does not include the padding, the size of chunkID, or the size of chunkSize.

From https://learn.microsoft.com/en-us/windows/win32/xaudio2/resource-interchange-file-format--riff- , emphasis added.

umlaeute · 2022-12-14T21:09:18Z

Except it's not a bug, we do currently write the padding byte.

ah indeed. i confused myself.

It's kinda the question of, does the padding byte belong "in" the data chunk, or is the padding byte "between" the data chunk and what comes next.

i think it is quite clear by now that it is between the chunks (and the 2nd chunk is permitted to not exist), at least form the RIFF perspecive (or i misunderstood your question).

otoh, i have no idea why the chose a term like WORD boundary which is ambiguous by nature. probably because they wrote this when they still worked on DOS.

arthurt · 2022-12-14T22:49:01Z

otoh, i have no idea why the chose a term like WORD boundary

It's MS. The terms "WORD" (and "DWORD", "QWORD") are dogma there. They made the choice to use that name to mean a 16-bit integer everywhere because when they were starting out they were a company purely focused on the 16-bit microcomputer market, hence "micro"-soft, not that inventive when you realize it.

In hindsight, bad choice, because as their success and the platform they become most associated with grew, other word sizes became a thing they had to deal with, but they are anchored to terms locked in a 16-bit processor world. (Also not helped by the fact that both pre-NT Windows and the x86 architecture "start" in 16-bit, then "add" 32-bit, if you follow me.)

I too chuckle when I see modern MS docs reference WORD in their specs, reminds me of my time playing with MS-DOS. It's what system level MS does. ;-)

i think it is quite clear by now that it is between the chunks (and the 2nd chunk is permitted to not exist), at least form the RIFF perspecive (or i misunderstood your question).

I think I was trying to imagine the intent of the original author of the error message code. He misinterpreted the pad byte in one instance but not the other 🤷‍♂️

umlaeute mentioned this issue Oct 19, 2022

[readsf~] bug: reading some files never stops pure-data/pure-data#1786

Closed

evpobr added the Bug Something isn't working label Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

odd number of frames with an odd number of bytes-per-frame behaves odd #887

odd number of frames with an odd number of bytes-per-frame behaves odd #887

umlaeute commented Oct 19, 2022 •

edited

Loading

data Chunk

umlaeute commented Oct 19, 2022 •

edited

Loading

evpobr commented Oct 20, 2022

arthurt commented Dec 13, 2022 •

edited

Loading

arthurt commented Dec 14, 2022 •

edited

Loading

arthurt commented Dec 14, 2022

umlaeute commented Dec 14, 2022

arthurt commented Dec 14, 2022 •

edited

Loading

arthurt commented Dec 14, 2022

arthurt commented Dec 14, 2022 •

edited

Loading

umlaeute commented Dec 14, 2022

arthurt commented Dec 14, 2022

odd number of frames with an odd number of bytes-per-frame behaves odd #887

odd number of frames with an odd number of bytes-per-frame behaves odd #887

Comments

umlaeute commented Oct 19, 2022 • edited Loading

data Chunk

umlaeute commented Oct 19, 2022 • edited Loading

evpobr commented Oct 20, 2022

arthurt commented Dec 13, 2022 • edited Loading

arthurt commented Dec 14, 2022 • edited Loading

arthurt commented Dec 14, 2022

umlaeute commented Dec 14, 2022

arthurt commented Dec 14, 2022 • edited Loading

arthurt commented Dec 14, 2022

arthurt commented Dec 14, 2022 • edited Loading

umlaeute commented Dec 14, 2022

arthurt commented Dec 14, 2022

umlaeute commented Oct 19, 2022 •

edited

Loading

umlaeute commented Oct 19, 2022 •

edited

Loading

arthurt commented Dec 13, 2022 •

edited

Loading

arthurt commented Dec 14, 2022 •

edited

Loading

arthurt commented Dec 14, 2022 •

edited

Loading

arthurt commented Dec 14, 2022 •

edited

Loading