ENH: io: add 24-bit support to `wavfile` #6852

perimosocordiae · 2016-12-12T19:06:34Z

See #6849, based on the code at: https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476

This is a first pass at converting the changes made by @josephernest to a form more likely to get merged. It still needs tests and a review of the API changes, as well as documentation updates. That said, comments are welcome!

Based on the code at: https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476

pv · 2016-12-12T19:39:30Z

scipy/io/wavfile.py

+        result.append(cuelabels)
+    if return_pitch:
+        result.append(pitch)
+    return tuple(result)


Namedtuple would be better

I agree. Is the current preference to use a namedtuple or a Bunch for this sort of thing?

@pv @perimosocordiae I think this solution would be cleaner: #6852 (comment)

I'm not a big fan of lumping the extra info into a metadata dict. The sample rate is already a kind of metadata that we can't add to the dict, and it adds an extra step for users to get at the information they want.

@perimosocordiae There is in fact other metadata that can be read. See https://web.archive.org/web/20141226210234/http://www.sonicspot.com/guide/wavefiles.html.
So if we implement the reading of a few more metadata, it will be annoying with , return_metadata1=True, return_metadata2=True, return_metadata3=True, .... A single metadata=True is shorter.

It solves another problem: the result _metadata dict is filled when some chunks are available.
And the code is short, Pythonic:

result = [fs, data] if metadata: result.append(_metadata) return tuple(result)

(BTW, is it possible to put this URL somewhere in the file as comment? It is really, really, really useful. Over years of audio programming, it's the more precise document I have found about WAV specifications.)

@pv what do you think about this API ?

Added an example of use of the proposed API here.

josephernest · 2016-12-12T19:45:36Z

After thinking about it, I suggest a cleaner API:

    Parameters
    ----------
    filename : string or open file handle
        Input wav file.
    mmap : bool, optional
        Whether to read data as memory-mapped.
        Only to be used on real files (Default: False).
    metadata : bool, optional
        Whether to return a dictionary containing metadata such as
        loops, cue markers, cue marker labels, pitch, bitrate (Default: False)

    Returns
    -------
    rate : int
        Sample rate of wav file.
    data : numpy array
        Data read from wav file.  Data-type is determined from the file;
        see Notes.
    metadata : dictionary 
        Possible keys are 'loops', 'markers', 'markerlabels', 'pitch', 'bitrate'.

def read(filename, mmap=False, metadata=False):
    _metadata = dict()

    ....

    if ... 'smpl':
        ...
        _metadata['pitch'] = ...
        ...
        _metadata['loops'] = ...

    if ... 'cue ':
        ...
        _metadata['markers'] = ...
        ...

    if ... 'labl':
        ...
        _metadata['markerlabels'] = ...
        ...

    if metadata:
        result.append(_metadata)

With this API,

read('test24bit.wav', metadata=True)

should return:

(44100, 
 array([[  ...  ]]), 
 {'loops': [[..., ...]], 
  'markers': [..., ....], 
  'markerlabels': ['...', '...'], 
  'pitch': 440.0, 
  'bitrate': 24})

I think it's far cleaner that the original dirty way I suggested here https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476.

[ci skip]

matthiasha · 2016-12-20T13:35:27Z

scipy/io/wavfile.py



-def write(filename, rate, data):
+def write(filename, rate, data, cues=None, loops=None, bitrate=None):


"bit_depth" would be more accurate. Bitrate usually refers to bits per second, while bit depth is bits per sample and channel.

matthiasha · 2016-12-20T13:38:57Z

Does read() return an int32 array for 24bit data? In that case read() should IMO also return the bit depth, so that the signal can be converted to a float in the range [-1.0; +1.0] and dBFS can be calculated.

josephernest · 2016-12-20T13:45:04Z

@matthiasha I suggest to add a parameter def read(...., normalized=False): When normalized is True, it would return a float in [-1; +1] no matter the bit depth, etc. Very useful in practical applications, because it allows to load many different files with read(...) without having to care about low-level questions like "should I divide by 2**23 or 2**31 or 2**15?".

pv · 2016-12-20T14:22:23Z

normalization is a somewhat separate question, and probably best discussed in a separate issue.

josephernest · 2016-12-20T14:31:03Z

@pv Sorry I probably chose a wrong word: I'm not speaking about 0dBfs normalization for every sound file, I'm just speaking about mapping

[- INT32_MAX, INT32_MAX] to [-1.0, 1.0] for a 32 bit soundfile
[- INT16_MAX, INT16_MAX] to [-1.0, 1.0] for a 16 bit soundfile

etc.

I should find a better word than "normalize" that is already used for something else in audio. Any idea?

larsoner · 2016-12-20T14:35:01Z

We have a discussion before about normalization, but should probably open a separate issue about it. If it's absolutely necessary for 24-bit to be useful (?) then we should probably tackle that discussion and PR separately first.

matthiasha · 2016-12-24T09:19:20Z

I've tested the wavfile.py provided by this PR (https://github.com/scipy/scipy/blob/020992a18b37a9c4991be19861d9871a9f94deb7/scipy/io/wavfile.py). To correctly represent 24bit data in an int32, we should use the 24 MSB, and have the 8 LSB equal zero (currently we use 24 LSB).

This is because audio signal level in the digital domain is measured relative to full scale, which cannot be reached if we don't use the MSBs.

I've attached 1000Hz_-10dBFS_24bit_48kHz.zip.

This file contains a sine tone with a level of -10dBFS (the level of a sine is measured by its peak value). Now my test:

sr, a = wavfile.read('1000Hz_-10dBFS_24bit_48kHz.wav')
20 * numpy.log10(float(a.max()) / 2**31)
-58.164798546035584

The resulting value should be -10. Because the 8 MSB are zero, the value is 6.02dB * 8 = 48.16dB lower. Note the normalization with 2**31 according the data type int32.

By fixing this issue, the output of the new code would fit my usecase (mostly measuring signal levels). I also really like the approach proposed in #6852 (comment), returning all metadata as a dict. This dict could be step-by-step extended without breaking existing API.

josephernest · 2016-12-24T09:24:10Z

@matthiasha It seems that your test .wav file is non-standard. Here is the result with good old SoundForge 8 (that I use since years, and that is able to open nearly any kind of WAV):

matthiasha · 2016-12-24T10:56:59Z

Sorry, I don't use SoundForge. Please try Adobe Audition or Audacity (which is free).

josephernest · 2016-12-24T11:36:27Z

From a music producer point of view, SoundForge has been the reference for audio in many studios since the 90s, long before Audacity even existed, and long before Adobe bought and rebranded Cool Edit Pro into "Audition". I'm saying this to say that, if the good old swiss-knife SoundForge 8.0 cannot open this WAV file, this WAV file probably doesn't respect the early WAV standard ( https://web.archive.org/web/20141226210234/http://www.sonicspot.com/guide/wavefiles.html), and I wouldn't use this file as a test file. (This software can open any 8 bit, 16 bit, 24 bit, 32 bit, 32bit IEEE float WAV, big or little endian, and all sorts of strange soundfiles.) PS : you're right, I opened with Audacity and Audition, and it works, but the fact SF cannot open it is a red flag (at least for me). PS2 : can you give a few details about how you did produce the file?

matthiasha · 2016-12-24T12:43:12Z

I used sox (linux command line tool). Probably the file could be saved in Audition and then opened in soundforge? If that works, please upload the new version. You could also cut it, for this test a few milliseconds of mono audio would actually be sufficient. UPDATE: That's how I created it: ``` sox -D -r 48000 -b 24 -n left.wav synth 10 sine 1000.0 gain -10 sox -D -r 48000 -b 24 -n right.wav synth 10 sine 1000.0 gain -10 sox -D left.wav right.wav --combine merge 1000Hz_-10dBFS_24bit_48kHz.wav ```

matthiasha · 2016-12-25T08:21:57Z

The code in the proposed wavfile.py:

    if bit_depth == 24:
        a = numpy.empty((len(data)//3, 4), dtype='u1')
        a[:, :3] = data.reshape((-1, 3))
        a[:, 3:] = (a[:, 3 - 1:3] >> 7) * 255
        data = a.view('<i4').reshape(a.shape[:-1])

Can be changed to:

    if bit_depth == 24:
        a = numpy.empty((len(data)//3, 4), dtype='u1')
        a[:, 1:4] = data.reshape((-1, 3))
        data = a.view('<i4').reshape(a.shape[:-1])

Then:

sr, a = wavfile.read('1000Hz_-10dBFS_24bit_48kHz.wav')
assert abs(20 * numpy.log10(float(a.max()) / 2**31) + 10) < 0.1
print(20 * numpy.log10(float(a.max()) / 2**31))
-9.9999992398

josephernest · 2016-12-25T11:08:01Z

Thanks a lot @matthiasha for the sox commands, I'll try it just after holidays.

Even without any modification, you can already have the right -10 dB by using / 2**23 instead of / 2**31 (I think it makes sense to divide by 2**23) :

print(20 * numpy.log10(float(a.max()) / 2**23))
-9.99999596545

To correctly represent 24bit data in an int32, we should use the 24 MSB, and have the 8 LSB equal zero (currently we use 24 LSB).

I thought that having our 24 bit data represented as an int32 in [-2**23 ; 2**23-1] makes sense, don't you think so?
I'm not sure to understand well, but you would prefer to represent 24 bit data as an int32 in {k \in [-2**31 ; 2**31-1], such that k = 256 * m } ? (if 8 LSB = 0)

matthiasha · 2016-12-25T15:24:39Z

Right, but I can use 2**23 only if I have information about the bit depth. I'm in favour of * The metadata-dict approach proposed above, which would also give the bit depth * Offering normalization to float (-1;+1), defaulting to false as has been proposed before I think the integer representation of 24bit PCM in the upper bits of int32 would be more appropriate. Consider writing it as int32 data back to disk: with the current (LSB) approach, the file becomes 48dB too silent. josephernest <notifications@github.com> schrieb am So., 25. Dez. 2016, 12:08:

…

Thanks a lot @matthiasha <https://github.com/matthiasha> for the sox commands, I'll try it just after holidays. Even without any modification, you can already have the right -10 dB by using / 2**23 instead of / 2**31 (I think it makes sense to divide by 2**23) : print(20 * numpy.log10(float(a.max()) / 2**23)) -9.99999596545 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6852 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABwMC_1uGxg3OnGrGwvxv1FtJfBzQ5Hrks5rLk6UgaJpZM4LK6Ba> .

perimosocordiae · 2017-01-05T21:40:15Z

@matthiasha I added the dictionary of metadata approach. Let me know what you think.

matthiasha · 2017-01-22T20:44:10Z

We don't store numerical data here, but audio samples. What IMO is important to have consistent is the level (dB) of the signal, independent of the bit depth.

@josephernest: the data is not 24bit anymore, but 32bit. If you want to get the 24-bit equivalent, you can right-shift it: 873741568 >> 8 = 3413053. If you use your wave file editor and change the bit depth from 24 to 32 bits, the sample value will change from 3413053 to 873741568.

I haven't had a look at the write function yet. What I think it should do:

numpy datatype has more bits than output bit depth: right-shift the data (keep MSB, truncate LSB)
numpy data type has less bits than output bit depth: left-shift the data

If we go the other way and store 24bit in the LSB of int32, the write function in contrast needs to be designed so that it writes the LSB. With this approach, if we have some real 32bit data and choose to write it as 24bit, the 8MSB will get truncated and thus destroy the audio.

pv · 2017-01-22T21:18:21Z

I would suggest looking at what other (non-Python) audio file libraries do with this.

X-Raym · 2018-06-08T13:52:26Z

I don't know what is the state of this pull request, but I made few enhancement on @josephernest code, there were some bugs (as far as I can see from his last gist https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476)

here is my updated version if you are interested: https://github.com/X-Raym/wavfile.py

Cheers !

X-Raym · 2018-06-10T12:17:11Z

Note: my version now also supports unsupported chunks, so that they can be rewritten to a new file, without losing any metadata (bext, list info etc).

v-iashin · 2018-12-15T09:18:16Z

Is it going to be released with 1.2.0?

ilayn · 2018-12-15T09:45:13Z

@v-iashin No unfortunately this is still open

josephernest · 2018-12-19T11:42:32Z

@v-iashin @ilayn After having thought about it during months, I think the only way to do it if we want to keep backwards-compatibility with the scipy.io.wavfile.read API is to have a metadata parameter, defaulting to False. Example:

sr, x = read('test.wav')  # like it has always been before

sr, x, md = read('test.wav', metadata=True)

with md being a dict:

{'loops': [[..., ...]], 
 'markers': [..., ....], 
 'markerlabels': ['...', '...'], 
 'pitch': 440.0, 
 'bitrate': 24})

Who would like to do this? @perimosocordiae we could try to do it?

endolith · 2019-05-02T17:03:19Z

Can this be broken up into smaller PRs to get them through? Since different features require different discussions and maybe some don't require any discussion? (I just want to be able to read markers, for instance.)

f0k · 2019-07-08T15:38:03Z

Can this be broken up into smaller PRs to get them through?

+1 to break this up. And +1 for using the MSB for 24-bit samples returned as 32-bit ints, so they don't need to be distinguished from other 32-bit ints further down the line (and will not require adding the meta dictionary right away).

Note that instead of:

    if bit_depth == 24:
        a = numpy.empty((len(data)//3, 4), dtype='u1')
        a[:, 1:4] = data.reshape((-1, 3))
        data = a.view('<i4').reshape(a.shape[:-1])

we can also start reading the samples one byte earlier and use stride tricks:

    if bit_depth == 24:
        # view data as int32 with one byte of overlap between samples
        a = np.lib.stride_tricks.as_strided(
                data[:0].view(np.int32),
                shape=(len(data) // 3,),
                strides=(3,))
        # mask out the LSB
        data = a & np.int32(0xffffff00)

Not sure what's faster. The latter is harder to understand and will require a modification further above so that data[0] is the byte before the first 24-bit sample, to be used as the first throwaway LSB.

endolith · 2019-07-08T16:38:29Z

(I modified my version of the wavfile.py repo to read markers and marker regions, by the way.)

f0k · 2019-07-09T13:39:04Z

Not sure what's faster.

I was curious. The second option is faster.

In [1]: import numpy as np

In [2]: data = np.random.randint(255, size=int(3e8) + 1, dtype=np.uint8)

In [3]: def unpack_int24a(data):
   ...:     a = np.empty((len(data) // 3, 4), dtype='u1')
   ...:     a[:, 1:4] = data.reshape((-1, 3))
   ...:     return a.view('<i4').reshape(a.shape[:-1])
   ...: 

In [4]: def unpack_int24b(data):
   ...:     a = np.lib.stride_tricks.as_strided(
   ...:         data[:0].view(np.int32),
   ...:         shape=(len(data) // 3,),
   ...:         strides=(3,))
   ...:     return a & np.int32(0xffffff00)
   ...: 

In [5]: np.allclose(unpack_int24a(data[1:]), unpack_int24b(data))
Out[5]: True

In [6]: %timeit a = unpack_int24a(data[1:])
1 loop, best of 3: 561 ms per loop

In [7]: %timeit b = unpack_int24b(data[1:])
1 loop, best of 3: 195 ms per loop

That's for a hypothetical in-memory file of about 17 minutes at stereo, 48 kHz. The second option requires a continuous input array that starts one byte before the first sample, but that's not a problem for a .wav file. And the second option should be changed to use a dtype with specified endianness.

WarrenWeckesser · 2019-07-09T15:26:49Z

FYI: There is a bug in this version (originally proposed by @matthiasha back in #6852 (comment)):

def unpack_int24a(data):
    a = np.empty((len(data) // 3, 4), dtype='u1')
    a[:, 1:4] = data.reshape((-1, 3))
    return a.view('<i4').reshape(a.shape[:-1])

np.empty should be changed to np.zeros.

numpy.empty does not initialize the memory that it allocates, so the values in a are indeterminate. See how the return value of unpack_int24a(data) changes on each call:

In [338]: data                                                                                                                           
Out[338]: 
array([  1,   0,   0, 255, 255, 127, 255, 255, 255,   4,   0,   0],
      dtype=uint8)

In [339]: unpack_int24a(data)                                                                                                            
Out[339]: array([       261, 2147483618,       -256,       1024], dtype=int32)

In [340]: unpack_int24a(data)                                                                                                            
Out[340]: array([       256, 2147483392,       -256,       1024], dtype=int32)

In [341]: unpack_int24a(data)                                                                                                            
Out[341]: array([       366, 2147483508,       -206,       1076], dtype=int32)

In [342]: unpack_int24a(data)                                                                                                            
Out[342]: array([       261, 2147483618,       -256,       1024], dtype=int32)

In [343]: unpack_int24a(data)                                                                                                            
Out[343]: array([       366, 2147483508,       -206,       1076], dtype=int32)

adri123 · 2019-10-26T08:08:56Z

Hi, is there any chance for resolving this issue in the next release ?
(Thanks for your Work)

endolith · 2019-10-26T19:43:28Z

Here's my (messy unfinished) changes for extracting cue regions in case it's helpful to anyone: https://github.com/X-Raym/wavfile.py/compare/master...endolith:metadata_develop?expand=1

WIP: adding 24-bit support to io.wavfile

a70a600

Based on the code at: https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476

perimosocordiae mentioned this pull request Dec 12, 2016

A few useful additions to wavfile.py (24 bit read/write support, cue markers support, loop makers support, etc.) #6849

Open

pv reviewed Dec 12, 2016

View reviewed changes

perimosocordiae added 4 commits December 12, 2016 15:36

PEP8 fixes, merging cues w/ cuelabels

b50e3b3

Add link to reference page

c82cd81

[ci skip]

Fixing signedness of pitch fraction

350838b

[ci skip]

Adding 24bit test fixture, fixing bugs

020992a

matthiasha reviewed Dec 20, 2016

View reviewed changes

perimosocordiae added 2 commits January 5, 2017 14:19

Simplify metadata return, use MSB for 24bit read

9dba3f3

Also return bit_depth in metadata

205ab3e

rgommers added enhancement A new feature or improvement scipy.io labels Jan 6, 2017

perimosocordiae mentioned this pull request Mar 31, 2017

Include bit depth in returns for scipy.io.wavfile.read() #7244

Open

NickleDave mentioned this pull request Dec 1, 2017

support for 24-bit wav files vocalpy/hybrid-vocal-classifier#44

Open

endolith mentioned this pull request Aug 6, 2019

scipy.io.wavfile should be able to read 24 bit signed wave (Trac #1405) #1930

Closed

endolith mentioned this pull request May 24, 2020

BUG: wavfile bugfixes and maintenance #12208

Merged

endolith mentioned this pull request Jun 1, 2020

ENH: Read arbitrary bit depth (including 24-bit) WAVs #12287

Merged

Ichunjo mentioned this pull request Oct 14, 2021

Add audio_async_render Jaded-Encoding-Thaumaturgy/lvsfunc#79

Closed

lucascolley marked this pull request as draft March 14, 2024 21:59

lucascolley changed the title ~~WIP: adding 24-bit support to io.wavfile~~ ENH: io: add 24-bit support to wavfile Mar 14, 2024



		def write(filename, rate, data):
		def write(filename, rate, data, cues=None, loops=None, bitrate=None):

ENH: io: add 24-bit support to wavfile #6852

Are you sure you want to change the base?

ENH: io: add 24-bit support to wavfile #6852

Conversation

perimosocordiae commented Dec 12, 2016

pv Dec 12, 2016

Choose a reason for hiding this comment

perimosocordiae Dec 12, 2016

Choose a reason for hiding this comment

josephernest Dec 12, 2016

Choose a reason for hiding this comment

perimosocordiae Dec 12, 2016

Choose a reason for hiding this comment

josephernest Dec 12, 2016 • edited

Choose a reason for hiding this comment

josephernest Dec 12, 2016

Choose a reason for hiding this comment

josephernest commented Dec 12, 2016 • edited

matthiasha Dec 20, 2016

Choose a reason for hiding this comment

matthiasha commented Dec 20, 2016

josephernest commented Dec 20, 2016 via email • edited

pv commented Dec 20, 2016 via email

josephernest commented Dec 20, 2016

larsoner commented Dec 20, 2016

matthiasha commented Dec 24, 2016 • edited

josephernest commented Dec 24, 2016

matthiasha commented Dec 24, 2016

josephernest commented Dec 24, 2016 via email • edited

matthiasha commented Dec 24, 2016 via email • edited

matthiasha commented Dec 25, 2016

josephernest commented Dec 25, 2016 • edited

matthiasha commented Dec 25, 2016 via email

perimosocordiae commented Jan 5, 2017

matthiasha commented Jan 22, 2017

pv commented Jan 22, 2017 via email

X-Raym commented Jun 8, 2018 • edited

X-Raym commented Jun 10, 2018

v-iashin commented Dec 15, 2018

ilayn commented Dec 15, 2018

josephernest commented Dec 19, 2018 • edited

endolith commented May 2, 2019 • edited

f0k commented Jul 8, 2019

endolith commented Jul 8, 2019 • edited

f0k commented Jul 9, 2019

WarrenWeckesser commented Jul 9, 2019

adri123 commented Oct 26, 2019

endolith commented Oct 26, 2019

ENH: io: add 24-bit support to `wavfile` #6852

ENH: io: add 24-bit support to `wavfile` #6852

josephernest Dec 12, 2016 •

edited

josephernest commented Dec 12, 2016 •

edited

josephernest commented Dec 20, 2016 via email •

edited

matthiasha commented Dec 24, 2016 •

edited

josephernest commented Dec 24, 2016 via email •

edited

matthiasha commented Dec 24, 2016 via email •

edited

josephernest commented Dec 25, 2016 •

edited

X-Raym commented Jun 8, 2018 •

edited

josephernest commented Dec 19, 2018 •

edited

endolith commented May 2, 2019 •

edited

endolith commented Jul 8, 2019 •

edited