New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: adding 24-bit support to io.wavfile #6852

Open
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@perimosocordiae
Member

perimosocordiae commented Dec 12, 2016

See #6849, based on the code at: https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476

This is a first pass at converting the changes made by @josephernest to a form more likely to get merged. It still needs tests and a review of the API changes, as well as documentation updates. That said, comments are welcome!

result.append(cuelabels)
if return_pitch:
result.append(pitch)
return tuple(result)

This comment has been minimized.

@pv

pv Dec 12, 2016

Member

Namedtuple would be better

This comment has been minimized.

@perimosocordiae

perimosocordiae Dec 12, 2016

Member

I agree. Is the current preference to use a namedtuple or a Bunch for this sort of thing?

This comment has been minimized.

@josephernest

josephernest Dec 12, 2016

@pv @perimosocordiae I think this solution would be cleaner: #6852 (comment)

This comment has been minimized.

@perimosocordiae

perimosocordiae Dec 12, 2016

Member

I'm not a big fan of lumping the extra info into a metadata dict. The sample rate is already a kind of metadata that we can't add to the dict, and it adds an extra step for users to get at the information they want.

This comment has been minimized.

@josephernest

josephernest Dec 12, 2016

@perimosocordiae There is in fact other metadata that can be read. See https://web.archive.org/web/20141226210234/http://www.sonicspot.com/guide/wavefiles.html.
So if we implement the reading of a few more metadata, it will be annoying with , return_metadata1=True, return_metadata2=True, return_metadata3=True, .... A single metadata=True is shorter.

It solves another problem: the result _metadata dict is filled when some chunks are available.
And the code is short, Pythonic:

result = [fs, data]
if metadata:
    result.append(_metadata)
return tuple(result)

(BTW, is it possible to put this URL somewhere in the file as comment? It is really, really, really useful. Over years of audio programming, it's the more precise document I have found about WAV specifications.)

@pv what do you think about this API ?

This comment has been minimized.

@josephernest

josephernest Dec 12, 2016

Added an example of use of the proposed API here.

@josephernest

This comment has been minimized.

josephernest commented Dec 12, 2016

After thinking about it, I suggest a cleaner API:

    Parameters
    ----------
    filename : string or open file handle
        Input wav file.
    mmap : bool, optional
        Whether to read data as memory-mapped.
        Only to be used on real files (Default: False).
    metadata : bool, optional
        Whether to return a dictionary containing metadata such as
        loops, cue markers, cue marker labels, pitch, bitrate (Default: False)

    Returns
    -------
    rate : int
        Sample rate of wav file.
    data : numpy array
        Data read from wav file.  Data-type is determined from the file;
        see Notes.
    metadata : dictionary 
        Possible keys are 'loops', 'markers', 'markerlabels', 'pitch', 'bitrate'.

def read(filename, mmap=False, metadata=False):
    _metadata = dict()

    ....

    if ... 'smpl':
        ...
        _metadata['pitch'] = ...
        ...
        _metadata['loops'] = ...

    if ... 'cue ':
        ...
        _metadata['markers'] = ...
        ...

    if ... 'labl':
        ...
        _metadata['markerlabels'] = ...
        ...

    if metadata:
        result.append(_metadata)

With this API,

read('test24bit.wav', metadata=True)

should return:

(44100, 
 array([[  ...  ]]), 
 {'loops': [[..., ...]], 
  'markers': [..., ....], 
  'markerlabels': ['...', '...'], 
  'pitch': 440.0, 
  'bitrate': 24})

I think it's far cleaner that the original dirty way I suggested here https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476.

@josephernest

This comment has been minimized.

josephernest commented on scipy/io/wavfile.py in b50e3b3 Dec 12, 2016

No, as far as I remember, I wrote <iI on purpose, probably <ii didn't work (see https://docs.python.org/2/library/struct.html#format-characters)

This comment has been minimized.

Member

perimosocordiae replied Dec 12, 2016

Oops, good catch!

@josephernest

This comment has been minimized.

josephernest commented on scipy/io/tests/test_wavfile.py in 020992a Dec 12, 2016

Why such a syntax for cues? {1: 4410, 2: 8820} would be enough, or even probably [4410, 8820].
And cuelabels = {1: 'Marker1', 2: 'Marker2'} would be fine.

Or do you want this @perimosocordiae

cues = {1: {'pos': 4410, 'label': 'Marker1'}, ... } 

If we authorize nested dicts like this here, we could also use a metadata dict that would be cleaner, as I suggested at the end of #6852 (comment) ;)

The problem for me is application. Let's say we use

cues = {1: {'pos': 4410, 'label': 'Marker1'}, ... } 

Example: how to get a list of markers ordered by time and their labels?
It would probably be quite headachy code.

This comment has been minimized.

Member

perimosocordiae replied Dec 13, 2016

I'm not a big fan of this format either, and I agree that ease of use is an important consideration. I'm leaning toward something like

cues = [(4410, 'Marker1'), (8820, 'Marker2')]

where each tuple is actually a namedtuple with pos and label fields. Mostly, I want to try to keep the positions and labels matched up and don't care too much about exposing the internal cue ID numbers.

def write(filename, rate, data):
def write(filename, rate, data, cues=None, loops=None, bitrate=None):

This comment has been minimized.

@matthiasha

matthiasha Dec 20, 2016

"bit_depth" would be more accurate. Bitrate usually refers to bits per second, while bit depth is bits per sample and channel.

@matthiasha

This comment has been minimized.

matthiasha commented Dec 20, 2016

Does read() return an int32 array for 24bit data? In that case read() should IMO also return the bit depth, so that the signal can be converted to a float in the range [-1.0; +1.0] and dBFS can be calculated.

@josephernest

This comment has been minimized.

josephernest commented Dec 20, 2016

@pv

This comment has been minimized.

Member

pv commented Dec 20, 2016

@josephernest

This comment has been minimized.

josephernest commented Dec 20, 2016

@pv Sorry I probably chose a wrong word: I'm not speaking about 0dBfs normalization for every sound file, I'm just speaking about mapping

  • [- INT32_MAX, INT32_MAX] to [-1.0, 1.0] for a 32 bit soundfile

  • [- INT16_MAX, INT16_MAX] to [-1.0, 1.0] for a 16 bit soundfile

etc.

I should find a better word than "normalize" that is already used for something else in audio. Any idea?

@larsoner

This comment has been minimized.

Member

larsoner commented Dec 20, 2016

We have a discussion before about normalization, but should probably open a separate issue about it. If it's absolutely necessary for 24-bit to be useful (?) then we should probably tackle that discussion and PR separately first.

@matthiasha

This comment has been minimized.

matthiasha commented Dec 24, 2016

I've tested the wavfile.py provided by this PR (https://github.com/scipy/scipy/blob/020992a18b37a9c4991be19861d9871a9f94deb7/scipy/io/wavfile.py). To correctly represent 24bit data in an int32, we should use the 24 MSB, and have the 8 LSB equal zero (currently we use 24 LSB).

This is because audio signal level in the digital domain is measured relative to full scale, which cannot be reached if we don't use the MSBs.

I've attached 1000Hz_-10dBFS_24bit_48kHz.zip.

This file contains a sine tone with a level of -10dBFS (the level of a sine is measured by its peak value). Now my test:

sr, a = wavfile.read('1000Hz_-10dBFS_24bit_48kHz.wav')
20 * numpy.log10(float(a.max()) / 2**31)
-58.164798546035584

The resulting value should be -10. Because the 8 MSB are zero, the value is 6.02dB * 8 = 48.16dB lower. Note the normalization with 2**31 according the data type int32.

By fixing this issue, the output of the new code would fit my usecase (mostly measuring signal levels). I also really like the approach proposed in #6852 (comment), returning all metadata as a dict. This dict could be step-by-step extended without breaking existing API.

@josephernest

This comment has been minimized.

josephernest commented Dec 24, 2016

@matthiasha It seems that your test .wav file is non-standard. Here is the result with good old SoundForge 8 (that I use since years, and that is able to open nearly any kind of WAV):

@matthiasha

This comment has been minimized.

matthiasha commented Dec 24, 2016

Sorry, I don't use SoundForge. Please try Adobe Audition or Audacity (which is free).

@josephernest

This comment has been minimized.

josephernest commented Dec 24, 2016

@matthiasha

This comment has been minimized.

matthiasha commented Dec 24, 2016

@matthiasha

This comment has been minimized.

matthiasha commented Dec 25, 2016

The code in the proposed wavfile.py:

    if bit_depth == 24:
        a = numpy.empty((len(data)//3, 4), dtype='u1')
        a[:, :3] = data.reshape((-1, 3))
        a[:, 3:] = (a[:, 3 - 1:3] >> 7) * 255
        data = a.view('<i4').reshape(a.shape[:-1])

Can be changed to:

    if bit_depth == 24:
        a = numpy.empty((len(data)//3, 4), dtype='u1')
        a[:, 1:4] = data.reshape((-1, 3))
        data = a.view('<i4').reshape(a.shape[:-1])

Then:

sr, a = wavfile.read('1000Hz_-10dBFS_24bit_48kHz.wav')
assert abs(20 * numpy.log10(float(a.max()) / 2**31) + 10) < 0.1
print(20 * numpy.log10(float(a.max()) / 2**31))
-9.9999992398
@josephernest

This comment has been minimized.

josephernest commented Dec 25, 2016

Thanks a lot @matthiasha for the sox commands, I'll try it just after holidays.

Even without any modification, you can already have the right -10 dB by using / 2**23 instead of / 2**31 (I think it makes sense to divide by 2**23) :

print(20 * numpy.log10(float(a.max()) / 2**23))
-9.99999596545

To correctly represent 24bit data in an int32, we should use the 24 MSB, and have the 8 LSB equal zero (currently we use 24 LSB).

I thought that having our 24 bit data represented as an int32 in [-2**23 ; 2**23-1] makes sense, don't you think so?
I'm not sure to understand well, but you would prefer to represent 24 bit data as an int32 in {k \in [-2**31 ; 2**31-1], such that k = 256 * m } ? (if 8 LSB = 0)

@matthiasha

This comment has been minimized.

matthiasha commented Dec 25, 2016

@josephernest

This comment has been minimized.

josephernest commented on 9dba3f3 Jan 5, 2017

@perimosocordiae @pv @matthiasha @Eric89GXL I personally think it makes no sense to read 24 bit with MSB.

If you open a wave file editor, and you look at a precise sample, you'll find the value 3 413 053.
With this MSB version of read(...), you would get x[i] = 873741568 which makes absolutely no sense.

It will break the meaning of what x[i] is : a 24-bit int.

What is the advantage of MSB to justify this loss of coherence between the actual sample values and the values read by read(...)?

This comment has been minimized.

Member

perimosocordiae replied Jan 5, 2017

I personally don't have a preference, so I'll defer to whatever the consensus settles on.

@perimosocordiae

This comment has been minimized.

Member

perimosocordiae commented Jan 5, 2017

@matthiasha I added the dictionary of metadata approach. Let me know what you think.

@matthiasha

This comment has been minimized.

matthiasha commented Jan 22, 2017

We don't store numerical data here, but audio samples. What IMO is important to have consistent is the level (dB) of the signal, independent of the bit depth.

@josephernest: the data is not 24bit anymore, but 32bit. If you want to get the 24-bit equivalent, you can right-shift it: 873741568 >> 8 = 3413053. If you use your wave file editor and change the bit depth from 24 to 32 bits, the sample value will change from 3413053 to 873741568.

I haven't had a look at the write function yet. What I think it should do:

  • numpy datatype has more bits than output bit depth: right-shift the data (keep MSB, truncate LSB)
  • numpy data type has less bits than output bit depth: left-shift the data

If we go the other way and store 24bit in the LSB of int32, the write function in contrast needs to be designed so that it writes the LSB. With this approach, if we have some real 32bit data and choose to write it as 24bit, the 8MSB will get truncated and thus destroy the audio.

@pv

This comment has been minimized.

Member

pv commented Jan 22, 2017

@X-Raym

This comment has been minimized.

X-Raym commented Jun 8, 2018

I don't know what is the state of this pull request, but I made few enhancement on @josephernest code, there were some bugs (as far as I can see from his last gist https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476)

here is my updated version if you are interested: https://github.com/X-Raym/wavfile.py

Cheers !

@X-Raym

This comment has been minimized.

X-Raym commented Jun 10, 2018

Note: my version now also supports unsupported chunks, so that they can be rewritten to a new file, without losing any metadata (bext, list info etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment