New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: adding 24-bit support to io.wavfile #6852
base: main
Are you sure you want to change the base?
Conversation
scipy/io/wavfile.py
Outdated
result.append(cuelabels) | ||
if return_pitch: | ||
result.append(pitch) | ||
return tuple(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namedtuple would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Is the current preference to use a namedtuple
or a Bunch
for this sort of thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pv @perimosocordiae I think this solution would be cleaner: #6852 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of lumping the extra info into a metadata dict. The sample rate is already a kind of metadata that we can't add to the dict, and it adds an extra step for users to get at the information they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@perimosocordiae There is in fact other metadata that can be read. See https://web.archive.org/web/20141226210234/http://www.sonicspot.com/guide/wavefiles.html.
So if we implement the reading of a few more metadata, it will be annoying with , return_metadata1=True, return_metadata2=True, return_metadata3=True, ...
. A single metadata=True
is shorter.
It solves another problem: the result _metadata
dict
is filled when some chunks are available.
And the code is short, Pythonic:
result = [fs, data]
if metadata:
result.append(_metadata)
return tuple(result)
(BTW, is it possible to put this URL somewhere in the file as comment? It is really, really, really useful. Over years of audio programming, it's the more precise document I have found about WAV specifications.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example of use of the proposed API here.
After thinking about it, I suggest a cleaner API:
With this API,
should return:
I think it's far cleaner that the original dirty way I suggested here https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476. |
scipy/io/wavfile.py
Outdated
|
||
|
||
def write(filename, rate, data): | ||
def write(filename, rate, data, cues=None, loops=None, bitrate=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"bit_depth" would be more accurate. Bitrate usually refers to bits per second, while bit depth is bits per sample and channel.
Does read() return an int32 array for 24bit data? In that case read() should IMO also return the bit depth, so that the signal can be converted to a float in the range [-1.0; +1.0] and dBFS can be calculated. |
@matthiasha
I suggest to add a parameter
def read(...., normalized=False):
When normalized is True, it would return a float in [-1; +1] no matter the
bit depth, etc.
Very useful in practical applications, because it allows to load many
different files with read(...) without having to care about low-level
questions like "should I divide by 2**23 or 2**31 or 2**15?".
|
normalization is a somewhat separate question, and probably best
discussed in a separate issue.
|
@pv Sorry I probably chose a wrong word: I'm not speaking about 0dBfs normalization for every sound file, I'm just speaking about mapping
etc. I should find a better word than "normalize" that is already used for something else in audio. Any idea? |
We have a discussion before about normalization, but should probably open a separate issue about it. If it's absolutely necessary for 24-bit to be useful (?) then we should probably tackle that discussion and PR separately first. |
I've tested the wavfile.py provided by this PR (https://github.com/scipy/scipy/blob/020992a18b37a9c4991be19861d9871a9f94deb7/scipy/io/wavfile.py). To correctly represent 24bit data in an int32, we should use the 24 MSB, and have the 8 LSB equal zero (currently we use 24 LSB). This is because audio signal level in the digital domain is measured relative to full scale, which cannot be reached if we don't use the MSBs. I've attached 1000Hz_-10dBFS_24bit_48kHz.zip. This file contains a sine tone with a level of -10dBFS (the level of a sine is measured by its peak value). Now my test:
The resulting value should be -10. Because the 8 MSB are zero, the value is 6.02dB * 8 = 48.16dB lower. Note the normalization with 2**31 according the data type int32. By fixing this issue, the output of the new code would fit my usecase (mostly measuring signal levels). I also really like the approach proposed in #6852 (comment), returning all metadata as a dict. This dict could be step-by-step extended without breaking existing API. |
@matthiasha It seems that your test .wav file is non-standard. Here is the result with good old SoundForge 8 (that I use since years, and that is able to open nearly any kind of WAV): |
Sorry, I don't use SoundForge. Please try Adobe Audition or Audacity (which is free). |
From a music producer point of view, SoundForge has been the reference for
audio in many studios since the 90s, long before Audacity even existed, and
long before Adobe bought and rebranded Cool Edit Pro into "Audition".
I'm saying this to say that, if the good old swiss-knife SoundForge 8.0
cannot open this WAV file, this WAV file probably doesn't respect the early
WAV standard (
https://web.archive.org/web/20141226210234/http://www.sonicspot.com/guide/wavefiles.html),
and I wouldn't use this file as a test file. (This software can open any 8
bit, 16 bit, 24 bit, 32 bit, 32bit IEEE float WAV, big or little endian,
and all sorts of strange soundfiles.)
PS : you're right, I opened with Audacity and Audition, and it works, but
the fact SF cannot open it is a red flag (at least for me).
PS2 : can you give a few details about how you did produce the file?
|
I used sox (linux command line tool). Probably the file could be saved in
Audition and then opened in soundforge? If that works, please upload the
new version. You could also cut it, for this test a few milliseconds of
mono audio would actually be sufficient.
UPDATE: That's how I created it:
```
sox -D -r 48000 -b 24 -n left.wav synth 10 sine 1000.0 gain -10
sox -D -r 48000 -b 24 -n right.wav synth 10 sine 1000.0 gain -10
sox -D left.wav right.wav --combine merge 1000Hz_-10dBFS_24bit_48kHz.wav
```
|
The code in the proposed
Can be changed to:
Then:
|
Thanks a lot @matthiasha for the Even without any modification, you can already have the right -10 dB by using
I thought that having our 24 bit data represented as an int32 in |
Right, but I can use 2**23 only if I have information about the bit depth.
I'm in favour of
* The metadata-dict approach proposed above, which would also give the bit
depth
* Offering normalization to float (-1;+1), defaulting to false as has been
proposed before
I think the integer representation of 24bit PCM in the upper bits of int32
would be more appropriate. Consider writing it as int32 data back to disk:
with the current (LSB) approach, the file becomes 48dB too silent.
josephernest <notifications@github.com> schrieb am So., 25. Dez. 2016,
12:08:
… Thanks a lot @matthiasha <https://github.com/matthiasha> for the sox
commands, I'll try it just after holidays.
Even without any modification, you can already have the right -10 dB by
using / 2**23 instead of / 2**31 (I think it makes sense to divide by
2**23) :
print(20 * numpy.log10(float(a.max()) / 2**23))
-9.99999596545
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6852 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABwMC_1uGxg3OnGrGwvxv1FtJfBzQ5Hrks5rLk6UgaJpZM4LK6Ba>
.
|
@matthiasha I added the dictionary of metadata approach. Let me know what you think. |
We don't store numerical data here, but audio samples. What IMO is important to have consistent is the level (dB) of the signal, independent of the bit depth. @josephernest: the data is not 24bit anymore, but 32bit. If you want to get the 24-bit equivalent, you can right-shift it: I haven't had a look at the
If we go the other way and store 24bit in the LSB of |
I would suggest looking at what other (non-Python) audio file libraries
do with this.
|
I don't know what is the state of this pull request, but I made few enhancement on @josephernest code, there were some bugs (as far as I can see from his last gist https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476) here is my updated version if you are interested: https://github.com/X-Raym/wavfile.py Cheers ! |
Note: my version now also supports unsupported chunks, so that they can be rewritten to a new file, without losing any metadata (bext, list info etc). |
Is it going to be released with 1.2.0? |
@v-iashin No unfortunately this is still open |
@v-iashin @ilayn After having thought about it during months, I think the only way to do it if we want to keep backwards-compatibility with the
with
Who would like to do this? @perimosocordiae we could try to do it? |
Can this be broken up into smaller PRs to get them through? Since different features require different discussions and maybe some don't require any discussion? (I just want to be able to read markers, for instance.) |
+1 to break this up. And +1 for using the MSB for 24-bit samples returned as 32-bit ints, so they don't need to be distinguished from other 32-bit ints further down the line (and will not require adding the Note that instead of:
we can also start reading the samples one byte earlier and use stride tricks:
Not sure what's faster. The latter is harder to understand and will require a modification further above so that data[0] is the byte before the first 24-bit sample, to be used as the first throwaway LSB. |
(I modified my version of the wavfile.py repo to read markers and marker regions, by the way.) |
I was curious. The second option is faster.
That's for a hypothetical in-memory file of about 17 minutes at stereo, 48 kHz. The second option requires a continuous input array that starts one byte before the first sample, but that's not a problem for a .wav file. And the second option should be changed to use a dtype with specified endianness. |
FYI: There is a bug in this version (originally proposed by @matthiasha back in #6852 (comment)):
|
Hi, is there any chance for resolving this issue in the next release ? |
Here's my (messy unfinished) changes for extracting cue regions in case it's helpful to anyone: https://github.com/X-Raym/wavfile.py/compare/master...endolith:metadata_develop?expand=1 |
See #6849, based on the code at: https://gist.github.com/josephernest/3f22c5ed5dabf1815f16efa8fa53d476
This is a first pass at converting the changes made by @josephernest to a form more likely to get merged. It still needs tests and a review of the API changes, as well as documentation updates. That said, comments are welcome!