New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: improve dtype check in wavfile.write #18828
Conversation
data.dtype.itemsize == 1)): | ||
allowed_dtypes = ['float32', 'float64', | ||
'uint8', 'int8', 'int16', 'int32', 'int64'] | ||
if data.dtype.name not in allowed_dtypes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case the team wants a regression test I noticed that this fails on main
and passes here:
--- a/scipy/io/tests/test_wavfile.py
+++ b/scipy/io/tests/test_wavfile.py
@@ -414,3 +414,15 @@ def test_write_roundtrip(realfile, mmap, rate, channels, dt_str, tmpdir):
# in PyPy; since the filename gets reused in this test, clean this up
break_cycles()
break_cycles()
+
+
+@pytest.mark.parametrize("dtype", [
+ np.float16,
+ ])
+def test_wavfile_dtype_unsupported(tmpdir, dtype):
+ tmpfile = str(tmpdir.join('temp.wav'))
+ rng = np.random.default_rng(1234)
+ data = rng.random((100, 5)).astype(dtype)
+ rate = 8000
+ with pytest.raises(ValueError, match="Unsupported"):
+ wavfile.write(tmpfile, rate, data)
scipy/io/wavfile.py
Outdated
if not (dkind == 'i' or dkind == 'f' or (dkind == 'u' and | ||
data.dtype.itemsize == 1)): | ||
allowed_dtypes = ['float32', 'float64', | ||
'uint8', 'int8', 'int16', 'int32', 'int64'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know int8
was already allowed in the old check, but I'm wondering about it specifically because in scipy/io/tests/test_wavfile.py
above test_write_roundtrip
I see signed 8-bit integer PCM is not allowed
and a similar statement in the docs indicating that Note that 8-bit PCM is unsigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I don't know the intrinsecs but the docs only show common data types, not all data types.
The second table is extracted from the references provided in the docs:
IBM Corporation and Microsoft Corporation, “Multimedia Programming Interface and Data Specifications 1.0”, section “Data Format of the Samples”, August 1991 http://www.tactilemedia.com/info/MCI_Control_Info.html
I need to recheck but I think I did a quick test and ffprobe was accepting the output for int8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, according to ffprobe info it seems it's processed as a uint8. Fixed at 33b8a51
@JuanFMontesinos could you add the suggested regression test, please? |
Hey @JuanFMontesinos , would you mind re-triggering CI here? Pushing an empty commit will do it. |
Done! |
Okay, the docs build should pass if you merge main into this branch. Then a maintainer can approve the GHA workflows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is happy, the change is simple enough and the regression test is there, so pulling this in. Thanks for your first contribution to SciPy @JuanFMontesinos ! And thanks for the test @tylerjereddy
@lucascolley @tylerjereddy Thank you very much and thanks all the developers for such a nice work with SciPy. |
What does this implement/fix?
scipy.io.wavfile.write
has a wrong data type check. It checks a numpy array to be "float" or "integer" in a generic way. However float16 is not supported by any codec but accepted by the function, leading to a silent error.Likewise, exotic data types such as float128, int128 and so on are neither supported.
Additional information
While it's complicated to find what dtypes are exactly supported, I rely on ffmpeg formats as they are a superset of scipy's ones.
int64 is also supported despite not appearing.
From your references