Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: improve dtype check in wavfile.write #18828

Merged
merged 6 commits into from Feb 29, 2024
Merged

Conversation

JuanFMontesinos
Copy link
Contributor

@JuanFMontesinos JuanFMontesinos commented Jul 5, 2023

What does this implement/fix?

scipy.io.wavfile.write has a wrong data type check. It checks a numpy array to be "float" or "integer" in a generic way. However float16 is not supported by any codec but accepted by the function, leading to a silent error.
Likewise, exotic data types such as float128, int128 and so on are neither supported.

Additional information

While it's complicated to find what dtypes are exactly supported, I rely on ffmpeg formats as they are a superset of scipy's ones.
int64 is also supported despite not appearing.
image
From your references
image

@tupui tupui added the enhancement A new feature or improvement label Jul 5, 2023
@tupui tupui changed the title BUG: wrong dtype check in wavfile.write ENH: improve dtype check in wavfile.write Jul 5, 2023
data.dtype.itemsize == 1)):
allowed_dtypes = ['float32', 'float64',
'uint8', 'int8', 'int16', 'int32', 'int64']
if data.dtype.name not in allowed_dtypes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the team wants a regression test I noticed that this fails on main and passes here:

--- a/scipy/io/tests/test_wavfile.py
+++ b/scipy/io/tests/test_wavfile.py
@@ -414,3 +414,15 @@ def test_write_roundtrip(realfile, mmap, rate, channels, dt_str, tmpdir):
         # in PyPy; since the filename gets reused in this test, clean this up
         break_cycles()
         break_cycles()
+
+
+@pytest.mark.parametrize("dtype", [
+    np.float16,
+    ])
+def test_wavfile_dtype_unsupported(tmpdir, dtype):
+    tmpfile = str(tmpdir.join('temp.wav'))
+    rng = np.random.default_rng(1234)
+    data = rng.random((100, 5)).astype(dtype)
+    rate = 8000
+    with pytest.raises(ValueError, match="Unsupported"):
+        wavfile.write(tmpfile, rate, data)

if not (dkind == 'i' or dkind == 'f' or (dkind == 'u' and
data.dtype.itemsize == 1)):
allowed_dtypes = ['float32', 'float64',
'uint8', 'int8', 'int16', 'int32', 'int64']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know int8 was already allowed in the old check, but I'm wondering about it specifically because in scipy/io/tests/test_wavfile.py above test_write_roundtrip I see signed 8-bit integer PCM is not allowed and a similar statement in the docs indicating that Note that 8-bit PCM is unsigned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't know the intrinsecs but the docs only show common data types, not all data types.
The second table is extracted from the references provided in the docs:
IBM Corporation and Microsoft Corporation, “Multimedia Programming Interface and Data Specifications 1.0”, section “Data Format of the Samples”, August 1991 http://www.tactilemedia.com/info/MCI_Control_Info.html

I need to recheck but I think I did a quick test and ffprobe was accepting the output for int8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
You are right, according to ffprobe info it seems it's processed as a uint8. Fixed at 33b8a51

@j-bowhay
Copy link
Member

@JuanFMontesinos could you add the suggested regression test, please?

@JuanFMontesinos
Copy link
Contributor Author

@j-bowhay

Done at 37f0d17! Although tests are failing due to an import error out of my control.

@lucascolley
Copy link
Member

Hey @JuanFMontesinos , would you mind re-triggering CI here? Pushing an empty commit will do it.

@JuanFMontesinos
Copy link
Contributor Author

Hey @JuanFMontesinos , would you mind re-triggering CI here? Pushing an empty commit will do it.

Done!

@lucascolley
Copy link
Member

Okay, the docs build should pass if you merge main into this branch. Then a maintainer can approve the GHA workflows.

@lucascolley lucascolley self-requested a review January 19, 2024 19:24
@lucascolley lucascolley added this to the 1.13.0 milestone Feb 29, 2024
Copy link
Member

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is happy, the change is simple enough and the regression test is there, so pulling this in. Thanks for your first contribution to SciPy @JuanFMontesinos ! And thanks for the test @tylerjereddy

@lucascolley lucascolley merged commit b815895 into scipy:main Feb 29, 2024
27 checks passed
@JuanFMontesinos
Copy link
Contributor Author

@lucascolley @tylerjereddy Thank you very much and thanks all the developers for such a nice work with SciPy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.io
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants