Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read wavfiles of size > 4GB #8529

Closed
wants to merge 6 commits into from
Closed

Conversation

TimFelixBeyer
Copy link
Contributor

Adds the ability to open a .wav file with the 'RF64' format for files with a file size > 4GB, I'm not sure how to write a unit test for this, since it'd require uploading a huge .wav file.

@larsoner
Copy link
Member

larsoner commented Mar 7, 2018

Would it be a lot of extra work to add the equivalent writer? If not, then a round-trip test could work.

@pv
Copy link
Member

pv commented Mar 7, 2018 via email

@TimFelixBeyer
Copy link
Contributor Author

I've added the corresponding writer, however I'm still not sure how to go about implementing the test, since the format only really makes sense for data > 4GB, so the writer should only use the format if the data is too large for RIFF.

@rgommers rgommers added enhancement A new feature or improvement scipy.io labels Mar 17, 2018
@larsoner
Copy link
Member

the format only really makes sense for data > 4GB, so the writer should only use the format if the data is too large for RIFF.

Although this might be true in practice, can you opt in to using the RF64 format even for files < 4 GB (even if just for the purpose of testing round-trip IO)?

@TimFelixBeyer
Copy link
Contributor Author

According to the official specs, the standard filesize-locations should be used if the file is smaller than 4GB: https://tech.ebu.ch/docs/tech/tech3306-2009.pdf
It‘d be possible to force the RF64 format with a smaller file but then a large part (writing/reading the new 64bit-size-fields) of the changes wouldn’t be tested.
Alternatively one could disregard the specs, and write the filesize into the 64Bit fields even if it’s < 4GB. Let me know which of these options is the best and I’ll try to implement it

@larsoner
Copy link
Member

According to the official specs, the standard filesize-locations should be used if the file is smaller than 4GB

This makes sense, and seems like reasonable behavior for SciPy to follow.

Alternatively one could disregard the specs, and write the filesize into the 64Bit fields even if it’s < 4GB.

This is what I propose to do for testing purposes only, yes. Especially if there is some way to do it that does not expose the option to the user (since it's non-standard). Maybe a private function that does the heavy lifting could be used for this purpose.

@777arc
Copy link

777arc commented Dec 2, 2023

This would be a valuable addition!!

@lucascolley
Copy link
Member

Alternatively one could disregard the specs, and write the filesize into the 64Bit fields even if it’s < 4GB.

This is what I propose to do for testing purposes only, yes. Especially if there is some way to do it that does not expose the option to the user (since it's non-standard). Maybe a private function that does the heavy lifting could be used for this purpose.

@TimFelixBeyer are you interested in returning to do this? Sounds like it's all that is needed to test this properly and get this in. Judging by the recent comment, there is still some demand for this feature.

@lucascolley lucascolley added the needs-work Items that are pending response from the author label Jan 16, 2024
@TimFelixBeyer
Copy link
Contributor Author

Sure, I’m currently busy but can take another look in about 3 weeks if that’s ok.

@TimFelixBeyer
Copy link
Contributor Author

@lucascolley I'd like some input regarding testing.
In addition to testing with small modified RF64 files, I have a working test on my local machine that just performs a round-trip write of a large (>4GB) array, but it obviously takes quite a while and requires a large amount of memory to complete.
However, I think a round-trip test would be valuable in principle.
Are there any guidelines regarding how long test cases are allowed to last?

@lucascolley
Copy link
Member

We have a marker for extremely slow tests, which can be applied with pytest.mark.xslow, to avoid running on every CI job. How about you add the test with that marker, then we can see how long it takes?

@TimFelixBeyer
Copy link
Contributor Author

Opened #20079 to avoid having to rebase etc. @lucascolley feel free to check out the new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement needs-work Items that are pending response from the author scipy.io
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants