Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: scipy.io.wavfile.read stays in infinite loop, warns on wav files converted from Sphere #6700

Closed
ghost opened this issue Oct 21, 2016 · 8 comments
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.io
Milestone

Comments

@ghost
Copy link

ghost commented Oct 21, 2016

I converted some Sphere audio files using sph2pipe_v2.5 and some of the resulting wave files can not be read using scipy.io.wavfile.

It shows this warning:
wavfile.py:267: WavFileWarning: Chunk (non-data) not understood, skipping it. WavFileWarning)

and keeps reading in an infinite loop which never ends.

Here is the wav file causing the problem.
problem.wav.tar.gz

Other libraries (such as wavio) are able to read this file without any problem.
Also I could open the wav file and access the content via sonic visualizer with no issue.

My python version:
Python 2.7.12 (default, Jul 1 2016, 15:12:24)
My scipy version:
scipy.__version__ '0.18.0'
My OS:
DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS" NAME="Ubuntu" VERSION="16.04.1 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04.1 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" UBUNTU_CODENAME=xenial

@ghost ghost changed the title scipy.io.wavfile.read warns on Sphear wav files scipy.io.wavfile.read warns on Sphere wav files Oct 21, 2016
@ghost ghost changed the title scipy.io.wavfile.read warns on Sphere wav files scipy.io.wavfile.read warns on wav files converted from Sphere Oct 21, 2016
@ghost ghost changed the title scipy.io.wavfile.read warns on wav files converted from Sphere scipy.io.wavfile.read stays in infinite loop, warns on wav files converted from Sphere Oct 21, 2016
@ghost ghost changed the title scipy.io.wavfile.read stays in infinite loop, warns on wav files converted from Sphere [BUG?!] scipy.io.wavfile.read stays in infinite loop, warns on wav files converted from Sphere Oct 21, 2016
@WarrenWeckesser WarrenWeckesser added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.io labels Oct 21, 2016
@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Oct 21, 2016

Thanks for reporting the problem and providing the test file. I can confirm that the problem also occurs with the latest version of scipy from github (dfaca96), using Python 3.5.2 with numpy 1.11.1 on a Mac running OS X 10.9.5.

@WarrenWeckesser WarrenWeckesser changed the title [BUG?!] scipy.io.wavfile.read stays in infinite loop, warns on wav files converted from Sphere BUG: scipy.io.wavfile.read stays in infinite loop, warns on wav files converted from Sphere Oct 21, 2016
@ghost
Copy link
Author

ghost commented Oct 22, 2016

good to see it has confirmed on other versions and OS as well. hope a solution can be found to fix this bug.

mp4096 added a commit to mp4096/scipy that referenced this issue Oct 23, 2016
If the RIFF chunk contains false information about the file size,
`scipy.io.wavfile.read()` gets stuck in an infinite loop as it tries
to read beyond the EOF.

Notice that the `mmap=True` call is still broken if the `data` chunk
has wrong size specification as well.

See scipy#6700
@mp4096
Copy link
Contributor

mp4096 commented Oct 23, 2016

The problems seems to be that the problem.wav file has a wrong size information in its RIFF and data chunks. It is larger thant the actual file size, so scipy.io.wavfile.read() tries to read beyond the EOF. Hence the while condition stays True and the loop never terminates.

You can run this small program:

from scipy.io.wavfile import _read_riff_chunk
from os.path import getsize

filename = "problem.wav"
with open(filename, 'rb') as f:
    riff_size, _ = _read_riff_chunk(f)

print('RIFF size: {}'.format(riff_size))
print('os size:   {}'.format(getsize(filename)))

It outputs

RIFF size: 5882284
os size:   4464640

I implemented a small patch that compares the file size in the RIFF chunk with the file size as reported by the OS (PR #6716). Then I read problem.wav and exported it to another file. The binary diff shows the problem is not only with the RIFF chunk, but also with the data chunk.

< 00000000: 5249 4646 a4c1 5900 5741 5645 666d 7420  RIFF..Y.WAVEfmt
---
> 00000000: 5249 4646 f81f 4400 5741 5645 666d 7420  RIFF..D.WAVEfmt
3c3
< 00000020: 0400 1000 6461 7461 80c1 5900 0000 0000  ....data..Y.....
---
> 00000020: 0400 1000 6461 7461 d41f 4400 0000 0000  ....data..D.....

This basically means that problem.wav cannot be read with mmap=True even with PR #6716 implemented. But at least it raises an exception.

@mp4096
Copy link
Contributor

mp4096 commented Oct 23, 2016

Concerning PR #6716:

  • Is it ok to fallback on os.path.getfilesize()? Is there a reasonable use case when fid is not a file and thus has no filename?
  • An alternative implementation would be to check whether chunk_id is empty and then to raise an exception saying something like Attempted read beyond EOF. Which one do you prefer?
  • Should I write a unit test? We would need a small malformed wav file for it. Is it worth it though?

EDIT: Corrected the second sentence, added a question about unit tests.

@ghost
Copy link
Author

ghost commented Oct 23, 2016

I've investigated the reason and I found out for some cases the sph2pipe was failing so it created shorter wav files than it supposed to. So the header probably reflects the length of the data in sphere format but since the conversion was failing at some point, there are some files where the header is wrong.
The rest was I guess described as @mp4096 mentioned.
I think if Scipy lets me know that the header information is wrong by a warning (or an exception) similar to the suggested one, it would work for me.

mp4096 added a commit to mp4096/scipy that referenced this issue Oct 23, 2016
If the RIFF chunk contains false information about the file size,
`scipy.io.wavfile.read()` gets stuck in an infinite loop as it tries
to read beyond the EOF.

Notice that the `mmap=True` call is still broken if the `data` chunk
has wrong size specification as well.

See scipy#6700
mp4096 added a commit to mp4096/scipy that referenced this issue Oct 23, 2016
If the RIFF chunk contains false information about the file size,
`scipy.io.wavfile.read()` gets stuck in an infinite loop as it tries
to read beyond the EOF.

Notice that the `mmap=True` call is still broken if the `data` chunk
has wrong size specification as well.

See scipy#6700
mp4096 added a commit to mp4096/scipy that referenced this issue Oct 23, 2016
If the RIFF chunk contains false information about the file size,
`scipy.io.wavfile.read()` gets stuck in an infinite loop as it tries
to read beyond the EOF.

Notice that the `mmap=True` call is still broken if the `data` chunk
has wrong size specification as well.

See scipy#6700
@pv pv added this to the 0.19.0 milestone Oct 24, 2016
@pv
Copy link
Member

pv commented Oct 24, 2016

Changed to raise an exception, as the file is malformed.

@pv pv closed this as completed Oct 24, 2016
@hudsantos
Copy link

my wave file was malformed too.. with other file didn't happend!! thanks

@srikar-s
Copy link

Hello,

I know this post is closed, but I recently ran into the similar issue, due to a malformed header. However, in our group, this issue was not uncovered earlier as all other tools - MATLAB, Adobe Audition, Windows Media Player are able to successfully read the data chunk, even though the chunkSize field is not entered correctly. Also, python's own standard library wave ( ) reads the data properly. It sizes the data chunk itself, and uses that.
Finally, in the process of debugging the issue, as a temporary workaround, I just opened the wav as binary and read the data from byte 44 into a numpy array using numpy.fromfile, without any EOF errors.

Given, how so many approaches are available to read the data successfully, it would be good for scipy.io.wavfile( ) to also continue to get the correct data, and change the current valueError to a warning.

Should I open a new ticket for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.io
Projects
None yet
Development

No branches or pull requests

5 participants