Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

torachaudio.load does not load webm bytesio or _io.BufferedReader, but loads the file from hard drive #2792

Closed
pooya-mohammadi opened this issue Oct 24, 2022 · 7 comments

Comments

@pooya-mohammadi
Copy link

pooya-mohammadi commented Oct 24, 2022

馃悰 Describe the bug

I have a websocket that receives chunks of data in a byte format. The browser encodes the data in audio/webm format. The code is like the following:

@app.websocket("/listen")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            data = await websocket.receive_bytes()
            with open('audio.wav', mode="wb") as f:
                f.write(data)
    except Exception as e:
        raise Exception(f'Could not process audio: {e}')
    finally:
        await websocket.close()

Manually writing the data to audio.wav and then reading the file using the following code works fine with no errors:

array, sr = torchaudio.load("audio.wav")

However, reading the file as a file object does not work:

with open("audio.wav", mode="rb") as f:
    torchaudio.load(f)

It raises the following error:

Exception: Could not process audio: Failed to open the input "<_io.BufferedReader name='audio.wav'>" (Invalid data found when processing input).
INFO:     connection closed

PS: Creating BytesIO from the data and passing it to the torchaudio.load results in error the same as the above.

Versions

Versions of relevant libraries:
[pip3] numpy==1.23.4
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[conda] numpy 1.23.4 pypi_0 pypi

OS

Ubuntu: 22.04
torchaudio.backend: "sox_io"

PS

I tested the same process on a webm file which was converted from a wav file, and the result was the same:

  1. torchaudio.load can read the file from hard drive.
  2. torchaudio.load cannot read bytesio or _io.BufferedReader
@mthrok
Copy link
Collaborator

mthrok commented Oct 25, 2022

Hi @pooya-mohammadi

This issue seems to depend on the data you are handling. Is it possible to share some sample data, which do not include any PII nor copyright issue? A complete silence is fine.

@pooya-mohammadi
Copy link
Author

Hi @mthrok
audio.zip
It's a simple file(less than 1 second) that contains a little noise to make sure the mic were functioning correctly.

@pooya-mohammadi pooya-mohammadi changed the title torachaudio.load does not load webm bytesio or _io.BufferedReader, but loads the from the file torachaudio.load does not load webm bytesio or _io.BufferedReader, but loads the file from hard drive Oct 25, 2022
@mthrok
Copy link
Collaborator

mthrok commented Oct 25, 2022

Hi @pooya-mohammadi

The audio you shared has wav extension but, in fact, it is WebM format.

with open("audio.wav", "rb") as f:
    print(f.read(50)[30:])

prints the following

b'\x84webmB\x87\x81\x02B\x85\x81\x02\x18S\x80g\x01\xff\xff'

and ffprove audio.wav reports;

Input #0, matroska,webm, from 'audio.wav':
  Metadata:
    encoder         : QTmuxingAppLibWebM-0.0.1
  Duration: N/A, start: -0.001000, bitrate: N/A
  Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)

torchaudio.load first attempts to read it with libsox, but it fails as WebM is not supported, and it re-tries with FFmpeg only when the source is file path. It cannot retry when the input is file-like object, as seek method is not always available.

To handle WebM, you can use torchaudio.io.StreamReader, and it works with both file input and file-like object input and it can do iterative reading as well.

# loading from path and  read the entire audio in one-go
s = torchaudio.io.StreamReader(path)
s.add_basic_audio_stream(-1)
s.process_all_packets()
waveform, = s.pop_chunks()
# load from file-like object and read audio chunk-by-chunk
s = torchaudio.io.StreamReader(f)
s.add_basic_audio_stream(chunk_size)
for chunk, in s.stream():
    # process waveform

For the detailed usage, please checkout tutorials like

@pooya-mohammadi
Copy link
Author

pooya-mohammadi commented Oct 25, 2022

@mthrok
Thanks for the detailed answer. It works fine. I could solve the issue like the following code snippet. However, I created a stream over the data which is generated from another streaming tool. I think this is not the most optimized way. Are you planning to add the support to torchaudio.load or something with similar functionality to read the whole file-obj data which is webm format in the future?

@app.websocket("/listen")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    try:
        while True:
            data = await websocket.receive_bytes()
            f = BytesIO(data)
            s = torchaudio.io.StreamReader(f)
            s.add_basic_audio_stream(1000)
            tensor = torch.concat([chunk[0] for chunk in s.stream()])
            print(tensor.shape)
    except Exception as e:
        raise Exception(f'Could not process audio: {e}')
    finally:
        await websocket.close()

@mthrok
Copy link
Collaborator

mthrok commented Oct 25, 2022

I think you could do chunk-by-chunk decoding, which is more efficient, but not sure if this is what you want, as I do not know what application you are building.

To do chunk-by-chunk decoding, you can wrap the socket object into a synchronous file-like object.

class Wrapper:
    def __init__(self, socket):
        self.socket = socket
        self.buffer = b''

    def read(self, n):
        while len(self.buffer) < n:
            new_data = await self.socket.receive_bytes()
            if not new_data:
                break
            self.buffer += new_data
        data, self.buffer = self.buffer[:n], self.buffer[n:]
        return data

Then passing it to StreamReader and let StreamReader pull the data.

try:
    wrapper = Wrapper(websocket)
    s = torchaudio.io.StreamReader(wrapper)
    for chunk in s.stream():
        print(chunk.shape)
except ...

@mthrok
Copy link
Collaborator

mthrok commented Oct 25, 2022

Note you can read in one-go with file-like object input. The src argument and how decoding is done is independent.

s = torchaudio.io.StreamReader(fileobj)
s.add_basic_audio_stream(-1)
s.process_all_packets()
waveform, = s.pop_chunks()

@mthrok
Copy link
Collaborator

mthrok commented Oct 25, 2022

I am going to close the issue, as this is not a bug with torchaudio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants