Skip to content

Conversation

@marcuskbrandt
Copy link
Contributor

I found that the function would let you parse numpy arrays and tensors. The VAD pipeline does however not support this, by just parsing, since you need to parse it as a mapping with the sample rate.

As of now you can't parse a mp3 file to the vad pipeline. I fixed this by reading the file with whisper audio_load and then converting it to stereo. This is not a pretty solution, but it seems to work quite well.

@sorgfresser
Copy link
Contributor

I'm curious: why do you ensure it's stereo? Doesn't the VAD under the hood use mono as well?

@marcuskbrandt
Copy link
Contributor Author

I'm curious: why do you ensure it's stereo? Doesn't the VAD under the hood use mono as well?

As far as I remember then I had some problems when it were mono. It is however almost 2 months ago, and I think Max Bain has already solved this issue with other code.

@matheusbach
Copy link

Nice improvement. Not sure about stereo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants