Added a function for VAD-segments to handle mp3 files, numpy arrays and tensors #122

marcuskbrandt · 2023-03-06T09:49:03Z

I found that the function would let you parse numpy arrays and tensors. The VAD pipeline does however not support this, by just parsing, since you need to parse it as a mapping with the sample rate.

As of now you can't parse a mp3 file to the vad pipeline. I fixed this by reading the file with whisper audio_load and then converting it to stereo. This is not a pretty solution, but it seems to work quite well.

sorgfresser · 2023-04-30T16:42:18Z

I'm curious: why do you ensure it's stereo? Doesn't the VAD under the hood use mono as well?

marcuskbrandt · 2023-04-30T16:54:53Z

I'm curious: why do you ensure it's stereo? Doesn't the VAD under the hood use mono as well?

As far as I remember then I had some problems when it were mono. It is however almost 2 months ago, and I think Max Bain has already solved this issue with other code.

matheusbach · 2023-06-21T00:14:10Z

Nice improvement. Not sure about stereo

marcuskbrandt added 15 commits March 4, 2023 18:33

Update requirements.txt

324a9a3

don't add version to soundfile

b00b416

maybe made loading audio for transcribe_with_vad_parallel more stable

0d21519

tried to make transcribe more stable

4a0f775

maybe a fix

d1fe628

i deleted mel by mistake

1ae96bd

again mel

c90b718

Update requirements.txt

427327e

Merge branch 'm-bain:main' into main

ba2443c

created function to get vad, can handle more than str now

404a80b

forgot to remove merge

fed6b7b

fix tensor in vad

480274e

added back that you can parse numpy array and tensor

562a2af

removed danish alignment model from this pull request

95e2e74

added sph support

f717bb2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Added a function for VAD-segments to handle mp3 files, numpy arrays and tensors #122

Added a function for VAD-segments to handle mp3 files, numpy arrays and tensors #122

Uh oh!

marcuskbrandt commented Mar 6, 2023

Uh oh!

sorgfresser commented Apr 30, 2023

Uh oh!

marcuskbrandt commented Apr 30, 2023

Uh oh!

matheusbach commented Jun 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Added a function for VAD-segments to handle mp3 files, numpy arrays and tensors #122

Are you sure you want to change the base?

Added a function for VAD-segments to handle mp3 files, numpy arrays and tensors #122

Uh oh!

Conversation

marcuskbrandt commented Mar 6, 2023

Uh oh!

sorgfresser commented Apr 30, 2023

Uh oh!

marcuskbrandt commented Apr 30, 2023

Uh oh!

matheusbach commented Jun 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants