Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long audio out of memory #659

Open
MyraBaba opened this issue Jan 6, 2024 · 1 comment
Open

long audio out of memory #659

MyraBaba opened this issue Jan 6, 2024 · 1 comment

Comments

@MyraBaba
Copy link

MyraBaba commented Jan 6, 2024

Hi,

When I try to diarize more than 4-5 hours at the stage of diarization its consuming memory much and increasing until killing itself.

whisperx 6hour.wav --lang tr --model large --model_dir MODELSx/ --device cuda --diarize

how I can prevent to use t much cpu memory ? Or no solution?

@simcop2387
Copy link

I ran into this also and ended up having to use ffmpeg to segment the audio. That said I have more memory I think so I wasn't hitting it for stuff until it's > 72 hours long

https://unix.stackexchange.com/questions/280767/how-do-i-split-an-audio-file-into-multiple

This stack overflow link explains it in more detail but something like: ffmpeg -i somefile.mp3 -f segment -segment_time 3600 -c copy out%03d.mp3 would make one hour segmented file in an automated fashion. That won't be doing anything like making sure words aren't cut off or anything like that so depending on what you're doing that might not be acceptable. In my case I'm trying to pull things apart for some training data so it's not going to make too much of a difference for me overall.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants