long audio out of memory #659

MyraBaba · 2024-01-06T15:36:29Z

Hi,

When I try to diarize more than 4-5 hours at the stage of diarization its consuming memory much and increasing until killing itself.

whisperx 6hour.wav --lang tr --model large --model_dir MODELSx/ --device cuda --diarize

how I can prevent to use t much cpu memory ? Or no solution?

The text was updated successfully, but these errors were encountered:

simcop2387 · 2024-01-15T00:17:18Z

I ran into this also and ended up having to use ffmpeg to segment the audio. That said I have more memory I think so I wasn't hitting it for stuff until it's > 72 hours long

https://unix.stackexchange.com/questions/280767/how-do-i-split-an-audio-file-into-multiple

This stack overflow link explains it in more detail but something like: ffmpeg -i somefile.mp3 -f segment -segment_time 3600 -c copy out%03d.mp3 would make one hour segmented file in an automated fashion. That won't be doing anything like making sure words aren't cut off or anything like that so depending on what you're doing that might not be acceptable. In my case I'm trying to pull things apart for some training data so it's not going to make too much of a difference for me overall.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

long audio out of memory #659

long audio out of memory #659

MyraBaba commented Jan 6, 2024

simcop2387 commented Jan 15, 2024

long audio out of memory #659

long audio out of memory #659

Comments

MyraBaba commented Jan 6, 2024

simcop2387 commented Jan 15, 2024