-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clients with slow networks blocking transcription cluster resources #1561
Comments
@lfcnassif, I am sorry if you already considered this, but what about sending the original audio files instead? I made a quick measure with ~50 K audio files collected from different cases and formats. Most of them (~90%) are OPUS. By the way, this is an additional comment about audio transcription, independent of the enhancement described in this issue. |
Hi @tc-wleite. Actually we started sending WAVs (aiming to distribute part of the job, the wav conversion), switched to your suggestion (because of bandwidth usage concerns), then rolled back after we looked at the stats and saw the cluster was surprisingly spending half the time just for WAV conversion. After we rolled back, the cluster performance became 2x faster. But WAV conversion is single threaded (using mplayer) while transcription uses almost half a physical CPU (together with the GPU, surprisingly, it needs both). The issue is that they were sequential. My changes will make audio transmission and transcription kind parallel. Because of that, maybe converting to WAV on server again (in parallel to transcription) wouldn't have that previous bottleneck, just testing... The code change is simple, I didn't threw up the previous logic, it is just disabled. |
I see. Sorry, I could have guessed that you had already tried that. |
You're welcome @tc-wleite, please continue to share your ideas. Commit above fixed slow clients blocking cluster resources. I already updated 4 from 6 nodes. After I finish, I'll change the code to convert to WAV on service side again, unplug 1 node from the cluster and do some tests to see if the wav conversion overhead was decreased or not. |
PS1: Each audios in this data set is duplicated, so client side total time should be higher for a similar dataset with unique audios. PS2: I'll fix the stats on server side, they should be changed after commit c7dc056. Wav real time would be a bit difficult to compute now. |
@tc-wleite, just for your reference, the previous wav conversion approach was changed here: #1400 |
Cluster nodes updated again. Clients should use a snapshot version to stop converting to WAV on their side. |
Currently WAV transfer and transcription itself are synchronous and part of the same job. The number of jobs per node is 2x the number of GPUs. A colleague with a very slow connection did some tests, it is working correctly, but WAV transfer took much more time than transcription itself. Then I realized that the service nodes were blocked waiting too much time for WAV transfer, refusing new connections from other clients, while the GPUs were idle. We should make WAV transfer and transcription asynchronous regarding each other. A simple workaround would be to allow a higher number of simultaneous connections from clients for WAV transfer and restrict the number of simultaneous transcriptions using a Semaphore to the current number of jobs per node.
The text was updated successfully, but these errors were encountered: