Pyannote is 10 times slower than WhisperX with GPU utilization 10%: expected behavior or misconfiguration? #1652

chubin · 2024-02-19T07:46:32Z

Tested versions

pyannote.audio==3.1.1
pyannote.core==5.0.0
pyannote.database==5.0.1
pyannote.metrics==3.2.1
pyannote.pipeline==3.0.1

System information

Ubuntu 22.04, NVIDIA RTX A6000

Issue description

I am not sure if it is a bug, so please feel free to close it if it is expected behavior.

I am trying to diarize a large recording (approximately 60 minutes), and the
diarization process takes 8.5 minutes:

real    8m40,982s
user    8m12,687s
sys     1m21,703s

Here is my code:

import torch
from pyannote.audio import Pipeline

pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1", use_auth_token=hf_token
    )

pipeline.to(torch.device("cuda"))

diarization = pipeline("audio.wav")

It uses the GPU during diarization, but with a low utilization level (~10%),
and it uses 1 core of the CPU (100%) all the time.

When doing the diarization with whisperx, though, it takes just a minute,
and GPU utilization is at full capacity.

However, the quality of diarization is slightly worse in this case (approximately 5% of text
is attributed to wrong/non-existent speakers).

           duration   GPU-usage
pyannote   520.5s     10%
whisperx    75.0s     100%

Pyannote diarization quality is just brilliant, but it takes an order of magnitude more time.

I suppose that I am doing something wrong, but I don't know what exactly.

Could you please point me in the right direction,
or just say that it is exactly as it should be, and the behavior is expected.

GPU utilization while using pyannote pure

GPU utilization when using whisperX

Minimal reproduction example (MRE)

(not applicable)

The text was updated successfully, but these errors were encountered:

hbredin · 2024-02-19T08:19:06Z

Would you mind sharing a link to a Google Colab that one can just click and run to reproduce the issue?

chubin · 2024-02-19T10:22:18Z

Unfortunately, I have no access to Google Colab from my Google Account (I can create a new account if needed),
but as you can see the code is trivial.

I noticed that the problem disappears, when I load the audio file using Audio:

from pyannote.audio import Audio
io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io("audio.mp3")

diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})

instead of loading audio.wav directly. The wav file (audio.wav) has the same sample rate (16000) though.

hbredin · 2024-02-21T08:32:23Z

The code might be "trivial" but the whole point of sharing a Google Colab is for pyannote maintainers to avoid wasting time on problems that are not reproducible.

For instance, two files with two different extensions (.wav and .mp3) are mentioned here.
It is not clear which one works and which one fails.

Preparing a Google Colab will definitely increase your chances of having someone look at your issue. It might also happen that the mere preparation of the Google Colab makes you realize that the problem is on your side (I am not saying that this is the case here but it happened in the past).

DerEchteFeuerpfeil · 2024-02-22T16:13:37Z

+1 for this issue

thanks for the note @chubin , I have used your solution with

io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io("audio.mp3")

diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})

and got much faster inference 👍

ahmetkipkip · 2024-05-06T14:56:30Z

Unfortunately, I have no access to Google Colab from my Google Account (I can create a new account if needed), but as you can see the code is trivial.

I noticed that the problem disappears, when I load the audio file using Audio:
from pyannote.audio import Audio
io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io("audio.mp3")

diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})
instead of loading audio.wav directly. The wav file (audio.wav) has the same sample rate (16000) though.

Wow, after updatin from 2.x to 3.x I had performance issues. Now It's better than old code. I really didn't get what caused that but..

Thanks

hbredin added the cannot_reproduce label Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pyannote is 10 times slower than WhisperX with GPU utilization 10%: expected behavior or misconfiguration? #1652

Pyannote is 10 times slower than WhisperX with GPU utilization 10%: expected behavior or misconfiguration? #1652

chubin commented Feb 19, 2024

hbredin commented Feb 19, 2024

chubin commented Feb 19, 2024 •

edited

Loading

hbredin commented Feb 21, 2024

DerEchteFeuerpfeil commented Feb 22, 2024

ahmetkipkip commented May 6, 2024

Pyannote is 10 times slower than WhisperX with GPU utilization 10%: expected behavior or misconfiguration? #1652

Pyannote is 10 times slower than WhisperX with GPU utilization 10%: expected behavior or misconfiguration? #1652

Comments

chubin commented Feb 19, 2024

Tested versions

System information

Issue description

GPU utilization while using pyannote pure

GPU utilization when using whisperX

Minimal reproduction example (MRE)

hbredin commented Feb 19, 2024

chubin commented Feb 19, 2024 • edited Loading

hbredin commented Feb 21, 2024

DerEchteFeuerpfeil commented Feb 22, 2024

ahmetkipkip commented May 6, 2024

chubin commented Feb 19, 2024 •

edited

Loading