Get rid of ONNX WeSpeaker in favor of its pytorch implementation #1537

hbredin · 2023-11-09T12:00:36Z

Since its introduction in pyannote.audio 3.x, the ONNX dependency seems to cause lots of problem to pyannote users: #1526 #1523 #1517 #1510 #1508 #1481 #1478 #1477 #1475

WeSpeaker does provide a pytorch implementation of its pretrained ResNet models.

Let's use this!

The text was updated successfully, but these errors were encountered:

hbredin · 2023-11-10T09:08:43Z

Among the people who raised their thumb on this issue, anyone wants to take care of it?

wsstriving · 2023-11-13T04:34:00Z

Hi, I am the initiator of Wespeaker, thanks for the interest of our toolkit!
We will update wespeaker to support installation and load pytorch model in a way such as "model = wespeaker.load_pytorch_model" very soon(currently we support wespeaker.load_model, but it's onnx), then I will open a PR to "pipelines/speaker_verification"

hbredin · 2023-11-13T07:43:22Z

Thanks @wsstriving! I worked on this a few days ago and already have a working prototype.

Instead of adding one more dependency to pyannote.audio, I was planning to copy the part of the WeSpeaker into a new pyannote.audio.models.embedding.wespeaker module.

I am just stuck with the fact that WeSpeaker uses Apache-2.0 license, while pyannote uses MIT license. Both are permissive but I am not quite sure where and how to mention WeSpeaker license into pyannote codebase. Would putting it at the top of the pyannote.audio.models.embedding.wespeaker directory be enough?

Another option that I am considering is adding embedding entrypoint to pyannote.audio so that any external libraries can actually provide embeddings usable in pyannote as long as they follow the API. What do you think?

wsstriving · 2023-11-13T12:02:33Z

Thanks @wsstriving! I worked on this a few days ago and already have a working prototype.

Instead of adding one more dependency to pyannote.audio, I was planning to copy the part of the WeSpeaker into a new pyannote.audio.models.embedding.wespeaker module.

I am just stuck with the fact that WeSpeaker uses Apache-2.0 license, while pyannote uses MIT license. Both are permissive but I am not quite sure where and how to mention WeSpeaker license into pyannote codebase. Would putting it at the top of the pyannote.audio.models.embedding.wespeaker directory be enough?

Another option that I am considering is adding embedding entrypoint to pyannote.audio so that any external libraries can actually provide embeddings usable in pyannote as long as they follow the API. What do you think?

Hi Bredin, I think it's just fine for the first option. We implemented the CLI support and you can check it here https://github.com/wenet-e2e/wespeaker/blob/master/docs/python_package.md

Now, it's easy to use the wespeaker model in pytorch as:

import wespeaker
model = wespeaker.load_model('english')
model.set_gpu(0)
print(model.model)

# model.model(feats)

Check https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/cli/speaker.py#L63 for more details to use it.

hbredin · 2023-11-13T15:57:32Z

Quick update:

PR feat(model): add WeSpeaker embedding wrapper based on pytorch #1540 adds support for pytorch-based WeSpeaker model
PR BREAKING: prepare for getting rid of ONNX runtime #1541 removes onnxruntime dependency

Could any of you (who raised their thumbs) try the following:

checkout PR feat(model): add WeSpeaker embedding wrapper based on pytorch #1540
instantiate a speaker diarization pipeline with pyannote/wespeaker-voxceleb-resnet34-LM in place of hbredin/wespeaker-voxceleb-resnet34-LM
run the pipeline on CPU
run the pipeline on GPU
report back in this issue

stygmate · 2023-11-14T11:48:09Z

@hbredin

I made a quick test, I don't have checked the results and i'm unsure of the pipeline def.

what i run:

from pyannote.audio.pipelines import SpeakerDiarization
from pyannote.audio.pipelines.utils.hook import ProgressHook
import torch

pipeline = SpeakerDiarization(segmentation="pyannote/segmentation-3.0",embedding="pyannote/wespeaker-voxceleb-resnet34-LM")


pipeline.instantiate({
    "segmentation": {
        "min_duration_off": 0.0,
    },
    "clustering": {
        "method": "centroid",
        "min_cluster_size": 12,
        "threshold": 0.7045654963945799,
    },
})

pipeline.to(torch.device("mps"))

with ProgressHook() as hook:
    diarization = pipeline("./download/test.wav", hook=hook)

i got this warning: Model was trained with pyannote.audio 2.1.1, yours is 3.0.1. Bad things might happen unless you revert pyannote.audio to 2.x.

Seems to work on CPU.

For GPU (mac m1 max) i got this error: NotImplementedError: The operator 'aten::upsample_linear1d.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
after setting the env var it work but with a mix of gpu and cpu.

hbredin · 2023-11-15T15:55:57Z

Thanks @stygmate for the feedback.

To use the same setup as pyannote/speaker-diarization-3.0, one should use the following:

from pyannote.audio.pipelines import SpeakerDiarization
from pyannote.audio.pipelines.utils.hook import ProgressHook
from pyannote.audio import Audio
import torch

pipeline = SpeakerDiarization(
    segmentation="pyannote/segmentation-3.0",
    segmentation_batch_size=32
    embedding="pyannote/wespeaker-voxceleb-resnet34-LM",
    embedding_exclude_overlap=True,
    embedding_batch_size=32)

# other values of `*_batch_size` may lead to faster processing. 
# the larger may not necessarily be the faster.

pipeline.instantiate({
    "segmentation": {
        "min_duration_off": 0.0,
    },
    "clustering": {
        "method": "centroid",
        "min_cluster_size": 12,
        "threshold": 0.7045654963945799,
    },
})

# send the pipeline to your prefered device
device = torch.device("cpu") 
device = torch.device("cuda")
device = torch.device("mps")  
pipeline.to(device)

# load audio in memory (usually leads to faster processing)
io = Audio(mono='downmix', sample_rate=16000)
waveform, sample_rate = io(audio)
file = {"waveform": waveform, "sample_rate": sample_rate}

# process the audio 
with ProgressHook() as hook:
    diarization = pipeline(file, hook=hook)

I'd love to get feedback from you all regarding possible algorithmic or speed regressions .

stygmate · 2023-11-16T09:04:50Z

@hbredin Give me a wav file to process, I will send you the results.

hbredin · 2023-11-16T13:01:30Z

Closing as latest version no longer relies on ONNX runtime.
Please update to pyannote.audio 3.1 and pyannote/speaker-diarization-3.1 (and open new issues if needed).

magicse · 2023-11-16T16:40:02Z

It work ok . But I use torch 1.XX
segmentation ---------------------------------------- 100% 0:00:09
speaker_counting ---------------------------------------- 100% 0:00:00
embeddings ---------------------------------------- 100% 0:06:39
discrete_diarization ---------------------------------------- 100% 0:00:00

And i made some changes for comptability with torch 1.xx and torch 2.xx
file \pyannote\audio\models\embedding\wespeaker_init_.py

if torch.__version__ >= "2.0.0":
    # Use torch.vmap for torch 2.0 or newer
    from torch import vmap
else:
    # Use functorch.vmap for torch 1.12 or older
    from functorch import vmap

And change
features = torch.vmap(self._fbank)(waveforms.to(fft_device)).to(device)
to
features = vmap(self._fbank)(waveforms.to(fft_device)).to(device)

Same changes in file \pyannote\audio\models\blocks\pooling.py

hbredin · 2023-11-16T18:58:40Z

Thanks for the feedback (and the PR!).
However, I don't plan to support torch 1.x in the future.

pyannote deleted a comment from github-actions bot Nov 9, 2023

This was referenced Nov 13, 2023

feat(model): add WeSpeaker embedding wrapper based on pytorch #1540

Merged

BREAKING: prepare for getting rid of ONNX runtime #1541

Merged

hbredin closed this as completed Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of ONNX WeSpeaker in favor of its pytorch implementation #1537

Get rid of ONNX WeSpeaker in favor of its pytorch implementation #1537

hbredin commented Nov 9, 2023

hbredin commented Nov 10, 2023

wsstriving commented Nov 13, 2023

hbredin commented Nov 13, 2023

wsstriving commented Nov 13, 2023

hbredin commented Nov 13, 2023 •

edited

Loading

stygmate commented Nov 14, 2023 •

edited

Loading

hbredin commented Nov 15, 2023

stygmate commented Nov 16, 2023

hbredin commented Nov 16, 2023

magicse commented Nov 16, 2023 •

edited

Loading

hbredin commented Nov 16, 2023

Get rid of ONNX WeSpeaker in favor of its pytorch implementation #1537

Get rid of ONNX WeSpeaker in favor of its pytorch implementation #1537

Comments

hbredin commented Nov 9, 2023

hbredin commented Nov 10, 2023

wsstriving commented Nov 13, 2023

hbredin commented Nov 13, 2023

wsstriving commented Nov 13, 2023

hbredin commented Nov 13, 2023 • edited Loading

stygmate commented Nov 14, 2023 • edited Loading

hbredin commented Nov 15, 2023

stygmate commented Nov 16, 2023

hbredin commented Nov 16, 2023

magicse commented Nov 16, 2023 • edited Loading

hbredin commented Nov 16, 2023

hbredin commented Nov 13, 2023 •

edited

Loading

stygmate commented Nov 14, 2023 •

edited

Loading

magicse commented Nov 16, 2023 •

edited

Loading