Add WavLM- & Wav2Vec2ForAudioFrameClassification support #611

D4ve-R · 2024-02-28T22:01:58Z

This adds support for WavLM- & Wav2Vec2ForAudioFrameClassification models.
The models can be used for speaker diarization tasks.

"Official model" microsoft/wavlm-base-plus-sd

Add AutoModelForAudioFrameClassification
Add Wav2Vec2ForAudioFrameClassification
Add WavLMForAudioFrameClassification

D4ve-R · 2024-02-28T22:05:40Z

Transformers docs:
Wav2Vec2ForAudioFrameClassification.
WavLMForAudioFrameClassification.

xenova · 2024-03-04T16:26:15Z

Thanks for the PR! Could you maybe explain how the output of the audio frame classification model should be interpreted? The example code in the model card produces a one-hot array of shape (num_frames, num_speakers), but it would be nice to be able to turn that into a JSON output with timestamps.

D4ve-R · 2024-03-04T18:04:27Z

Yes of course! The output logits are of dimension [num_batches, num_frames, num_speakers].
For now I can't tell you more, but I'm doing experiments, because I had the same idea 😆.
Since there is no speaker-diarization task in transformers, how do you feel about implementing a new pipeline for that task in transformers.js? I think it would be really cool.

D4ve-R · 2024-03-04T18:09:06Z

I suspect the model output to be not ready "out-of-the-box" for such a pipeline, because of possible overlapping etc.(I need to check the paper & code, haven't had time yet)
I'm working on porting this code for a speaker diarization pipeline to js and with different models if needed.
It would be really awesome to be able to combine it with whisper for accurate conversational transcription in the browser 🤯

src/models.js

HuggingFaceDocBuilderDev · 2024-03-07T12:56:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

flatsiedatsie · 2024-09-04T14:44:48Z

@D4ve-R Did you manage to get the model to classify into three different speakers? The current inplementation in Transformers.js seems only split into two speakers.

D4ve-R · 2024-09-06T15:29:09Z

@flatsiedatsie sorry can't remember.

Here is a working example, to run speaker diarization with whisper + pyannote in transformer.js

https://huggingface.co/spaces/Xenova/whisper-speaker-diarization/blob/main/whisper-speaker-diarization/src/worker.js

Hope it helps

flatsiedatsie · 2024-09-06T17:05:11Z

Thanks, that's very kind.

I'm currently testing an implementation where I recursively re-segment long segments.

D4ve-R added 7 commits February 23, 2024 01:35

Add WavLMForXVector support

3d35f66

fix model docs

6d09c47

Add WavLMForAudioFrameClassification

3079d2c

Add missing wWav2Vec2ForAudioFrameCl.

750092e

Merge branch 'main' into wavlm-audioframe

8972e60

Add doc comment

627b48c

Add doc string wav2vec2

fc34610

xenova mentioned this pull request Mar 4, 2024

Add support for UniSpeech and UniSpeechSat models #624

Merged

D4ve-R added 2 commits March 4, 2024 19:39

update comment

6221571

make example like python

3b0f1ac

xenova reviewed Mar 7, 2024

View reviewed changes

src/models.js Outdated Show resolved Hide resolved

xenova reviewed Mar 7, 2024

View reviewed changes

src/models.js Outdated Show resolved Hide resolved

xenova added 2 commits March 7, 2024 14:52

Update src/models.js

008a079

Merge branch 'main' into wavlm-audioframe

9b35fd2

xenova merged commit 8eef154 into huggingface:main Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add WavLM- & Wav2Vec2ForAudioFrameClassification support #611

Add WavLM- & Wav2Vec2ForAudioFrameClassification support #611

Uh oh!

D4ve-R commented Feb 28, 2024 •

edited

Loading

Uh oh!

D4ve-R commented Feb 28, 2024

Uh oh!

xenova commented Mar 4, 2024

Uh oh!

D4ve-R commented Mar 4, 2024

Uh oh!

D4ve-R commented Mar 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 7, 2024

Uh oh!

flatsiedatsie commented Sep 4, 2024

Uh oh!

D4ve-R commented Sep 6, 2024

Uh oh!

flatsiedatsie commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add WavLM- & Wav2Vec2ForAudioFrameClassification support #611

Add WavLM- & Wav2Vec2ForAudioFrameClassification support #611

Uh oh!

Conversation

D4ve-R commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

D4ve-R commented Feb 28, 2024

Uh oh!

xenova commented Mar 4, 2024

Uh oh!

D4ve-R commented Mar 4, 2024

Uh oh!

D4ve-R commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Mar 7, 2024

Uh oh!

flatsiedatsie commented Sep 4, 2024

Uh oh!

D4ve-R commented Sep 6, 2024

Uh oh!

flatsiedatsie commented Sep 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

D4ve-R commented Feb 28, 2024 •

edited

Loading

D4ve-R commented Mar 4, 2024 •

edited

Loading