-
Notifications
You must be signed in to change notification settings - Fork 1k
Add WavLM- & Wav2Vec2ForAudioFrameClassification support #611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Transformers docs: |
|
Thanks for the PR! Could you maybe explain how the output of the audio frame classification model should be interpreted? The example code in the model card produces a one-hot array of shape |
|
Yes of course! The output logits are of dimension [num_batches, num_frames, num_speakers]. |
|
I suspect the model output to be not ready "out-of-the-box" for such a pipeline, because of possible overlapping etc.(I need to check the paper & code, haven't had time yet) |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@D4ve-R Did you manage to get the model to classify into three different speakers? The current inplementation in Transformers.js seems only split into two speakers. |
|
@flatsiedatsie sorry can't remember. Here is a working example, to run speaker diarization with whisper + pyannote in transformer.js Hope it helps |
|
Thanks, that's very kind. I'm currently testing an implementation where I recursively re-segment long segments. |
This adds support for WavLM- & Wav2Vec2ForAudioFrameClassification models.
The models can be used for speaker diarization tasks.
"Official model"
microsoft/wavlm-base-plus-sdAutoModelForAudioFrameClassificationWav2Vec2ForAudioFrameClassificationWavLMForAudioFrameClassification