Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
update results and recent news of wavlm
  • Loading branch information
MarkWuNLP committed Jan 10, 2022
1 parent 67afeed commit 65f15af
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions wavlm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,18 @@

Official PyTorch implementation and pretrained models of WavLM

- Dec 2021: An interesting speaker verification demo on [HuggingFace](https://huggingface.co/spaces/microsoft/wavlm-speaker-verification). You can have a try!
- Dec 2021: WavLM Large Release and [HuggingFace Support](https://huggingface.co/models?other=wavlm)
- Nov 2021: release code and pretrained models (WavLM Base and WavLM Base+)
- Oct 2021: release preprint in [arXiv](https://arxiv.org/pdf/2110.13900.pdf)


## Pre-Trained Models
Model | Pre-training Dataset | Fine-tuning Dataset | Model
|---|---|---|---
WavLM Base | [960 hrs LibriSpeech](http://www.openslr.org/12)| - | [Azure Storage](https://msranlcmtteamdrive.blob.core.windows.net/share/wavlm/WavLM-Base.pt?sv=2020-04-08&st=2021-11-05T00%3A35%3A31Z&se=2022-11-06T00%3A35%3A00Z&sr=b&sp=r&sig=JljnRVzyHY6AjHzhVmHV5KyQQCvvGfgp9D2M02oGJBU%3D) <br> [Google Drive](https://drive.google.com/file/d/19-C7SMQvEFAYLG5uc47NX_MY03JCbI4x/view?usp=sharing)
WavLM Base+ | [60k hrs Libri-Light](https://github.com/facebookresearch/libri-light) + [10k hrs GigaSpeech](https://github.com/SpeechColab/GigaSpeech) + [24k hrs VoxPopuli](https://github.com/facebookresearch/voxpopuli/tree/main)| - | [Azure Storage](https://msranlcmtteamdrive.blob.core.windows.net/share/wavlm/WavLM-Base+.pt?sv=2020-04-08&st=2021-11-05T00%3A34%3A47Z&se=2022-10-06T00%3A34%3A00Z&sr=b&sp=r&sig=Gkf1IByHaIn1t%2FVEd9D6WHjZ3zu%2Fk5eSdoj21UytKro%3D) <br> [Google Drive](https://drive.google.com/file/d/1PlbT_9_B4F9BsD_ija84sUTVw7almNX8/view?usp=sharing)
WavLM Large | [60k hrs Libri-Light](https://github.com/facebookresearch/libri-light) + [10k hrs GigaSpeech](https://github.com/SpeechColab/GigaSpeech) + [24k hrs VoxPopuli](https://github.com/facebookresearch/voxpopuli/tree/main)| - | [Azure Storage](https://msranlcmtteamdrive.blob.core.windows.net/share/wavlm/WavLM-Large.pt?sv=2020-08-04&st=2021-11-22T10%3A03%3A53Z&se=2022-11-23T10%3A03%3A00Z&sr=b&sp=r&sig=3kB8dwTCyIS8YQ7gW5oXmDrXV%2FAaLmoxBS37oPpFsz4%3D) <br> [Google Drive](https://drive.google.com/file/d/1p8nbj16b7YA16sqPZ4E0JUL-oIDUBGwU/view?usp=sharing)
WavLM Large | [60k hrs Libri-Light](https://github.com/facebookresearch/libri-light) + [10k hrs GigaSpeech](https://github.com/SpeechColab/GigaSpeech) + [24k hrs VoxPopuli](https://github.com/facebookresearch/voxpopuli/tree/main)| - | [Azure Storage](https://msranlcmtteamdrive.blob.core.windows.net/share/wavlm/WavLM-Large.pt?sv=2020-08-04&st=2021-11-22T10%3A03%3A53Z&se=2022-11-23T10%3A03%3A00Z&sr=b&sp=r&sig=3kB8dwTCyIS8YQ7gW5oXmDrXV%2FAaLmoxBS37oPpFsz4%3D) <br> [Google Drive](https://drive.google.com/file/d/1rMu6PQ9vz3qPz4oIm72JDuIr5AHIbCOb/view?usp=sharing)

## Load Pre-Trained Models

Expand All @@ -31,19 +34,17 @@ model = WavLM(cfg)
model.load_state_dict(checkpoint['model'])
model.eval()

# extract the the representation of last layer
# extract the representation of last layer
wav_input_16khz = torch.randn(1,10000)
rep = model.extract_features(wav_input_16khz)[0]

# extract the the representation of each layer
# extract the representation of each layer
wav_input_16khz = torch.randn(1,10000)
rep, layer_results = model.extract_features(wav_input_16khz, output_layer=model.cfg.encoder_layers, ret_layer_results=True)[0]
layer_reps = [x.transpose(0, 1) for x, _ in layer_results]
```


## Fine-Tuning
The authors are preparing simple, clear, and well-documented fine-tuning code of WavLM. Stay tuned!
HuggingFace and [s3prl](https://github.com/s3prl/s3prl) both support our models. It is very easy to fine-tune our models on different downstream tasks. We suggest you to extract representation of each layer and weighted sum the representations.

## Universal Representation Evaluation on SUPERB
![alt text](SUPERB_Results.png)
Expand All @@ -52,21 +53,20 @@ The authors are preparing simple, clear, and well-documented fine-tuning code of
## Downstream Task Performance
We also evaluate our models on typical speech processing benchmarks.
### Speaker Verification

Evaluate on the [VoxCeleb](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/#:~:text=VoxCeleb%20is%20an%20audio%2Dvisual,interview%20videos%20uploaded%20to%20YouTube)

Finetune the model with VoxCeleb2 dev data, and evaluate it on the [VoxCeleb1](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/#:~:text=VoxCeleb%20is%20an%20audio%2Dvisual,interview%20videos%20uploaded%20to%20YouTube)
| Model |Fix pre-train| Vox1-O | Vox1-E | Vox1-H |
| ------------- |------------- | ---------- | ---------- | ---------- |
| ECAPA-TDNN | - | 0.87 | 1.12 | 2.12 |
| HuBERT large | Yes| 0.888 |0.912| 1.853 |
| Wav2Vec2.0 (XLSR)| Yes | 0.915| 0.945 |1.895|
| UniSpeech-SAT large | Yes | 0.771 | 0.781| 1.669|
| WavLM large | Yes | 0.59 | 0.65| 1.328|
| WavLM large | No | 0.505 | 0.579| 1.176|
|+Large Margin Finetune and Score Calibration|
| HuBERT large | No| 0.585| 0.654 |1.342|
| Wav2Vec2.0 (XLSR) | No| 0.564| 0.605 |1.23|
| UniSpeech-SAT large | No | 0.564 | 0.561| 1.23 |
| **WavLM large** | No | **0.399** | **0.474**| **1.003** |

| **WavLM large (New)** | No | **0.33** | **0.477**| **0.984** |


### Speech Separation
Expand Down

0 comments on commit 65f15af

Please sign in to comment.