Update README.md

update results and recent news of wavlm
microsoft · Jan 10, 2022 · 65f15af · 65f15af
1 parent 67afeed
commit 65f15af
Showing 1 changed file with 11 additions and 11 deletions.
diff --git a/wavlm/README.md b/wavlm/README.md
@@ -8,15 +8,18 @@
 
 Official PyTorch implementation and pretrained models of WavLM
 
+- Dec 2021: An interesting speaker verification demo on [HuggingFace](https://huggingface.co/spaces/microsoft/wavlm-speaker-verification). You can have a try!
+- Dec 2021: WavLM Large Release and [HuggingFace Support](https://huggingface.co/models?other=wavlm)
 - Nov 2021: release code and pretrained models (WavLM Base and WavLM Base+)
 - Oct 2021: release preprint in [arXiv](https://arxiv.org/pdf/2110.13900.pdf)
 
+
 ## Pre-Trained Models
 Model | Pre-training Dataset | Fine-tuning Dataset | Model
 |---|---|---|---
 WavLM Base |  [960 hrs LibriSpeech](http://www.openslr.org/12)| -  | [Azure Storage](https://msranlcmtteamdrive.blob.core.windows.net/share/wavlm/WavLM-Base.pt?sv=2020-04-08&st=2021-11-05T00%3A35%3A31Z&se=2022-11-06T00%3A35%3A00Z&sr=b&sp=r&sig=JljnRVzyHY6AjHzhVmHV5KyQQCvvGfgp9D2M02oGJBU%3D) <br> [Google Drive](https://drive.google.com/file/d/19-C7SMQvEFAYLG5uc47NX_MY03JCbI4x/view?usp=sharing)
 WavLM Base+ | [60k hrs Libri-Light](https://github.com/facebookresearch/libri-light) + [10k hrs GigaSpeech](https://github.com/SpeechColab/GigaSpeech) + [24k hrs VoxPopuli](https://github.com/facebookresearch/voxpopuli/tree/main)| -  |  [Azure Storage](https://msranlcmtteamdrive.blob.core.windows.net/share/wavlm/WavLM-Base+.pt?sv=2020-04-08&st=2021-11-05T00%3A34%3A47Z&se=2022-10-06T00%3A34%3A00Z&sr=b&sp=r&sig=Gkf1IByHaIn1t%2FVEd9D6WHjZ3zu%2Fk5eSdoj21UytKro%3D) <br> [Google Drive](https://drive.google.com/file/d/1PlbT_9_B4F9BsD_ija84sUTVw7almNX8/view?usp=sharing) 
-WavLM Large | [60k hrs Libri-Light](https://github.com/facebookresearch/libri-light) + [10k hrs GigaSpeech](https://github.com/SpeechColab/GigaSpeech) + [24k hrs VoxPopuli](https://github.com/facebookresearch/voxpopuli/tree/main)| -  | [Azure Storage](https://msranlcmtteamdrive.blob.core.windows.net/share/wavlm/WavLM-Large.pt?sv=2020-08-04&st=2021-11-22T10%3A03%3A53Z&se=2022-11-23T10%3A03%3A00Z&sr=b&sp=r&sig=3kB8dwTCyIS8YQ7gW5oXmDrXV%2FAaLmoxBS37oPpFsz4%3D) <br> [Google Drive](https://drive.google.com/file/d/1p8nbj16b7YA16sqPZ4E0JUL-oIDUBGwU/view?usp=sharing) 
+WavLM Large | [60k hrs Libri-Light](https://github.com/facebookresearch/libri-light) + [10k hrs GigaSpeech](https://github.com/SpeechColab/GigaSpeech) + [24k hrs VoxPopuli](https://github.com/facebookresearch/voxpopuli/tree/main)| -  | [Azure Storage](https://msranlcmtteamdrive.blob.core.windows.net/share/wavlm/WavLM-Large.pt?sv=2020-08-04&st=2021-11-22T10%3A03%3A53Z&se=2022-11-23T10%3A03%3A00Z&sr=b&sp=r&sig=3kB8dwTCyIS8YQ7gW5oXmDrXV%2FAaLmoxBS37oPpFsz4%3D) <br> [Google Drive](https://drive.google.com/file/d/1rMu6PQ9vz3qPz4oIm72JDuIr5AHIbCOb/view?usp=sharing) 
 
 ## Load Pre-Trained Models
 
@@ -31,19 +34,17 @@ model = WavLM(cfg)
 model.load_state_dict(checkpoint['model'])
 model.eval()
 
-# extract the the representation of last layer
+# extract the representation of last layer
 wav_input_16khz = torch.randn(1,10000)
 rep = model.extract_features(wav_input_16khz)[0]
 
-# extract the the representation of each layer
+# extract the representation of each layer
 wav_input_16khz = torch.randn(1,10000)
 rep, layer_results = model.extract_features(wav_input_16khz, output_layer=model.cfg.encoder_layers, ret_layer_results=True)[0]
 layer_reps = [x.transpose(0, 1) for x, _ in layer_results]
 ```
 
-
-## Fine-Tuning 
-The authors are preparing simple, clear, and well-documented fine-tuning code of WavLM. Stay tuned!
+HuggingFace and [s3prl](https://github.com/s3prl/s3prl) both support our models. It is very easy to fine-tune our models on different downstream tasks. We suggest you to extract representation of each layer and weighted sum the representations. 
 
 ## Universal Representation Evaluation on SUPERB 
 ![alt text](SUPERB_Results.png)
@@ -52,21 +53,20 @@ The authors are preparing simple, clear, and well-documented fine-tuning code of
 ## Downstream Task Performance 
 We also evaluate our models on typical speech processing benchmarks.
 ### Speaker Verification
-
-Evaluate on the [VoxCeleb](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/#:~:text=VoxCeleb%20is%20an%20audio%2Dvisual,interview%20videos%20uploaded%20to%20YouTube)
-
+Finetune the model with VoxCeleb2 dev data, and evaluate it on the [VoxCeleb1](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/#:~:text=VoxCeleb%20is%20an%20audio%2Dvisual,interview%20videos%20uploaded%20to%20YouTube)
 | Model         |Fix pre-train| Vox1-O | Vox1-E     | Vox1-H         |
 | ------------- |------------- | ---------- | ---------- | ---------- |
 | ECAPA-TDNN   | - | 0.87     | 1.12  | 2.12   |
 | HuBERT large  | Yes|  0.888	|0.912|	1.853 |
 | Wav2Vec2.0 (XLSR)| Yes | 0.915|	0.945	|1.895|
 | UniSpeech-SAT large | Yes | 0.771	| 0.781|	1.669|
 | WavLM large | Yes | 0.59	| 0.65|	1.328|
+| WavLM large | No | 0.505	| 0.579|	1.176|
+|+Large Margin Finetune and Score Calibration|
 | HuBERT large | No| 0.585|	0.654	|1.342|   
 | Wav2Vec2.0 (XLSR) | No| 0.564|	0.605	|1.23|   
 | UniSpeech-SAT large | No | 0.564 | 0.561| 1.23 |
-| **WavLM large** | No | **0.399** | **0.474**| **1.003** |
-
+| **WavLM large (New)** | No | **0.33** | **0.477**| **0.984** |
 
 
 ### Speech Separation