Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating Pruned Fast RNNT with Transducer + new recipe for mTEDx dataset #1465

Open
wants to merge 36 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
608e1de
added script to prepare mTEDx dataset
Jun 22, 2022
26dddf5
added an asr transducer training file for mTEDx recipe
Jun 22, 2022
1a6fbbe
added jointer network to be used with the pruned loss of Fast RNNT
Jun 22, 2022
e9e6241
added pruned-loss to the losses script
Jun 22, 2022
af2c3b3
created simple beam searcher for the pruned loss; just the same as Tr…
Jun 22, 2022
9e6e1ee
added a recipe for creating a tokenizer on mTEDx dataset
Jun 22, 2022
a2073fa
added a recipe for creating an RNN language model on mTEDx-French dat…
Jun 22, 2022
22ec024
added a recipe for creating an RNN language model on mTEDx-French dat…
Jun 22, 2022
b74a424
added yaml file for training ASR transducer on mTEDx
Jun 22, 2022
61ccae8
added yaml file for training ASR transducer on mTEDx
Jun 22, 2022
0ca6e7a
Merge remote-tracking branch 'upstream/develop' into 'pruned_fast_rnnt'
Jul 29, 2022
78c9008
added README file for mTEDx recipe
Anwarvic Aug 2, 2022
f9b9e03
Merge branch 'speechbrain:develop' into pruned_fast_rnnt
Anwarvic Aug 12, 2022
900c261
Merge branch 'speechbrain:develop' into pruned_fast_rnnt
Anwarvic Sep 16, 2022
4e38371
Merge branch 'speechbrain:develop' into pruned_fast_rnnt
Anwarvic Sep 18, 2022
2d60e5e
updated Transducer recipes + added README
Anwarvic Sep 19, 2022
bde66d5
updated Transducer recipes + added README
Anwarvic Sep 19, 2022
112b688
added CTC recipes
Anwarvic Sep 19, 2022
c5cbe1f
updated files with latest updates
Anwarvic Sep 19, 2022
2974d3a
Merge branch 'pruned_fast_rnnt' of https://github.com/Anwarvic/speech…
Anwarvic Sep 19, 2022
eb37ab2
updated scripts with latest updates
Anwarvic Sep 19, 2022
cddef0a
fixed pre-commit erorrs
Anwarvic Sep 19, 2022
6b2e8f5
fixed pre-commit erorrs
Anwarvic Sep 19, 2022
0ea78d7
added recipes yaml files to tests/recipes.csv
Anwarvic Sep 19, 2022
3960a4e
fixed the un-used dnn_neurons variable in train_wav2vec.yaml file
Anwarvic Sep 19, 2022
f76f0a1
pre-commit passed successfully
Anwarvic Sep 19, 2022
9f36769
updated transducer configs in the other dataset recipes to match the …
Anwarvic Sep 19, 2022
405bcee
updated transducer configs in the other dataset recipes to match the …
Anwarvic Sep 19, 2022
82d450b
added needed README files for mTEDx recipes
Anwarvic Sep 25, 2022
1bf6988
changed use_torchaudio flag in Transducer recipes README all across d…
Anwarvic Sep 25, 2022
b71c2d7
fixed wrong pths in tests/recipes.csv
Anwarvic Sep 25, 2022
9fd2564
added CTC models to CTC README of mTEDx recipe
Anwarvic Sep 26, 2022
dd41227
fixed merged issues in tests/recipes.csv
Anwarvic Sep 26, 2022
209a60a
minor changes in README file
Anwarvic Sep 26, 2022
f22f144
removed unused variables in conf files in mTEDx recipes
Anwarvic Sep 26, 2022
b2aedf5
fixed the naming issue for transducer recipe
Anwarvic Sep 26, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion recipes/CommonVoice/ASR/transducer/hparams/train_fr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ log_softmax: !new:speechbrain.nnet.activations.Softmax
apply_log: True

transducer_cost: !name:speechbrain.nnet.losses.transducer_loss
use_torchaudio: True
framework: torchaudio
Anwarvic marked this conversation as resolved.
Show resolved Hide resolved
blank_index: !ref <blank_index>

# for MTL
Expand Down
2 changes: 1 addition & 1 deletion recipes/LibriSpeech/ASR/transducer/hparams/train.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ log_softmax: !new:speechbrain.nnet.activations.Softmax

transducer_cost: !name:speechbrain.nnet.losses.transducer_loss
blank_index: !ref <blank_index>
use_torchaudio: True
framework: torchaudio

# This is the RNNLM that is used according to the Huggingface repository
# NB: It has to match the pre-trained RNNLM!!
Expand Down
4 changes: 2 additions & 2 deletions recipes/TIMIT/ASR/transducer/hparams/train.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ output: !new:speechbrain.nnet.linear.Linear
# apply_log: True

compute_cost: !name:speechbrain.nnet.losses.transducer_loss
use_torchaudio: True
framework: torchaudio
blank_index: !ref <blank_index>

model: !new:torch.nn.ModuleList [[
Expand Down Expand Up @@ -216,7 +216,7 @@ train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger

transducer_stats: !name:speechbrain.utils.metric_stats.MetricStats
metric: !name:speechbrain.nnet.losses.transducer_loss
use_torchaudio: True
framework: torchaudio
blank_index: !ref <blank_index>
reduction: none

Expand Down
4 changes: 2 additions & 2 deletions recipes/TIMIT/ASR/transducer/hparams/train_wav2vec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ output: !new:speechbrain.nnet.linear.Linear
# apply_log: True

compute_cost: !name:speechbrain.nnet.losses.transducer_loss
use_torchaudio: True
framework: torchaudio
blank_index: !ref <blank_index>

model: !new:torch.nn.ModuleList [[
Expand Down Expand Up @@ -205,7 +205,7 @@ train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger

transducer_stats: !name:speechbrain.utils.metric_stats.MetricStats
metric: !name:speechbrain.nnet.losses.transducer_loss
use_torchaudio: True
framework: torchaudio
blank_index: !ref <blank_index>
reduction: none

Expand Down
47 changes: 47 additions & 0 deletions recipes/mTEDx/ASR/CTC/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# mTEDx ASR with CTC models.
This folder contains the scripts to train a wav2vec based system using mTEDx. You can train either a single-language wav2vec model or multilingual
wav2vec model. Before running this recipe, make sure to read this [README](../../README.md) file first.

**Note:**\
Wav2vec model used in this recipe is pre-trained on the French language.
In order to use another language, don't forget to change the `wav2vec2_hub`
in the `train_wav2vec.yaml` YAML file.


# How to run

To train a single-language wav2vec model, run:
```bash
$ python train.py hparams/train_wav2vec.yaml
```

To train a multilingual wav2vec model, run:
```bash
$ python train.py hparams/train_xlsr.yaml
```

# Results

TODO
Anwarvic marked this conversation as resolved.
Show resolved Hide resolved


# **About SpeechBrain**
- Website: https://speechbrain.github.io/
- Code: https://github.com/speechbrain/speechbrain/
- HuggingFace: https://huggingface.co/speechbrain/


# **Citing SpeechBrain**
Please, cite SpeechBrain if you use it for your research or business.

```bibtex
@misc{speechbrain,
title={{SpeechBrain}: A General-Purpose Speech Toolkit},
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
year={2021},
eprint={2106.04624},
archivePrefix={arXiv},
primaryClass={eess.AS},
note={arXiv:2106.04624}
}
```
175 changes: 175 additions & 0 deletions recipes/mTEDx/ASR/CTC/hparams/train_wav2vec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# ################################
# Model: wav2vec2 + DNN + CTC
# Authors: Titouan Parcollet 2021
# Mohamed Anwar 2022
# ################################

# Seed needs to be set at top of yaml, before objects with parameters are made
seed: 1234
__set_seed: !!python/object/apply:torch.manual_seed [!ref <seed>]
output_folder: !PLACEHOLDER
wer_file: !ref <output_folder>/wer.txt
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

# URL for the biggest LeBenchmark wav2vec french.
wav2vec2_hub: LeBenchmark/wav2vec2-FR-7K-large

# Data files
data_folder: !PLACEHOLDER
langs:
- fr
remove_punc_cap: True # remove punctuation & capitalization from text

train_json: !ref <data_folder>/train_fr.json
valid_json: !ref <data_folder>/valid_fr.json
test_json: !ref <data_folder>/test_fr.json

# We remove train utterances longer than 10s
avoid_if_longer_than: 20.0
test_lang: fr

# Training parameters
number_of_epochs: 30
number_of_ctc_epochs: 15
lr: 1.0
lr_wav2vec: 0.0001
ctc_weight: 0.3
sorting: descending # ascending, descending
auto_mix_prec: True
sample_rate: 16000
max_grad_norm: -1

batch_size: 4
batch_size_valid: 2 #for valid & test

train_dataloader_opts:
batch_size: !ref <batch_size>
num_workers: 4

valid_dataloader_opts:
batch_size: !ref <batch_size_valid>
num_workers: 4

test_dataloader_opts:
batch_size: !ref <batch_size_valid>
num_workers: 4

# BPE parameters
token_type: char # ["unigram", "bpe", "char"]
character_coverage: 1.0

# Model parameters
activation: !name:torch.nn.LeakyReLU
wav2vec_output_dim: 1024
dnn_neurons: 1024
freeze_wav2vec: False

# Outputs
output_neurons: 161 # BPE size, index(blank/eos/bos) = 0

# Decoding parameters
# Be sure that the bos and eos index match with the BPEs ones
blank_index: 0
bos_index: 1
eos_index: 2
min_decode_ratio: 0.0
max_decode_ratio: 1.0
beam_size: 80
eos_threshold: 1.5
using_max_attn_shift: True
max_attn_shift: 140
ctc_weight_decode: 0.0
temperature: 1.50

#
# Functions and classes
#
epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
limit: !ref <number_of_epochs>

enc: !new:speechbrain.nnet.containers.Sequential
input_shape: [null, null, !ref <wav2vec_output_dim>]
linear1: !name:speechbrain.nnet.linear.Linear
n_neurons: <dnn_neurons>
bias: True
bn1: !name:speechbrain.nnet.normalization.BatchNorm1d
activation: !new:torch.nn.LeakyReLU
drop: !new:torch.nn.Dropout
p: 0.15
linear2: !name:speechbrain.nnet.linear.Linear
n_neurons: <dnn_neurons>
bias: True
bn2: !name:speechbrain.nnet.normalization.BatchNorm1d
activation2: !new:torch.nn.LeakyReLU
drop2: !new:torch.nn.Dropout
p: 0.15
linear3: !name:speechbrain.nnet.linear.Linear
n_neurons: <dnn_neurons>
bias: True
bn3: !name:speechbrain.nnet.normalization.BatchNorm1d
activation3: !new:torch.nn.LeakyReLU

wav2vec2: !new:speechbrain.lobes.models.huggingface_wav2vec.HuggingFaceWav2Vec2
source: !ref <wav2vec2_hub>
output_norm: True
freeze: !ref <freeze_wav2vec>
save_path: !ref <save_folder>/wav2vec2_checkpoint

ctc_lin: !new:speechbrain.nnet.linear.Linear
input_size: !ref <dnn_neurons>
n_neurons: !ref <output_neurons>

log_softmax: !new:speechbrain.nnet.activations.Softmax
apply_log: True

ctc_cost: !name:speechbrain.nnet.losses.ctc_loss
blank_index: !ref <blank_index>

modules:
wav2vec2: !ref <wav2vec2>
enc: !ref <enc>
ctc_lin: !ref <ctc_lin>

model: !new:torch.nn.ModuleList
- [!ref <enc>, !ref <ctc_lin>]

model_opt_class: !name:torch.optim.Adadelta
lr: !ref <lr>
rho: 0.95
eps: 1.e-8

wav2vec_opt_class: !name:torch.optim.Adam
lr: !ref <lr_wav2vec>

lr_annealing_model: !new:speechbrain.nnet.schedulers.NewBobScheduler
initial_value: !ref <lr>
improvement_threshold: 0.0025
annealing_factor: 0.8
patient: 0

lr_annealing_wav2vec: !new:speechbrain.nnet.schedulers.NewBobScheduler
initial_value: !ref <lr_wav2vec>
improvement_threshold: 0.0025
annealing_factor: 0.9
patient: 0

checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
checkpoints_dir: !ref <save_folder>
recoverables:
wav2vec2: !ref <wav2vec2>
model: !ref <model>
scheduler_model: !ref <lr_annealing_model>
scheduler_wav2vec: !ref <lr_annealing_wav2vec>
counter: !ref <epoch_counter>

train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
save_file: !ref <train_log>

tensorboard_logger: !new:speechbrain.utils.train_logger.TensorboardLogger
save_dir: !ref <output_folder>/logs

error_rate_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats

cer_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats
split_tokens: True
Loading