Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LSTM for the multi-dataset setup. #558

Merged
merged 4 commits into from Sep 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions .flake8
Expand Up @@ -22,3 +22,11 @@ exclude =
**/data/**,
icefall/shared/make_kn_lm.py,
icefall/__init__.py

ignore =
# E203 white space before ":"
E203,
# W503 line break before binary operator
W503,
# E226 missing whitespace around arithmetic operator
E226,
1 change: 1 addition & 0 deletions egs/librispeech/ASR/README.md
Expand Up @@ -26,6 +26,7 @@ The following table lists the differences among them.
| `conv_emformer_transducer_stateless` | ConvEmformer | Embedding + Conv1d | Using ConvEmformer for streaming ASR + mechanisms in reworked model |
| `conv_emformer_transducer_stateless2` | ConvEmformer | Embedding + Conv1d | Using ConvEmformer with simplified memory for streaming ASR + mechanisms in reworked model |
| `lstm_transducer_stateless` | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model |
| `lstm_transducer_stateless2` | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model + gigaspeech (multi-dataset setup) |

The decoder in `transducer_stateless` is modified from the paper
[Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).
Expand Down
70 changes: 70 additions & 0 deletions egs/librispeech/ASR/RESULTS.md
@@ -1,5 +1,75 @@
## Results

#### LibriSpeech BPE training results (Pruned Stateless LSTM RNN-T + multi-dataset)

[lstm_transducer_stateless2](./lstm_transducer_stateless2)

See <https://github.com/k2-fsa/icefall/pull/558> for more details.


The WERs are:

| | test-clean | test-other | comment |
|-------------------------------------|------------|------------|-------------------------|
| greedy search (max sym per frame 1) | 2.78 | 7.36 | --iter 468000 --avg 16 |
| modified_beam_search | 2.73 | 7.15 | --iter 468000 --avg 16 |
| fast_beam_search | 2.76 | 7.31 | --iter 468000 --avg 16 |
| greedy search (max sym per frame 1) | 2.77 | 7.35 | --iter 472000 --avg 18 |
| modified_beam_search | 2.75 | 7.08 | --iter 472000 --avg 18 |
| fast_beam_search | 2.77 | 7.29 | --iter 472000 --avg 18 |

The training command is:

```bash
#!/usr/bin/env bash

export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"

./lstm_transducer_stateless2/train.py \
--world-size 8 \
--num-epochs 35 \
--start-epoch 1 \
--full-libri 1 \
--exp-dir lstm_transducer_stateless2/exp \
--max-duration 500 \
--use-fp16 0 \
--lr-epochs 10 \
--num-workers 2 \
--giga-prob 0.9
```
**Note**: It was killed manually after getting `epoch-18.pt`. Also, we resumed
training after getting `epoch-9.pt`.

The tensorboard log can be found at
<https://tensorboard.dev/experiment/1ziQ2LFmQY2mt4dlUr5dyA/>

The decoding command is
```bash
for m in greedy_search fast_beam_search modified_beam_search; do
for iter in 472000; do
for avg in 8 10 12 14 16 18; do
./lstm_transducer_stateless2/decode.py \
--iter $iter \
--avg $avg \
--exp-dir lstm_transducer_stateless2/exp \
--max-duration 600 \
--num-encoder-layers 12 \
--rnn-hidden-size 1024 \
--decoding-method $m \
--use-averaged-model True \
--beam 4 \
--max-contexts 4 \
--max-states 8 \
--beam-size 4
done
done
done
```

Pretrained models, training logs, decoding logs, and decoding results
are available at
<https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03>

#### LibriSpeech BPE training results (Pruned Stateless LSTM RNN-T)

[lstm_transducer_stateless](./lstm_transducer_stateless)
Expand Down
2 changes: 1 addition & 1 deletion egs/librispeech/ASR/lstm_transducer_stateless/lstm.py
Expand Up @@ -773,7 +773,7 @@ def _get_random_mixed_weights(
"""
logprobs = (
torch.randn(num_frames, self.num_inputs, dtype=dtype, device=device)
* self.stddev
* self.stddev # noqa
)
logprobs[:, -1] += self.final_log_weight
return logprobs.softmax(dim=1)
Expand Down
Empty file.