k2-fsa · csukuangfj · Sep 16, 2022 · Aug 29, 2022 · Sep 3, 2022 · Sep 3, 2022
diff --git a/.flake8 b/.flake8
@@ -22,3 +22,11 @@ exclude =
   **/data/**,
   icefall/shared/make_kn_lm.py,
   icefall/__init__.py
+
+ignore =
+  # E203 white space before ":"
+  E203,
+  # W503 line break before binary operator
+  W503,
+  # E226 missing whitespace around arithmetic operator
+  E226,
diff --git a/egs/librispeech/ASR/README.md b/egs/librispeech/ASR/README.md
@@ -26,6 +26,7 @@ The following table lists the differences among them.
 | `conv_emformer_transducer_stateless`  | ConvEmformer | Embedding + Conv1d | Using ConvEmformer for streaming ASR + mechanisms in reworked model |
 | `conv_emformer_transducer_stateless2` | ConvEmformer | Embedding + Conv1d | Using ConvEmformer with simplified memory for streaming ASR + mechanisms in reworked model |
 | `lstm_transducer_stateless`           | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model |
+| `lstm_transducer_stateless2`           | LSTM | Embedding + Conv1d | Using LSTM with mechanisms in reworked model + gigaspeech (multi-dataset setup) |
 
 The decoder in `transducer_stateless` is modified from the paper
 [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).

diff --git a/egs/librispeech/ASR/RESULTS.md b/egs/librispeech/ASR/RESULTS.md
@@ -1,5 +1,75 @@
 ## Results
 
+#### LibriSpeech BPE training results (Pruned Stateless LSTM RNN-T + multi-dataset)
+
+[lstm_transducer_stateless2](./lstm_transducer_stateless2)
+
+See <https://github.com/k2-fsa/icefall/pull/558> for more details.
+
+
+The WERs are:
+
+|                                     | test-clean | test-other | comment                 |
+|-------------------------------------|------------|------------|-------------------------|
+| greedy search (max sym per frame 1) | 2.78       | 7.36       | --iter 468000 --avg 16  |
+| modified_beam_search                | 2.73       | 7.15       | --iter 468000 --avg 16  |
+| fast_beam_search                    | 2.76       | 7.31       | --iter 468000 --avg 16  |
+| greedy search (max sym per frame 1) | 2.77       | 7.35       | --iter 472000 --avg 18  |
+| modified_beam_search                | 2.75       | 7.08       | --iter 472000 --avg 18  |
+| fast_beam_search                    | 2.77       | 7.29       | --iter 472000 --avg 18  |
+
+The training command is:
+
+```bash
+#!/usr/bin/env bash
+
+export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
+
+./lstm_transducer_stateless2/train.py \
+  --world-size 8 \
+  --num-epochs 35 \
+  --start-epoch 1 \
+  --full-libri 1 \
+  --exp-dir lstm_transducer_stateless2/exp \
+  --max-duration 500 \
+  --use-fp16 0 \
+  --lr-epochs 10 \
+  --num-workers 2 \
+  --giga-prob 0.9
+```
+**Note**: It was killed manually after getting `epoch-18.pt`. Also, we resumed
+training after getting `epoch-9.pt`.
+
+The tensorboard log can be found at
+<https://tensorboard.dev/experiment/1ziQ2LFmQY2mt4dlUr5dyA/>
+
+The decoding command is
+```bash
+for m in greedy_search fast_beam_search modified_beam_search; do
+  for iter in 472000; do
+    for avg in 8 10 12 14 16 18; do
+      ./lstm_transducer_stateless2/decode.py \
+        --iter $iter \
+        --avg $avg \
+        --exp-dir lstm_transducer_stateless2/exp \
+        --max-duration 600 \
+        --num-encoder-layers 12 \
+        --rnn-hidden-size 1024 \
+        --decoding-method $m \
+        --use-averaged-model True \
+        --beam 4 \
+        --max-contexts 4 \
+        --max-states 8 \
+        --beam-size 4
+    done
+  done
+done
+```
+
+Pretrained models, training logs, decoding logs, and decoding results
+are available at
+<https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03>
+
 #### LibriSpeech BPE training results (Pruned Stateless LSTM RNN-T)
 
 [lstm_transducer_stateless](./lstm_transducer_stateless)

diff --git a/egs/librispeech/ASR/lstm_transducer_stateless/lstm.py b/egs/librispeech/ASR/lstm_transducer_stateless/lstm.py
@@ -773,7 +773,7 @@ def _get_random_mixed_weights(
         """
         logprobs = (
             torch.randn(num_frames, self.num_inputs, dtype=dtype, device=device)
-            * self.stddev
+            * self.stddev  # noqa
         )
         logprobs[:, -1] += self.final_log_weight
         return logprobs.softmax(dim=1)

diff --git a/egs/librispeech/ASR/lstm_transducer_stateless2/__init__.py b/egs/librispeech/ASR/lstm_transducer_stateless2/__init__.py
diff --git a/egs/librispeech/ASR/lstm_transducer_stateless2/asr_datamodule.py b/egs/librispeech/ASR/lstm_transducer_stateless2/asr_datamodule.py
@@ -0,0 +1 @@
+../pruned_transducer_stateless3/asr_datamodule.py
diff --git a/egs/librispeech/ASR/lstm_transducer_stateless2/beam_search.py b/egs/librispeech/ASR/lstm_transducer_stateless2/beam_search.py
@@ -0,0 +1 @@
+../pruned_transducer_stateless2/beam_search.py