diff --git a/README.md b/README.md index 99eabe4..1191a78 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ WER are we? An attempt at tracking states of the art(s) and recent results on sp | 1.9% | 3.9% | [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/pdf/2005.08100.pdf) | May 2020 | Convolution-augmented-Transformer(Conformer) + 3-layer LSTM LM (data augmentation:SpecAugment) | | 1.9% | 4.1% | [ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context](https://arxiv.org/pdf/2005.03191.pdf) | May 2020 | CNN-RNN-Transducer(ContextNet) + 3-layer LSTM LM (data augmentation:SpecAugment) | | 2.0% | 4.1% | [End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures](https://arxiv.org/abs/1911.08460) | November 2019 | Conv+Transformer AM (10k word pieces) with ConvLM decoding and Transformer rescoring + 60k hours unlabeled | +| 2.1% | 4.1% | [Efficient Training of Neural Transducer for Speech Recognition](https://arxiv.org/abs/2204.10586) | May 2022 | Conformer Transducer (efficient 3-stage progressive training: 35 epochs in total) + Transformer LM | | 2.3% | 4.9% | [Transformer-based Acoustic Modeling for Hybrid Speech Recognition](https://arxiv.org/abs/1910.09799) | October 2019 | Transformer AM (chenones) + 4-gram LM + Neural LM rescore (data augmentation:Speed perturbation and SpecAugment) | | 2.3% | 5.0% | [RWTH ASR Systems for LibriSpeech: Hybrid vs Attention](https://arxiv.org/abs/1905.03072) | September 2019, Interspeech | HMM-DNN + lattice-based sMBR + LSTM LM + Transformer LM rescoring (no data augmentation) | | 2.3% | 5.2% | [End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures](https://arxiv.org/abs/1911.08460) | November 2019 | Conv+Transformer AM (10k word pieces) with ConvLM decoding and Transformer rescoring |