My implementation of Speech-Transformer model using PyTorch
- Torch 2.0
- Torchaudio
- Spectrogramms
- Transformer
@INPROCEEDINGS{8462506,
author={Dong, Linhao and Xu, Shuang and Xu, Bo},
booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition},
year={2018},
volume={},
number={},
pages={5884-5888},
keywords={Hidden Markov models;Encoding;Training;Decoding;Speech recognition;Time-frequency analysis;Spectrogram;Speech Recognition;Sequence-to-Sequence;Attention;Transformer},
doi={10.1109/ICASSP.2018.8462506}}
- Train on big corpus and evaluate metrics
- Implement 2d-attention proposed in the paper (it has not been implemented due to minor changes in metrics in the evaluation results in the paper)
- Make beamsearch instead of greedy-search