Skip to content
This repository has been archived by the owner on May 2, 2023. It is now read-only.

paguseva/asr-homework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASR project

Installation guide

  1. Install Python packages
pip install -r ./requirements.txt
  1. Download checkpoint and default config
mkdir default_test_model
gdown --id 1FMoIxP_rQA4gXQ9395FZupQ4juKOu7LZ -O default_test_model/checkpoint.pth  # checkpoint
gdown --id 1-VJb5kP2Pa7IL59WkQ38pLJkQLrxn6Bx -O default_test_model/config.json  # default config
  1. Necessary resources are downloadable. If a class requires some external material, it downloads it. The list of such classes:
  • BackgroundNoise from hw_asr/augmentations/wave_augmentations downloads noise for augmentations in line 22
  • CTCBPETextEncoder from hw_asr/text_encoder downloads pretrained BPE model in line 19
  • CTCCharTextEncoder from hw_asr/text_encoder downloads pretrained KenLM and a vocab for shallow fusion in line 42 and line 68

About

This repository includes implementations of:

  1. LSTM
  2. QuartzNet [1]
  3. Deep Speech 2 [2]

There are also implementations of multiple decoding strategies and vocabularies. Language model is KenLM.

References

  1. Samuel Kriman et al. QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions. 2019
  2. Dario Amodei et al. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. 2015

Releases

No releases published

Packages

No packages published

Languages