APAM - Adaptation of Pretrained Acoustic Models

APAM toolkit is built on PyTorch and provides recipes to adapt pretrained acoustic models with a variety of sequence discriminative training criterions.

Introduction

The library structure is inspired from the S3PRL library. In keeping up with the terminology in S3PRL, the pretrained models are referred to as upstream models. A separate downstream model is added on the top of upstream model to be used as acoustic model for ASR training.

High-Level Library Structure

The library provides various runners (trainers) that take care of training acoustic models.

The runner takes as input:

asr_config: which defines parameters related to experiment such as learning rate, optimizers, epochs etc. ckpt: path to pretrained model ckpt upconfig configuration related to pretrained upstream model. get_model function which create the upstream and downstream model using the above parameters

The idea is to re-use the various pretrained models such as TERA, wav2vec through decoupled upstream and downstream models. This is enabled by writing simple scripts to load these pretrained models. Examples for these can be found in pretrained folder in the source code.

Installation

Dependencies

Python 3 or above
Required packages and their use are listed below:

torch                        # deep neural networks
pytorch-fast-transformers    # fast clustered attention
pkwrap                       # lfmmi loss
librosa                      # audio file reading
yaml                         # config parser

We recommend installing the latest version of fast transformers using the following command:

pip install git+https://github.com/idiap/fast-transformers

To install Pkwrap follow the instructions here Pkwrap

Pretrained Models Supported

At the moment we support the following pretrained models

Masked Acoustic Model

We provide the pretrained model for trained with masked language modeling objective as described in "TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech".

The pretrained model is available here.

Current Recipes

At the moment, we only support flat-start lattice-free MMI training. The following recipes can be found in the examples folder. For more details on how to run, follow the steps in the README files in examples

Librispeech 100h

We provide recipes to train acoustic model using 100 hours of librispeech data and pretrained acoustic models based on

Masked Acoustic Model

Citation

If you found this library useful, please cite the relevant work(s) from below

@misc{vyas2020latticefree,
    title={Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models}, 
    author={Apoorv Vyas and Srikanth Madikeri and Hervé Bourlard},
    year={2020},
    eprint={2012.14252},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

References

Please note that this list is not exhaustive. We are only providing references to a few key works which this library uses. For a more exhaustive list please take a look at our published reports based on this library.

@inproceedings{paszke2019pytorch,
    title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
    author = {Paszke, Adam et. al.},
    booktitle = {Advances in Neural Information Processing Systems 32},
    year = {2019},
}

@article{hadian2018flat,
    author={Hossein Hadian and others},
    title={Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR},
    year={2018},
    journal={IEEE ACM Transactions on Audio, Speech, and Language Processing},
}

@misc{madikeri2020pkwrap,
    title={Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models}, 
    author={Srikanth Madikeri and Sibo Tong and Juan Zuluaga-Gomez and Apoorv Vyas and Petr Motlicek and Herv{\'e} Bourlard},
    year={2020},
    eprint={2010.03466},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

@inproceedings{vyas2020fast,
    author = {Vyas, Apoorv and Katharopoulos, Angelos and Fleuret, Fran\c{c}ois},
    title = {Fast Transformers with Clustered Attention},
    booktitle = {Proceedings of the international conference on Neural Information Processing Systems (NeurIPS)},
    year = {2020}
}

@misc{
    S3PRL,
    author = {Andy T. Liu and Yang Shu-wen},
    title = {S3PRL: The Self-Supervised Speech Pre-training and Representation Learning Toolkit},
    year = {2020},
    publisher = {GitHub},
    journal = {GitHub repository},
    url = {https://github.com/s3prl/s3prl}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples/librispeech		examples/librispeech
misc		misc
shutil		shutil
src/e2e		src/e2e
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cmd.sh		cmd.sh
config		config
path.sh		path.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APAM - Adaptation of Pretrained Acoustic Models

Table of Contents

Introduction

High-Level Library Structure

Installation

Dependencies

Pretrained Models Supported

Masked Acoustic Model

Current Recipes

Librispeech 100h

Citation

References

About

Releases

Packages

Languages

License

idiap/apam

Folders and files

Latest commit

History

Repository files navigation

APAM - Adaptation of Pretrained Acoustic Models

Table of Contents

Introduction

High-Level Library Structure

Installation

Dependencies

Pretrained Models Supported

Masked Acoustic Model

Current Recipes

Librispeech 100h

Citation

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages