Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Lhotse/K2 example #45

Open
wants to merge 119 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
5d1001d
initial commit
freewym Dec 2, 2018
dbb20ad
asr models related
freewym Dec 15, 2018
d6033af
decoding related
freewym Dec 17, 2018
eadf2a8
code adaptation/changes according to the commits from Dec 24, 2018 to…
freewym Jan 13, 2019
6148a8a
wsj recipe and other fixes
freewym Jan 14, 2019
ee70584
code adaptation/changes according to the commits from Jan 24, 2019 to…
freewym Jan 26, 2019
93c1de4
fix
freewym Jan 27, 2019
e864565
validation on wer
freewym Feb 7, 2019
8782e5d
environment configurations
freewym Feb 8, 2019
9363a71
code adaptation/changes according to the commits from Jan 29, 2019 to…
freewym Feb 8, 2019
0802783
Add wsj data prep recipe from kaldi and espnet
freewym Feb 9, 2019
d276c82
code adaptation/changes according to the commits from Feb 12, 2019 to…
freewym Feb 22, 2019
c1a54da
code adaptation/changes according to the commits from Feb 23, 2019 to…
freewym Mar 3, 2019
47dc107
add lr scheduler that allows to set epoch from which lr starts to decay
freewym Mar 8, 2019
eb1d55f
refactor speech_tools.utils.Tokenizer
freewym Mar 9, 2019
f0930b1
lm training
freewym Mar 9, 2019
0e2926b
code adaptation/changes according to the commits from Apr 1, 2019 to …
freewym Apr 4, 2019
9e479ca
word lm related; add unigram/temporal label smoothing and update the wsj
freewym Apr 15, 2019
cd8803d
add scheduled sampling training support
freewym May 26, 2019
82a1175
add bpe support & librispeech recipe
freewym May 26, 2019
3ea6bb9
pure batch decoding with LookAhead Word LM
freewym May 27, 2019
36baed5
librispeech recipe fix; code adaptation/changes according to the comm…
freewym May 31, 2019
f4bcc7f
add Transformer and FConv model for ASR
freewym Jun 16, 2019
9eb1020
LM arch changes
freewym Jun 20, 2019
9745aa7
swbd
Jun 21, 2019
7ae0222
add coverage to libripspeech recipe; make best metric for ASR configu…
freewym Jun 26, 2019
6ab3bd5
fix swbd recipe; code adaptation/changes according to the commits fro…
freewym Jul 4, 2019
236c927
revise speech_{transformer/fconv} code
freewym Jul 21, 2019
3b38acd
improve swbd recipe; comtinue training while lr is no less than --min…
freewym Jul 26, 2019
833ad03
Relicense Espresso under MIT license
freewym Aug 2, 2019
2777ac7
add --eos-factor for beam search to alleviate the problem of too shor…
freewym Aug 2, 2019
5cf1c41
tokenize each sentence such that it ends with <space>; modify look-ah…
freewym Aug 7, 2019
b2d3ef2
switch to pip install sentencepiece; modify Librispeech/SWBD recipes …
freewym Aug 20, 2019
1c19fe5
update tensorized tree implementation
ctongfei Sep 8, 2019
f21c428
switch to pip install kaldi_io
freewym Sep 11, 2019
bac7415
Update README.md; add logo; slightly change LM weight and beam size for
ctongfei Sep 16, 2019
91d9fc7
compansate for the removal of torch.rand() from distributed_init() re…
freewym Sep 20, 2019
6a8f4d3
add backoff smoothing for unigram label smoothing
freewym Sep 25, 2019
32825b5
a better LM for Librispeech yielding better WERs; code
freewym Sep 28, 2019
c63caa8
code adaptation/changes according to the commits on Sep 30, 2019
freewym Sep 30, 2019
0e610d5
set --distributed-port=-1 if ngpus=1; code adaptation/changes accordi…
freewym Oct 13, 2019
67fdff1
change warmup scheduling for ReduceLROnPlateauV2; code adaptation/cha…
freewym Oct 19, 2019
aab85e5
remove warmup code in ReduceLROnPlateauV2 as fariseq just added it; c…
freewym Oct 23, 2019
3018d20
add gpu.conf for SGE qsub
freewym Oct 30, 2019
a1b76df
Fixed error when using fp16
Shujian2015 Nov 2, 2019
75581ae
code adaptation/changes according to the commits on Nov 8, 2019
freewym Nov 9, 2019
0e91c3b
code adaptation/changes according to the commits on Nov 26, 2019
freewym Nov 27, 2019
7f3a0bc
code adaptation/changes according to the commits on Dec 11, 2019
freewym Dec 11, 2019
5744fb9
allows text2vocabulary.py to accept an existing vocabulry with its fi…
freewym Dec 15, 2019
4617036
re-organize the codebase to isolate espresso from fairseq
freewym Dec 16, 2019
9dd8992
remove coverage term for beam search decoding as it has been superced…
freewym Dec 17, 2019
69dbd15
fix bugs causing build failure; a bunch of lint changes; rename
freewym Dec 18, 2019
732ab08
scheduled sampling rate scheduler
ctongfei Dec 20, 2019
ad0157e
decouple scheduled sampling rate scheduler; rename all appearances of…
freewym Dec 21, 2019
13e2d62
code adaptation/changes according to the commits from Dec 21 to Dec 2…
freewym Dec 26, 2019
390d157
move the code of computing prob mask of temporal label smoothing into…
freewym Dec 26, 2019
3c5b3c9
isolate greedy search code from criterions (#19)
freewym Dec 26, 2019
54870cb
remove the need to pass tgt dataset to criterions by adding a raw tex…
freewym Dec 27, 2019
3677002
code adaptation/changes according to the commits from Jan 3 to Jan 6,…
freewym Jan 4, 2020
f1bed6f
code adaptation/changes according to the commits from Jan 11 to Jan 1…
freewym Jan 11, 2020
458828d
code adaptation/changes according to the commits from Jan 16 to Jan 1…
freewym Jan 17, 2020
293a068
move WER validation with greedy decoding code from criterion to task;…
freewym Jan 19, 2020
80f564d
code adaptation/changes according to the commits on Jan 20, 2020; cos…
freewym Jan 21, 2020
d488b92
isolate LSTMLanguageModel from speech_lstm.py and rename it to LSTMLa…
freewym Jan 25, 2020
ad102bd
code adaptation/changes according to the commits on Jan 30, 2020
freewym Jan 31, 2020
19ec973
code adaptation/changes according to the commits on Feb 12, 2020
freewym Feb 12, 2020
9c69029
add options to accept utt2num_frames files to speed up the data loadi…
freewym Feb 14, 2020
698305f
use json files to simplify the cli options for input data (#23)
freewym Feb 15, 2020
ea7732b
code adaptation/changes according to the commits on Feb 21, 2020
freewym Feb 22, 2020
7170edc
move duplicated network parsers to espresso/speech_tools/utils.py; re…
freewym Feb 25, 2020
95fcd90
code adaptation/changes according to the commits on Feb 27-29, 2020
freewym Feb 27, 2020
41c5725
code adaptation/changes according to the commits on Mar 3-10, 2020
freewym Mar 5, 2020
4b8d3be
SpecAugment (#21)
freewym Mar 11, 2020
11e106f
code adaptation/changes according to the commits on Mar 11, 2020; change
freewym Mar 11, 2020
c05a336
fix specaug indexing
freewym Mar 20, 2020
8772f69
code adaptation/changes according to the commits on Mar 24-Apr 3, 202…
freewym Mar 24, 2020
e40b033
code adaptation/changes according to the commits on Apr 7
freewym Apr 7, 2020
b9c19c2
update the qsub script for gpu jobs; code adaptation/changes accordin…
freewym Apr 14, 2020
b35d210
use EncoderOut for SpeechLSTMEncoder's output; code adaptation/change…
freewym Apr 23, 2020
cfa899c
Hybrid ASR code (E2E LF-MMI and cross-entropy) and WSJ examples (#29)
freewym May 2, 2020
be4fd6a
code adaptation/changes according to the commits on May 10
freewym May 10, 2020
967dd34
code adaptation/changes according to the commits on May 18
freewym May 19, 2020
af5a564
fix lf-mmi loss; code adaptation/changes according to the commits on …
freewym May 27, 2020
9d3b8b3
remove useless max_{source,target}_positions arguments
freewym Jun 17, 2020
c8c1dfb
code adaptation/changes according to the commits on Jun 18-23, 2020
freewym Jun 19, 2020
027595b
code adaptation/changes according to the commits on Jun 24-25, 2020; fix
freewym Jun 25, 2020
e42ec43
Update Transformer models (#31)
freewym Jul 2, 2020
8600f76
ignore flake8's FileNotFoundError for soft links to kaldi files; code…
freewym Jul 8, 2020
bc6901b
code adaptation/changes according to the commits on Jul 14, 2020
freewym Jul 15, 2020
7c21d3d
code adaptation/changes according to the commits on Jul 16, 2020
freewym Jul 17, 2020
4b253b1
code adaptation/changes according to the commits on Jul 20-25, 2020
freewym Jul 21, 2020
d2f264b
fix reorder_encoder_out in SpeechChunkTransformerEncoder; code adapta…
freewym Jul 28, 2020
a92409c
reorder the elements of the returned tuple of TdnnModel.forward();
freewym Jul 30, 2020
c73c4de
updates for new PyChain (#37)
freewym Aug 10, 2020
afaef92
code adaptation/changes according to the commits on Aug 10-18, 2020
freewym Aug 10, 2020
5351152
code adaptation/changes according to the commits on Aug 20-24, 2020
freewym Aug 21, 2020
9d71078
code adaptation/changes according to the commits on Aug 31, 2020
freewym Aug 31, 2020
e34b27d
code adaptation/changes according to the commits on Sep 9-11, 2020
freewym Sep 9, 2020
3fe02ae
code adaptation/changes according to the commits on Sep 17-26, 2020
freewym Sep 18, 2020
8c02b45
code adaptation/changes according to the commits on Oct 1, 2020
freewym Oct 1, 2020
707ce3a
code adaptation/changes according to the commits on Oct 2-15, 2020
freewym Oct 11, 2020
262c0a2
code adaptation/changes according to the commits on Oct 18-Nov 3, 202…
freewym Oct 26, 2020
513b171
code adaptation/changes according to the commits on Nov 4-9, 2020
freewym Nov 5, 2020
6c6ee41
code adaptation/changes according to the commits on Nov 11, 2020; obtain
freewym Nov 11, 2020
2b68caf
fix an error when more than one external LMs are used for shallow fusion
freewym Nov 14, 2020
72fa596
code adaptation/changes according to the commits on Nov 16-20, 2020; …
freewym Nov 18, 2020
6b4e571
fix length tensor device issue in lf_mmi loss; code adaptation/change…
freewym Dec 4, 2020
1f812eb
code adaptation/changes according to the commits on Dec 22, 2020
freewym Dec 23, 2020
b7b8937
Lhotse/K2 support
freewym Nov 4, 2020
8926fa3
add a data prep example for lhotse
freewym Nov 5, 2020
083ea69
add random split of negatives
freewym Nov 7, 2020
ecf8423
misc fixes
freewym Nov 8, 2020
3713dc1
k2 training related (not yet done)
freewym Nov 13, 2020
4fb0d87
fixes
freewym Nov 14, 2020
dc3874a
f
freewym Nov 16, 2020
63ceaf2
decoding related
freewym Nov 27, 2020
66d84af
some changes
freewym Dec 26, 2020
8a345cb
fix negative loss
freewym Dec 27, 2020
1b05966
refactor code
freewym Dec 28, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Expand Up @@ -45,7 +45,7 @@ jobs:
run: |
pip install flake8
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --extend-exclude fairseq/model_parallel/megatron
flake8 . --count --select=E9,F63,F7,F82 --ignore=E902 --show-source --statistics --extend-exclude fairseq/model_parallel/megatron
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --extend-exclude fairseq/model_parallel/megatron

Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Expand Up @@ -134,3 +134,6 @@ experimental/*

# Weights and Biases logs
wandb/

# emacs saves
*~
4 changes: 3 additions & 1 deletion LICENSE
@@ -1,6 +1,8 @@
MIT License

Copyright (c) Facebook, Inc. and its affiliates.
Copyright for the original fairseq code are held by Facebook, Inc. and its
affiliates as part of project Espresso. All other copyright for project Espresso
are held by Espresso authors.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
219 changes: 40 additions & 179 deletions README.md

Large diffs are not rendered by default.

216 changes: 216 additions & 0 deletions README_fairseq.md
@@ -0,0 +1,216 @@
<p align="center">
<img src="docs/fairseq_logo.png" width="150">
<br />
<br />
<a href="https://github.com/pytorch/fairseq/blob/master/LICENSE"><img alt="MIT License" src="https://img.shields.io/badge/license-MIT-blue.svg" /></a>
<a href="https://github.com/pytorch/fairseq/releases"><img alt="Latest Release" src="https://img.shields.io/github/release/pytorch/fairseq.svg" /></a>
<a href="https://github.com/pytorch/fairseq/actions?query=workflow:build"><img alt="Build Status" src="https://github.com/pytorch/fairseq/workflows/build/badge.svg" /></a>
<a href="https://fairseq.readthedocs.io/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/fairseq/badge/?version=latest" /></a>
</p>

--------------------------------------------------------------------------------

Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for translation, summarization, language
modeling and other text generation tasks.

We provide reference implementations of various sequence modeling papers:

<details><summary>List of implemented papers</summary><p>

* **Convolutional Neural Networks (CNN)**
+ [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md)
+ [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)
+ [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
+ [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)
+ [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
* **LightConv and DynamicConv models**
+ [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)
* **Long Short-Term Memory (LSTM) networks**
+ Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
* **Transformer (self-attention) networks**
+ Attention Is All You Need (Vaswani et al., 2017)
+ [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)
+ [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
+ [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)](examples/language_model/README.adaptive_inputs.md)
+ [Lexically constrained decoding with dynamic beam allocation (Post & Vilar, 2018)](examples/constrained_decoding/README.md)
+ [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (Dai et al., 2019)](examples/truncated_bptt/README.md)
+ [Adaptive Attention Span in Transformers (Sukhbaatar et al., 2019)](examples/adaptive_span/README.md)
+ [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
+ [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
+ [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
+ [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )
+ [Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)](examples/mbart/README.md)
+ [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)](examples/byte_level_bpe/README.md)
+ [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples/unsupervised_quality_estimation/README.md)
+ [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](examples/wav2vec/README.md)
+ [Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)](examples/pointer_generator/README.md)
+ [Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)](examples/linformer/README.md)
+ [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)
+ [Deep Transformers with Latent Depth (Li et al., 2020)](examples/latent_depth/README.md)
* **Non-autoregressive Transformers**
+ Non-Autoregressive Neural Machine Translation (Gu et al., 2017)
+ Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)
+ Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)
+ Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)
+ [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)
* **Finetuning**
+ [Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)](examples/rxf/README.md)

</p></details>

### What's New:

* December 2020: [GottBERT model and code released](examples/gottbert/README.md)
* November 2020: Adopted the [Hydra](https://github.com/facebookresearch/hydra) configuration framework
* [see documentation explaining how to use it for new and existing projects](docs/hydra_integration.md)
* November 2020: [fairseq 0.10.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.10.0)
* October 2020: [Added R3F/R4F (Better Fine-Tuning) code](examples/rxf/README.md)
* October 2020: [Deep Transformer with Latent Depth code released](examples/latent_depth/README.md)
* October 2020: [Added CRISS models and code](examples/criss/README.md)
* September 2020: [Added Linformer code](examples/linformer/README.md)
* September 2020: [Added pointer-generator networks](examples/pointer_generator/README.md)
* August 2020: [Added lexically constrained decoding](examples/constrained_decoding/README.md)
* August 2020: [wav2vec2 models and code released](examples/wav2vec/README.md)
* July 2020: [Unsupervised Quality Estimation code released](examples/unsupervised_quality_estimation/README.md)

<details><summary>Previous updates</summary><p>

* May 2020: [Follow fairseq on Twitter](https://twitter.com/fairseq)
* April 2020: [Monotonic Multihead Attention code released](examples/simultaneous_translation/README.md)
* April 2020: [Quant-Noise code released](examples/quant_noise/README.md)
* April 2020: [Initial model parallel support and 11B parameters unidirectional LM released](examples/megatron_11b/README.md)
* March 2020: [Byte-level BPE code released](examples/byte_level_bpe/README.md)
* February 2020: [mBART model and code released](examples/mbart/README.md)
* February 2020: [Added tutorial for back-translation](https://github.com/pytorch/fairseq/tree/master/examples/backtranslation#training-your-own-model-wmt18-english-german)
* December 2019: [fairseq 0.9.0 released](https://github.com/pytorch/fairseq/releases/tag/v0.9.0)
* November 2019: [VizSeq released (a visual analysis toolkit for evaluating fairseq models)](https://facebookresearch.github.io/vizseq/docs/getting_started/fairseq_example)
* November 2019: [CamemBERT model and code released](examples/camembert/README.md)
* November 2019: [BART model and code released](examples/bart/README.md)
* November 2019: [XLM-R models and code released](examples/xlmr/README.md)
* September 2019: [Nonautoregressive translation code released](examples/nonautoregressive_translation/README.md)
* August 2019: [WMT'19 models released](examples/wmt19/README.md)
* July 2019: fairseq relicensed under MIT license
* July 2019: [RoBERTa models and code released](examples/roberta/README.md)
* June 2019: [wav2vec models and code released](examples/wav2vec/README.md)

</p></details>

### Features:

* multi-GPU training on one machine or across multiple machines (data and model parallel)
* fast generation on both CPU and GPU with multiple search algorithms implemented:
+ beam search
+ Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
+ sampling (unconstrained, top-k and top-p/nucleus)
+ [lexically constrained decoding](examples/constrained_decoding/README.md) (Post & Vilar, 2018)
* [gradient accumulation](https://fairseq.readthedocs.io/en/latest/getting_started.html#large-mini-batch-training-with-delayed-updates) enables training with large mini-batches even on a single GPU
* [mixed precision training](https://fairseq.readthedocs.io/en/latest/getting_started.html#training-with-half-precision-floating-point-fp16) (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))
* [extensible](https://fairseq.readthedocs.io/en/latest/overview.html): easily register new models, criterions, tasks, optimizers and learning rate schedulers
* [flexible configuration](docs/hydra_integration.md) based on [Hydra](https://github.com/facebookresearch/hydra) allowing a combination of code, command-line and file based configuration

We also provide [pre-trained models for translation and language modeling](#pre-trained-models-and-examples)
with a convenient `torch.hub` interface:

``` python
en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')
en2de.translate('Hello world', beam=5)
# 'Hallo Welt'
```

See the PyTorch Hub tutorials for [translation](https://pytorch.org/hub/pytorch_fairseq_translation/)
and [RoBERTa](https://pytorch.org/hub/pytorch_fairseq_roberta/) for more examples.

# Requirements and Installation

* [PyTorch](http://pytorch.org/) version >= 1.5.0
* Python version >= 3.6
* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)
* **To install fairseq** and develop locally:

``` bash
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./

# to install the latest stable release (0.10.0)
# pip install fairseq==0.10.0
```

* **For faster training** install NVIDIA's [apex](https://github.com/NVIDIA/apex) library:

``` bash
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
--global-option="--deprecated_fused_adam" --global-option="--xentropy" \
--global-option="--fast_multihead_attn" ./
```

* **For large datasets** install [PyArrow](https://arrow.apache.org/docs/python/install.html#using-pip): `pip install pyarrow`
* If you use Docker make sure to increase the shared memory size either with `--ipc=host` or `--shm-size`
as command line options to `nvidia-docker run` .

# Getting Started

The [full documentation](https://fairseq.readthedocs.io/) contains instructions
for getting started, training new models and extending fairseq with new model
types and tasks.

# Pre-trained models and examples

We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,
as well as example training and evaluation commands.

* [Translation](examples/translation/README.md): convolutional and transformer models are available
* [Language Modeling](examples/language_model/README.md): convolutional and transformer models are available

We also have more detailed READMEs to reproduce results from specific papers:

* [Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)](examples/criss/README.md)
* [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](examples/wav2vec/README.md)
* [Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)](examples/unsupervised_quality_estimation/README.md)
* [Training with Quantization Noise for Extreme Model Compression ({Fan*, Stock*} et al., 2020)](examples/quant_noise/README.md)
* [Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)](examples/byte_level_bpe/README.md)
* [Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)](examples/mbart/README.md)
* [Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)](examples/layerdrop/README.md)
* [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md)
* [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)
* [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
* [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
* [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
* [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
* [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)
* [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
* [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
* [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)
* [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)
* [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)
* [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/README.conv.md)

# Join the fairseq community

* Twitter: https://twitter.com/fairseq
* Facebook page: https://www.facebook.com/groups/fairseq.users
* Google group: https://groups.google.com/forum/#!forum/fairseq-users

# License

fairseq(-py) is MIT-licensed.
The license applies to the pre-trained models as well.

# Citation

Please cite as:

``` bibtex
@inproceedings{ott2019fairseq,
title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
year = {2019},
}
```
12 changes: 12 additions & 0 deletions espresso/__init__.py
@@ -0,0 +1,12 @@
# Copyright (c) Yiming Wang
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.

import espresso.tools # noqa
import espresso.criterions # noqa
import espresso.models # noqa
import espresso.modules # noqa
import espresso.optim # noqa
import espresso.optim.lr_scheduler # noqa
import espresso.tasks # noqa
14 changes: 14 additions & 0 deletions espresso/criterions/__init__.py
@@ -0,0 +1,14 @@
# Copyright (c) Yiming Wang
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.

import importlib
import os


# automatically import any Python files in the criterions/ directory
for file in os.listdir(os.path.dirname(__file__)):
if not file.startswith("_") and not file.startswith(".") and file.endswith(".py"):
file_name = file[: file.find(".py")]
importlib.import_module("espresso.criterions." + file_name)