Skip to content

Commit

Permalink
NMT Perceiver Encoder (NVIDIA#2621)
Browse files Browse the repository at this point in the history
* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Correct Dockerfile

Signed-off-by: smajumdar <titu1994@gmail.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* update readmes

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* map_location instead of set_device

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* shallow fusion init commit

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* debug info removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* update version number

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* update package version

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sh tests init

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sparrowhawk container tests support added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove duplication

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <titu1994@gmail.com>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <titu1994@gmail.com>

* Style fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct docstring

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Guard scheduler for None

Signed-off-by: smajumdar <titu1994@gmail.com>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <complex451@gmail.com>

* Correctly log class that was restored

Signed-off-by: smajumdar <titu1994@gmail.com>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: ericharper <complex451@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <titu1994@gmail.com>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add conda update for numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Enable RNNT tests

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update whitelist, change roman weights

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstrings, space fix, init file

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* lgtm

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fraction with measure class

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add to documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Improve documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct name of the dataset

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix syntax

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* check if data dir exists

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding pretrained model

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressed review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup Docs

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <complex451@gmail.com>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <complex451@gmail.com>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <complex451@gmail.com>

* add megatron encoder module

Signed-off-by: ericharper <complex451@gmail.com>

* fixed horrible typo

Signed-off-by: ericharper <complex451@gmail.com>

* fix typo and add default

Signed-off-by: ericharper <complex451@gmail.com>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <complex451@gmail.com>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <complex451@gmail.com>

* add checkpoint_file property

Signed-off-by: ericharper <complex451@gmail.com>

* fix property

Signed-off-by: ericharper <complex451@gmail.com>

* num_tokentypes=0

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* find_unused_parameters=True

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* get instead of pop

Signed-off-by: ericharper <complex451@gmail.com>

* remove token type ids from megatron input example

Signed-off-by: ericharper <complex451@gmail.com>

* pop vocab_size

Signed-off-by: ericharper <complex451@gmail.com>

* fix checkpointing for model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix bug in non model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* convert cfg.trainer to dict

Signed-off-by: ericharper <complex451@gmail.com>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <complex451@gmail.com>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <complex451@gmail.com>

* make vocab_file configurable

Signed-off-by: ericharper <complex451@gmail.com>

* dataclass can't have mutable default

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* unused imports

Signed-off-by: ericharper <complex451@gmail.com>

* revert input example

Signed-off-by: ericharper <complex451@gmail.com>

* check that checkpoint version is not None

Signed-off-by: ericharper <complex451@gmail.com>

* add mp jenkins test

Signed-off-by: ericharper <complex451@gmail.com>

* update docstring

Signed-off-by: ericharper <complex451@gmail.com>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Working on bottleneck transformers.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Working on bottleneck transformers.

* 1. Done cleaning code of bottleneck transformers.
2. Ready to test.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Done cleaning code of bottleneck transformers.
2. Ready to test.

* 1. Working on training script.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Working on training script.

* 1. Updated config class name.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated config class name.

* 1. Training script is ready to be tested.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Training script is ready to be tested.

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

* 1. Fixed bugs.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed bugs.

* 1. Fixed missing import.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing import.

* 1. Fixed support in seq2seq-br.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed support in seq2seq-br.

* 1. Added NLPDDPPlugin.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added NLPDDPPlugin.

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated to support multi-node training.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added comments.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. MTBottleneckModel is in its own file mt_enc_dec_bottleneck_model.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Switched loss annealing to rely on self.trainer.global_step

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added comments regrding the use of return_ortho_loss.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added detailed logging of loss during training (still need to do the same for eval).

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Testing a fix to import bug.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging wrong import issue.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added logging of results to validation step (no tested yet).

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing import.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Testing failing immports.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Disabling changes.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Enabled bottleneck architecture.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed identation.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed import statement.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed typo.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed logging of arbitrary values.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed torch lightining logging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added a missing import.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added NLPDDPPlugin.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Cleaned style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated sign of computed loss.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed double import.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Moved logging of additional loss terms into MTBottleneckModel class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated permissions.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added initial perceiver package.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Working on encoder.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Testing perceiver.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. FInished implementing Perceiver.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated default arch.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Ignoring independant perceiver implementation.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added latent transformer to perceiver

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added TransformerBottleneckDecoderNM.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added TransformerBottleneckEncoderNM.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated bottleneck perceiver.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated MTBottleneckModel.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added BridgeEncoder.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Cleaned code.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated architecture name.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added support in bridge encoder.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added support in hidden_init_method to BridgeEncoder.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Removed unneeded imports.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated comment in YAML

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated YAML comments.
2. hidden_blocks in bridge relates to post-processing after bridge1. Updated YAML comments.
2. hidden_blocks in bridge relates to post-processing after bridge (instead of hidden_blocks-1).

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Initial cross attention in Perceiver with params init has independant parameters.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated Perciver forward.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated TransformerEncoder to be a component as opposed to a parent class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated example command.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. forward nethod in MTBottleneckModel does not compute loss.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added label smoothing for per-sample loss.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated recon_only loss to nll.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Update yaml doc.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated default config to have 32 hidden steps.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated doc.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed type.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed unreachable code bug.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed wrong sign for reconstruction per sample (instead of per token).

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated comments.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
  • Loading branch information
15 people authored and paarthneekhara committed Sep 17, 2021
1 parent 304ab8f commit 34ec867
Show file tree
Hide file tree
Showing 11 changed files with 889 additions and 187 deletions.
40 changes: 26 additions & 14 deletions examples/nlp/machine_translation/conf/aayn_bottleneck.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,20 @@
# The bottleneck architecture supports two architectures:
# 1) No bottleneck: model_type=seq2seq, the usual NMT mdoel.
# 2) Fixed-size bottleneck: model_type in [seq2seq-br, seq2seq-mim, seq2seq-vae],
# where the output of the encoder is projected to a fixed number of steps.
# The loss of seq2seq-br is reconstruction only (like seq2seq).
# The losses of seq2seq-mim, seq2seq-vae are MIM (https://arxiv.org/pdf/2003.02645.pdf)
# and VAE (https://arxiv.org/pdf/1312.6114.pdf) correspondingly.
# The bottleneck architecture supports three learning framework (i.e., losses)
# via model.model_type:
# 1) nll - Conditional cross entropy (the usual NMT loss)
# 2) mim - MIM learning framework. A latent variable model with good
# reconstruction and compressed latent representation.
# https://arxiv.org/pdf/2003.02645.pdf
# 3) vae - VAE learning framework. A latent variable model which learns
# good probability estimation over observations and
# a regularized latent representation.
# https://arxiv.org/pdf/1312.6114.pdf
# The bottleneck architecture supports three encoder architectures via
# model.encoder.arch:
# 1) seq2seq - the usual NMT model without bottleneck
# 2) bridge - a bottleneck which projects the encoder output to a fixed
# number of steps using attention bridge (https://arxiv.org/pdf/1703.03130.pdf)
# 3) perceiver - a bottleneck by projecting inputs to a fixed
# number of steps using perceiver architecture (https://arxiv.org/pdf/2103.03206.pdf)
name: AttentionIsAllYouNeedBottleneck
do_training: True # set to False if only preprocessing data
do_testing: False # set to True to run evaluation on test data after training
Expand All @@ -19,13 +29,10 @@ model:
preproc_out_dir: null # path to store data preprocessing outputs
src_language: 'en'
tgt_language: 'de'
model_type: 'seq2seq-br' # supports seq2seq, seq2seq-br, seq2seq-mim, seq2seq-vae (see description above)
min_logv: -8 # minimal allowed logv for seq2seq-mim
ortho_loss_coef: 0.0 # orthogonality coefficient for attention bridge
att_bridge_size: 512 # dimension of a step in attention bridge
att_bridge_k: 16 # fixed number of steps in attention bridge
att_bridge_inner_size: 1024 # feedforward size in attention bridge
non_recon_warmup_batches: 200000 # warm-up steps for seq2seq-mim, seq2seq-vae
model_type: 'nll' # learning (i.e., loss) type: nll (i.e., cross-entropy/auto-encoder), mim, vae (see description above)
min_logv: -6 # minimal allowed log variance for mim
latent_size: -1 # dimension of latent (projected from hidden) -1 will take value of hidden size
non_recon_warmup_batches: 200000 # warm-up steps for mim, and vae losses
recon_per_token: true # when false reconstruction is computed per sample, not per token

train_ds:
Expand Down Expand Up @@ -127,6 +134,10 @@ model:
mask_future: false
pre_ln: false
pre_ln_final_layer_norm: true
arch: seq2seq # seq2seq, bridge, perceiver (see description above)
hidden_steps: 32 # fixed number of hidden steps
hidden_blocks: 1 # number of repeat blocks (see classes for description)
hidden_init_method: default # see classes for available values

decoder:
library: nemo
Expand All @@ -146,6 +157,7 @@ model:
hidden_act: relu
pre_ln: false
pre_ln_final_layer_norm: true
arch: seq2seq # currently only seq2seq is supported

head:
num_layers: 1
Expand Down
11 changes: 6 additions & 5 deletions examples/nlp/machine_translation/enc_dec_nmt-bottleneck.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,14 @@
model.beam_size=4 \
model.max_generation_delta=256 \
model.label_smoothing=0.1 \
model.model_type='seq2seq-br' \
model.att_bridge_size=512 \
model.att_bridge_k=16 \
model.att_bridge_inner_size=1024 \
model.non_recon_warmup_batches=75000 \
model.model_type=nll \
model.non_recon_warmup_batches=7500 \
model.encoder_tokenizer.tokenizer_model=tokenizer.BPE.8192.model \
model.decoder_tokenizer.tokenizer_model=tokenizer.BPE.8192.model \
model.encoder.arch=perceiver \
model.encoder.hidden_steps=32 \
model.encoder.hidden_blocks=2 \
model.encoder.hidden_init_method=bridge \
model.encoder.num_layers=6 \
model.encoder.hidden_size=512 \
model.encoder.inner_size=2048 \
Expand Down
12 changes: 10 additions & 2 deletions nemo/collections/common/losses/smoothed_cross_entropy.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ class SmoothedCrossEntropyLoss(Loss):
1) excludes padding tokens from loss calculation
2) allows to use label smoothing regularization
3) allows to calculate loss for the desired number of last tokens
4) per_token_reduction - if False disables reduction per token
Args:
label_smoothing (float): label smoothing regularization coefficient
Expand Down Expand Up @@ -65,12 +66,14 @@ def __init__(
label_smoothing: Optional[float] = 0.0,
predict_last_k: Optional[int] = 0,
eps: float = 1e-6,
per_token_reduction: bool = True,
):
super().__init__()
self._pad_id = pad_id
self._eps = eps
self._predict_last_k = predict_last_k
self._label_smoothing = label_smoothing
self._per_token_reduction = per_token_reduction

@typecheck()
def forward(self, log_probs, labels, output_mask=None):
Expand All @@ -97,7 +100,12 @@ def forward(self, log_probs, labels, output_mask=None):
neg_log_likelihood = (1.0 - smoothing) * target_log_probs + smoothing * smoothing_log_probs
neg_log_likelihood = neg_log_likelihood[:, -self._predict_last_k :]
output_mask = output_mask[:, -self._predict_last_k :]
neg_log_likelihood = -torch.sum(neg_log_likelihood * output_mask)
neg_log_likelihood = neg_log_likelihood / (output_mask.sum() + self._eps)

# when False avoid per token reduction
if self._per_token_reduction:
neg_log_likelihood = -torch.sum(neg_log_likelihood * output_mask)
neg_log_likelihood = neg_log_likelihood / (output_mask.sum() + self._eps)
else:
neg_log_likelihood = -(neg_log_likelihood * output_mask)

return neg_log_likelihood

0 comments on commit 34ec867

Please sign in to comment.