Merge r1.1 bugfixes to main. Update dep versions. (NVIDIA#2437)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378) * update branch Signed-off-by: ericharper <complex451@gmail.com> * update jenkinsfile Signed-off-by: ericharper <complex451@gmail.com> * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380) * fix property when not using model parallel Signed-off-by: ericharper <complex451@gmail.com> * fix property when not using model parallel Signed-off-by: ericharper <complex451@gmail.com> * add debug statement Signed-off-by: ericharper <complex451@gmail.com> * add debug statement Signed-off-by: ericharper <complex451@gmail.com> * instantiate with NLPDDPPlugin with num_nodes from trainer config Signed-off-by: ericharper <complex451@gmail.com> * Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381) * Update ASR scripts for tokenizer building and tarred dataset building Signed-off-by: smajumdar <titu1994@gmail.com> * Update container Signed-off-by: smajumdar <titu1994@gmail.com> * Add STT Zh Citrinet 1024 Gamma 0.25 model Signed-off-by: smajumdar <titu1994@gmail.com> * Update notebook (NVIDIA#2391) Signed-off-by: smajumdar <titu1994@gmail.com> * ASR Notebooks fix for 1.1.0 (NVIDIA#2395) * nb fix for spring clean Signed-off-by: fayejf <fayejf07@gmail.com> * remove outdated instruction Signed-off-by: fayejf <fayejf07@gmail.com> * Mean normalization (NVIDIA#2397) * norm embeddings Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * move to utils Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * Bugfix adaptive spec augment time masking (NVIDIA#2398) * bugfix adaptive spec augment Signed-off-by: smajumdar <titu1994@gmail.com> * Revert freq mask guard Signed-off-by: smajumdar <titu1994@gmail.com> * Revert freq mask guard Signed-off-by: smajumdar <titu1994@gmail.com> * Remove static time width clamping Signed-off-by: smajumdar <titu1994@gmail.com> * Correct typos and issues with notebooks (NVIDIA#2402) * Fix Primer notebook Signed-off-by: smajumdar <titu1994@gmail.com> * Typo Signed-off-by: smajumdar <titu1994@gmail.com> * remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403) Signed-off-by: Hoo Chang Shin <hshin@nvidia.com> Co-authored-by: Hoo Chang Shin <hshin@nvidia.com> * [BUGFIX] Megatron in NMT was setting vocab_file to None (NVIDIA#2417) * make vocab_file configurable for megatron in nmt Signed-off-by: ericharper <complex451@gmail.com> * update docs Signed-off-by: ericharper <complex451@gmail.com> * update docs Signed-off-by: ericharper <complex451@gmail.com> * Link updates in docs and notebooks and typo fix (NVIDIA#2416) * typo fix for notebooks Signed-off-by: fayejf <fayejf07@gmail.com> * tiny typo fix in docs Signed-off-by: fayejf <fayejf07@gmail.com> * docs branch->stable Signed-off-by: fayejf <fayejf07@gmail.com> * more docs branch -> stable Signed-off-by: fayejf <fayejf07@gmail.com> * tutorial links branch -> stable Signed-off-by: fayejf <fayejf07@gmail.com> * small fix Signed-off-by: fayejf <fayejf07@gmail.com> * add renamed 06 Signed-off-by: fayejf <fayejf07@gmail.com> * more fixes Signed-off-by: fayejf <fayejf07@gmail.com> * Update onnx (NVIDIA#2420) Signed-off-by: smajumdar <titu1994@gmail.com> * Correct version of onnxruntime (NVIDIA#2422) Signed-off-by: smajumdar <titu1994@gmail.com> * update deployment instructions (NVIDIA#2430) Signed-off-by: ericharper <complex451@gmail.com> * Bumping version to 1.1.0 Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> * update jenksinfile Signed-off-by: ericharper <complex451@gmail.com> * add upper bounds Signed-off-by: ericharper <complex451@gmail.com> * update readme Signed-off-by: ericharper <complex451@gmail.com> * update requirements Signed-off-by: ericharper <complex451@gmail.com> * update jenkinsfile Signed-off-by: ericharper <complex451@gmail.com> * update version Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: khcs <khcs@users.noreply.github.com> Co-authored-by: Hoo Chang Shin <hshin@nvidia.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
paarthneekhara · Sep 17, 2021 · c131b57 · c131b57
1 parent 014672f
commit c131b57
Show file tree

Hide file tree

Showing 51 changed files with 220 additions and 200 deletions.
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -17,12 +17,6 @@ pipeline {
       }
     }
 
-    stage('Uninstall torchtext') {
-      steps {
-        sh 'pip uninstall -y torchtext'
-      }
-    }
-
     stage('Install test requirements') {
       steps {
         sh 'apt-get update && apt-get install -y bc && pip install -r requirements/requirements_test.txt'

diff --git a/README.rst b/README.rst
@@ -93,19 +93,17 @@ Documentation
   :scale: 100%
   :target:  https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/
 
-+---------+-------------+----------------------------------------------------------------------------------------------------------------------------------+
-| Version | Status      | Description                                                                                                                      |
-+=========+=============+==================================================================================================================================+
-| Latest  | |main|      | `Documentation of the latest (i.e. main) branch. <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/>`_          |
-+---------+-------------+----------------------------------------------------------------------------------------------------------------------------------+
-| Next    | |v1.0.2|    | `Documentation of the most recent release: v1.0.2 <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/v1.0.2/>`_       |
-+---------+-------------+----------------------------------------------------------------------------------------------------------------------------------+
-| Stable  | |stable|    | `Documentation of the stable (i.e. stable) branch. <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/>`_      |
-+---------+-------------+----------------------------------------------------------------------------------------------------------------------------------+
++---------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+
+| Version | Status      | Description                                                                                                                              |
++=========+=============+==========================================================================================================================================+
+| Latest  | |main|      | `Documentation of the latest (i.e. main) branch. <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/>`_                  |
++---------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+
+| Stable  | |stable|    | `Documentation of the stable (i.e. most recent release) branch. <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/>`_ |
++---------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+
 
 Tutorials
 ---------
-A great way to start with NeMo is by checking `one of our tutorials <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/v1.0.2/starthere/tutorials.html>`_.
+A great way to start with NeMo is by checking `one of our tutorials <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/starthere/tutorials.html>`_.
 
 Getting help with NeMo
 ----------------------
@@ -147,6 +145,16 @@ Use this installation mode if you are contributing to NeMo.
     cd NeMo
     ./reinstall.sh
 
+RNNT
+~~~~
+Note that RNNT requires numba to be installed from conda.
+
+.. code-block:: bash
+
+  conda remove numba
+  pip uninstall numba
+  conda install -c conda conda
+
 Docker containers:
 ~~~~~~~~~~~~~~~~~~
 
@@ -161,14 +169,14 @@ If you chose to work with main branch, we recommend using NVIDIA's PyTorch conta
 Examples
 --------
 
-Many example can be found under `"Examples" <https://github.com/NVIDIA/NeMo/tree/main/examples>`_ folder.
+Many example can be found under `"Examples" <https://github.com/NVIDIA/NeMo/tree/stable/examples>`_ folder.
 
 
 Contributing
 ------------
 
-We welcome community contributions! Please refer to the  `CONTRIBUTING.md <https://github.com/NVIDIA/NeMo/blob/main/CONTRIBUTING.md>`_ CONTRIBUTING.md for the process.
+We welcome community contributions! Please refer to the  `CONTRIBUTING.md <https://github.com/NVIDIA/NeMo/blob/stable/CONTRIBUTING.md>`_ CONTRIBUTING.md for the process.
 
 License
 -------
-NeMo is under `Apache 2.0 license <https://github.com/NVIDIA/NeMo/blob/main/LICENSE>`_.
+NeMo is under `Apache 2.0 license <https://github.com/NVIDIA/NeMo/blob/stable/LICENSE>`_.
diff --git a/docs/source/asr/asr_language_modeling.rst b/docs/source/asr/asr_language_modeling.rst
@@ -42,7 +42,7 @@ Train N-gram LM
 ===============
 
 The script to train an N-gram language model with KenLM can be found at
-`scripts/asr_language_modeling/ngram_lm/train_kenlm.py <https://github.com/NVIDIA/NeMo/blob/main/scripts/asr_language_modeling/ngram_lm/train_kenlm.py>`__.
+`scripts/asr_language_modeling/ngram_lm/train_kenlm.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/train_kenlm.py>`__.
 
 This script would train an N-gram language model with KenLM library which can be used with the beam search decoders
 on top of the ASR models. This script supports both character level and BPE level encodings and models which is
@@ -95,7 +95,7 @@ Evaluate by Beam Search Decoding and N-gram LM
 
 NeMo's beam search decoders are capable of using the KenLM's N-gram models to find the best candidates.
 The script to evaluate an ASR model with beam search decoding and N-gram models can be found at
-`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py <https://github.com/NVIDIA/NeMo/blob/main/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py>`__.
+`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py>`__.
 
 You may evaluate an ASR model as the following:
 
@@ -169,7 +169,7 @@ Width of the beam search (`--beam_width`) specifies the number of top candidates
 would search for. Larger beams result in more accurate but slower predictions.
 
 There is also a tutorial to learn more about evaluating the ASR models with N-gram LM here:
-`Offline ASR Inference with Beam Search and External Language Model Rescoring <https://colab.research.google.com/github/NVIDIA/NeMo/blob/v1.0.2/tutorials/asr/Offline_ASR.ipynb>`_
+`Offline ASR Inference with Beam Search and External Language Model Rescoring <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Offline_ASR.ipynb>`_
 
 Hyperparameter Grid Search
 --------------------------
@@ -202,19 +202,19 @@ This score is usually combined with the scores from the beam search decoding to
 Train Neural Rescorer
 =====================
 
-An example script to train such a language model with Transformer can be found at `examples/nlp/language_modeling/transformer_lm.py <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/transformer_lm.py>`__.
+An example script to train such a language model with Transformer can be found at `examples/nlp/language_modeling/transformer_lm.py <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/language_modeling/transformer_lm.py>`__.
 It trains a TransformerLMModel which can be used as a neural rescorer for an ASR system.
 
 
 Evaluation
 ==========
 
 Given a trained TransformerLMModel `.nemo` file, the script available at
-`scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py <https://github.com/NVIDIA/NeMo/blob/main/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py>`__
+`scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py>`__
 can be used to re-score beams obtained with ASR model. You need the `.tsv` file containing the candidates produced
 by the acoustic model and the beam search decoding to use this script. The candidates can be the result of just the beam
 search decoding or the result of fusion with an N-gram LM. You may generate this file by specifying `--preds_output_folder' for
-`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py <https://github.com/NVIDIA/NeMo/blob/main/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py>`__.
+`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py>`__.
 
 The neural rescorer would rescore the beams/candidates by using two parameters of `rescorer_alpha` and `rescorer_beta` as the following:
 
@@ -231,9 +231,9 @@ You may follow the following steps to evaluate a neural LM:
 #. Obtain `.tsv` file with beams and their corresponding scores. Scores can be from a regular beam search decoder or
    in fusion with an N-gram LM scores. For a given beam size `beam_size` and a number of examples
    for evaluation `num_eval_examples`, it should contain (`num_eval_examples` x `beam_size`) lines of
-   form `beam_candidate_text \t score`. This file can be generated by `scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py <https://github.com/NVIDIA/NeMo/blob/main/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py>`__
+   form `beam_candidate_text \t score`. This file can be generated by `scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram.py>`__
 
-#. Rescore the candidates by `scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py <https://github.com/NVIDIA/NeMo/blob/main/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py>`__.
+#. Rescore the candidates by `scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py>`__.
 
 .. code::
     python eval_neural_rescorer.py

diff --git a/docs/source/nemo_text_processing/intro.rst b/docs/source/nemo_text_processing/intro.rst
@@ -5,7 +5,7 @@ Text Processing
 
 See :doc:`NeMo Introduction <../starthere/intro>` for installation details.
 
-Additional requirements can be found in `setup.sh <https://github.com/NVIDIA/NeMo/blob/main/nemo_text_processing/setup.sh>`_.
+Additional requirements can be found in `setup.sh <https://github.com/NVIDIA/NeMo/blob/stable/nemo_text_processing/setup.sh>`_.
 
 .. toctree::
    :maxdepth: 1

diff --git a/docs/source/nemo_text_processing/inverse_text_normalization.rst b/docs/source/nemo_text_processing/inverse_text_normalization.rst
@@ -13,7 +13,7 @@ See :doc:`Text Procesing Deployment <../tools/text_processing_deployment>` for d
 
 .. note::
 
-    For more details, see the tutorial `NeMo/tutorials/text_processing/Inverse_Text_Normalization.ipynb <https://github.com/NVIDIA/NeMo/blob/main/tutorials/text_processing/Inverse_Text_Normalization.ipynb>`__ in `Google's Colab <https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/text_processing/Inverse_Text_Normalization.ipynb>`_.
+    For more details, see the tutorial `NeMo/tutorials/text_processing/Inverse_Text_Normalization.ipynb <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Inverse_Text_Normalization.ipynb>`__ in `Google's Colab <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Inverse_Text_Normalization.ipynb>`_.
 
 
 

diff --git a/docs/source/nemo_text_processing/text_normalization.rst b/docs/source/nemo_text_processing/text_normalization.rst
@@ -15,7 +15,7 @@ See :doc:`Text Procesing Deployment <../tools/text_processing_deployment>` for d
 
 .. note::
 
-    For more details, see the tutorial `NeMo/tutorials/text_processing/Text_Normalization.ipynb <https://github.com/NVIDIA/NeMo/blob/main/tutorials/text_processing/Text_Normalization.ipynb>`__ in `Google's Colab <https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/text_processing/Text_Normalization.ipynb>`_.
+    For more details, see the tutorial `NeMo/tutorials/text_processing/Text_Normalization.ipynb <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Text_Normalization.ipynb>`__ in `Google's Colab <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Text_Normalization.ipynb>`_.
 
 
 

diff --git a/docs/source/nlp/bert_pretraining.rst b/docs/source/nlp/bert_pretraining.rst
@@ -61,8 +61,8 @@ and specify the path to the created hd5f files.
 Training the BERT model
 -----------------------
 
-Example of model configuration for on-the-fly data preprocessing: `NeMo/examples/nlp/language_modeling/conf/bert_pretraining_from_text_config.yaml <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/bert_pretraining_from_text_config.yaml>`__.
-Example of model configuration for offline data preprocessing: `NeMo/examples/nlp/language_modeling/conf/bert_pretraining_from_preprocessed_config.yaml <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/bert_pretraining_from_preprocessed_config.yaml>`__.
+Example of model configuration for on-the-fly data preprocessing: `NeMo/examples/nlp/language_modeling/conf/bert_pretraining_from_text_config.yaml <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/language_modeling/conf/bert_pretraining_from_text_config.yaml>`__.
+Example of model configuration for offline data preprocessing: `NeMo/examples/nlp/language_modeling/conf/bert_pretraining_from_preprocessed_config.yaml <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/language_modeling/conf/bert_pretraining_from_preprocessed_config.yaml>`__.
 
 The specification can be grouped into three categories:
 

diff --git a/docs/source/nlp/glue_benchmark.rst b/docs/source/nlp/glue_benchmark.rst
@@ -3,8 +3,8 @@
 GLUE Benchmark
 ==============
 
-We recommend you try the GLUE Benchmark model in a Jupyter notebook (can run on `Google's Colab <https://colab.research.google.com/notebooks/intro.ipynb>`_): `NeMo/tutorials/nlp/GLUE_Benchmark.ipynb <https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/GLUE_Benchmark.ipynb>`__.
+We recommend you try the GLUE Benchmark model in a Jupyter notebook (can run on `Google's Colab <https://colab.research.google.com/notebooks/intro.ipynb>`_): `NeMo/tutorials/nlp/GLUE_Benchmark.ipynb <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/nlp/GLUE_Benchmark.ipynb>`__.
 
 Connect to an instance with a GPU (**Runtime** -> **Change runtime type** -> select **GPU** for the hardware accelerator).
 
-An example script on how to train the model can be found here: `NeMo/examples/nlp/glue_benchmark/glue_benchmark.py <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/glue_benchmark/glue_benchmark.py>`__.
+An example script on how to train the model can be found here: `NeMo/examples/nlp/glue_benchmark/glue_benchmark.py <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/glue_benchmark/glue_benchmark.py>`__.
diff --git a/docs/source/nlp/information_retrieval.rst b/docs/source/nlp/information_retrieval.rst
@@ -3,7 +3,7 @@
 Information Retrieval
 =====================
 
-We recommend you try the Information Retrieval model in a Jupyter notebook (can run on `Google's Colab <https://colab.research.google.com/notebooks/intro.ipynb>`_): `NeMo/tutorials/nlp/Information_Retrieval_MSMARCO.ipynb <https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Information_Retrieval_MSMARCO.ipynb>`__.
+We recommend you try the Information Retrieval model in a Jupyter notebook (can run on `Google's Colab <https://colab.research.google.com/notebooks/intro.ipynb>`_): `NeMo/tutorials/nlp/Information_Retrieval_MSMARCO.ipynb <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/nlp/Information_Retrieval_MSMARCO.ipynb>`__.
 
 Connect to an instance with a GPU (**Runtime** -> **Change runtime type** -> select **GPU** for hardware the accelerator),
 

diff --git a/docs/source/nlp/joint_intent_slot.rst b/docs/source/nlp/joint_intent_slot.rst
@@ -14,7 +14,7 @@ Our BERT-based model implementation allows you to train and detect both of these
 
 .. note::
 
-    We recommend you try the Joint Intent and Slot Classification model in a Jupyter notebook (can run on `Google's Colab <https://colab.research.google.com/notebooks/intro.ipynb>`_.): `NeMo/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb <https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb>`__.
+    We recommend you try the Joint Intent and Slot Classification model in a Jupyter notebook (can run on `Google's Colab <https://colab.research.google.com/notebooks/intro.ipynb>`_.): `NeMo/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/nlp/Joint_Intent_and_Slot_Classification.ipynb>`__.
 
     Connect to an instance with a GPU (**Runtime** -> **Change runtime type** -> select **GPU** for the hardware accelerator).
 
@@ -115,7 +115,7 @@ For each query, the model classifies it as one the intents from the intent dicti
 it as one of the slots from the slot dictionary, including out of scope slot for all the remaining words in the query which does not 
 fall in another slot category. Out of scope slot (``O``) is a part of slot dictionary that the model is trained on.
 
-Example of model configuration file for training the model can be found at: `NeMo/examples/nlp/intent_slot_classification/conf/intent_slot_classification.yaml <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/intent_slot_classification/conf/intent_slot_classification_config.yaml>`__.
+Example of model configuration file for training the model can be found at: `NeMo/examples/nlp/intent_slot_classification/conf/intent_slot_classification.yaml <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/intent_slot_classification/conf/intent_slot_classification_config.yaml>`__.
 In the configuration file, define the parameters of the training and the model, although most of the default values will work well.
 
 The specification can be roughly grouped into three categories:
@@ -152,7 +152,7 @@ More details about parameters in the spec file can be found below:
 | **test_ds.prefix**                        | string          | ``test``                                                                         | A prefix for the test file names.                                                                            |
 +-------------------------------------------+-----------------+----------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
 
-For additional config parameters common to all NLP models, refer to the `nlp_model doc <https://github.com/NVIDIA/NeMo/blob/main/docs/source/nlp/nlp_model.rst#model-nlp>`__.
+For additional config parameters common to all NLP models, refer to the `nlp_model doc <https://github.com/NVIDIA/NeMo/blob/stable/docs/source/nlp/nlp_model.rst#model-nlp>`__.
 
 The following is an example of the command for training the model:
 

diff --git a/docs/source/nlp/machine_translation.rst b/docs/source/nlp/machine_translation.rst
@@ -478,7 +478,7 @@ custom configuration under the ``encoder`` configuration.
 HuggingFace
 ^^^^^^^^^^^
 
-We have provided a `HuggingFace config file <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/machine_translation/conf/huggingface.yaml>`__
+We have provided a `HuggingFace config file <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/machine_translation/conf/huggingface.yaml>`__
 to use with HuggingFace encoders. 
 
 To use the config file from CLI:
@@ -508,7 +508,7 @@ Note the ``+`` symbol is needed if we're not adding the arguments to the YAML co
 Megatron
 ^^^^^^^^
 
-We have provided a `Megatron config file <https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/machine_translation/conf/megatron.yaml>`__
+We have provided a `Megatron config file <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/machine_translation/conf/megatron.yaml>`__
 to use with Megatron encoders. 
 
 To use the config file from CLI:
@@ -561,6 +561,17 @@ To train a Megatron 345M BERT, we would use
   model.encoder.num_layers=24 \
   model.encoder.max_position_embeddings=512 \
 
+If the pretrained megatron model used a custom vocab file, then set:
+
+.. code::
+
+  model.encoder_tokenizer.vocab_file=/path/to/your/megatron/vocab_file.txt
+  model.encoder.vocab_file=/path/to/your/megatron/vocab_file.txt
+
+
+Use ``encoder.model_name=megatron_bert_uncased`` for uncased models with custom vocabularies and
+use ``encoder.model_name=megatron_bert_cased`` for cased models with custom vocabularies.
+
 
 References
 ----------