Merge r1.10.0 main (NVIDIA#4486)

* update branch Signed-off-by: ericharper <complex451@gmail.com> * Fix ASR Typos in tutorials (NVIDIA#4384) * Fix typos Signed-off-by: smajumdar <smajumdar@nvidia.com> * Quick wav2vec fix. In-place operation adding convolutional positions to encoder was overwriting leaf history. Wasn't caught on previous torch versions. (NVIDIA#4383) Signed-off-by: tbartley94 <tbartley@nvidia.com> Co-authored-by: tbartley94 <tbartley@nvidia.com> (cherry picked from commit 0322b15) Co-authored-by: Travis Bartley <Travismbartley@gmail.com> * Fix tutorial typos and docs (NVIDIA#4415) * Fix typos Signed-off-by: smajumdar <smajumdar@nvidia.com> * Fix typos Signed-off-by: smajumdar <smajumdar@nvidia.com> * Add ASR Scores to Docs (NVIDIA#4412) * Fix link Signed-off-by: smajumdar <smajumdar@nvidia.com> * Correct model card Signed-off-by: smajumdar <smajumdar@nvidia.com> * Add ASR Results to Docs Signed-off-by: smajumdar <smajumdar@nvidia.com> * Update info Signed-off-by: smajumdar <smajumdar@nvidia.com> * Update info Signed-off-by: smajumdar <smajumdar@nvidia.com> * docs: add table overflow handling for nested sections (NVIDIA#4441) Co-authored-by: Nick Goncharenko <ngoncharenko@nvidia.com> * Docs: Decrease Font Size on Tables (NVIDIA#4444) * docs: add table overflow handling for nested sections * docs: set table font-size to small Co-authored-by: Nick Goncharenko <ngoncharenko@nvidia.com> * Updated notebook to fix batch configuration and precision bugs (NVIDIA#4447) * Updated notebook to fix batch configuration and precision bugs Signed-off-by: Virginia Adams <vadams@nvidia.com> * Deleted cell outputs Signed-off-by: Virginia Adams <vadams@nvidia.com> * Set datasets back to full dataset Signed-off-by: Virginia Adams <vadams@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * fix branch in link (NVIDIA#4454) Signed-off-by: ekmb <ebakhturina@nvidia.com> * [TTS] [bugfix] German FastPitch HiFi-GAN tutorial and lr (NVIDIA#4459) * [TN] Bug fix: expand serial coverage of unknown symbol, remove constraints from word graph (NVIDIA#4463) * remove constraints from word graph det Signed-off-by: ekmb <ebakhturina@nvidia.com> * add measure units to serial Signed-off-by: ekmb <ebakhturina@nvidia.com> * revert serial changes, update jenkins path Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix test case Signed-off-by: ekmb <ebakhturina@nvidia.com> * update indentation (NVIDIA#4468) Signed-off-by: Akshit Arora <akshit.arora@colorado.edu> * t5-rpe-fix targeting r1.10.0; raise exception for PP>2. (NVIDIA#4469) Signed-off-by: Hoo Chang Shin <hshin@nvidia.com> Co-authored-by: Hoo Chang Shin <hshin@nvidia.com> * Fix some 's' cases for IPA G2P (NVIDIA#4460) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Refactor bias act fusion (NVIDIA#4376) * Refactor bias act fusion Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update NMT config Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update ci tests Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add kwargs to exact string match (NVIDIA#4479) Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Try fix (NVIDIA#4484) Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * update branch Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Travis Bartley <Travismbartley@gmail.com> Co-authored-by: Nick Goncharenko <8766167+nickolyamba@users.noreply.github.com> Co-authored-by: Nick Goncharenko <ngoncharenko@nvidia.com> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Akshit Arora <akshit.arora@colorado.edu> Co-authored-by: khcs <khcs@users.noreply.github.com> Co-authored-by: Hoo Chang Shin <hshin@nvidia.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
hainan-xv · Nov 29, 2022 · 2e04a40 · 2e04a40
1 parent a24e660
commit 2e04a40
Show file tree

Hide file tree

Showing 21 changed files with 484 additions and 482 deletions.
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -137,18 +137,18 @@ pipeline {
       parallel {
         stage('En TN grammars') {
           steps {
-            sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --text="1" --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-14-22'
+            sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --text="1" --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-28-22'
           }
         }
         stage('En ITN grammars') {
           steps {
-            sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --language en --text="twenty" --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-14-22'
+            sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --language en --text="twenty" --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-28-22'
           }
         }
         stage('Test En non-deterministic TN & Run all En TN/ITN tests (restore grammars from cache)') {
           steps {
-            sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize_with_audio.py --text "\$.01" --n_tagged 2 --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-14-22'
-            sh 'CUDA_VISIBLE_DEVICES="" pytest tests/nemo_text_processing/en/ -m "not pleasefixme" --cpu --tn_cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-14-22'
+            sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize_with_audio.py --text "\$.01" --n_tagged 2 --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-28-22'
+            sh 'CUDA_VISIBLE_DEVICES="" pytest tests/nemo_text_processing/en/ -m "not pleasefixme" --cpu --tn_cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-28-22'
           }
         }
       }
@@ -165,7 +165,7 @@ pipeline {
       parallel {
         stage('L2: Eng TN') {
           steps {
-            sh 'cd tools/text_processing_deployment && python pynini_export.py --output=/home/TestData/nlp/text_norm/output/ --grammars=tn_grammars --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-14-22 --language=en && ls -R /home/TestData/nlp/text_norm/output/ && echo ".far files created "|| exit 1'
+            sh 'cd tools/text_processing_deployment && python pynini_export.py --output=/home/TestData/nlp/text_norm/output/ --grammars=tn_grammars --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-28-22 --language=en && ls -R /home/TestData/nlp/text_norm/output/ && echo ".far files created "|| exit 1'
             sh 'cd nemo_text_processing/text_normalization/ &&  python normalize.py --input_file=/home/TestData/nlp/text_norm/ci/test.txt --input_case="lower_cased" --language=en --output_file=/home/TestData/nlp/text_norm/output/test.pynini.txt --verbose'
             sh 'cat /home/TestData/nlp/text_norm/output/test.pynini.txt'
             sh 'cmp --silent /home/TestData/nlp/text_norm/output/test.pynini.txt /home/TestData/nlp/text_norm/ci/test_goal_py_05-25.txt || exit 1'
@@ -175,7 +175,7 @@ pipeline {
 
         stage('L2: Eng ITN export') {
           steps {
-            sh 'cd tools/text_processing_deployment && python pynini_export.py --output=/home/TestData/nlp/text_denorm/output/ --grammars=itn_grammars --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-14-22 --language=en && ls -R /home/TestData/nlp/text_denorm/output/ && echo ".far files created "|| exit 1'
+            sh 'cd tools/text_processing_deployment && python pynini_export.py --output=/home/TestData/nlp/text_denorm/output/ --grammars=itn_grammars --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-28-22 --language=en && ls -R /home/TestData/nlp/text_denorm/output/ && echo ".far files created "|| exit 1'
             sh 'cd nemo_text_processing/inverse_text_normalization/ &&  python inverse_normalize.py --input_file=/home/TestData/nlp/text_denorm/ci/test.txt --language=en --output_file=/home/TestData/nlp/text_denorm/output/test.pynini.txt --verbose'
             sh 'cmp --silent /home/TestData/nlp/text_denorm/output/test.pynini.txt /home/TestData/nlp/text_denorm/ci/test_goal_py.txt || exit 1'
             sh 'rm -rf /home/TestData/nlp/text_denorm/output/*'
@@ -184,23 +184,23 @@ pipeline {
         stage('L2: TN with Audio (audio and raw text)') {
           steps {
             sh 'cd nemo_text_processing/text_normalization && \
-            python normalize_with_audio.py --language=en --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-14-22 --text "The total amounts to \\$4.76." \
+            python normalize_with_audio.py --language=en --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-28-22 --text "The total amounts to \\$4.76." \
             --audio_data /home/TestData/nlp/text_norm/audio_based/audio.wav | tail -n2 | head -n1 > /tmp/out_raw.txt 2>&1 && \
             cmp --silent /tmp/out_raw.txt /home/TestData/nlp/text_norm/audio_based/result.txt || exit 1'
           }
         }
         stage('L2: TN with Audio (audio and text file)') {
           steps {
             sh 'cd nemo_text_processing/text_normalization && \
-            python normalize_with_audio.py --language=en --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-14-22 --text /home/TestData/nlp/text_norm/audio_based/text.txt \
+            python normalize_with_audio.py --language=en --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-28-22 --text /home/TestData/nlp/text_norm/audio_based/text.txt \
             --audio_data /home/TestData/nlp/text_norm/audio_based/audio.wav | tail -n2 | head -n1 > /tmp/out_file.txt 2>&1 && \
             cmp --silent /tmp/out_file.txt /home/TestData/nlp/text_norm/audio_based/result.txt || exit 1'
           }
         }
         stage('L2: TN with Audio (manifest)') {
           steps {
             sh 'cd nemo_text_processing/text_normalization && \
-            python normalize_with_audio.py --language=en --audio_data /home/TestData/nlp/text_norm/audio_based/manifest.json --n_tagged=120 --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-14-22'
+            python normalize_with_audio.py --language=en --audio_data /home/TestData/nlp/text_norm/audio_based/manifest.json --n_tagged=120 --cache_dir /home/TestData/nlp/text_norm/ci/grammars/6-28-22'
           }
         }
       }
@@ -2129,7 +2129,7 @@ pipeline {
         model.num_attention_heads=8 \
         model.activation='swiglu' \
         model.masked_softmax_fusion=False \
-        model.bias_gelu_fusion=False \
+        model.bias_activation_fusion=False \
         model.activations_checkpoint_method='block' \
         model.activations_checkpoint_num_layers=1 \
         model.micro_batch_size=2 \
@@ -2161,7 +2161,7 @@ pipeline {
         model.hidden_size=64 \
         model.num_attention_heads=8 \
         model.activation='swiglu' \
-        model.bias_gelu_fusion=False \
+        model.bias_activation_fusion=False \
         model.masked_softmax_fusion=False \
         model.activations_checkpoint_method='block' \
         model.activations_checkpoint_num_layers=1 \
@@ -2893,7 +2893,7 @@ pipeline {
         model.hidden_size=64 \
         model.num_attention_heads=8 \
         model.activation='swiglu' \
-        model.bias_gelu_fusion=False \
+        model.bias_activation_fusion=False \
         model.activations_checkpoint_method='block' \
         model.activations_checkpoint_num_layers=1 \
         model.transformer_block_type='pre_ln' \
@@ -2918,7 +2918,7 @@ pipeline {
         model.hidden_size=64 \
         model.num_attention_heads=8 \
         model.activation='swiglu' \
-        model.bias_gelu_fusion=False \
+        model.bias_activation_fusion=False \
         model.activations_checkpoint_method='block' \
         model.activations_checkpoint_num_layers=1 \
         model.transformer_block_type='pre_ln' \
@@ -3015,7 +3015,7 @@ pipeline {
         model.hidden_size=64 \
         model.num_attention_heads=8 \
         model.activation='swiglu' \
-        model.bias_gelu_fusion=False \
+        model.bias_activation_fusion=False \
         model.activations_checkpoint_method='block' \
         model.activations_checkpoint_num_layers=1 \
         model.transformer_block_type='normformer' \
@@ -3040,7 +3040,7 @@ pipeline {
         model.hidden_size=64 \
         model.num_attention_heads=8 \
         model.activation='swiglu' \
-        model.bias_gelu_fusion=False \
+        model.bias_activation_fusion=False \
         model.activations_checkpoint_method='block' \
         model.activations_checkpoint_num_layers=1 \
         model.transformer_block_type='normformer' \
@@ -3094,7 +3094,7 @@ pipeline {
         model.hidden_size=64 \
         model.num_attention_heads=8 \
         model.activation='reglu' \
-        model.bias_gelu_fusion=False \
+        model.bias_activation_fusion=False \
         model.activations_checkpoint_method='block' \
         model.activations_checkpoint_num_layers=1 \
         model.data.data_prefix=[.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document,.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document]"
@@ -3116,7 +3116,7 @@ pipeline {
         model.hidden_size=64 \
         model.num_attention_heads=8 \
         model.activation='reglu' \
-        model.bias_gelu_fusion=False \
+        model.bias_activation_fusion=False \
         model.activations_checkpoint_method='block' \
         model.activations_checkpoint_num_layers=1 \
         model.data.data_prefix=[.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document,.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document]"
@@ -3150,7 +3150,7 @@ pipeline {
         model.hidden_size=64 \
         model.num_attention_heads=8 \
         model.activation='geglu' \
-        model.bias_gelu_fusion=False \
+        model.bias_activation_fusion=False \
         model.activations_checkpoint_method='block' \
         model.activations_checkpoint_num_layers=1 \
         model.data.data_prefix=[.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document,.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document]"
@@ -3173,7 +3173,7 @@ pipeline {
         model.hidden_size=64 \
         model.num_attention_heads=8 \
         model.activation='geglu' \
-        model.bias_gelu_fusion=False \
+        model.bias_activation_fusion=False \
         model.activations_checkpoint_method='block' \
         model.activations_checkpoint_num_layers=1 \
         model.data.data_prefix=[.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document,.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document]"