Skip to content

Commit

Permalink
Multimodal merge (NVIDIA#7728)
Browse files Browse the repository at this point in the history
* ControlNet TRT export

* Final MR before release

* SD2 update

* Fixed export issue

* Fix for instruct p2p and reformat

* Fix SD export issue

* Add nemo clip export for DB

* Fix ins pix2pix

* fix sd2 config

* [Mingyuan Ma] BF16 and SD conversion script

* [Imagen] NHWC Feature

* Fix .nemo loading issue for NeMo CLIP in SD

* NeMo r1.20.0 Multimodal Merge

* fix the inductor issue in inference

* Fix inductor loading .nemo issue

* Add Neva Model Support

* Imagen Optimizations

* Neva inference code

* NeMo TOT 1.21 to Internal/main

* Update neva_inference.yaml

* REBASING  for latest code changes

* Update internal/main to main tot

* Parallel DDIM implementation

* 1. Fixing indentation bug. (#7352)

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* NeMo MCore llama2 support + MCore PEFT adapters (#7299)

* start adding gpt from megatron core path

Signed-off-by: ericharper <complex451@gmail.com>

* set model parallel config

Signed-off-by: ericharper <complex451@gmail.com>

* use model parallel config object

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <complex451@gmail.com>

* set vp size to none if it is 1

Signed-off-by: ericharper <complex451@gmail.com>

* add TransformerConfig

Signed-off-by: ericharper <complex451@gmail.com>

* start updating to TransformerConfig

Signed-off-by: ericharper <complex451@gmail.com>

* add todo

Signed-off-by: ericharper <complex451@gmail.com>

* revert to model parallel config

Signed-off-by: ericharper <complex451@gmail.com>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove imports

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* remove import

Signed-off-by: ericharper <complex451@gmail.com>

* small clean up

Signed-off-by: ericharper <complex451@gmail.com>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <complex451@gmail.com>

* add config obj to flash attention tests

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove args

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <complex451@gmail.com>

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* add config to self

Signed-off-by: ericharper <complex451@gmail.com>

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* add config to test

Signed-off-by: ericharper <complex451@gmail.com>

* get hidden_size from config

Signed-off-by: ericharper <complex451@gmail.com>

* add try except

Signed-off-by: ericharper <complex451@gmail.com>

* use default

Signed-off-by: ericharper <complex451@gmail.com>

* update config with hidden size

Signed-off-by: ericharper <complex451@gmail.com>

* remove arg

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out jenkins test

Signed-off-by: ericharper <complex451@gmail.com>

* revert import

Signed-off-by: ericharper <complex451@gmail.com>

* build transformer config

Signed-off-by: ericharper <complex451@gmail.com>

* add model to provider func

Signed-off-by: ericharper <complex451@gmail.com>

* update forward and float16 wrapper

Signed-off-by: ericharper <complex451@gmail.com>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* set virtual rank

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <jasonwan@nvidia.com>

---------

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* mcore llama2 ckpt conversion & small fix

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* Add inference & sft config by Hongbin

Co-authored-by: Hongbin Liu <hongbinl@nvidia.com>

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* fix config

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* add inference param. update TP/PP script to support mcore gpt

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* p-tuning

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* modify ckpt conversion script (adding model cast)

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* ckpt conversion use relative path for config

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* start adding gpt from megatron core path

Signed-off-by: ericharper <complex451@gmail.com>

* set model parallel config

Signed-off-by: ericharper <complex451@gmail.com>

* use model parallel config object

Signed-off-by: ericharper <complex451@gmail.com>

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* set vp size to none if it is 1

Signed-off-by: ericharper <complex451@gmail.com>

* set vp size to none if it is 1

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add TransformerConfig

Signed-off-by: ericharper <complex451@gmail.com>

* start updating to TransformerConfig

Signed-off-by: ericharper <complex451@gmail.com>

* add todo

Signed-off-by: ericharper <complex451@gmail.com>

* revert to model parallel config

Signed-off-by: ericharper <complex451@gmail.com>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <complex451@gmail.com>

* remove imports

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove import

Signed-off-by: ericharper <complex451@gmail.com>

* small clean up

Signed-off-by: ericharper <complex451@gmail.com>

* update hidden size in peft base model, add mcore commit to jenkins

Signed-off-by: ericharper <complex451@gmail.com>

* update module args

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add config obj to flash attention tests

Signed-off-by: ericharper <complex451@gmail.com>

* remove args

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove sequence parallel arg

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* add config to self

Signed-off-by: ericharper <complex451@gmail.com>

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* add config to test

Signed-off-by: ericharper <complex451@gmail.com>

* get hidden_size from config

Signed-off-by: ericharper <complex451@gmail.com>

* add try except

Signed-off-by: ericharper <complex451@gmail.com>

* use default

Signed-off-by: ericharper <complex451@gmail.com>

* update config with hidden size

Signed-off-by: ericharper <complex451@gmail.com>

* remove arg

Signed-off-by: ericharper <complex451@gmail.com>

* comment out jenkins test

Signed-off-by: ericharper <complex451@gmail.com>

* revert import

Signed-off-by: ericharper <complex451@gmail.com>

* remove optimizer_idx

Signed-off-by: eharper <eharper@nvidia.com>

* prefetch num microbatches

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start adding gpt from megatron core path

Signed-off-by: ericharper <complex451@gmail.com>

* set model parallel config

Signed-off-by: ericharper <complex451@gmail.com>

* use model parallel config object

Signed-off-by: ericharper <complex451@gmail.com>

* update args

Signed-off-by: ericharper <complex451@gmail.com>

* fix for p-tuning sequence parallel

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* support SFT/distOpt mcore (#7207)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* change layer names for SFT

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

* fix bug in SFT

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

---------

Signed-off-by: jasonwan <jasonwan@nvidia.com>
Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>
Co-authored-by: Hongbin Liu <hongbinl@nvidia.com>
Co-authored-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* start updating to TransformerConfig

Signed-off-by: ericharper <complex451@gmail.com>

* revert to model parallel config

Signed-off-by: ericharper <complex451@gmail.com>

* add hidden_size to model_parallel_config

Signed-off-by: ericharper <complex451@gmail.com>

* remove imports

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update module args

Signed-off-by: ericharper <complex451@gmail.com>

* add config to self

Signed-off-by: ericharper <complex451@gmail.com>

* build transformer config

Signed-off-by: ericharper <complex451@gmail.com>

* add model to provider func

Signed-off-by: ericharper <complex451@gmail.com>

* update forward and float16 wrapper

Signed-off-by: ericharper <complex451@gmail.com>

* instantiate model parallel config after init model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* set virtual rank

Signed-off-by: ericharper <complex451@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add GQA config to megatron gpt model (#7096)

* Add GQA config in gpt config file

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* Verify mcore is enabled when using GQA

Signed-off-by: jasonwan <jasonwan@nvidia.com>

---------

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

* remove import

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rollback model cast for p-tuning

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* update for dist adam

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use get_gpt_module_list

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion script

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* ptl2.0 patch for llama config

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* add plugins to trainer in scripts

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* fix activation checkpointing mcore

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* fix variable names

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* overwrite normalization type for mcore/te

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* Update megatron_llama_sft.yaml

Signed-off-by: Jason Wang <jasonwan@nvidia.com>

* add PEFT adapter support for mcore gpt path (#7276)

* implementation for mcore adapter/mxins

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* small fix for lora and ptuning

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* support layerwise peft

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* support multiple target layers

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* support lora GQA

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* support amp O2

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* revert & more O2 fix

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lora inject to attention

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* support lora weight tying

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* add copyright header

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* rollback ptuning name change. full string match mcore target

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove comment

Signed-off-by: jasonwan <jasonwan@nvidia.com>

---------

Signed-off-by: jasonwan <jasonwan@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* clean up config

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* Sync llama branch (#7297)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* change layer names for SFT

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

* fix bug in SFT

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

* fix bug: cpu initialization is not really enabled

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

* add use_cpu_initialization to TransformerConfig

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

* fix bug: wrong config path when using relative cjpt path

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

* revert mcore config change

Signed-off-by: Jason Wang <jasonwan@nvidia.com>

---------

Signed-off-by: jasonwan <jasonwan@nvidia.com>
Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>
Signed-off-by: Jason Wang <jasonwan@nvidia.com>
Co-authored-by: Hongbin Liu <hongbinl@nvidia.com>

* clean up ckpt conversion script

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* rollback git merge errors

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* update mcore, add check for mcore+te

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* formatting

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* make sft test dataset optional. fix indentation in config

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* one more fix for optional test set

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* support merging lora weights in mcore

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* update mcore for cpu init

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update ckpt conversion for code llama

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add seq_len_interpolation_factor support for long-context llama ckpts (#7312)

* add inference param. update TP/PP script to support mcore gpt

* p-tuning

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* add seq_len_interpolation_factor

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>

---------

Signed-off-by: jasonwan <jasonwan@nvidia.com>
Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>
Co-authored-by: jasonwan <jasonwan@nvidia.com>
Co-authored-by: Hongbin Liu <hongbinl@nvidia.com>

* fix old ptuning model, update mcore to support seq_len_interpolation_factor

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* support fused layernorm linear, fix ptuning O2

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* drop loss mask for mcore for now

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* disable dist ckpt in peft

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix loading non dist ckpt

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* add ckpt conversion to CI

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* update CI

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* mcore_mixin docstring

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* minor change in mcore peft error message

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* fix amp o2 in lora weight tying

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* correct mcore fp8 config

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* add TE installation

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* support mcore adapter tuning

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comment out new CI test. rollback docker image

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* ignore FA tests, try new CI on 23.08

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* mark new CI as L2, put to beginning to test

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* minor fix for prompt learning

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* rollback to 23.06. comment out CI

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* minor fix ckpt conversion script

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor rollback gpt model change

Signed-off-by: jasonwan <jasonwan@nvidia.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: jasonwan <jasonwan@nvidia.com>
Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>
Signed-off-by: Jason Wang <jasonwan@nvidia.com>
Co-authored-by: ericharper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: eharper <eharper@nvidia.com>
Co-authored-by: Hongbin Liu <hongbinl@nvidia.com>
Co-authored-by: Kelvin Liu <lhb8125@users.noreply.github.com>

* Hiddens modules documentation (#7303)

* 1. Changed hiddens transformations module from `transformations` to `hiddens`.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Finished doc.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com>

---------

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Support for flash attention 2.0 (#7063)

* Add flash attn 2

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add FA2 feature

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Remove debugging

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com>
Co-authored-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* lora merge fix for O2 names (#7325)

* wip

Signed-off-by: arendu <adithyare@nvidia.com>

* adjust key names based on O2

Signed-off-by: arendu <adithyare@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithyare@nvidia.com>

* minor

Signed-off-by: arendu <adithyare@nvidia.com>

---------

Signed-off-by: arendu <adithyare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* multiple fields can form a context (#7147)

* list of context fields and flexible prompt template

Signed-off-by: arendu <adithya.r@gmail.com>

* list of fields for context

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Add multiple truncation fields and middle truncation

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Compatible to old ckpt

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix tokenize detokenize issue

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove detokenization, add truncation augmentation

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Resolve comments

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Remove unused import

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert eos

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Add tokenizer space_sensitive attribute

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix error

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Fix erorr and use re

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Change assert logic

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Follow adi suggestion

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove merge function

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add example and comment

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Remove context_key and add comment

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Remove random truncation

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix template none

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com>

* Load buffers in checkpoint (#7357)

Signed-off-by: Jason Wang <jasonwan@nvidia.com>

* Add migration guide for lightning 2.0 upgrade (#7360)

* Add lightning 2.0 migration guide in NeMo docs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add remaining guide for lightning 2.0 upgrade

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove line spill over and continue in next line

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing dataloader_iter in the guide

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix minor typo

Signed-off-by: Abhishree <abhishreetm@gmail.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* adding bias_dropout_add_fusion option for BERT (#7332)

Signed-off-by: Alexander Jipa <azzhipa@amazon.com>
Co-authored-by: Alexander Jipa <azzhipa@amazon.com>

* [TTS] Change audio codec token type to TokenIndex (#7356)

Signed-off-by: Ryan <rlangman@nvidia.com>

* enable selective unfreeze (#7326)

* wip

Signed-off-by: arendu <adithyare@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* wip

Signed-off-by: arendu <adithyare@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* avoid PTL method conflicts

Signed-off-by: arendu <adithyare@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithyare@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithyare@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <adithyare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix typos (#7361)

* fix typos

Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>

* fix typo

Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>

* fix typos

Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>

* fix typos

Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>

* fix typo

Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>

* fix typos

Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>

* fix typo

Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>

* fix typo

Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>

* fix typo

Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>

---------

Signed-off-by: omahs <73983677+omahs@users.noreply.github.com>

* pin numba=0.57.1 to fix reinstall.sh error (#7366)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Update new conversion script for converting safetensors.

* Upgrade pytorch container to 23.08 (#7353)

* upgrade pytorch container

Signed-off-by: eharper <eharper@nvidia.com>

* use mcore

Signed-off-by: eharper <eharper@nvidia.com>

* revert test change

Signed-off-by: eharper <eharper@nvidia.com>

* pleasefixme

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check for ampere

Signed-off-by: eharper <eharper@nvidia.com>

* comment test temporarily

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* enable fp32 optimizer for output_layer in mcore (#7355)

Signed-off-by: lhb8125 <lhb8125@gmail.com>

* revert comment (#7368)

Signed-off-by: eharper <eharper@nvidia.com>

* Update to core 23.08 branch ToT (#7371)

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* upper bounding ptl (#7370)

Signed-off-by: eharper <eharper@nvidia.com>

* fix pipeline parallel inference (#7367)

* fix pp inference

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: jasonwan <jasonwan@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix for peft tied weights (#7372)

Signed-off-by: arendu <adithyare@nvidia.com>

* fixed trainer.strategy=auto from None. (#7369)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* add O2 option in gpt eval (#7358)

* add O2 option in eval

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add doc for O2 config

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* add to llama inference config

Signed-off-by: jasonwan <jasonwan@nvidia.com>

---------

Signed-off-by: jasonwan <jasonwan@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Move model precision copy (#7336)

* move cfg precision set to megatron base model

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* remove copy from other models

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* modify attribute not arg

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* fix gpt model test for ptl 2.0

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* rename function and add docstring

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* replace precision to dtype conditionals with func call

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* unnecessary function and cfg reset

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* set default value

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* fix precision lookup in a few more places

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* rename mapping function

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* ununsed import

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* save torch datatype to model

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* set weights precision wrt amp o2

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* Revert "set weights precision wrt amp o2"

This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c.

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* revert half precision at inference attempt

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* move autocast dtype to base model

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* move params dtype to base model, enable fp16 O2 inf

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* unused imports

Signed-off-by: Maanu Grover <maanug@nvidia.com>

---------

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* Fix PEFT checkpoint loading (#7388)

* Fix PEFT checkpoint loading

Signed-off-by: Jason Wang <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <jasonwan@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Use distributed optimizer support for multiple dtypes (#7359)

* Update distopt wrapper with multiple dtype support

Remove manual handling of separate FP32 optimizer.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Use distopt support for contiguous buffers with multiple dtypes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Fix typo

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Separate distopt buckets for first GPT layer and non-overlapped params

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Add distopt logic for int dtypes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Apex commit

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Remove unused variables

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Apex commit in README and Jenkensfile

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Debug Dockerfile and Jenkinsfile

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* minor fix for llama ckpt conversion script (#7387)

* minor fix for llama ckpt conversion script

Signed-off-by: Jason Wang <jasonwan@nvidia.com>

* Update Jenkinsfile

Signed-off-by: Jason Wang <jasonwan@nvidia.com>

* remove fast_swiglu configuration

Signed-off-by: Jason Wang <jasonwan@nvidia.com>

---------

Signed-off-by: Jason Wang <jasonwan@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Fix wrong calling of librosa.get_duration() in notebook (#7376)

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* [PATCH] PEFT import mcore (#7393)

* [PATCH] PEFT import mcore

Signed-off-by: Jason Wang <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jason Wang <jasonwan@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [TTS] Added a callback for logging initial data (#7384)

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Update Core Commit (#7402)

* Update Core Commit

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* update commit

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

---------

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* Use cfg attribute in bert (#7394)

* use cfg attribute instead of arg

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* use torch_dtype in place of cfg.precision

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* move precision copy before super constructor

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* use trainer arg

Signed-off-by: Maanu Grover <maanug@nvidia.com>

---------

Signed-off-by: Maanu Grover <maanug@nvidia.com>

* Add support for bias conversion in Swiglu models (#7386)

* Add support for bias conversion in Swiglu models

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add support for auto extracting tokenizer model

Signed-off-by: smajumdar <titu1994@gmail.com>

* Fix issue with missing tokenizer

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update save_to and restore_from for dist checkpointing (#7343)

* add dist ckpt to save to, in progress

Signed-off-by: eharper <eharper@nvidia.com>

* move dist ckpt

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update restore from, need to figure out how to initialize distributed

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* launch distrib if needed when restoring dist ckpt

Signed-off-by: eharper <eharper@nvidia.com>

* when using mcore we can change tp pp on the fly

Signed-off-by: eharper <eharper@nvidia.com>

* add load_from_checkpoint support for dist ckpt

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update llama convert script to save dist .nemo

Signed-off-by: eharper <eharper@nvidia.com>

* fix load dist ckpt

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup TE TP groups if needed

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* setup te tp groups if needed

Signed-off-by: eharper <eharper@nvidia.com>

* remove import

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
Signed-off-by: jasonwan <jasonwan@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: jasonwan <jasonwan@nvidia.com>

* fix forward for with mcore=false (#7403)

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com>

* Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374)

* Add CustomProgressBar class to exp_manager and trainer callbacks

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix the progress bar to reflect total microbatch cnt

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify CustomProgressBar class

1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch
2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add CustomProgressBar callback to tuning files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Set Activation Checkpointing Defaults (#7404)

* Set Activation Checkpointing Defaults

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* check for None

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* make loss mask default to false (#7407)

Signed-off-by: eharper <eharper@nvidia.com>

* Add dummy userbuffer config files (#7408)

Signed-off-by: Sangkug Lym <slym@nvidia.com>

* add missing ubconf files (#7412)

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* New tutorial on Speech Data Explorer (#7405)

* Added Google Colab based tutorial on Speech Data Explorer

Signed-off-by: George Zelenfroynd <gzelenfroind@nvidia.com>

* Update ptl training ckpt conversion script to work with dist ckpt (#7416)

* update ptl convert script

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* don't break legacy

Signed-off-by: eharper <eharper@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Allow disabling sanity checking when num_sanity_val_steps=0 (#7413)

* Allow disabling sanity checking when num_sanity_val_steps=0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Update num_sanity_val_steps to be a multiple of num_microbatches

Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add comprehensive error messages (#7261)

Signed-off-by: Anton Peganov <apeganov@nvidia.com>

* check NEMO_PATH (#7418)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* layer selection for ia3 (#7417)

* layer selection for ia3

Signed-off-by: arendu <adithyare@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <adithyare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix missing pip package 'einops' (#7397)

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>

* Fix failure of pyaudio in Google Colab (#7396)

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>

* Update README.md: output_path --> output_manifest_filepath (#7442)

Signed-off-by: Samuele Cornell <cornellsamuele@gmail.com>

* Updating FlashAttention API to match FlashAttentionV2

* Multiple fixes for mm

* Fix CI inductor issue and update to torch compile

* Remove suppress error

* Fix when conversion config uses fp16 and it complains about precision plugin

* Fixing FAv2 API usage

* Initial release of content filtering model

* Added synthetic dataloader for precached and online mode

* Mingyuanm/dreambooth opt

* Add llama2 support in neva training

* Fix sampler length

* Fix all precision issues in nemo multimodal

* Add rope dynamic linear scaling (#7437)

* Add dynamic linear scaling

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

---------

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>

* Fix None dataloader issue in PTL2.0 (#7455)

* Fix None dataloader issue in PTL2.0

Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updating values of self._validation_dl and self._test_dl as well

Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com>

* updating values of self._validation_dl and self._test_dl as well

Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [ASR] Confidence measure -> method renames (#7434)

* measure -> method

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add steps for document of getting dataset 'SF Bilingual Speech' (#7378)

* Add steps for document of getting dataset 'SF Bilingual Speech'

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>

* Update datasets.rst

added a link from a tutorial demonstrating detailed data prep steps.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* RNN-T confidence and alignment bugfix (#7381)

* new frame_confidence and alignments lists are now always created after the while loop

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tests added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* Fix resume from checkpoint in exp_manager (#7424) (#7426)

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* Fix checking of cuda/cpu device for inputs of Decoder (#7444)

* Fix checking of cuda/cpu device for inputs of Decoder

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>

* Update tacotron2.py

Signed-off-by: Jason <jasoli@nvidia.com>

---------

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Co-authored-by: Jason <jasoli@nvidia.com>

* Fix failure of ljspeech's get_data.py (#7430)

* Fix failure of ljspeech's get_data.py

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [TTS] Fix audio codec type checks (#7373)

* [TTS] Fix audio codec type checks

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix audio codec tests

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add dataset to path of logged artifacts (#7462)

* [TTS] Add dataset to path of logged artifacts

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Revert axis name back to Audio Frames

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>

* Fix sft dataset truncation (#7464)

* Add fix

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

---------

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330)

* striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* transpose conv1d inputs

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* Update subsampling.py

change striding_conv1d_k5 to striding_conv1d

Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com>

* cv branch

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* video manifest

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* add collection classes

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add test_step_outputs

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* correct manifest bug when having only audio or only videos

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* correct manifest bug when having only audio or only videos

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* clean references

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* freeze unfreeze transcribe cv models

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* correct manifest get_full_path bug

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* update for PR

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* guard torchvision

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update nemo/collections/cv/data/video_to_text_dataset.py

Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>
Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com>

* _video_speech_collate_fn in cv/data/video_to_text.py

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* add self.out = None to asr subsampling

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* Update nemo/collections/cv/data/video_to_text_dataset.py

Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>
Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com>

* cv -> multimodal/speech_cv branch

Signed-off-by: mburchi <maxime.burchi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: mburchi <maxime.burchi@gmail.com>
Signed-off-by: Maxime Burchi <60737204+burchim@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>

* HF StarCoder to NeMo conversion script (#7421)

* Script to convert HF StarCoder checkpoint to NeMo

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* StarCoder conversion test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Fix test

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Catch up with save_to changes

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Don't abbreviate args for clarity

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Configurable precision: BF16 vs FP32

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix bug when loading dist ckpt in peft (#7452)

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>
Co-authored-by: Hongbin Liu <hongbinl@nvidia.com>

* Fix adding positional embeddings in-place in transformer module (#7440)

Signed-off-by: Tamerlan Tabolov <tktabolov@gmail.com>
Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com>

* Fix (#7478)

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* add sleep (#7498) (#7499)

* add sleep

* add sleep onto config instead

* add comment

---------

Signed-off-by: Gerald Shen <geshen@nvidia.com>
Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com>

* Fix exp manager check for sleep (#7503) (#7504)

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* bugfix: trainer.accelerator=auto from None. (#7492) (#7493)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>

* [doc] fix broken link (#7481)

Signed-off-by: Stas Bekman <stas00@users.noreply.github.com>

* [TTS] Read audio as int32 to avoid flac read errors (#7477)

* [TTS] Read audio as int32 to avoid flac read errors

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add comment about read failures

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>

* Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409)

* Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS
* Train 'AISHELL-3' dataset with multi-speakers

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>

* Update get_data.py

update copyright header

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Update get_data.py

added a disclaimer

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add new configuration file for AISHELL3 with multispeaker of fastpitch

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>

---------

Signed-off-by: Robin Dong <robin.k.dong@gmail.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* dllogger - log on rank 0 only (#7513)

Signed-off-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix TTS FastPitch tutorial (#7494) (#7516)

* Fix

---------

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com>

* Fix get_dist() tensor dimension (#7506) (#7515)

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>

* bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512)

Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com>

* fix (#7511)

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [TTS] Fix FastPitch data prep tutorial (#7524)

Signed-off-by: Ryan <rlangman@nvidia.com>

* add italian tokenization (#7486)

* add italian tokenization

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more ipa lexicon it

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix error deletion

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* add test

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Replace None strategy with auto in tutorial notebooks (#7521) (#7527)

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>

* unpin setuptools (#7534) (#7535)

Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>

* remove auto generated examples (#7510)

* explicitly remove autogenerated examples for data parallel evaluation

Signed-off-by: arendu <adithyare@nvidia.com>

* mark autogenrated and remove it for test

Signed-off-by: arendu <adithyare@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <adithyare@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264)

It is passed as an explicit argument rather than through
`**strategy_args` so as to ensure someone cannot accidentally pass other
arguments that would end up being ignored.

It is a keyword-only argument to ensure that if in the future we want to
update the signature to `**strategy_args`, we can do it without breaking
code.

Signed-off-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>

* Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533)

* fix none dataloader issue ptl2

* ptl2.0 logging fixes for rnnt_models

---------

Signed-off-by: KunalDhawan <kunaldhawan97@gmail.com>
Co-authored-by: Kunal Dhawan <kunaldhawan97@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>

* gpus -> devices (#7542) (#7545)

Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>

* Update FFMPEG version to fix issue with torchaudio (#7551) (#7553)

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* PEFT GPT & T5 Refactor (#7308)

* initial implementation of add_adapters API

* correct type hint

* Add config in add_adapters for save and load (@author bobchen)

* Remove AdapterConfig to avoid import error

* Add AdaterConfig back and move adaptermixin to sft model

* Add NLPSaveRestoreConnector as default in NLPModel.restore_from

* Add restore_from_nemo_with_adapter and test script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename t5 file and classes to be consistent with GPT

* add t5 sft dataset

* add support for single-file format with T5SFTDataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Various small changes to make T5 SFT work like GPT SFT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add adapter evaluation test script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MultiAdaterConfig for ia3 and fix builder issue

* Make ptuning for T5SFTModel work using mixin

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add IA3_Adapter for AdapterName

* Add adapter name for ptuning and attention adapter

* Make test script GPT/T5 agnostic

* Add layer selection feature

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Integrate adapter name and config

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update gpt peft tuning script to new API

* add t5 peft tuning script with new API

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix IA3 layer selection issue

* Override state_dict on SFT model instead of mixin

* Add load adapter by adapter config

* move peft config map away from example script

* auto get config from nemo adapter

* Move PEFTConfig to new file

* fix ckpt save/load for t5

* name change: add_adapters -> add_adapter

* variable name change

* update t5 script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix t5 issues

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add weight tying

* update gpt tuning script

* PEFT-API proposal

* Fix according to comments

* update tuning scripts

* move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore

* Add mcore_gpt support for NLPAdapterMixin

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

* variable name change to distinguish "peft" and "adapter"

* override `load_adapters` to support `add_adapter` name change

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update tuning and eval script for adapter save/load

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add Ptuning on first stage only

* add lora tutorial for review

* Fix layer selection for mcore

* add landing page

* fix resume training

Signed-off-by: jasonwan <jasonwan@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add mcore condition in sharded_state_dict to make sft work

* Update lora_tutorial.md

First edit of this file for PEFT documentation for NeMO

Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com>

* rename Adapter to AttentionAdapter to avoid confusion in doc

* Change load_adapters to load .nemo

* add quick start guide

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add load_adapters with .ckpt

* Remove setup_complete changes in load_adapters

* update landing page

* remove typo

* Updated quick_start.md per Chen Cui

Signed-off-by: hkelly33 <58792115+hkelly33@users.noreply.github.com>

* Add inference config merger and tutorial

* Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel

* add supported_methods.md and update other documentations

* Update supported_methods.md

minor updates.

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>

* Update landing_page.md

minor update.

Signed-off-by: Adi Renduchintala <adithyare@nvidia.com>

* Modify doc string for NLPAdapterModelMixin

* Add doc string add_adapters in NLPAdapterModelMixin

* rename canonical adapters

* remove mcore hard dependency

* [PATCH] move microbatch calculator to nemo from apex

* remove apex dependency in gpt and t5 sft models

* remove apex dependency in gpt model

* render doc strings

* fix

* Add missing virtual_tokens on ptuning

* fix docstrings

* update gpt-style model coverage in docs

* update docstring

* Remove pdb

* add lightning_fabric to make docstring rendering work

* Add Ptuning missing key

* try docstring rendering

* Fix ptuning issue

* update gpt t5 peft tuning and eval scripts

* typos

* update eval config

* fix bug relating to apex dependency removal

* typo

* make predict step behave the same as test step

* make lora tutorial work in notebook

* cosmetics

* update yaml scripts

* mcore_gpt attribute optional

* typo

* update eval scripts and fix T5 eval bugs

* add NLPDDPStrategyNotebook and trainer builder logic to use it

* update lora notebook to use new trainer builder

* fix microbatch calculator bug for inference after training

* Convert markdown files to RST and incorporate with doc

* typo

* revise language

* remove extra cell

* remove unnecessary inheritance

* remove old tests

* move layer selection default so logging messages make sense

* remove `save_adapters` as adapter weights are saved automatically during training

* initialize weights from a checkpoint instead of randomly

* multiple fields can form a context (#7147)

* list of context fields and flexible prompt template

Signed-off-by: arendu <adithya.r@gmail.com>

* list of fields for context

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Fix bug

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Add multiple truncation fields and middle truncation

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Compatible to old ckpt

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix tokenize detokenize issue

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* [pre-commit.ci] auto fix…
  • Loading branch information
Show file tree
Hide file tree
Showing 54 changed files with 1,511 additions and 111 deletions.
4 changes: 2 additions & 2 deletions examples/nlp/language_modeling/megatron_bart_pretraining.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
plugins = []
strategy = NLPDDPStrategy(
no_ddp_communication_hook=True,
Expand All @@ -56,7 +56,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2:
if megatron_amp_O2:
plugins.append(MegatronHalfPrecisionPlugin(plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(plugin_precision, device='cuda', scaler=scaler))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
with_distributed_adam = cfg.model.optim.get('name', 'fused_adam') == 'distributed_fused_adam'
plugins = []
strategy = NLPDDPStrategy(
Expand All @@ -152,7 +152,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2 and not with_distributed_adam:
if megatron_amp_O2 and not with_distributed_adam:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)

plugins = []
strategy = NLPDDPStrategy(no_ddp_communication_hook=True, find_unused_parameters=False,)
Expand All @@ -67,7 +67,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2:
if megatron_amp_O2:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
4 changes: 2 additions & 2 deletions examples/nlp/language_modeling/megatron_gpt_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def main(cfg) -> None:
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

trainer = None
if cfg.trainer.precision == 16:
if cfg.trainer.precision in [16, '16', '16-mixed']:
trainer = Trainer(
plugins=[
NLPMixedPrecisionPlugin(
Expand All @@ -45,7 +45,7 @@ def main(cfg) -> None:
strategy=NLPDDPStrategy(),
**cfg.trainer,
)
elif cfg.trainer.precision == 'bf16':
elif cfg.trainer.precision in ['bf16', 'bf16-mixed']:
trainer = Trainer(plugins=[NLPNativeBfloat16PrecisionPlugin(),], strategy=NLPDDPStrategy(), **cfg.trainer,)
else:
trainer = Trainer(plugins=[NLPPrecisionPlugin()], strategy=NLPDDPStrategy(), **cfg.trainer)
Expand Down
4 changes: 2 additions & 2 deletions examples/nlp/language_modeling/megatron_lm_ckpt_to_nemo.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ def parse_weights(weight_dict: OrderedDict, parent_key: str, total: list, conver
converted[final_key] = weight_dict[key]


def add_optimizer_state(lm_checkpoint, new_checkpoint, megatron_amp_o2=True):
def add_optimizer_state(lm_checkpoint, new_checkpoint, megatron_amp_O2=True):
# this method is to convert lm_checkpoint optimizer states for nemo checkpoint
OPTIMIZER_KEY = 'optimizer'
FP32_FP16_KEY = 'fp32_from_fp16_params'
Expand All @@ -182,7 +182,7 @@ def add_optimizer_state(lm_checkpoint, new_checkpoint, megatron_amp_o2=True):
NEW_LR_SCHEDULER = 'lr_schedulers'
if OPTIMIZER_KEY in lm_checkpoint and OPTIMIZER_KEY in lm_checkpoint[OPTIMIZER_KEY]:
opt_state = lm_checkpoint[OPTIMIZER_KEY][OPTIMIZER_KEY]
if megatron_amp_o2:
if megatron_amp_O2:
opt_dict = dict()
if LR_SCHEDULER in lm_checkpoint:
sched = lm_checkpoint[LR_SCHEDULER]
Expand Down
6 changes: 3 additions & 3 deletions examples/nlp/language_modeling/megatron_retro_cal_shape.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
plugins = []
strategy = NLPDDPStrategy(
no_ddp_communication_hook=True if megatron_amp_o2 else False,
no_ddp_communication_hook=True if megatron_amp_O2 else False,
gradient_as_bucket_view=cfg.model.gradient_as_bucket_view,
find_unused_parameters=False,
)
Expand All @@ -53,7 +53,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2:
if megatron_amp_O2:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(MixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
6 changes: 3 additions & 3 deletions examples/nlp/language_modeling/megatron_retro_fine_tune.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,10 @@ def main(cfg) -> None:
# import torch.multiprocessing as mp
# mp.set_start_method("spawn", force=True)
#####################################################
megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
plugins = []
strategy = NLPDDPStrategy(
no_ddp_communication_hook=True if megatron_amp_o2 else False,
no_ddp_communication_hook=True if megatron_amp_O2 else False,
gradient_as_bucket_view=cfg.model.gradient_as_bucket_view,
find_unused_parameters=False,
timeout=datetime.timedelta(seconds=18000),
Expand All @@ -95,7 +95,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2:
if megatron_amp_O2:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(MixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,10 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
plugins = []
strategy = NLPDDPStrategy(
no_ddp_communication_hook=True if megatron_amp_o2 else False,
no_ddp_communication_hook=True if megatron_amp_O2 else False,
gradient_as_bucket_view=cfg.model.gradient_as_bucket_view,
find_unused_parameters=False,
)
Expand All @@ -59,7 +59,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2:
if megatron_amp_O2:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(MixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
6 changes: 3 additions & 3 deletions examples/nlp/language_modeling/megatron_retro_pretraining.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,10 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
plugins = []
strategy = NLPDDPStrategy(
no_ddp_communication_hook=True if megatron_amp_o2 else False,
no_ddp_communication_hook=True if megatron_amp_O2 else False,
gradient_as_bucket_view=cfg.model.gradient_as_bucket_view,
find_unused_parameters=False,
)
Expand All @@ -58,7 +58,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2:
if megatron_amp_O2:
plugins.append(MegatronHalfPrecisionPlugin(plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(MixedPrecisionPlugin(plugin_precision, device='cuda', scaler=scaler))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
plugins = []
strategy = NLPDDPStrategy(
no_ddp_communication_hook=True, # we don't use DDP for async grad allreduce
Expand All @@ -57,7 +57,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2:
if megatron_amp_O2:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
4 changes: 2 additions & 2 deletions examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
plugins = []
strategy = NLPDDPStrategy(
no_ddp_communication_hook=True,
Expand All @@ -90,7 +90,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2:
if megatron_amp_O2:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(MixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
plugins = []
strategy = NLPDDPStrategy(
no_ddp_communication_hook=True,
Expand All @@ -170,7 +170,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2:
if megatron_amp_O2:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
with_distributed_adam = cfg.model.optim.get('name') == 'distributed_fused_adam'

plugins = []
Expand All @@ -88,7 +88,7 @@ def main(cfg) -> None:
else:
plugin_precision = 'bf16-mixed'

if megatron_amp_o2 and not with_distributed_adam:
if megatron_amp_O2 and not with_distributed_adam:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
with_distributed_adam = cfg.model.optim.get('name') == 'distributed_fused_adam'

plugins = []
Expand All @@ -87,7 +87,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2 and not with_distributed_adam:
if megatron_amp_O2 and not with_distributed_adam:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
4 changes: 2 additions & 2 deletions examples/nlp/language_modeling/tuning/megatron_gpt_sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
with_distributed_adam = cfg.model.optim.get('name', 'fused_adam') == 'distributed_fused_adam'
plugins = []
strategy = NLPDDPStrategy(
Expand All @@ -185,7 +185,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2 and not with_distributed_adam:
if megatron_amp_O2 and not with_distributed_adam:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
with_distributed_adam = cfg.model.optim.get('name') == 'distributed_fused_adam'

plugins = []
Expand All @@ -87,7 +87,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2 and not with_distributed_adam:
if megatron_amp_O2 and not with_distributed_adam:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
with_distributed_adam = cfg.model.optim.get('name') == 'distributed_fused_adam'

plugins = []
Expand All @@ -88,7 +88,7 @@ def main(cfg) -> None:
else:
plugin_precision = 'bf16-mixed'

if megatron_amp_o2 and not with_distributed_adam:
if megatron_amp_O2 and not with_distributed_adam:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
with_distributed_adam = cfg.model.optim.get('name') == 'distributed_fused_adam'

plugins = []
Expand All @@ -87,7 +87,7 @@ def main(cfg) -> None:
plugin_precision = '16-mixed'
else:
plugin_precision = 'bf16-mixed'
if megatron_amp_o2 and not with_distributed_adam:
if megatron_amp_O2 and not with_distributed_adam:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
4 changes: 2 additions & 2 deletions examples/nlp/machine_translation/megatron_nmt_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ def main(cfg) -> None:
logging.info("\n\n************** Experiment configuration ***********")
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')

megatron_amp_o2 = cfg.model.get('megatron_amp_O2', False)
megatron_amp_O2 = cfg.model.get('megatron_amp_O2', False)
plugins = []
strategy = NLPDDPStrategy(
no_ddp_communication_hook=True,
Expand All @@ -62,7 +62,7 @@ def main(cfg) -> None:
else:
plugin_precision = 'bf16-mixed'

if megatron_amp_o2:
if megatron_amp_O2:
plugins.append(MegatronHalfPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
else:
plugins.append(PipelineMixedPrecisionPlugin(precision=plugin_precision, device='cuda', scaler=scaler))
Expand Down
7 changes: 5 additions & 2 deletions nemo/collections/common/callbacks/ema.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ def __init__(
def all_parameters(self) -> Iterable[torch.Tensor]:
return (param for group in self.param_groups for param in group['params'])

def step(self, closure=None, **kwargs):
def step(self, closure=None, grad_scaler=None, **kwargs):
self.join()

if self.first_iteration:
Expand All @@ -244,7 +244,10 @@ def step(self, closure=None, **kwargs):
)
self.rebuild_ema_params = False

loss = self.optimizer.step(closure)
if getattr(self.optimizer, "_step_supports_amp_scaling", False) and grad_scaler is not None:
loss = self.optimizer.step(closure=closure, grad_scaler=grad_scaler)
else:
loss = self.optimizer.step(closure)

if self._should_update_at_step():
self.update()
Expand Down

0 comments on commit 8770689

Please sign in to comment.