# Prompt Learning with NeMo

In this example, we utilize NeMo's [prompt learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html)
feature to showcase how to adapt a large language model (LLM) to 
a downstream task, such as financial sentiment predictions. 

The prompt learning technique shown in the example is [p-tuning](https://arxiv.org/abs/2103.10385), which adds a small prompt encoder network to the LLM
to produce virtual token embeddings that guide the model toward the desired output of the downstream task.

For more details on how to change hyperparameters for prompt learning in NeMo, see this [tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Multitask_Prompt_and_PTuning.ipynb) which is also the basis for this NVFlare tutorial.

## Dependencies
We assume you followed the instructions [here](../../README.md#requirements) 
to install the NeMo framework and the NeMo-NVFlare package. 

## Download the pre-trained LLM
In this example, we use a `MegatronGPTModel`, a transformer-based language model based on the GPT architecture.

In [1]:
# Check what GPT .nemo models we have available on NGC
from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
MegatronGPTModel.list_available_models()

[NeMo W 2023-05-03 21:53:35 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-03 21:53:35 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-03 21:53:37 experimental:27] Module <class 'nemo.collections.asr.modules.audio_modules.SpectrogramToMultichannelFeatures'> is experimental, not ready for production and is not fully supported. Use at your own risk.


[PretrainedModelInfo(
 	pretrained_model_name=megatron_gpt_345m,
 	description=345M parameter GPT generative Megatron model.,
 	location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/megatron_gpt_345m/versions/1/files/megatron_gpt_345m.nemo
 )]

In [2]:
# Download the model from NGC
import os
model_file = "megatron_gpt_345m.nemo"
if not os.path.isfile(model_file):
    !wget "https://api.ngc.nvidia.com/v2/models/nvidia/nemo/megatron_gpt_345m/versions/1/files/$model_file"
else:
    print(f"{model_file} already downloaded.")

megatron_gpt_345m.nemo already downloaded.


## Data preprocessing
As our downstream task, we will use the [Financial PhraseBank dataset](https://huggingface.co/datasets/financial_phrasebank) for sentiment analysis.

The Financial PhraseBank dataset contains the sentiments for financial news headlines from a retail investor's perspective. Further details about the dataset can be found in Malo et al.'s ["Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts"](https://asistdl.onlinelibrary.wiley.com/doi/abs/10.1002/asi.23062).


#### 1. Download the preprocessing scripts
We use the preprocessing scripts provided by NeMo which can be downloaded from GitHub.

In [3]:
script_name = "prompt_learning_financial_phrase_bank_preprocessing.py"
if not os.path.isfile(script_name):
    !wget -N "https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/dataset_processing/nlp/financial_phrase_bank/$script_name"
else:
    print(f"{script_name} already downloaded.")

prompt_learning_financial_phrase_bank_preprocessing.py already downloaded.


#### 2. Download the Financial PhraseBank Dataset

Download the `FinancialPhraseBank-v1.0.zip` dataset from [here](https://www.researchgate.net/profile/Pekka_Malo/publication/251231364_FinancialPhraseBank-v1.0/data/0c96051eee4fb1d56e000000/FinancialPhraseBank-v1.0.zip).

Then extract it under `./data`.

#### 3. Preprocess the dataset

In [4]:
!python3 prompt_learning_financial_phrase_bank_preprocessing.py

Saving train split to data/FinancialPhraseBank-v1.0/financial_phrase_bank_train.jsonl
100%|███████████████████████████████████| 1811/1811 [00:00<00:00, 112901.27it/s]
Saving val split to data/FinancialPhraseBank-v1.0/financial_phrase_bank_val.jsonl
100%|█████████████████████████████████████| 226/226 [00:00<00:00, 113468.12it/s]
Saving test split to data/FinancialPhraseBank-v1.0/financial_phrase_bank_test.jsonl
100%|█████████████████████████████████████| 227/227 [00:00<00:00, 122457.49it/s]


#### 4. Split the dataset to simulate clients
Next, we use three clients to simulate federated learning for p-tuning with NeMo.

In [5]:
!python3 data/split_financial_phrase_data.py --data_path data/FinancialPhraseBank-v1.0/financial_phrase_bank_train.jsonl --num_clients 3 --out_dir data/FinancialPhraseBank-v1.0_split

Loaded training data with 1811 entries
Save split 1 of 3 with 604 entries to data/FinancialPhraseBank-v1.0_split/site-1.jsonl
Save split 2 of 3 with 604 entries to data/FinancialPhraseBank-v1.0_split/site-2.jsonl
Save split 3 of 3 with 603 entries to data/FinancialPhraseBank-v1.0_split/site-3.jsonl


## Federated learning simulations
Next, we are using NVFlare's [simulator](https://nvflare.readthedocs.io/en/latest/user_guide/fl_simulator.html) to simulate each client training on their own dataset locally and all three clients training together using the [FedAvg](https://arxiv.org/abs/1602.05629) algorithm implemented in NVFlare.

With this setting, we require a GPU with at least 16GB memory to run all clients in parallel on the same GPU. 
If you have multiple GPUs in your system, you can use the `gpu` argument to assign one GPU for each client, e.g., `gpu="0,1"`.

#### 1. Local P-Tuning
First, we modify the configuration files to include the current directory path to access the dataset and pre-trained LLM.

In [1]:
!python3 modify_configs.py --job_folder "jobs/gpt_p-tuning_local"

Set ROOT_DIR to /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning


Next, simulate each client p-tuning on their local dataset using the FL simulator. To do this, we only run 1 round of FL, with each client running 50 p-tuning epochs on their local dataset.

In [None]:
from nvflare import SimulatorRunner    

simulator = SimulatorRunner(
    job_folder="jobs/gpt_p-tuning_local",
    workspace="/tmp/nvflare/nemo/gpt_p-tuning_local",
    n_clients=3,
    threads=3
)
run_status = simulator.run()
print("Simulator finished with run_status", run_status)

2023-05-03 21:53:41,258 - SimulatorRunner - INFO - Create the Simulator Server.
2023-05-03 21:53:41,263 - Cell - INFO - server: creating listener on tcp://0:59755
2023-05-03 21:53:41,264 - Cell - INFO - server: created backbone external listener for tcp://0:59755
2023-05-03 21:53:41,265 - ConnectorManager - INFO - 25901: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2023-05-03 21:53:41,267 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:2793] is starting
2023-05-03 21:53:41,769 - Cell - INFO - server: created backbone internal listener for tcp://localhost:2793
2023-05-03 21:53:41,771 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 PASSIVE tcp://0:59755] is starting
2023-05-03 21:53:41,969 - nvflare.fuel.hci.server.hci - INFO - Starting Admin Server localhost on Port 44111
2023-05-03 21:53:41,970 - SimulatorRunner - INFO - Deploy the Apps.
2023-05-03 21:53:41,980 - SimulatorRunner - INFO - Create the simulate 

[NeMo W 2023-05-03 21:53:59 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-03 21:53:59 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-03 21:53:59 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-05-03 21:53:59 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully

NEMO version 1.17.0
2023-05-03 21:54:00,880 - PromptLearner - INFO - [identity=site-3, run=simulate_job]: Initializing the Learner...
2023-05-03 21:54:00,880 - PromptLearner - INFO - [identity=site-3, run=simulate_job]: Using `MASTER_ADDR`: localhost and `MASTER_PORT`: None
2023-05-03 21:54:00,931 - PromptLearner - INFO - [identity=site-3, run=simulate_job]: Load model configuration from /tmp/nvflare/nemo/gpt_p-tuning_local/simulate_job/app_site-3/config/megatron_gpt_prompt_learning_config.yaml
2023-05-03 21:54:00,938 - PromptLearner - INFO - [identity=site-3, run=simulate_job]: Training with global_batch_size 8 and micro_batch_size 4
2023-05-03 21:54:00,942 - pytorch_lightning.utilities.rank_zero - INFO - Using 16bit None Automatic Mixed Precision (AMP)
2023-05-03 21:54:00,977 - pytorch_lightning.utilities.rank_zero - INFO - GPU available: True (cuda), used: True
2023-05-03 21:54:00,978 - pytorch_lightning.utilities.rank_zero - INFO - TPU available: False, using: 0 TPU cores
2023-05-0

I0503 21:54:00.880515 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job]: Initializing the Learner...
I0503 21:54:00.880946 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job]: Using `MASTER_ADDR`: localhost and `MASTER_PORT`: None
I0503 21:54:00.931094 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job]: Load model configuration from /tmp/nvflare/nemo/gpt_p-tuning_local/simulate_job/app_site-3/config/megatron_gpt_prompt_learning_config.yaml
I0503 21:54:00.938791 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job]: Training with global_batch_size 8 and micro_batch_size 4
I0503 21:54:00.942872 140649257482048 accelerator_connector.py:758] Using 16bit None Automatic Mixed Precision (AMP)
I0503 21:54:00.977674 140649257482048 setup.py:163] GPU available: True (cuda), used: True
I0503 21:54:00.978206 140649257482048 setup.py:166] TPU available: False, using: 0 TPU cores
I0503 21:54:00.978397 14064925748204

2023-05-03 21:54:01,096 - PromptLearner - INFO - [identity=site-1, run=simulate_job]: Load model configuration from /tmp/nvflare/nemo/gpt_p-tuning_local/simulate_job/app_site-1/config/megatron_gpt_prompt_learning_config.yaml
2023-05-03 21:54:01,104 - PromptLearner - INFO - [identity=site-1, run=simulate_job]: Training with global_batch_size 8 and micro_batch_size 4
NEMO version 1.17.0
2023-05-03 21:54:01,111 - PromptLearner - INFO - [identity=site-2, run=simulate_job]: Initializing the Learner...
2023-05-03 21:54:01,112 - PromptLearner - INFO - [identity=site-2, run=simulate_job]: Using `MASTER_ADDR`: localhost and `MASTER_PORT`: None
2023-05-03 21:54:01,165 - PromptLearner - INFO - [identity=site-2, run=simulate_job]: Load model configuration from /tmp/nvflare/nemo/gpt_p-tuning_local/simulate_job/app_site-2/config/megatron_gpt_prompt_learning_config.yaml
2023-05-03 21:54:01,173 - PromptLearner - INFO - [identity=site-2, run=simulate_job]: Training with global_batch_size 8 and micro_ba

I0503 21:54:01.096453 140522389899072 fl_component.py:134] [identity=site-1, run=simulate_job]: Load model configuration from /tmp/nvflare/nemo/gpt_p-tuning_local/simulate_job/app_site-1/config/megatron_gpt_prompt_learning_config.yaml
I0503 21:54:01.104016 140522389899072 fl_component.py:134] [identity=site-1, run=simulate_job]: Training with global_batch_size 8 and micro_batch_size 4
I0503 21:54:01.111409 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job]: Initializing the Learner...
I0503 21:54:01.112121 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job]: Using `MASTER_ADDR`: localhost and `MASTER_PORT`: None
I0503 21:54:01.165990 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job]: Load model configuration from /tmp/nvflare/nemo/gpt_p-tuning_local/simulate_job/app_site-2/config/megatron_gpt_prompt_learning_config.yaml
I0503 21:54:01.173627 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job]: Traini

2023-05-03 21:54:01,309 - pytorch_lightning.utilities.rank_zero - INFO - GPU available: True (cuda), used: True
2023-05-03 21:54:01,310 - pytorch_lightning.utilities.rank_zero - INFO - TPU available: False, using: 0 TPU cores
2023-05-03 21:54:01,310 - pytorch_lightning.utilities.rank_zero - INFO - IPU available: False, using: 0 IPUs
2023-05-03 21:54:01,310 - pytorch_lightning.utilities.rank_zero - INFO - HPU available: False, using: 0 HPUs
2023-05-03 21:54:01,311 - pytorch_lightning.utilities.rank_zero - INFO - GPU available: True (cuda), used: True
2023-05-03 21:54:01,311 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
2023-05-03 21:54:01,311 - pytorch_lightning.utilities.rank_zero - INFO - TPU available: False, using: 0 TPU cores
2023-05-03 21:54:01,312 - pytorch_lightning.utilities.rank_zero - INFO - IPU available: False, using: 0 IPUs
2023-05-03 21:54:01,312 - pytorch_lightni

I0503 21:54:01.309562 140522389899072 setup.py:163] GPU available: True (cuda), used: True
I0503 21:54:01.310195 140522389899072 setup.py:166] TPU available: False, using: 0 TPU cores
I0503 21:54:01.310412 140522389899072 setup.py:169] IPU available: False, using: 0 IPUs
I0503 21:54:01.310610 140522389899072 setup.py:172] HPU available: False, using: 0 HPUs
I0503 21:54:01.311291 139755592410944 setup.py:163] GPU available: True (cuda), used: True
I0503 21:54:01.311563 140522389899072 setup.py:121] `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
I0503 21:54:01.311942 139755592410944 setup.py:166] TPU available: False, using: 0 TPU cores
I0503 21:54:01.312154 139755592410944 setup.py:169] IPU available: False, using: 0 IPUs
I0503 21:54:01.312330 139755592410944 setup.py:172] HPU available: False, using: 0 HPUs
I0503 21:54:01.313292 139755592410944 setup.py:121] `Trainer(val_check_interval=1.0)` was configured so validation will r

[NeMo I 2023-05-03 21:54:01 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-05-03 21:54:01 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:01 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-05-03 21:54:01 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-05-03 21:54:01 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:01 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-05-03 21:54:01 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:01 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-05-03 21:54:01 megatron_init:267] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2023-05-03 21:54:01 megatron_init:279] Rank 0 has embedding group: [0]
[NeMo I 2023-05-03 21:54:01 megatron_init:285] All pipeline model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:01 megatron_init:286]

23-05-03 21:54:01 - PID:26006 - rank:(0, 0, 0, 0) - microbatches.py:39 - INFO - setting number of micro-batches to constant 2


[NeMo I 2023-05-03 21:54:02 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-05-03 21:54:02 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:02 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-05-03 21:54:02 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-05-03 21:54:02 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:02 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-05-03 21:54:02 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:02 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-05-03 21:54:02 megatron_init:267] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2023-05-03 21:54:02 megatron_init:279] Rank 0 has embedding group: [0]
[NeMo I 2023-05-03 21:54:02 megatron_init:285] All pipeline model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:02 megatron_init:286]

23-05-03 21:54:02 - PID:26002 - rank:(0, 0, 0, 0) - microbatches.py:39 - INFO - setting number of micro-batches to constant 2
23-05-03 21:54:02 - PID:26005 - rank:(0, 0, 0, 0) - microbatches.py:39 - INFO - setting number of micro-batches to constant 2


[NeMo I 2023-05-03 21:54:47 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-05-03 21:54:47 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:47 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-05-03 21:54:47 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-05-03 21:54:47 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:47 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-05-03 21:54:47 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:47 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-05-03 21:54:47 megatron_init:267] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2023-05-03 21:54:47 megatron_init:279] Rank 0 has embedding group: [0]
[NeMo I 2023-05-03 21:54:47 megatron_init:285] All pipeline model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:47 megatron_init:286]

[NeMo W 2023-05-03 21:54:47 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.
[NeMo W 2023-05-03 21:54:51 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.


[NeMo I 2023-05-03 21:54:51 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.
[NeMo I 2023-05-03 21:54:51 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-05-03 21:54:51 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:51 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-05-03 21:54:51 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-05-03 21:54:51 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:51 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-05-03 21:54:51 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:51 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-05-03 21:54:51 megatron_init:267] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2023-05-03 21:54:51 megatron_init:279] Rank 0 has embedding group: [0]
[NeMo I 2023-05

Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-05-03 21:54:52 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.
[NeMo I 2023-05-03 21:54:53 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-05-03 21:54:53 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:53 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-05-03 21:54:53 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-05-03 21:54:53 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:53 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-05-03 21:54:53 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:54:53 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-05-03 21:54:53 megatron_init:267] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2023-05-03 21:54:53 megatron_init:279] Rank 0 has embedding group: [0]
[NeMo I 2023-05

[NeMo W 2023-05-03 21:54:53 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-05-03 21:54:54 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.
[NeMo I 2023-05-03 21:54:55 nlp_overrides:374] Model MegatronGPTModel was successfully restored from /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning/megatron_gpt_345m.nemo.
[NeMo I 2023-05-03 21:54:55 nlp_overrides:374] Model MegatronGPTModel was successfully restored from /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning/megatron_gpt_345m.nemo.
[NeMo I 2023-05-03 21:54:55 auto_tokenizer:172] 10 special tokens added, resize your model accordingly.
[NeMo I 2023-05-03 21:54:55 auto_tokenizer:172] 10 special tokens added, resize your model accordingly.


Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-05-03 21:54:56 nlp_overrides:374] Model MegatronGPTModel was successfully restored from /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning/megatron_gpt_345m.nemo.
[NeMo I 2023-05-03 21:54:56 auto_tokenizer:172] 10 special tokens added, resize your model accordingly.


Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-05-03 21:55:47 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-05-03 21:55:47 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:55:47 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-05-03 21:55:47 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-05-03 21:55:47 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:55:47 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-05-03 21:55:47 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:55:47 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-05-03 21:55:47 megatron_init:267] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2023-05-03 21:55:47 megatron_init:279] Rank 0 has embedding group: [0]
[NeMo I 2023-05-03 21:55:47 megatron_init:285] All pipeline model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:55:47 megatron_init:286]

[NeMo W 2023-05-03 21:55:47 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
[NeMo W 2023-05-03 21:55:47 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.
Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-05-03 21:55:48 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.
[NeMo I 2023-05-03 21:55:48 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.
[NeMo I 2023-05-03 21:55:50 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-05-03 21:55:50 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:55:50 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-05-03 21:55:50 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-05-03 21:55:50 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:55:50 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-05-03 21:55:50 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-05-03 21:55:50 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-05-03 21:55:50 megatron_init:267] Rank 0 has pipe

[NeMo W 2023-05-03 21:55:50 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.


2023-05-03 21:55:50,572 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-3, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: assigned task to client site-3: name=train, id=372f6920-d35f-4934-bae0-efd9db3fbb7f
2023-05-03 21:55:50,574 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-3, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: sent task assignment to client. client_name:site-3 task_id:372f6920-d35f-4934-bae0-efd9db3fbb7f
2023-05-03 21:55:50,575 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-1, peer_run=simulate_job, task_name=train, task_id=c9c177bd-ced9-4dc3-a7c0-39683ab84cff]: assigned task to client site-1: name=train, id=c9c177bd-ced9-4dc3-a7c0-39683ab84cff
2023-05-03 21:55:50,630 - GetTaskCommand - INFO - return task to client.  client_name

Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.
Using mask_token, but it is not set yet.
I0503 21:55:50.563429 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job]: Initialized model <class 'nemo_nvflare.fed_megatron_gpt_prompt_learning_model.FedMegatronGPTPromptLearningModel'> and prompt encoder <class 'nemo.collections.nlp.modules.common.prompt_encoder.PromptEncoder'>
I0503 21:55:50.564302 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job]: client runner started
I0503 21:55:50.564096 140522389899072 fl_component.py:134] [identity=site-1, run=simulate_job]: Initialized model <class 'nemo_nvflare.fed_megatron_gpt_prompt_learning_model.FedMegatronGPTPromptLearningModel'> and prompt encoder <class 'nemo.collections.nlp.modules.common.prompt_encoder.PromptEncoder'>
I0503 21:55:50.564533 140649257482048 simulator_worker.py:85] Initialize ClientRunner for client: site-3
I0503 2

2023-05-03 21:55:50,713 - GetTaskCommand - INFO - return task to client.  client_name: site-1  task_name: train   task_id: c9c177bd-ced9-4dc3-a7c0-39683ab84cff  sharable_header_task_id: c9c177bd-ced9-4dc3-a7c0-39683ab84cff
2023-05-03 21:55:50,709 - Communicator - INFO - Received from simulator_server server  (16873478 Bytes). getTask: train time: 0.11954808235168457 seconds
2023-05-03 21:55:50,710 - FederatedClient - INFO - pull_task completed. Task name:train Status:True 
2023-05-03 21:55:50,711 - ClientRunner - INFO - [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: got task assignment: name=train, id=372f6920-d35f-4934-bae0-efd9db3fbb7f
2023-05-03 21:55:50,711 - ClientRunner - INFO - [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: invoking task executor <class 'nvflare.app_common.executors.learner_executor.LearnerExecutor'>
2023-05-03 21:55:50,712 - Lea

I0503 21:55:50.709390 140635516434176 communicator.py:200] Received from simulator_server server  (16873478 Bytes). getTask: train time: 0.11954808235168457 seconds
I0503 21:55:50.710824 140649257482048 fed_client.py:91] pull_task completed. Task name:train Status:True 
I0503 21:55:50.711157 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: got task assignment: name=train, id=372f6920-d35f-4934-bae0-efd9db3fbb7f
I0503 21:55:50.711871 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: invoking task executor <class 'nvflare.app_common.executors.learner_executor.LearnerExecutor'>
I0503 21:55:50.712136 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: Client tr

[NeMo I 2023-05-03 21:55:51 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 
[NeMo I 2023-05-03 21:55:51 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 
[NeMo I 2023-05-03 21:55:51 nlp_overrides:374] Model MegatronGPTModel was successfully restored from /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning/megatron_gpt_345m.nemo.
[NeMo I 2023-05-03 21:55:52 auto_tokenizer:172] 10 special tokens added, resize your model accordingly.
2023-05-03 21:55:52,065 - PromptLearner - INFO - [identity=site-2, run=simulate_job]: Initialized model <class 'nemo_nvflare.fed_megatron_gpt_prompt_learning_model.FedMegatronGPTPromptLearningModel'> and prompt encoder <class 'nemo.collections.nlp.modules.common.prompt_encoder.PromptEncoder'>
2023-05-03 21:55:52,066 - ClientRunner - INFO - [identity=site-2, run=simulate_job]: client runner started
2023-05-03 21:55:52,066 - ClientTaskWorker - INFO - Initialize ClientRunner for client: site-2
2023

70it [00:00, 687.54it/s]Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.
62it [00:00, 615.65it/s]I0503 21:55:52.065337 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job]: Initialized model <class 'nemo_nvflare.fed_megatron_gpt_prompt_learning_model.FedMegatronGPTPromptLearningModel'> and prompt encoder <class 'nemo.collections.nlp.modules.common.prompt_encoder.PromptEncoder'>
I0503 21:55:52.066189 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job]: client runner started
I0503 21:55:52.066437 139755592410944 simulator_worker.py:85] Initialize ClientRunner for client: site-2


2023-05-03 21:55:52,128 - GetTaskCommand - INFO - return task to client.  client_name: site-2  task_name: train   task_id: f5aa2c14-c507-4451-8d91-362c9d433102  sharable_header_task_id: f5aa2c14-c507-4451-8d91-362c9d433102


218it [00:00, 726.46it/s]I0503 21:55:52.204099 139737599833856 communicator.py:200] Received from simulator_server server  (16873478 Bytes). getTask: train time: 0.1115117073059082 seconds
I0503 21:55:52.206012 139755592410944 fed_client.py:91] pull_task completed. Task name:train Status:True 
I0503 21:55:52.206760 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: got task assignment: name=train, id=f5aa2c14-c507-4451-8d91-362c9d433102
I0503 21:55:52.207579 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa2c14-c507-4451-8d91-362c9d433102]: invoking task executor <class 'nvflare.app_common.executors.learner_executor.LearnerExecutor'>
I0503 21:55:52.207908 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa2c14-c507-4451-8d91-

2023-05-03 21:55:52,204 - Communicator - INFO - Received from simulator_server server  (16873478 Bytes). getTask: train time: 0.1115117073059082 seconds
2023-05-03 21:55:52,206 - FederatedClient - INFO - pull_task completed. Task name:train Status:True 
2023-05-03 21:55:52,206 - ClientRunner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: got task assignment: name=train, id=f5aa2c14-c507-4451-8d91-362c9d433102
2023-05-03 21:55:52,207 - ClientRunner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa2c14-c507-4451-8d91-362c9d433102]: invoking task executor <class 'nvflare.app_common.executors.learner_executor.LearnerExecutor'>
2023-05-03 21:55:52,207 - LearnerExecutor - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa2c14-c507-4451-8d91-362c9d433102]: Client trainer got task: train
2023-05-03 21:55:52,214 -

603it [00:00, 804.07it/s]
604it [00:00, 796.94it/s]
85it [00:00, 847.39it/s]

[NeMo I 2023-05-03 21:55:52 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
[NeMo I 2023-05-03 21:55:52 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 
[NeMo I 2023-05-03 21:55:52 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
[NeMo I 2023-05-03 21:55:52 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 


226it [00:00, 859.33it/s]
I0503 21:55:52.904081 140649257482048 cuda.py:58] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
226it [00:00, 878.10it/s]
I0503 21:55:52.989405 140522389899072 cuda.py:58] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
    


[NeMo I 2023-05-03 21:55:52 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
2023-05-03 21:55:52,904 - pytorch_lightning.accelerators.cuda - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo I 2023-05-03 21:55:52 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
2023-05-03 21:55:52,989 - pytorch_lightning.accelerators.cuda - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Validation: 0it [00:00, ?it/s]

    


Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][NeMo I 2023-05-03 21:55:53 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 


79it [00:00, 434.33it/s]

Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s]

604it [00:00, 742.96it/s]
90it [00:00, 887.41it/s]

[NeMo I 2023-05-03 21:55:54 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
[NeMo I 2023-05-03 21:55:54 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 


226it [00:00, 907.89it/s]
I0503 21:55:55.066386 139755592410944 cuda.py:58] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
    


[NeMo I 2023-05-03 21:55:55 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
2023-05-03 21:55:55,066 - pytorch_lightning.accelerators.cuda - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Validation DataLoader 0: 100%|██████████| 29/29 [00:06<00:00,  4.22it/s]2023-05-03 21:56:00,705 - root - INFO - global_model_val_loss: 9.124295234680176
Validation DataLoader 0: 100%|██████████| 29/29 [00:06<00:00,  4.21it/s]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1m     Validate metric     [0m[1m [0m┃[1m [0m[1m      DataLoader 0       [0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│[36m [0m[36m  global_model_val_loss  [0m[36m [0m│[35m [0m[35m    9.124295234680176    [0m[35m [0m│
└───────────────────────────┴───────────────────────────┘


I0503 21:56:00.705590 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] global_model_val_loss: 9.124295234680176


Validation DataLoader 0:  62%|██████▏   | 18/29 [00:05<00:03,  3.32it/s]2023-05-03 21:56:01,259 - PromptLearner - INFO - [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: Global_model global_model_val_loss: 9.124295234680176
2023-05-03 21:56:01,259 - PromptLearner - INFO - [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: Current/Total Round: 1/1
2023-05-03 21:56:01,260 - PromptLearner - INFO - [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: Client identity: site-3
2023-05-03 21:56:01,268 - PromptLearner - INFO - [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: Loaded 7 of 7 weights
2023-05-03 21:56:01,269

I0503 21:56:01.259375 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: Global_model global_model_val_loss: 9.124295234680176
I0503 21:56:01.259870 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: Current/Total Round: 1/1
I0503 21:56:01.260066 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: Client identity: site-3
I0503 21:56:01.268986 140649257482048 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=372f6920-d35f-4934-bae0-efd9db3fbb7f]: Loaded 7 of 7 weights
I0503 21:56:01.269212 140649257482048 fl_component.py:1

[NeMo I 2023-05-03 21:56:01 nlp_overrides:105] Configuring DDP for model parallelism.
Validation DataLoader 0:  69%|██████▉   | 20/29 [00:05<00:02,  3.54it/s][NeMo I 2023-05-03 21:56:01 modelPT:722] Optimizer config = FusedAdam (
    Parameter Group 0
        betas: [0.9, 0.98]
        bias_correction: True
        eps: 1e-08
        lr: 0.0001
        weight_decay: 0.01
    )
[NeMo I 2023-05-03 21:56:01 lr_scheduler:910] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7feb31484040>" 
    will be used during training (effective maximum steps = 3750) - 
    Parameters : 
    (warmup_steps: 50
    min_lr: 0.0
    constant_steps: 0
    max_steps: 3750
    )
2023-05-03 21:56:01,507 - pytorch_lightning.callbacks.model_summary - INFO - 
  | Name            | Type                   | Params
-----------------------------------------------------------
0 | frozen_model    | MegatronGPTModel       | 354 M 
1 | word_embeddings | VocabParallelEmbedding | 51.5 M
2 | prompt_encod

I0503 21:56:01.507802 140649257482048 model_summary.py:83] 
  | Name            | Type                   | Params
-----------------------------------------------------------
0 | frozen_model    | MegatronGPTModel       | 354 M 
1 | word_embeddings | VocabParallelEmbedding | 51.5 M
2 | prompt_encoder  | PromptEncoder          | 4.2 M 
-----------------------------------------------------------
4.2 M     Trainable params
354 M     Non-trainable params
359 M     Total params
718.178   Total estimated model params size (MB)


Validation DataLoader 0:  83%|████████▎ | 24/29 [00:06<00:01,  3.92it/s]2023-05-03 21:56:01,964 - PromptLearner - INFO - [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=c9c177bd-ced9-4dc3-a7c0-39683ab84cff]: Global_model global_model_val_loss: 9.124295234680176
2023-05-03 21:56:01,965 - PromptLearner - INFO - [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=c9c177bd-ced9-4dc3-a7c0-39683ab84cff]: Current/Total Round: 1/1
2023-05-03 21:56:01,965 - PromptLearner - INFO - [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=c9c177bd-ced9-4dc3-a7c0-39683ab84cff]: Client identity: site-1
2023-05-03 21:56:01,973 - PromptLearner - INFO - [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=c9c177bd-ced9-4dc3-a7c0-39683ab84cff]: Loaded 7 of 7 weights
2023-05-03 21:56:01,973

I0503 21:56:01.964596 140522389899072 fl_component.py:134] [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=c9c177bd-ced9-4dc3-a7c0-39683ab84cff]: Global_model global_model_val_loss: 9.124295234680176
I0503 21:56:01.965180 140522389899072 fl_component.py:134] [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=c9c177bd-ced9-4dc3-a7c0-39683ab84cff]: Current/Total Round: 1/1
I0503 21:56:01.965396 140522389899072 fl_component.py:134] [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=c9c177bd-ced9-4dc3-a7c0-39683ab84cff]: Client identity: site-1
I0503 21:56:01.973066 140522389899072 fl_component.py:134] [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=c9c177bd-ced9-4dc3-a7c0-39683ab84cff]: Loaded 7 of 7 weights
I0503 21:56:01.973309 140522389899072 fl_component.py:1

Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00,  6.25it/s][NeMo I 2023-05-03 21:56:02 nlp_overrides:105] Configuring DDP for model parallelism.
2023-05-03 21:56:02,177 - root - INFO - val_loss: 6.063295364379883
Epoch 0:   0%|          | 0/104 [00:00<?, ?it/s] [NeMo I 2023-05-03 21:56:02 modelPT:722] Optimizer config = FusedAdam (
    Parameter Group 0
        betas: [0.9, 0.98]
        bias_correction: True
        eps: 1e-08
        lr: 0.0001
        weight_decay: 0.01
    )
[NeMo I 2023-05-03 21:56:02 lr_scheduler:910] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7fcda7630580>" 
    will be used during training (effective maximum steps = 3750) - 
    Parameters : 
    (warmup_steps: 50
    min_lr: 0.0
    constant_steps: 0
    max_steps: 3750
    )
2023-05-03 21:56:02,238 - pytorch_lightning.callbacks.model_summary - INFO - 
  | Name            | Type                   | Params
-----------------------------------------------------------
0 | 

I0503 21:56:02.177571 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 6.063295364379883
I0503 21:56:02.238523 140522389899072 model_summary.py:83] 
  | Name            | Type                   | Params
-----------------------------------------------------------
0 | frozen_model    | MegatronGPTModel       | 354 M 
1 | word_embeddings | VocabParallelEmbedding | 51.5 M
2 | prompt_encoder  | PromptEncoder          | 4.2 M 
-----------------------------------------------------------
4.2 M     Trainable params
354 M     Non-trainable params
359 M     Total params
718.178   Total estimated model params size (MB)


Validation DataLoader 0: 100%|██████████| 29/29 [00:06<00:00,  4.36it/s]2023-05-03 21:56:02,484 - root - INFO - global_model_val_loss: 9.124295234680176
Validation DataLoader 0: 100%|██████████| 29/29 [00:06<00:00,  4.35it/s]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1m     Validate metric     [0m[1m [0m┃[1m [0m[1m      DataLoader 0       [0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│[36m [0m[36m  global_model_val_loss  [0m[36m [0m│[35m [0m[35m    9.124295234680176    [0m[35m [0m│
└───────────────────────────┴───────────────────────────┘


I0503 21:56:02.484618 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] global_model_val_loss: 9.124295234680176


Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00,  6.29it/s]2023-05-03 21:56:02,977 - root - INFO - val_loss: 6.063295364379883
                                                                           2023-05-03 21:56:02,979 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa2c14-c507-4451-8d91-362c9d433102]: Global_model global_model_val_loss: 9.124295234680176
2023-05-03 21:56:02,980 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa2c14-c507-4451-8d91-362c9d433102]: Current/Total Round: 1/1
2023-05-03 21:56:02,980 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa2c14-c507-4451-8d91-362c9d433102]: Client identity: site-2
2023-05-03 21:56:02,987 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simu

I0503 21:56:02.977072 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 6.063295364379883
I0503 21:56:02.979708 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa2c14-c507-4451-8d91-362c9d433102]: Global_model global_model_val_loss: 9.124295234680176
I0503 21:56:02.980204 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa2c14-c507-4451-8d91-362c9d433102]: Current/Total Round: 1/1
I0503 21:56:02.980409 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa2c14-c507-4451-8d91-362c9d433102]: Client identity: site-2
I0503 21:56:02.987103 139755592410944 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=f5aa

[NeMo I 2023-05-03 21:56:03 modelPT:722] Optimizer config = FusedAdam (
    Parameter Group 0
        betas: [0.9, 0.98]
        bias_correction: True
        eps: 1e-08
        lr: 0.0001
        weight_decay: 0.01
    )
[NeMo I 2023-05-03 21:56:03 lr_scheduler:910] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7f1b1eb14700>" 
    will be used during training (effective maximum steps = 3750) - 
    Parameters : 
    (warmup_steps: 50
    min_lr: 0.0
    constant_steps: 0
    max_steps: 3750
    )
2023-05-03 21:56:03,207 - pytorch_lightning.callbacks.model_summary - INFO - 
  | Name            | Type                   | Params
-----------------------------------------------------------
0 | frozen_model    | MegatronGPTModel       | 354 M 
1 | word_embeddings | VocabParallelEmbedding | 51.5 M
2 | prompt_encoder  | PromptEncoder          | 4.2 M 
-----------------------------------------------------------
4.2 M     Trainable params
354 M     Non-trainable params
35

I0503 21:56:03.207578 139755592410944 model_summary.py:83] 
  | Name            | Type                   | Params
-----------------------------------------------------------
0 | frozen_model    | MegatronGPTModel       | 354 M 
1 | word_embeddings | VocabParallelEmbedding | 51.5 M
2 | prompt_encoder  | PromptEncoder          | 4.2 M 
-----------------------------------------------------------
4.2 M     Trainable params
354 M     Non-trainable params
359 M     Total params
718.178   Total estimated model params size (MB)


Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:00<00:00,  6.37it/s]2023-05-03 21:56:03,917 - root - INFO - val_loss: 6.063295364379883
Epoch 0:   0%|          | 0/104 [00:00<?, ?it/s]                           

I0503 21:56:03.917259 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 6.063295364379883
    
    


Epoch 0:   1%|          | 1/104 [00:03<06:20,  3.70s/it, loss=11.8, v_num=0, reduced_train_loss=11.80, global_step=0.000]

    
    


Epoch 0:   5%|▍         | 5/104 [00:04<01:30,  1.09it/s, loss=10.2, v_num=0, reduced_train_loss=9.090, global_step=4.000]

    
    


Epoch 0:  72%|███████▏  | 75/104 [00:20<00:07,  3.69it/s, loss=0.513, v_num=0, reduced_train_loss=0.689, global_step=74.00]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 0:  66%|██████▋   | 69/104 [00:19<00:09,  3.52it/s, loss=0.544, v_num=0, reduced_train_loss=0.660, global_step=68.00]
Epoch 0:  64%|██████▍   | 67/104 [00:18<00:10,  3.55it/s, loss=1.67, v_num=0, reduced_train_loss=1.200, global_step=66.00]]
Epoch 0:  67%|██████▋   | 70/104 [00:19<00:09,  3.53it/s, loss=0.53, v_num=0, reduced_train_loss=0.394, global_step=69.00] 
Epoch 0:  68%|██████▊   | 71/104 [00:20<00:09,  3.53it/s, loss=0.509, v_num=0, reduced_train_loss=0.276, global_step=70.00]
Epoch 0:  76%|███████▌  | 79/104 [00:20<00:06,  3.78it/s, loss=0.513, v_num=0, reduced_train_loss=0.689, global_step=74.00]
Epoch 0:  69%|██████▉   | 72/104 [00:20<00:09,  3.54it/s, loss=0.496, v_num=0, reduced_train_loss=0.293, global_step=71.00]
Epoch 0:  67%|██████▋   | 70/104 [00:19<00:0

I0503 21:56:26.156550 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.41569310426712036



Epoch 0:  89%|████████▉ | 93/104 [00:23<00:02,  4.00it/s, loss=0.506, v_num=0, reduced_train_loss=0.608, global_step=74.00]
Epoch 0:  85%|████████▍ | 88/104 [00:22<00:04,  3.94it/s, loss=1.36, v_num=0, reduced_train_loss=1.140, global_step=74.00]
Epoch 0:  90%|█████████ | 94/104 [00:23<00:02,  4.02it/s, loss=0.506, v_num=0, reduced_train_loss=0.608, global_step=74.00]
Epoch 1:   1%|          | 1/104 [00:00<00:28,  3.57it/s, loss=0.503, v_num=0, reduced_train_loss=0.597, global_step=75.00, val_loss=0.416]
Epoch 0:  91%|█████████▏| 95/104 [00:23<00:02,  4.05it/s, loss=0.506, v_num=0, reduced_train_loss=0.608, global_step=74.00]
Epoch 0:  87%|████████▋ | 90/104 [00:22<00:03,  3.99it/s, loss=1.36, v_num=0, reduced_train_loss=1.140, global_step=74.00]
Epoch 0:  92%|█████████▏| 96/104 [00:23<00:01,  4.07it/s, loss=0.506, v_num=0, reduced_train_loss=0.608, global_step=74.00]
Epoch 1:   2%|▏         | 2/104 [00:00<00:26,  3.86it/s, loss=0.507, v_num=0, reduced_train_loss=0.339, global_step=76

I0503 21:56:27.484981 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.3508816659450531



Epoch 1:   6%|▌         | 6/104 [00:01<00:23,  4.15it/s, loss=0.533, v_num=0, reduced_train_loss=0.888, global_step=80.00, val_loss=0.416]
Epoch 1:   1%|          | 1/104 [00:00<00:29,  3.47it/s, loss=0.568, v_num=0, reduced_train_loss=1.880, global_step=75.00, val_loss=0.351]
Epoch 1:   7%|▋         | 7/104 [00:01<00:23,  4.17it/s, loss=0.53, v_num=0, reduced_train_loss=0.828, global_step=81.00, val_loss=0.416] 
Epoch 1:   2%|▏         | 2/104 [00:00<00:26,  3.82it/s, loss=0.57, v_num=0, reduced_train_loss=0.489, global_step=76.00, val_loss=0.351] 
Epoch 1:   8%|▊         | 8/104 [00:01<00:22,  4.20it/s, loss=0.552, v_num=0, reduced_train_loss=0.874, global_step=82.00, val_loss=0.416]
Epoch 0: 100%|██████████| 104/104 [00:24<00:00,  4.30it/s, loss=1.36, v_num=0, reduced_train_loss=1.140, global_step=74.00]2023-05-03 21:56:28,117 - root - INFO - val_loss: 0.44989368319511414
Epoch 0: 100%|██████████| 104/104 [00:24<00:00,  4.30it/s, loss=1.36, v_num=0, reduced_train_loss=1.140, global

I0503 21:56:28.117239 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.44989368319511414


Epoch 1:  72%|███████▏  | 75/104 [00:16<00:06,  4.53it/s, loss=0.375, v_num=0, reduced_train_loss=0.135, global_step=149.0, val_loss=0.351] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 1:  68%|██████▊   | 71/104 [00:16<00:07,  4.40it/s, loss=0.578, v_num=0, reduced_train_loss=0.648, global_step=145.0, val_loss=0.450]
Epoch 1:  74%|███████▍  | 77/104 [00:16<00:05,  4.57it/s, loss=0.375, v_num=0, reduced_train_loss=0.135, global_step=149.0, val_loss=0.351]
Epoch 1:  72%|███████▏  | 75/104 [00:18<00:07,  4.10it/s, loss=0.429, v_num=0, reduced_train_loss=0.216, global_step=149.0, val_loss=0.416]
Epoch 1:  69%|██████▉   | 72/104 [00:16<00:07,  4.40it/s, loss=0.581, v_num=0, reduced_train_loss=0.578, global_step=146.0, val_loss=0.450]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 1:  76%|███████▌ 

I0503 21:56:47.482541 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.14852215349674225



Epoch 2:   1%|          | 1/104 [00:00<00:27,  3.80it/s, loss=0.37, v_num=0, reduced_train_loss=0.0829, global_step=150.0, val_loss=0.149]6]
Epoch 1:  92%|█████████▏| 96/104 [00:19<00:01,  4.88it/s, loss=0.542, v_num=0, reduced_train_loss=0.443, global_step=149.0, val_loss=0.450]
Epoch 1:  98%|█████████▊| 102/104 [00:21<00:00,  4.70it/s, loss=0.429, v_num=0, reduced_train_loss=0.216, global_step=149.0, val_loss=0.416]
Epoch 2:   2%|▏         | 2/104 [00:00<00:24,  4.15it/s, loss=0.373, v_num=0, reduced_train_loss=0.214, global_step=151.0, val_loss=0.149]]
Epoch 1:  99%|█████████▉| 103/104 [00:21<00:00,  4.72it/s, loss=0.429, v_num=0, reduced_train_loss=0.216, global_step=149.0, val_loss=0.416]
Epoch 1:  94%|█████████▍| 98/104 [00:19<00:01,  4.93it/s, loss=0.542, v_num=0, reduced_train_loss=0.443, global_step=149.0, val_loss=0.450]
Epoch 1: 100%|██████████| 104/104 [00:21<00:00,  4.75it/s, loss=0.429, v_num=0, reduced_train_loss=0.216, global_step=149.0, val_loss=0.416]2023-05-03 21:56

I0503 21:56:48.058345 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.2391786277294159


Epoch 2:   3%|▎         | 3/104 [00:00<00:23,  4.27it/s, loss=0.371, v_num=0, reduced_train_loss=0.302, global_step=152.0, val_loss=0.149]
Epoch 2:   1%|          | 1/104 [00:00<00:28,  3.62it/s, loss=0.431, v_num=0, reduced_train_loss=0.497, global_step=150.0, val_loss=0.239]0]
Epoch 2:   4%|▍         | 4/104 [00:00<00:23,  4.32it/s, loss=0.388, v_num=0, reduced_train_loss=0.683, global_step=153.0, val_loss=0.149]0]
Epoch 2:   2%|▏         | 2/104 [00:00<00:25,  3.95it/s, loss=0.436, v_num=0, reduced_train_loss=0.438, global_step=151.0, val_loss=0.239]0]
Epoch 2:   5%|▍         | 5/104 [00:01<00:22,  4.35it/s, loss=0.371, v_num=0, reduced_train_loss=0.402, global_step=154.0, val_loss=0.149]0]
Epoch 1: 100%|██████████| 104/104 [00:20<00:00,  5.06it/s, loss=0.542, v_num=0, reduced_train_loss=0.443, global_step=149.0, val_loss=0.450]2023-05-03 21:56:48,685 - root - INFO - val_loss: 0.4168568253517151
Epoch 1: 100%|██████████| 104/104 [00:20<00:00,  5.06it/s, loss=0.542, v_num=0, reduced_

I0503 21:56:48.685668 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.4168568253517151


Epoch 2:  72%|███████▏  | 75/104 [00:16<00:06,  4.56it/s, loss=0.287, v_num=0, reduced_train_loss=0.546, global_step=224.0, val_loss=0.149]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 2:  67%|██████▋   | 70/104 [00:16<00:07,  4.37it/s, loss=0.22, v_num=0, reduced_train_loss=0.0774, global_step=219.0, val_loss=0.239]
Epoch 2:  65%|██████▌   | 68/104 [00:15<00:08,  4.39it/s, loss=0.487, v_num=0, reduced_train_loss=0.503, global_step=217.0, val_loss=0.417]
Epoch 2:  68%|██████▊   | 71/104 [00:16<00:07,  4.38it/s, loss=0.223, v_num=0, reduced_train_loss=0.383, global_step=220.0, val_loss=0.239]
Epoch 2:  66%|██████▋   | 69/104 [00:15<00:07,  4.39it/s, loss=0.492, v_num=0, reduced_train_loss=0.444, global_step=218.0, val_loss=0.417]
Epoch 2:  69%|██████▉   | 72/104 [00:16<00:07,  4.38it/s, loss=0.229, v_num=0, reduced_train_loss=0.267, global_step=221.0, val_loss=0.239]
Epoch 2:  67%|██████▋   | 70/104 [00:15<00:07,  4.39it/s, loss=0.472, 

I0503 21:57:07.361057 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.16108885407447815



Epoch 2:  84%|████████▎ | 87/104 [00:18<00:03,  4.65it/s, loss=0.469, v_num=0, reduced_train_loss=0.506, global_step=224.0, val_loss=0.417]
Epoch 2:  89%|████████▉ | 93/104 [00:19<00:02,  4.79it/s, loss=0.239, v_num=0, reduced_train_loss=0.173, global_step=224.0, val_loss=0.239]
Epoch 2:  85%|████████▍ | 88/104 [00:18<00:03,  4.67it/s, loss=0.469, v_num=0, reduced_train_loss=0.506, global_step=224.0, val_loss=0.417]
Epoch 3:   1%|          | 1/104 [00:00<00:27,  3.71it/s, loss=0.283, v_num=0, reduced_train_loss=0.142, global_step=225.0, val_loss=0.161]]
Epoch 2:  86%|████████▌ | 89/104 [00:18<00:03,  4.70it/s, loss=0.469, v_num=0, reduced_train_loss=0.506, global_step=224.0, val_loss=0.417]
Epoch 2:  91%|█████████▏| 95/104 [00:19<00:01,  4.84it/s, loss=0.239, v_num=0, reduced_train_loss=0.173, global_step=224.0, val_loss=0.239]
Epoch 2:  87%|████████▋ | 90/104 [00:19<00:02,  4.72it/s, loss=0.469, v_num=0, reduced_train_loss=0.506, global_step=224.0, val_loss=0.417]
Epoch 3:   2%|▏    

I0503 21:57:08.755060 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10812219977378845


Epoch 3:   7%|▋         | 7/104 [00:01<00:22,  4.39it/s, loss=0.279, v_num=0, reduced_train_loss=0.321, global_step=231.0, val_loss=0.161]
Epoch 3:   1%|          | 1/104 [00:00<00:27,  3.78it/s, loss=0.23, v_num=0, reduced_train_loss=0.130, global_step=225.0, val_loss=0.108] 7]
Epoch 3:   2%|▏         | 2/104 [00:00<00:25,  4.08it/s, loss=0.228, v_num=0, reduced_train_loss=0.221, global_step=226.0, val_loss=0.108]7]
Epoch 2:  98%|█████████▊| 102/104 [00:20<00:00,  4.96it/s, loss=0.469, v_num=0, reduced_train_loss=0.506, global_step=224.0, val_loss=0.417]
Epoch 3:   9%|▊         | 9/104 [00:02<00:21,  4.44it/s, loss=0.225, v_num=0, reduced_train_loss=0.394, global_step=233.0, val_loss=0.161]7]
Epoch 2: 100%|██████████| 104/104 [00:20<00:00,  5.01it/s, loss=0.469, v_num=0, reduced_train_loss=0.506, global_step=224.0, val_loss=0.417]2023-05-03 21:57:09,456 - root - INFO - val_loss: 0.3517796993255615
Epoch 2: 100%|██████████| 104/104 [00:20<00:00,  5.01it/s, loss=0.469, v_num=0, reduced_

I0503 21:57:09.456427 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.3517796993255615


Epoch 3:  72%|███████▏  | 75/104 [00:16<00:06,  4.52it/s, loss=0.222, v_num=0, reduced_train_loss=0.218, global_step=299.0, val_loss=0.161]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 3:  66%|██████▋   | 69/104 [00:15<00:07,  4.50it/s, loss=0.184, v_num=0, reduced_train_loss=0.032, global_step=293.0, val_loss=0.108] 
Epoch 3:  62%|██████▏   | 64/104 [00:14<00:09,  4.35it/s, loss=0.543, v_num=0, reduced_train_loss=0.459, global_step=288.0, val_loss=0.352]
Epoch 3:  67%|██████▋   | 70/104 [00:15<00:07,  4.50it/s, loss=0.163, v_num=0, reduced_train_loss=0.143, global_step=294.0, val_loss=0.108]
Epoch 3:  62%|██████▎   | 65/104 [00:14<00:08,  4.35it/s, loss=0.553, v_num=0, reduced_train_loss=0.580, global_step=289.0, val_loss=0.352]
Epoch 3:  68%|██████▊   | 71/104 [00:15<00:07,  4.50it/s, loss=0.166, v_num=0, reduced_train_loss=0.178, global_step=295.0, val_loss=0.108]
Epoch 3:  69%|██████▉   | 72/104 [00:15<00:07,  4.50it/s, loss=0.153,

I0503 21:57:27.432579 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.11162658035755157



Epoch 3:  79%|███████▉  | 82/104 [00:18<00:04,  4.52it/s, loss=0.491, v_num=0, reduced_train_loss=0.454, global_step=299.0, val_loss=0.352]
Epoch 4:   1%|          | 1/104 [00:00<00:27,  3.71it/s, loss=0.219, v_num=0, reduced_train_loss=0.156, global_step=300.0, val_loss=0.112]8]
Epoch 3:  80%|███████▉  | 83/104 [00:18<00:04,  4.54it/s, loss=0.491, v_num=0, reduced_train_loss=0.454, global_step=299.0, val_loss=0.352]
Epoch 3:  90%|█████████ | 94/104 [00:19<00:02,  4.95it/s, loss=0.178, v_num=0, reduced_train_loss=0.0726, global_step=299.0, val_loss=0.108]
Epoch 3:  81%|████████  | 84/104 [00:18<00:04,  4.57it/s, loss=0.491, v_num=0, reduced_train_loss=0.454, global_step=299.0, val_loss=0.352]
Epoch 4:   2%|▏         | 2/104 [00:00<00:25,  4.03it/s, loss=0.234, v_num=0, reduced_train_loss=0.439, global_step=301.0, val_loss=0.112]8]
Epoch 3:  82%|████████▏ | 85/104 [00:18<00:04,  4.59it/s, loss=0.491, v_num=0, reduced_train_loss=0.454, global_step=299.0, val_loss=0.352]
Epoch 3:  92%|██

I0503 21:57:28.917787 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10436980426311493


Epoch 4:   7%|▋         | 7/104 [00:01<00:22,  4.34it/s, loss=0.204, v_num=0, reduced_train_loss=0.0288, global_step=306.0, val_loss=0.112]
Epoch 3:  90%|█████████ | 94/104 [00:19<00:02,  4.80it/s, loss=0.491, v_num=0, reduced_train_loss=0.454, global_step=299.0, val_loss=0.352]
Epoch 4:   8%|▊         | 8/104 [00:01<00:21,  4.37it/s, loss=0.204, v_num=0, reduced_train_loss=0.174, global_step=307.0, val_loss=0.112] 
Epoch 4:   2%|▏         | 2/104 [00:00<00:24,  4.18it/s, loss=0.185, v_num=0, reduced_train_loss=0.0523, global_step=301.0, val_loss=0.104]
Epoch 4:   9%|▊         | 9/104 [00:02<00:21,  4.39it/s, loss=0.173, v_num=0, reduced_train_loss=0.043, global_step=308.0, val_loss=0.112]]
Epoch 4:   3%|▎         | 3/104 [00:00<00:23,  4.29it/s, loss=0.176, v_num=0, reduced_train_loss=0.125, global_step=302.0, val_loss=0.104] 
Epoch 4:  10%|▉         | 10/104 [00:02<00:21,  4.39it/s, loss=0.171, v_num=0, reduced_train_loss=0.153, global_step=309.0, val_loss=0.112]
Epoch 4:   4%|▍     

I0503 21:57:30.213440 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.3288504481315613


Epoch 4:  72%|███████▏  | 75/104 [00:16<00:06,  4.51it/s, loss=0.236, v_num=0, reduced_train_loss=0.677, global_step=374.0, val_loss=0.112]  
Epoch 4:  59%|█████▊    | 61/104 [00:13<00:09,  4.40it/s, loss=0.528, v_num=0, reduced_train_loss=0.734, global_step=360.0, val_loss=0.329]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 4:  66%|██████▋   | 69/104 [00:15<00:07,  4.52it/s, loss=0.212, v_num=0, reduced_train_loss=0.0749, global_step=368.0, val_loss=0.104]
Epoch 4:  60%|█████▉    | 62/104 [00:14<00:09,  4.40it/s, loss=0.503, v_num=0, reduced_train_loss=0.312, global_step=361.0, val_loss=0.329]
Epoch 4:  67%|██████▋   | 70/104 [00:15<00:07,  4.52it/s, loss=0.201, v_num=0, reduced_train_loss=0.0528, global_step=369.0, val_loss=0.104]
Epoch 4:  61%|██████    | 63/104 [00:14<00:09,  4.40it/s, loss=0.518, v_num=0, reduced_train_loss=0.559, global_step=362.0, val_loss=0.329]
Epoch 4:  68%|██████▊   | 71/104 [00:15<00:07,  4.52it/s, loss=0.199, v_num=0, reduced_train_loss=0.05

I0503 21:57:47.522151 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1115463525056839



Epoch 4:  75%|███████▌  | 78/104 [00:17<00:05,  4.46it/s, loss=0.519, v_num=0, reduced_train_loss=0.449, global_step=374.0, val_loss=0.329]
Epoch 5:   1%|          | 1/104 [00:00<00:27,  3.76it/s, loss=0.239, v_num=0, reduced_train_loss=0.0833, global_step=375.0, val_loss=0.112]
Epoch 4:  76%|███████▌  | 79/104 [00:17<00:05,  4.49it/s, loss=0.519, v_num=0, reduced_train_loss=0.449, global_step=374.0, val_loss=0.329]
Epoch 4:  90%|█████████ | 94/104 [00:18<00:02,  4.97it/s, loss=0.139, v_num=0, reduced_train_loss=0.146, global_step=374.0, val_loss=0.104]
Validation DataLoader 0:  69%|██████▉   | 20/29 [00:02<00:01,  8.17it/s][A
Epoch 5:   2%|▏         | 2/104 [00:00<00:24,  4.10it/s, loss=0.231, v_num=0, reduced_train_loss=0.0683, global_step=376.0, val_loss=0.112]
Epoch 4:  92%|█████████▏| 96/104 [00:19<00:01,  5.01it/s, loss=0.139, v_num=0, reduced_train_loss=0.146, global_step=374.0, val_loss=0.104]
Epoch 4:  78%|███████▊  | 81/104 [00:17<00:05,  4.53it/s, loss=0.519, v_num=0, redu

I0503 21:57:48.981627 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07740198075771332


Epoch 5:   7%|▋         | 7/104 [00:01<00:22,  4.37it/s, loss=0.239, v_num=0, reduced_train_loss=0.198, global_step=381.0, val_loss=0.112]
Epoch 5:   1%|          | 1/104 [00:00<00:27,  3.78it/s, loss=0.13, v_num=0, reduced_train_loss=0.0104, global_step=375.0, val_loss=0.0774]
Epoch 5:   8%|▊         | 8/104 [00:01<00:21,  4.40it/s, loss=0.233, v_num=0, reduced_train_loss=0.0201, global_step=382.0, val_loss=0.112]
Epoch 5:   2%|▏         | 2/104 [00:00<00:24,  4.15it/s, loss=0.146, v_num=0, reduced_train_loss=0.461, global_step=376.0, val_loss=0.0774]
Epoch 5:   9%|▊         | 9/104 [00:02<00:21,  4.42it/s, loss=0.222, v_num=0, reduced_train_loss=0.0342, global_step=383.0, val_loss=0.112]
Epoch 5:  10%|▉         | 10/104 [00:02<00:21,  4.44it/s, loss=0.222, v_num=0, reduced_train_loss=0.0342, global_step=383.0, val_loss=0.112]
Epoch 5:   4%|▍         | 4/104 [00:00<00:23,  4.32it/s, loss=0.125, v_num=0, reduced_train_loss=0.0405, global_step=378.0, val_loss=0.0774]
Epoch 5:  11%|█    

I0503 21:57:50.845316 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.24179112911224365


Epoch 5:  72%|███████▏  | 75/104 [00:16<00:06,  4.50it/s, loss=0.156, v_num=0, reduced_train_loss=0.588, global_step=449.0, val_loss=0.112]4] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 5:  57%|█████▋    | 59/104 [00:13<00:10,  4.39it/s, loss=0.404, v_num=0, reduced_train_loss=0.402, global_step=433.0, val_loss=0.242]
Epoch 5:  66%|██████▋   | 69/104 [00:15<00:07,  4.49it/s, loss=0.171, v_num=0, reduced_train_loss=0.0223, global_step=443.0, val_loss=0.0774]
Epoch 5:  67%|██████▋   | 70/104 [00:15<00:07,  4.49it/s, loss=0.171, v_num=0, reduced_train_loss=0.0223, global_step=443.0, val_loss=0.0774]
Epoch 5:  67%|██████▋   | 70/104 [00:15<00:07,  4.49it/s, loss=0.157, v_num=0, reduced_train_loss=0.114, global_step=444.0, val_loss=0.0774] 
Epoch 5:  68%|██████▊   | 71/104 [00:15<00:07,  4.49it/s, loss=0.131, v_num=0, reduced_train_loss=0.0158, global_step=445.0, val_loss=0.0774]
Epoch 5:  77%|███████▋  | 80/104 [00:17<00:05,  4.63it/s, los

I0503 21:58:07.600023 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.09811785817146301


Epoch 5:  71%|███████   | 74/104 [00:16<00:06,  4.40it/s, loss=0.404, v_num=0, reduced_train_loss=0.165, global_step=448.0, val_loss=0.242]
Epoch 5:  88%|████████▊ | 91/104 [00:18<00:02,  4.86it/s, loss=0.117, v_num=0, reduced_train_loss=0.00904, global_step=449.0, val_loss=0.0774]
Epoch 5:  72%|███████▏  | 75/104 [00:17<00:06,  4.40it/s, loss=0.406, v_num=0, reduced_train_loss=0.389, global_step=449.0, val_loss=0.242]74]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 5:  89%|████████▉ | 93/104 [00:18<00:02,  4.91it/s, loss=0.117, v_num=0, reduced_train_loss=0.00904, global_step=449.0, val_loss=0.0774]
Epoch 6:   2%|▏         | 2/104 [00:00<00:24,  4.17it/s, loss=0.149, v_num=0, reduced_train_loss=0.0163, global_step=451.0, val_loss=0.0981]4]
Epoch 5:  73%|███████▎  | 76/104 [00:17<00:06,  4.41it/s, loss=0.406, v_num=0, reduced_train_loss=0.389, global_step=449.0, val_loss=

I0503 21:58:09.201494 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08989561349153519



Epoch 6:   1%|          | 1/104 [00:00<00:27,  3.71it/s, loss=0.114, v_num=0, reduced_train_loss=0.00804, global_step=450.0, val_loss=0.0899]
Epoch 6:   9%|▊         | 9/104 [00:02<00:21,  4.48it/s, loss=0.12, v_num=0, reduced_train_loss=0.00215, global_step=458.0, val_loss=0.0981] 
Epoch 6:   2%|▏         | 2/104 [00:00<00:25,  4.01it/s, loss=0.117, v_num=0, reduced_train_loss=0.110, global_step=451.0, val_loss=0.0899]  
Epoch 6:  10%|▉         | 10/104 [00:02<00:20,  4.49it/s, loss=0.12, v_num=0, reduced_train_loss=0.00908, global_step=459.0, val_loss=0.0981]
Epoch 6:   3%|▎         | 3/104 [00:00<00:24,  4.15it/s, loss=0.103, v_num=0, reduced_train_loss=0.0734, global_step=452.0, val_loss=0.0899]
Epoch 6:  11%|█         | 11/104 [00:02<00:20,  4.49it/s, loss=0.128, v_num=0, reduced_train_loss=0.248, global_step=460.0, val_loss=0.0981] 
Epoch 6:   4%|▍         | 4/104 [00:00<00:23,  4.23it/s, loss=0.114, v_num=0, reduced_train_loss=0.291, global_step=453.0, val_loss=0.0899] 
Epoch 6

I0503 21:58:11.569724 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.23421069979667664


Epoch 6:  68%|██████▊   | 71/104 [00:15<00:07,  4.48it/s, loss=0.209, v_num=0, reduced_train_loss=0.0167, global_step=520.0, val_loss=0.0899]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 6:  69%|██████▉   | 72/104 [00:16<00:07,  4.48it/s, loss=0.208, v_num=0, reduced_train_loss=0.375, global_step=521.0, val_loss=0.0899] 
Epoch 6:  58%|█████▊    | 60/104 [00:13<00:10,  4.32it/s, loss=0.232, v_num=0, reduced_train_loss=0.128, global_step=509.0, val_loss=0.234]]
Epoch 6:  70%|███████   | 73/104 [00:16<00:06,  4.48it/s, loss=0.183, v_num=0, reduced_train_loss=0.00416, global_step=522.0, val_loss=0.0899]
Epoch 6:  59%|█████▊    | 61/104 [00:14<00:09,  4.32it/s, loss=0.234, v_num=0, reduced_train_loss=0.159, global_step=510.0, val_loss=0.234]]
Epoch 6:  71%|███████   | 74/104 [00:16<00:06,  4.48it/s, loss=0.186, v_num=0, reduced_train_loss=0.0563, global_step=523.0, val_loss=

I0503 21:58:28.492641 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06033783778548241


Epoch 6:  71%|███████   | 74/104 [00:17<00:06,  4.34it/s, loss=0.206, v_num=0, reduced_train_loss=0.247, global_step=523.0, val_loss=0.234]
Epoch 6:  93%|█████████▎| 97/104 [00:19<00:01,  4.99it/s, loss=0.179, v_num=0, reduced_train_loss=0.0785, global_step=524.0, val_loss=0.0899]
Epoch 6:  72%|███████▏  | 75/104 [00:17<00:06,  4.34it/s, loss=0.2, v_num=0, reduced_train_loss=0.274, global_step=524.0, val_loss=0.234]  ]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 6:  95%|█████████▌| 99/104 [00:19<00:00,  5.03it/s, loss=0.179, v_num=0, reduced_train_loss=0.0785, global_step=524.0, val_loss=0.0899]
Epoch 7:   2%|▏         | 2/104 [00:00<00:26,  3.80it/s, loss=0.125, v_num=0, reduced_train_loss=0.0141, global_step=526.0, val_loss=0.0603]9]
Epoch 6:  73%|███████▎  | 76/104 [00:17<00:06,  4.35it/s, loss=0.2, v_num=0, reduced_train_loss=0.274, global_step=524.0, val_loss=0.234

I0503 21:58:29.437052 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07053378224372864



Epoch 7:   5%|▍         | 5/104 [00:01<00:25,  3.96it/s, loss=0.101, v_num=0, reduced_train_loss=0.113, global_step=529.0, val_loss=0.0603]]
Epoch 7:   2%|▏         | 2/104 [00:00<00:25,  4.08it/s, loss=0.167, v_num=0, reduced_train_loss=0.293, global_step=526.0, val_loss=0.0705] 
Epoch 7:   6%|▌         | 6/104 [00:01<00:24,  3.96it/s, loss=0.104, v_num=0, reduced_train_loss=0.0709, global_step=530.0, val_loss=0.0603]
Epoch 7:   3%|▎         | 3/104 [00:00<00:24,  4.21it/s, loss=0.151, v_num=0, reduced_train_loss=0.0156, global_step=527.0, val_loss=0.0705]
Epoch 7:   7%|▋         | 7/104 [00:01<00:24,  3.97it/s, loss=0.107, v_num=0, reduced_train_loss=0.098, global_step=531.0, val_loss=0.0603] 
Epoch 7:   4%|▍         | 4/104 [00:00<00:23,  4.27it/s, loss=0.163, v_num=0, reduced_train_loss=0.321, global_step=528.0, val_loss=0.0705] 
Epoch 7:   8%|▊         | 8/104 [00:02<00:24,  3.97it/s, loss=0.113, v_num=0, reduced_train_loss=0.322, global_step=532.0, val_loss=0.0603]
Epoch 7:   5%

I0503 21:58:32.413452 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.2053402066230774


Epoch 7:  72%|███████▏  | 75/104 [00:16<00:06,  4.47it/s, loss=0.0647, v_num=0, reduced_train_loss=0.0173, global_step=599.0, val_loss=0.0705]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 7:  69%|██████▉   | 72/104 [00:17<00:07,  4.01it/s, loss=0.174, v_num=0, reduced_train_loss=0.0425, global_step=596.0, val_loss=0.0603]  
Epoch 7:  59%|█████▊    | 61/104 [00:14<00:10,  4.29it/s, loss=0.349, v_num=0, reduced_train_loss=0.408, global_step=585.0, val_loss=0.205] 5]
Epoch 7:  70%|███████   | 73/104 [00:18<00:07,  4.01it/s, loss=0.17, v_num=0, reduced_train_loss=0.114, global_step=597.0, val_loss=0.0603]  ]
Epoch 7:  60%|█████▉    | 62/104 [00:14<00:09,  4.30it/s, loss=0.349, v_num=0, reduced_train_loss=0.211, global_step=586.0, val_loss=0.205]05]
Epoch 7:  71%|███████   | 74/104 [00:18<00:07,  4.01it/s, loss=0.168, v_num=0, reduced_train_loss=0.0497, global_step=598.0, val

I0503 21:58:49.720328 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.061218589544296265


Epoch 7:  72%|███████▏  | 75/104 [00:17<00:06,  4.31it/s, loss=0.188, v_num=0, reduced_train_loss=0.148, global_step=599.0, val_loss=0.205] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 7:  93%|█████████▎| 97/104 [00:21<00:01,  4.54it/s, loss=0.161, v_num=0, reduced_train_loss=0.00395, global_step=599.0, val_loss=0.0603]
Epoch 8:   1%|          | 1/104 [00:00<00:27,  3.80it/s, loss=0.0688, v_num=0, reduced_train_loss=0.116, global_step=600.0, val_loss=0.0612] ]
Epoch 7:  73%|███████▎  | 76/104 [00:17<00:06,  4.33it/s, loss=0.188, v_num=0, reduced_train_loss=0.148, global_step=599.0, val_loss=0.205]
Epoch 7:  95%|█████████▌| 99/104 [00:21<00:01,  4.58it/s, loss=0.161, v_num=0, reduced_train_loss=0.00395, global_step=599.0, val_loss=0.0603]
Epoch 8:   2%|▏         | 2/104 [00:00<00:24,  4.14it/s, loss=0.0693, v_num=0, reduced_train_loss=0.0354, global_step=601.0, val_loss=0

I0503 21:58:50.643182 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.12819087505340576



Epoch 8:   1%|          | 1/104 [00:00<00:27,  3.78it/s, loss=0.161, v_num=0, reduced_train_loss=0.0204, global_step=600.0, val_loss=0.128]  
Epoch 8:   6%|▌         | 6/104 [00:01<00:22,  4.38it/s, loss=0.0695, v_num=0, reduced_train_loss=0.0274, global_step=605.0, val_loss=0.0612]
Epoch 8:   2%|▏         | 2/104 [00:00<00:24,  4.12it/s, loss=0.161, v_num=0, reduced_train_loss=0.00965, global_step=601.0, val_loss=0.128]
Epoch 8:   3%|▎         | 3/104 [00:00<00:23,  4.25it/s, loss=0.147, v_num=0, reduced_train_loss=0.0137, global_step=602.0, val_loss=0.128]  
Epoch 7:  84%|████████▎ | 87/104 [00:18<00:03,  4.59it/s, loss=0.188, v_num=0, reduced_train_loss=0.148, global_step=599.0, val_loss=0.205]
Epoch 8:   4%|▍         | 4/104 [00:00<00:23,  4.31it/s, loss=0.146, v_num=0, reduced_train_loss=0.0208, global_step=603.0, val_loss=0.128]]
Epoch 7:  86%|████████▌ | 89/104 [00:19<00:03,  4.63it/s, loss=0.188, v_num=0, reduced_train_loss=0.148, global_step=599.0, val_loss=0.205]
Epoch 8:   

I0503 21:58:53.434782 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.13637514412403107


Epoch 8:  72%|███████▏  | 75/104 [00:16<00:06,  4.48it/s, loss=0.037, v_num=0, reduced_train_loss=0.00306, global_step=674.0, val_loss=0.0612]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 8:  69%|██████▉   | 72/104 [00:15<00:07,  4.50it/s, loss=0.145, v_num=0, reduced_train_loss=0.547, global_step=671.0, val_loss=0.128] 
Epoch 8:  56%|█████▌    | 58/104 [00:13<00:10,  4.39it/s, loss=0.244, v_num=0, reduced_train_loss=0.433, global_step=657.0, val_loss=0.136]12]
Epoch 8:  57%|█████▋    | 59/104 [00:13<00:10,  4.39it/s, loss=0.238, v_num=0, reduced_train_loss=0.0277, global_step=658.0, val_loss=0.136]2]
Epoch 8:  75%|███████▌  | 78/104 [00:17<00:05,  4.54it/s, loss=0.037, v_num=0, reduced_train_loss=0.00306, global_step=674.0, val_loss=0.0612]
Epoch 8:  58%|█████▊    | 60/104 [00:13<00:10,  4.39it/s, loss=0.262, v_num=0, reduced_train_loss=0.730, global_step=659.0, val_loss=0.136] 2]
Epoch 8:  77%|███████▋  | 80/104 [00:17<00:05,  4.59it

I0503 21:59:09.981894 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.055116742849349976


Epoch 9:   1%|          | 1/104 [00:00<00:27,  3.79it/s, loss=0.0399, v_num=0, reduced_train_loss=0.0692, global_step=675.0, val_loss=0.0551]
Epoch 8:  95%|█████████▌| 99/104 [00:19<00:00,  5.05it/s, loss=0.134, v_num=0, reduced_train_loss=0.0122, global_step=674.0, val_loss=0.128]
Epoch 8:  72%|███████▏  | 75/104 [00:16<00:06,  4.42it/s, loss=0.247, v_num=0, reduced_train_loss=0.0251, global_step=674.0, val_loss=0.136]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 9:   2%|▏         | 2/104 [00:00<00:25,  4.01it/s, loss=0.0349, v_num=0, reduced_train_loss=0.0259, global_step=676.0, val_loss=0.0551]
Epoch 8:  97%|█████████▋| 101/104 [00:19<00:00,  5.09it/s, loss=0.134, v_num=0, reduced_train_loss=0.0122, global_step=674.0, val_loss=0.128]
Epoch 8:  73%|███████▎  | 76/104 [00:17<00:06,  4.43it/s, loss=0.247, v_num=0, reduced_train_loss=0.0251, global_step=674.0, val_loss=0.136]
Epoch 9:   3%|▎         | 3/104 [00:00<00:24,  4.15it/s, loss=

I0503 21:59:10.802353 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.0649709701538086



Epoch 9:   4%|▍         | 4/104 [00:00<00:23,  4.23it/s, loss=0.0479, v_num=0, reduced_train_loss=0.252, global_step=678.0, val_loss=0.0551] 
Epoch 9:   1%|          | 1/104 [00:00<00:27,  3.76it/s, loss=0.133, v_num=0, reduced_train_loss=0.00259, global_step=675.0, val_loss=0.065]
Epoch 9:   5%|▍         | 5/104 [00:01<00:23,  4.27it/s, loss=0.0473, v_num=0, reduced_train_loss=0.00105, global_step=679.0, val_loss=0.0551]
Epoch 9:   2%|▏         | 2/104 [00:00<00:24,  4.13it/s, loss=0.128, v_num=0, reduced_train_loss=0.179, global_step=676.0, val_loss=0.065]  
Epoch 9:   6%|▌         | 6/104 [00:01<00:22,  4.30it/s, loss=0.0469, v_num=0, reduced_train_loss=0.00424, global_step=680.0, val_loss=0.0551]
Epoch 9:   3%|▎         | 3/104 [00:00<00:23,  4.26it/s, loss=0.129, v_num=0, reduced_train_loss=0.0438, global_step=677.0, val_loss=0.065]]
Epoch 9:   7%|▋         | 7/104 [00:01<00:22,  4.32it/s, loss=0.0419, v_num=0, reduced_train_loss=0.00393, global_step=681.0, val_loss=0.0551]
Epoch

I0503 21:59:13.943663 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.14293226599693298


Epoch 9:  72%|███████▏  | 75/104 [00:16<00:06,  4.48it/s, loss=0.0623, v_num=0, reduced_train_loss=0.0465, global_step=749.0, val_loss=0.0551] 
Epoch 9:  54%|█████▍    | 56/104 [00:12<00:10,  4.38it/s, loss=0.181, v_num=0, reduced_train_loss=0.167, global_step=730.0, val_loss=0.143]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 9:  69%|██████▉   | 72/104 [00:16<00:07,  4.49it/s, loss=0.0987, v_num=0, reduced_train_loss=0.00391, global_step=746.0, val_loss=0.065]
Epoch 9:  55%|█████▍    | 57/104 [00:13<00:10,  4.38it/s, loss=0.182, v_num=0, reduced_train_loss=0.112, global_step=731.0, val_loss=0.143]51]
Epoch 9:  70%|███████   | 73/104 [00:16<00:06,  4.49it/s, loss=0.0986, v_num=0, reduced_train_loss=0.0125, global_step=747.0, val_loss=0.065] 
Epoch 9:  56%|█████▌    | 58/104 [00:13<00:10,  4.38it/s, loss=0.181, v_num=0, reduced_train_loss=0.0348, global_step=732.0, val_loss=0.143]1]
Epoch 9:  71%|███████   | 74/104 [00:16<00:06,  4.49it/s, loss=0.129, v_num=0, reduced_tra

I0503 21:59:30.183434 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.05615731701254845



Epoch 10:   1%|          | 1/104 [00:00<00:28,  3.61it/s, loss=0.062, v_num=0, reduced_train_loss=0.000707, global_step=750.0, val_loss=0.0562]
Epoch 9:  70%|███████   | 73/104 [00:16<00:07,  4.38it/s, loss=0.228, v_num=0, reduced_train_loss=0.250, global_step=747.0, val_loss=0.143]
Epoch 10:   2%|▏         | 2/104 [00:00<00:25,  3.96it/s, loss=0.0653, v_num=0, reduced_train_loss=0.0704, global_step=751.0, val_loss=0.0562] 
Epoch 9:  71%|███████   | 74/104 [00:16<00:06,  4.38it/s, loss=0.231, v_num=0, reduced_train_loss=0.135, global_step=748.0, val_loss=0.143]]
Epoch 10:   3%|▎         | 3/104 [00:00<00:24,  4.11it/s, loss=0.0615, v_num=0, reduced_train_loss=0.0108, global_step=752.0, val_loss=0.0562]
Epoch 9:  99%|█████████▉| 103/104 [00:20<00:00,  5.11it/s, loss=0.165, v_num=0, reduced_train_loss=0.781, global_step=749.0, val_loss=0.065]
Epoch 9: 100%|██████████| 104/104 [00:20<00:00,  5.14it/s, loss=0.165, v_num=0, reduced_train_loss=0.781, global_step=749.0, val_loss=0.065]2023-0

I0503 21:59:31.048685 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.14145277440547943


Epoch 10:   5%|▍         | 5/104 [00:01<00:23,  4.26it/s, loss=0.0765, v_num=0, reduced_train_loss=0.412, global_step=754.0, val_loss=0.0562]  
Epoch 10:   8%|▊         | 8/104 [00:01<00:21,  4.37it/s, loss=0.0633, v_num=0, reduced_train_loss=0.0524, global_step=757.0, val_loss=0.0562]  
Epoch 10:   9%|▊         | 9/104 [00:02<00:21,  4.39it/s, loss=0.0588, v_num=0, reduced_train_loss=0.027, global_step=758.0, val_loss=0.0562] 
Epoch 9:  76%|███████▌  | 79/104 [00:18<00:05,  4.32it/s, loss=0.231, v_num=0, reduced_train_loss=0.0496, global_step=749.0, val_loss=0.143]
Epoch 10:  10%|▉         | 10/104 [00:02<00:21,  4.41it/s, loss=0.0604, v_num=0, reduced_train_loss=0.0322, global_step=759.0, val_loss=0.0562]
Epoch 9:  78%|███████▊  | 81/104 [00:18<00:05,  4.37it/s, loss=0.231, v_num=0, reduced_train_loss=0.0496, global_step=749.0, val_loss=0.143]
Epoch 10:  11%|█         | 11/104 [00:02<00:21,  4.41it/s, loss=0.0667, v_num=0, reduced_train_loss=0.152, global_step=760.0, val_loss=0.0562]

I0503 21:59:35.337958 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.15198786556720734


Epoch 10:  72%|███████▏  | 75/104 [00:17<00:06,  4.38it/s, loss=0.0429, v_num=0, reduced_train_loss=0.0769, global_step=824.0, val_loss=0.0562]  
Epoch 10:  68%|██████▊   | 71/104 [00:16<00:07,  4.36it/s, loss=0.0727, v_num=0, reduced_train_loss=0.0121, global_step=820.0, val_loss=0.141] 
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 10:  50%|█████     | 52/104 [00:12<00:12,  4.32it/s, loss=0.166, v_num=0, reduced_train_loss=0.363, global_step=801.0, val_loss=0.152] 
Epoch 10:  51%|█████     | 53/104 [00:12<00:11,  4.32it/s, loss=0.18, v_num=0, reduced_train_loss=0.387, global_step=802.0, val_loss=0.152] 41]
Epoch 10:  74%|███████▍  | 77/104 [00:17<00:06,  4.42it/s, loss=0.0429, v_num=0, reduced_train_loss=0.0769, global_step=824.0, val_loss=0.0562]
Epoch 10:  52%|█████▏    | 54/104 [00:12<00:11,  4.32it/s, loss=0.172, v_num=0, reduced_train_loss=0.026, global_step=803.0, val_loss=0.152]1] 
Epoch 10:  71%|███████   | 74/104 [00:16<00:06,  4.37it/s, loss=0.0744, v_num=0, r

I0503 21:59:50.828982 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06780713051557541



Epoch 10:  65%|██████▌   | 68/104 [00:15<00:08,  4.35it/s, loss=0.164, v_num=0, reduced_train_loss=0.0321, global_step=817.0, val_loss=0.152]]
Epoch 11:   1%|          | 1/104 [00:00<00:27,  3.79it/s, loss=0.0568, v_num=0, reduced_train_loss=0.383, global_step=825.0, val_loss=0.0678] 
Epoch 10:  66%|██████▋   | 69/104 [00:15<00:08,  4.35it/s, loss=0.169, v_num=0, reduced_train_loss=0.104, global_step=818.0, val_loss=0.152] 1]
Epoch 11:   2%|▏         | 2/104 [00:00<00:24,  4.15it/s, loss=0.0592, v_num=0, reduced_train_loss=0.0563, global_step=826.0, val_loss=0.0678]]
Epoch 10:  67%|██████▋   | 70/104 [00:16<00:07,  4.35it/s, loss=0.191, v_num=0, reduced_train_loss=0.483, global_step=819.0, val_loss=0.152]41]
Epoch 11:   3%|▎         | 3/104 [00:00<00:23,  4.26it/s, loss=0.0591, v_num=0, reduced_train_loss=0.000503, global_step=827.0, val_loss=0.0678]
Epoch 10: 100%|██████████| 104/104 [00:20<00:00,  5.06it/s, loss=0.0753, v_num=0, reduced_train_loss=0.0335, global_step=824.0, val_loss

I0503 21:59:51.594717 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.11595574021339417


Epoch 10:  72%|███████▏  | 75/104 [00:17<00:06,  4.36it/s, loss=0.191, v_num=0, reduced_train_loss=0.196, global_step=824.0, val_loss=0.152] ]] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 11:   8%|▊         | 8/104 [00:01<00:21,  4.41it/s, loss=0.0605, v_num=0, reduced_train_loss=0.00405, global_step=832.0, val_loss=0.0678]
Epoch 11:   5%|▍         | 5/104 [00:01<00:22,  4.42it/s, loss=0.0598, v_num=0, reduced_train_loss=0.00193, global_step=829.0, val_loss=0.116]
Epoch 11:   6%|▌         | 6/104 [00:01<00:22,  4.44it/s, loss=0.0598, v_num=0, reduced_train_loss=0.00364, global_step=830.0, val_loss=0.116] 
Epoch 10:  75%|███████▌  | 78/104 [00:17<00:05,  4.43it/s, loss=0.191, v_num=0, reduced_train_loss=0.196, global_step=824.0, val_loss=0.152]
Epoch 11:   7%|▋         | 7/104 [00:01<00:21,  4.47it/s, loss=0.0464, v_num=0, reduced_train_loss=0.0258, global_step=831.0, val_loss=0.116] 
Epoch 11:  11%|█         | 11/104 [00:02<00:21,  4.4

I0503 21:59:56.059919 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.20047898590564728


Epoch 11:  69%|██████▉   | 72/104 [00:15<00:07,  4.52it/s, loss=0.0528, v_num=0, reduced_train_loss=0.0083, global_step=895.0, val_loss=0.116]   
Epoch 11:  69%|██████▉   | 72/104 [00:15<00:07,  4.51it/s, loss=0.0725, v_num=0, reduced_train_loss=0.398, global_step=896.0, val_loss=0.116] 
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 11:  48%|████▊     | 50/104 [00:11<00:12,  4.32it/s, loss=0.205, v_num=0, reduced_train_loss=0.501, global_step=874.0, val_loss=0.200]
Epoch 11:  70%|███████   | 73/104 [00:16<00:06,  4.52it/s, loss=0.0904, v_num=0, reduced_train_loss=0.587, global_step=897.0, val_loss=0.116]
Epoch 11:  49%|████▉     | 51/104 [00:11<00:12,  4.32it/s, loss=0.221, v_num=0, reduced_train_loss=0.328, global_step=875.0, val_loss=0.200]]
Epoch 11:  71%|███████   | 74/104 [00:16<00:06,  4.52it/s, loss=0.0926, v_num=0, reduced_train_loss=0.0501, global_step=898.0, val_loss=0.116]
Epoch 11:  72%|███████▏  | 75/104 [00:16<00:06,  4.52it/s, loss=0.128, v_num=0, reduced_t

I0503 22:00:11.100620 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04365125298500061


Epoch 11:  63%|██████▎   | 66/104 [00:15<00:08,  4.33it/s, loss=0.169, v_num=0, reduced_train_loss=0.0671, global_step=890.0, val_loss=0.200]
Epoch 12:   1%|          | 1/104 [00:00<00:28,  3.63it/s, loss=0.11, v_num=0, reduced_train_loss=0.00225, global_step=900.0, val_loss=0.0437]
Epoch 11:  64%|██████▍   | 67/104 [00:15<00:08,  4.34it/s, loss=0.191, v_num=0, reduced_train_loss=0.641, global_step=891.0, val_loss=0.200] 
Epoch 12:   2%|▏         | 2/104 [00:00<00:25,  3.97it/s, loss=0.128, v_num=0, reduced_train_loss=0.357, global_step=901.0, val_loss=0.0437] 
Epoch 11: 100%|██████████| 104/104 [00:20<00:00,  5.19it/s, loss=0.128, v_num=0, reduced_train_loss=0.710, global_step=899.0, val_loss=0.116]2023-05-03 22:00:11,646 - root - INFO - val_loss: 0.10708499699831009
Epoch 11: 100%|██████████| 104/104 [00:20<00:00,  5.19it/s, loss=0.128, v_num=0, reduced_train_loss=0.710, global_step=899.0, val_loss=0.107]
Epoch 12:   0%|          | 0/104 [00:00<?, ?it/s, loss=0.128, v_num=0, reduced_

I0503 22:00:11.646350 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10708499699831009


Epoch 11:  72%|███████▏  | 75/104 [00:17<00:06,  4.34it/s, loss=0.185, v_num=0, reduced_train_loss=1.110, global_step=899.0, val_loss=0.200]]7]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 12:   8%|▊         | 8/104 [00:01<00:21,  4.37it/s, loss=0.157, v_num=0, reduced_train_loss=0.0548, global_step=907.0, val_loss=0.107] 37]
Epoch 12:  11%|█         | 11/104 [00:02<00:21,  4.33it/s, loss=0.0722, v_num=0, reduced_train_loss=0.450, global_step=910.0, val_loss=0.0437]  
Epoch 12:   9%|▊         | 9/104 [00:02<00:21,  4.39it/s, loss=0.165, v_num=0, reduced_train_loss=0.185, global_step=908.0, val_loss=0.107] 
Epoch 12:  12%|█▏        | 12/104 [00:02<00:21,  4.34it/s, loss=0.0701, v_num=0, reduced_train_loss=0.00177, global_step=911.0, val_loss=0.0437]
Epoch 12:  10%|▉         | 10/104 [00:02<00:21,  4.40it/s, loss=0.166, v_num=0, reduced_train_loss=0.0123, global_step=909.0, val_loss=0.107]
Epoch 12:  11%|█         | 11/104 [00:02<00:21,  4

I0503 22:00:16.965550 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.12795104086399078


Epoch 12:  72%|███████▏  | 75/104 [00:16<00:06,  4.44it/s, loss=0.0344, v_num=0, reduced_train_loss=0.0191, global_step=974.0, val_loss=0.0437]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 12:  46%|████▌     | 48/104 [00:11<00:12,  4.33it/s, loss=0.12, v_num=0, reduced_train_loss=0.0306, global_step=947.0, val_loss=0.128]7]  
Epoch 12:  72%|███████▏  | 75/104 [00:16<00:06,  4.51it/s, loss=0.085, v_num=0, reduced_train_loss=0.00868, global_step=974.0, val_loss=0.107]]
Epoch 12:  74%|███████▍  | 77/104 [00:17<00:06,  4.49it/s, loss=0.0344, v_num=0, reduced_train_loss=0.0191, global_step=974.0, val_loss=0.0437]
Epoch 12:  47%|████▋     | 49/104 [00:11<00:12,  4.33it/s, loss=0.123, v_num=0, reduced_train_loss=0.0718, global_step=948.0, val_loss=0.128]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 12:  75%|███████▌  | 78/104 [00:17<00:05,  4.51it/s, loss=0.0344

I0503 22:00:31.471611 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06056351959705353


Epoch 12:  62%|██████▏   | 64/104 [00:14<00:09,  4.35it/s, loss=0.275, v_num=0, reduced_train_loss=0.441, global_step=963.0, val_loss=0.128] 
Epoch 13:   1%|          | 1/104 [00:00<00:27,  3.77it/s, loss=0.0393, v_num=0, reduced_train_loss=0.101, global_step=975.0, val_loss=0.0606] ]
Epoch 12: 100%|██████████| 104/104 [00:20<00:00,  5.17it/s, loss=0.085, v_num=0, reduced_train_loss=0.00868, global_step=974.0, val_loss=0.107]2023-05-03 22:00:31,762 - root - INFO - val_loss: 0.08079487085342407
Epoch 12: 100%|██████████| 104/104 [00:20<00:00,  5.17it/s, loss=0.085, v_num=0, reduced_train_loss=0.00868, global_step=974.0, val_loss=0.0808]
Epoch 13:   0%|          | 0/104 [00:00<?, ?it/s, loss=0.085, v_num=0, reduced_train_loss=0.00868, global_step=974.0, val_loss=0.0808]          

I0503 22:00:31.762665 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08079487085342407


Epoch 12:  72%|███████▏  | 75/104 [00:17<00:06,  4.37it/s, loss=0.303, v_num=0, reduced_train_loss=0.0834, global_step=974.0, val_loss=0.128]808]
Epoch 13:  12%|█▏        | 12/104 [00:02<00:20,  4.47it/s, loss=0.0435, v_num=0, reduced_train_loss=0.0623, global_step=986.0, val_loss=0.0606] 
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 13:  11%|█         | 11/104 [00:02<00:21,  4.36it/s, loss=0.0441, v_num=0, reduced_train_loss=0.032, global_step=985.0, val_loss=0.0808]   
Epoch 13:  12%|█▎        | 13/104 [00:02<00:20,  4.48it/s, loss=0.0436, v_num=0, reduced_train_loss=0.0024, global_step=987.0, val_loss=0.0606]
Epoch 13:  12%|█▏        | 12/104 [00:02<00:21,  4.37it/s, loss=0.0494, v_num=0, reduced_train_loss=0.137, global_step=986.0, val_loss=0.0808]
Epoch 13:  13%|█▎        | 14/104 [00:03<00:20,  4.48it/s, loss=0.0402, v_num=0, reduced_train_loss=0.0113, global_step=988.0, val_loss=0.0606]
Epoch 13:  12%|█▎        | 13/104 [00:02<00:20,  4.37it/s, loss=0.0492, v_num=

I0503 22:00:37.661150 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.15789301693439484


Epoch 13:  72%|███████▏  | 75/104 [00:17<00:06,  4.22it/s, loss=0.108, v_num=0, reduced_train_loss=0.00187, global_step=1049.0, val_loss=0.0606]] 
Epoch 13:  72%|███████▏  | 75/104 [00:17<00:06,  4.28it/s, loss=0.0831, v_num=0, reduced_train_loss=0.0122, global_step=1049.0, val_loss=0.0808] 
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 13:  73%|███████▎  | 76/104 [00:17<00:06,  4.23it/s, loss=0.108, v_num=0, reduced_train_loss=0.00187, global_step=1049.0, val_loss=0.0606]
Epoch 13:  49%|████▉     | 51/104 [00:11<00:12,  4.32it/s, loss=0.164, v_num=0, reduced_train_loss=0.0667, global_step=1025.0, val_loss=0.158]8]
Epoch 13:  74%|███████▍  | 77/104 [00:18<00:06,  4.25it/s, loss=0.108, v_num=0, reduced_train_loss=0.00187, global_step=1049.0, val_loss=0.06

I0503 22:00:52.860242 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08299127221107483


Epoch 14:   1%|          | 1/104 [00:00<00:28,  3.57it/s, loss=0.083, v_num=0, reduced_train_loss=0.00112, global_step=1050.0, val_loss=0.083]
Epoch 13:  65%|██████▌   | 68/104 [00:15<00:08,  4.35it/s, loss=0.141, v_num=0, reduced_train_loss=0.196, global_step=1042.0, val_loss=0.158]606]
Epoch 14:   2%|▏         | 2/104 [00:00<00:25,  3.95it/s, loss=0.0869, v_num=0, reduced_train_loss=0.0929, global_step=1051.0, val_loss=0.083]06]
Epoch 13: 100%|██████████| 104/104 [00:21<00:00,  4.74it/s, loss=0.108, v_num=0, reduced_train_loss=0.00187, global_step=1049.0, val_loss=0.0606]2023-05-03 22:00:53,418 - root - INFO - val_loss: 0.0606265589594841
Epoch 13: 100%|██████████| 104/104 [00:21<00:00,  4.74it/s, loss=0.108, v_num=0, reduced_train_loss=0.00187, global_step=1049.0, val_loss=0.0606]
Epoch 14:   0%|          | 0/104 [00:00<?, ?it/s, loss=0.108, v_num=0, reduced_train_loss=0.00187, global_step=1049.0, val_loss=0.0606]          

I0503 22:00:53.418156 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.0606265589594841


Epoch 13:  72%|███████▏  | 75/104 [00:17<00:06,  4.36it/s, loss=0.15, v_num=0, reduced_train_loss=0.461, global_step=1049.0, val_loss=0.158]   6]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 14:   9%|▊         | 9/104 [00:02<00:22,  4.27it/s, loss=0.107, v_num=0, reduced_train_loss=0.0125, global_step=1058.0, val_loss=0.083]]   
Epoch 13:  73%|███████▎  | 76/104 [00:17<00:06,  4.37it/s, loss=0.15, v_num=0, reduced_train_loss=0.461, global_step=1049.0, val_loss=0.158]
Epoch 14:  10%|▉         | 10/104 [00:02<00:21,  4.29it/s, loss=0.105, v_num=0, reduced_train_loss=0.00193, global_step=1059.0, val_loss=0.083]
Epoch 13:  75%|███████▌  | 78/104 [00:17<00:05,  4.42it/s, loss=0.15, v_num=0, reduced_train_loss=0.461, global_step=1049.0, val_loss=0.158]
Epoch 14:   8%|▊         | 8/104 [00:02<00:24,  3.96it/s, loss=0.0601, v_num=0, reduced_train_loss=0.00483, global_step=1057.0, val_loss=0.0606]
Epoch 14:  12%|█▏        | 12/104 [00:02<00:21,  

I0503 22:00:58.406319 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1434473693370819


Epoch 14:  72%|███████▏  | 75/104 [00:17<00:06,  4.40it/s, loss=0.0653, v_num=0, reduced_train_loss=0.0919, global_step=1124.0, val_loss=0.083]6] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 14:  49%|████▉     | 51/104 [00:11<00:12,  4.36it/s, loss=0.126, v_num=0, reduced_train_loss=0.160, global_step=1100.0, val_loss=0.143]0606]
Epoch 14:  50%|█████     | 52/104 [00:11<00:11,  4.36it/s, loss=0.126, v_num=0, reduced_train_loss=0.160, global_step=1100.0, val_loss=0.143]6]   
Epoch 14:  75%|███████▌  | 78/104 [00:17<00:05,  4.47it/s, loss=0.0653, v_num=0, reduced_train_loss=0.0919, global_step=1124.0, val_loss=0.083]
Epoch 14:  72%|███████▏  | 75/104 [00:17<00:06,  4.38it/s, loss=0.0134, v_num=0, reduced_train_loss=0.0018, global_step=1124.0, val_loss=0.0606]
Epoch 14:  51%|█████     | 53/104 [00:12<00:11,  4.36it/s, loss=0.117, v_num=0, reduced_train_loss=0.00431, global_

I0503 22:01:13.383130 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.05828892067074776


Epoch 14:  63%|██████▎   | 66/104 [00:15<00:08,  4.38it/s, loss=0.146, v_num=0, reduced_train_loss=0.160, global_step=1115.0, val_loss=0.143] 
Epoch 15:   1%|          | 1/104 [00:00<00:28,  3.64it/s, loss=0.0395, v_num=0, reduced_train_loss=0.0011, global_step=1125.0, val_loss=0.0583]]
Epoch 14:  64%|██████▍   | 67/104 [00:15<00:08,  4.38it/s, loss=0.137, v_num=0, reduced_train_loss=0.0249, global_step=1116.0, val_loss=0.143]06]
Epoch 14:  65%|██████▌   | 68/104 [00:15<00:08,  4.38it/s, loss=0.141, v_num=0, reduced_train_loss=0.0873, global_step=1117.0, val_loss=0.143]]6]
Epoch 14:  98%|█████████▊| 102/104 [00:20<00:00,  4.97it/s, loss=0.0134, v_num=0, reduced_train_loss=0.0018, global_step=1124.0, val_loss=0.0606]
Epoch 15:   3%|▎         | 3/104 [00:00<00:24,  4.14it/s, loss=0.0422, v_num=0, reduced_train_loss=0.00371, global_step=1127.0, val_loss=0.0583]]
Epoch 14: 100%|██████████| 104/104 [00:20<00:00,  5.02it/s, loss=0.0134, v_num=0, reduced_train_loss=0.0018, global_step=1124.0,

I0503 22:01:14.142695 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.042552873492240906


Epoch 14:  72%|███████▏  | 75/104 [00:17<00:06,  4.39it/s, loss=0.121, v_num=0, reduced_train_loss=0.300, global_step=1124.0, val_loss=0.143]    
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 15:   6%|▌         | 6/104 [00:01<00:22,  4.32it/s, loss=0.0208, v_num=0, reduced_train_loss=0.00308, global_step=1130.0, val_loss=0.0426]
Epoch 15:   7%|▋         | 7/104 [00:01<00:22,  4.33it/s, loss=0.0205, v_num=0, reduced_train_loss=0.000535, global_step=1131.0, val_loss=0.0426]
Epoch 14:  74%|███████▍  | 77/104 [00:17<00:06,  4.43it/s, loss=0.121, v_num=0, reduced_train_loss=0.300, global_step=1124.0, val_loss=0.143]
Epoch 15:   8%|▊         | 8/104 [00:01<00:22,  4.35it/s, loss=0.0178, v_num=0, reduced_train_loss=0.000187, global_step=1132.0, val_loss=0.0426]
Epoch 14:  76%|███████▌  | 79/104 [00:17<00:05,  4.48it/s, loss=0.121, v_num=0, reduced_train_loss=0.300, global_step=1124.0, val_loss=0.143]
Epoch 15:   9%|▊         | 9/104 [00:02<00:21

I0503 22:01:19.088216 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.14755719900131226


Epoch 15:  72%|███████▏  | 75/104 [00:17<00:06,  4.41it/s, loss=0.055, v_num=0, reduced_train_loss=0.00802, global_step=1199.0, val_loss=0.0583]] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 15:  71%|███████   | 74/104 [00:16<00:06,  4.51it/s, loss=0.0186, v_num=0, reduced_train_loss=0.045, global_step=1198.0, val_loss=0.0426]  
Epoch 15:  49%|████▉     | 51/104 [00:11<00:12,  4.39it/s, loss=0.133, v_num=0, reduced_train_loss=0.111, global_step=1175.0, val_loss=0.148]83]
Epoch 15:  72%|███████▏  | 75/104 [00:16<00:06,  4.51it/s, loss=0.0185, v_num=0, reduced_train_loss=0.0015, global_step=1199.0, val_loss=0.0426]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 15:  50%|█████     | 52/104 [00:11<00:11,  4.39it/s, loss=0.129, v_num=0, reduced_train_loss=0.185, global_step=1176.0, val_loss=0.148]83]
Epoch 15:  73%|███████▎  | 76

I0503 22:01:33.977322 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07636048644781113



Epoch 15:  63%|██████▎   | 66/104 [00:15<00:08,  4.39it/s, loss=0.132, v_num=0, reduced_train_loss=0.596, global_step=1190.0, val_loss=0.148]426]
Epoch 16:   1%|          | 1/104 [00:00<00:28,  3.66it/s, loss=0.0578, v_num=0, reduced_train_loss=0.0603, global_step=1200.0, val_loss=0.0764]6]
Epoch 15: 100%|██████████| 104/104 [00:20<00:00,  5.16it/s, loss=0.0185, v_num=0, reduced_train_loss=0.0015, global_step=1199.0, val_loss=0.0426]2023-05-03 22:01:34,289 - root - INFO - val_loss: 0.05835936963558197
Epoch 15: 100%|██████████| 104/104 [00:20<00:00,  5.16it/s, loss=0.0185, v_num=0, reduced_train_loss=0.0015, global_step=1199.0, val_loss=0.0584]
Epoch 16:   0%|          | 0/104 [00:00<?, ?it/s, loss=0.0185, v_num=0, reduced_train_loss=0.0015, global_step=1199.0, val_loss=0.0584]          

I0503 22:01:34.289492 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.05835936963558197


Epoch 15:  72%|███████▏  | 75/104 [00:17<00:06,  4.40it/s, loss=0.118, v_num=0, reduced_train_loss=0.0754, global_step=1199.0, val_loss=0.148] 4]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 16:  10%|▉         | 10/104 [00:02<00:21,  4.42it/s, loss=0.0509, v_num=0, reduced_train_loss=0.0263, global_step=1209.0, val_loss=0.0764]  
Epoch 16:   9%|▊         | 9/104 [00:02<00:21,  4.42it/s, loss=0.0396, v_num=0, reduced_train_loss=0.00135, global_step=1208.0, val_loss=0.0584]
Epoch 16:  10%|▉         | 10/104 [00:02<00:21,  4.43it/s, loss=0.0396, v_num=0, reduced_train_loss=0.000368, global_step=1209.0, val_loss=0.0584]
Epoch 15:  75%|███████▌  | 78/104 [00:17<00:05,  4.46it/s, loss=0.118, v_num=0, reduced_train_loss=0.0754, global_step=1199.0, val_loss=0.148]
Epoch 16:  11%|█         | 11/104 [00:02<00:21,  4.43it/s, loss=0.039, v_num=0, reduced_train_loss=0.000559, global_step=1210.0, val_loss=0.0584] 
Epoch 16:  12%|█▎        | 13/104 [00

I0503 22:01:39.757402 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.14052347838878632


Epoch 16:  72%|███████▏  | 75/104 [00:16<00:06,  4.51it/s, loss=0.1, v_num=0, reduced_train_loss=0.356, global_step=1274.0, val_loss=0.0764]      
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 16:  45%|████▌     | 47/104 [00:10<00:13,  4.31it/s, loss=0.0956, v_num=0, reduced_train_loss=0.112, global_step=1246.0, val_loss=0.141]84]
Epoch 16:  72%|███████▏  | 75/104 [00:16<00:06,  4.53it/s, loss=0.0397, v_num=0, reduced_train_loss=0.000733, global_step=1274.0, val_loss=0.0584]
Epoch 16:  46%|████▌     | 48/104 [00:11<00:12,  4.31it/s, loss=0.11, v_num=0, reduced_train_loss=0.305, global_step=1247.0, val_loss=0.141]  
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 16:  74%|███████▍  | 77/104 [00:16<00:05,  4.54it/s, loss=0.1, v_num=0, reduced_train_loss=0.356, global_step=1274.0, val_loss=0.0764]
Epoch 16:  73%|███████▎  | 76/104 [00:16<00:06,  4.54it/s, loss=0.0

I0503 22:01:54.092256 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1006506085395813


Epoch 16:  61%|██████    | 63/104 [00:14<00:09,  4.34it/s, loss=0.131, v_num=0, reduced_train_loss=0.198, global_step=1262.0, val_loss=0.141]  
Epoch 16:  99%|█████████▉| 103/104 [00:19<00:00,  5.15it/s, loss=0.0397, v_num=0, reduced_train_loss=0.000733, global_step=1274.0, val_loss=0.0584]
Epoch 16: 100%|██████████| 104/104 [00:20<00:00,  5.19it/s, loss=0.0397, v_num=0, reduced_train_loss=0.000733, global_step=1274.0, val_loss=0.0584]2023-05-03 22:01:54,355 - root - INFO - val_loss: 0.05217134207487106
Epoch 16: 100%|██████████| 104/104 [00:20<00:00,  5.18it/s, loss=0.0397, v_num=0, reduced_train_loss=0.000733, global_step=1274.0, val_loss=0.0522]
Epoch 17:   1%|          | 1/104 [00:00<00:27,  3.78it/s, loss=0.1, v_num=0, reduced_train_loss=0.00291, global_step=1275.0, val_loss=0.101]       

I0503 22:01:54.355755 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.05217134207487106


Epoch 16:  72%|███████▏  | 75/104 [00:17<00:06,  4.36it/s, loss=0.0955, v_num=0, reduced_train_loss=0.00445, global_step=1274.0, val_loss=0.141]] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 17:  12%|█▏        | 12/104 [00:02<00:20,  4.49it/s, loss=0.0258, v_num=0, reduced_train_loss=0.00912, global_step=1286.0, val_loss=0.0522]
Epoch 17:  12%|█▎        | 13/104 [00:02<00:20,  4.50it/s, loss=0.026, v_num=0, reduced_train_loss=0.00341, global_step=1287.0, val_loss=0.0522] 
Epoch 16:  74%|███████▍  | 77/104 [00:17<00:06,  4.40it/s, loss=0.0955, v_num=0, reduced_train_loss=0.00445, global_step=1274.0, val_loss=0.141]
Epoch 17:  13%|█▎        | 14/104 [00:03<00:20,  4.49it/s, loss=0.0252, v_num=0, reduced_train_loss=0.00189, global_step=1288.0, val_loss=0.0522]
Epoch 16:  76%|███████▌  | 79/104 [00:17<00:05,  4.45it/s, loss=0.0955, v_num=0, reduced_train_loss=0.00445, global_step=1274.0, val_loss=0.141]
Epoch 17:  14%|█▍        | 15/104 [00

I0503 22:02:00.519162 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.16364814341068268


Epoch 17:  72%|███████▏  | 75/104 [00:16<00:06,  4.52it/s, loss=0.0312, v_num=0, reduced_train_loss=0.245, global_step=1349.0, val_loss=0.101]2]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 17:  43%|████▎     | 45/104 [00:10<00:13,  4.38it/s, loss=0.076, v_num=0, reduced_train_loss=0.0952, global_step=1319.0, val_loss=0.164] 
Epoch 17:  72%|███████▏  | 75/104 [00:16<00:06,  4.54it/s, loss=0.0217, v_num=0, reduced_train_loss=0.00551, global_step=1349.0, val_loss=0.0522]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 17:  44%|████▍     | 46/104 [00:10<00:13,  4.38it/s, loss=0.0593, v_num=0, reduced_train_loss=0.0161, global_step=1320.0, val_loss=0.164]
Epoch 17:  73%|███████▎  | 76/104 [00:16<00:06,  4.55it/s, loss=0.0217, v_num=0, reduced_train_loss=0.00551, global_step=1349.0, val_loss=0.0522]
Epoch 17:  75%|███████▌  | 78/

I0503 22:02:14.152065 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10755656659603119



Epoch 17:  58%|█████▊    | 60/104 [00:13<00:10,  4.39it/s, loss=0.132, v_num=0, reduced_train_loss=0.0855, global_step=1334.0, val_loss=0.164]522]
Epoch 17:  98%|█████████▊| 102/104 [00:19<00:00,  5.12it/s, loss=0.0217, v_num=0, reduced_train_loss=0.00551, global_step=1349.0, val_loss=0.0522]
Epoch 17:  59%|█████▊    | 61/104 [00:13<00:09,  4.39it/s, loss=0.134, v_num=0, reduced_train_loss=0.166, global_step=1335.0, val_loss=0.164] ]22]
Epoch 17: 100%|██████████| 104/104 [00:20<00:00,  5.17it/s, loss=0.0217, v_num=0, reduced_train_loss=0.00551, global_step=1349.0, val_loss=0.0522]2023-05-03 22:02:14,486 - root - INFO - val_loss: 0.04125262051820755
Epoch 17: 100%|██████████| 104/104 [00:20<00:00,  5.17it/s, loss=0.0217, v_num=0, reduced_train_loss=0.00551, global_step=1349.0, val_loss=0.0413]
Epoch 18:   0%|          | 0/104 [00:00<?, ?it/s, loss=0.0217, v_num=0, reduced_train_loss=0.00551, global_step=1349.0, val_loss=0.0413]          

I0503 22:02:14.486965 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04125262051820755


Epoch 17:  72%|███████▏  | 75/104 [00:17<00:06,  4.40it/s, loss=0.106, v_num=0, reduced_train_loss=0.0595, global_step=1349.0, val_loss=0.164]8]3]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 18:  15%|█▌        | 16/104 [00:03<00:19,  4.44it/s, loss=0.0306, v_num=0, reduced_train_loss=0.0776, global_step=1365.0, val_loss=0.108] 3]
Epoch 18:  14%|█▍        | 15/104 [00:03<00:20,  4.30it/s, loss=0.0146, v_num=0, reduced_train_loss=0.003, global_step=1364.0, val_loss=0.0413]   
Epoch 18:  16%|█▋        | 17/104 [00:03<00:19,  4.44it/s, loss=0.0318, v_num=0, reduced_train_loss=0.025, global_step=1366.0, val_loss=0.108] 
Epoch 18:  17%|█▋        | 18/104 [00:04<00:19,  4.44it/s, loss=0.032, v_num=0, reduced_train_loss=0.00555, global_step=1367.0, val_loss=0.108]13]
Epoch 17:  77%|███████▋  | 80/104 [00:17<00:05,  4.52it/s, loss=0.106, v_num=0, reduced_train_loss=0.0595, global

I0503 22:02:21.170073 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1315620392560959


Epoch 18:  72%|███████▏  | 75/104 [00:17<00:06,  4.41it/s, loss=0.0175, v_num=0, reduced_train_loss=0.00296, global_step=1424.0, val_loss=0.108]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 18:  70%|███████   | 73/104 [00:16<00:07,  4.34it/s, loss=0.0255, v_num=0, reduced_train_loss=0.0055, global_step=1422.0, val_loss=0.0413]
Epoch 18:  41%|████▏     | 43/104 [00:10<00:14,  4.21it/s, loss=0.0912, v_num=0, reduced_train_loss=0.0201, global_step=1392.0, val_loss=0.132]]
Epoch 18:  71%|███████   | 74/104 [00:17<00:06,  4.34it/s, loss=0.0199, v_num=0, reduced_train_loss=0.0471, global_step=1423.0, val_loss=0.0413]
Epoch 18:  42%|████▏     | 44/104 [00:10<00:14,  4.21it/s, loss=0.0981, v_num=0, reduced_train_loss=0.224, global_step=1393.0, val_loss=0.132] ]
Epoch 18:  72%|███████▏  | 75/104 [00:17<00:06,  4.34it/s, loss=0.0198, v_num=0, reduced_train_loss=0.00206, global_step=1424.0, val_loss=0.0413]
Validation: 0it [00:00, ?it/s][A
Valid

I0503 22:02:34.619524 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10008056461811066


Epoch 18:  56%|█████▌    | 58/104 [00:13<00:10,  4.26it/s, loss=0.118, v_num=0, reduced_train_loss=0.00474, global_step=1407.0, val_loss=0.132]
Epoch 19:   1%|          | 1/104 [00:00<00:27,  3.72it/s, loss=0.015, v_num=0, reduced_train_loss=0.00986, global_step=1425.0, val_loss=0.100] 3]
Epoch 18:  57%|█████▋    | 59/104 [00:13<00:10,  4.26it/s, loss=0.113, v_num=0, reduced_train_loss=0.00804, global_step=1408.0, val_loss=0.132]13]
Epoch 19:   2%|▏         | 2/104 [00:00<00:24,  4.09it/s, loss=0.0151, v_num=0, reduced_train_loss=0.0055, global_step=1426.0, val_loss=0.100]413]
Epoch 18:  58%|█████▊    | 60/104 [00:14<00:10,  4.27it/s, loss=0.111, v_num=0, reduced_train_loss=0.0202, global_step=1409.0, val_loss=0.132] 13]
Epoch 19:   3%|▎         | 3/104 [00:00<00:23,  4.23it/s, loss=0.0151, v_num=0, reduced_train_loss=0.000636, global_step=1427.0, val_loss=0.100]3]
Epoch 18: 100%|██████████| 104/104 [00:20<00:00,  4.98it/s, loss=0.0198, v_num=0, reduced_train_loss=0.00206, global_step=

I0503 22:02:35.385702 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06310959160327911


Epoch 18:  72%|███████▏  | 75/104 [00:17<00:06,  4.29it/s, loss=0.0797, v_num=0, reduced_train_loss=0.0326, global_step=1424.0, val_loss=0.132]   
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 19:  18%|█▊        | 19/104 [00:04<00:19,  4.45it/s, loss=0.0204, v_num=0, reduced_train_loss=0.000392, global_step=1443.0, val_loss=0.100]
Epoch 19:  15%|█▌        | 16/104 [00:03<00:20,  4.31it/s, loss=0.0105, v_num=0, reduced_train_loss=0.00149, global_step=1439.0, val_loss=0.0631]
Epoch 19:  19%|█▉        | 20/104 [00:04<00:18,  4.46it/s, loss=0.0203, v_num=0, reduced_train_loss=0.000377, global_step=1444.0, val_loss=0.100]]
Epoch 19:  20%|██        | 21/104 [00:04<00:18,  4.46it/s, loss=0.0199, v_num=0, reduced_train_loss=0.000814, global_step=1445.0, val_loss=0.100] 
Epoch 18:  77%|███████▋  | 80/104 [00:18<00:05,  4.40it/s, loss=0.0797, v_num=0, reduced_train_loss=0.0326, glob

I0503 22:02:42.309551 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1495768278837204


Epoch 19:  72%|███████▏  | 75/104 [00:16<00:06,  4.44it/s, loss=0.0612, v_num=0, reduced_train_loss=0.00342, global_step=1499.0, val_loss=0.100]1] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 19:  69%|██████▉   | 72/104 [00:16<00:07,  4.45it/s, loss=0.0254, v_num=0, reduced_train_loss=0.000933, global_step=1496.0, val_loss=0.0631]
Epoch 19:  70%|███████   | 73/104 [00:16<00:06,  4.45it/s, loss=0.0412, v_num=0, reduced_train_loss=0.322, global_step=1497.0, val_loss=0.0631]   
Epoch 19:  40%|████      | 42/104 [00:09<00:14,  4.36it/s, loss=0.09, v_num=0, reduced_train_loss=0.0494, global_step=1466.0, val_loss=0.150] 0]
Epoch 19:  71%|███████   | 74/104 [00:16<00:06,  4.45it/s, loss=0.0414, v_num=0, reduced_train_loss=0.00311, global_step=1498.0, val_loss=0.0631]
Epoch 19:  41%|████▏     | 43/104 [00:09<00:13,  4.37it/s, loss=0.088, v_num=0, reduced_train_loss=0.00174, global_step=1467.0, val_loss=0.150]]
Epoch 19:  72%|███████▏  | 75/104 

I0503 22:02:55.275534 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.12376294285058975



Epoch 20:   1%|          | 1/104 [00:00<00:28,  3.57it/s, loss=0.0612, v_num=0, reduced_train_loss=0.00079, global_step=1500.0, val_loss=0.124]
Epoch 19:  98%|█████████▊| 102/104 [00:20<00:00,  5.04it/s, loss=0.053, v_num=0, reduced_train_loss=0.247, global_step=1499.0, val_loss=0.0631]
Epoch 20:   2%|▏         | 2/104 [00:00<00:25,  3.97it/s, loss=0.0611, v_num=0, reduced_train_loss=0.00253, global_step=1501.0, val_loss=0.124]
Epoch 19: 100%|██████████| 104/104 [00:20<00:00,  5.10it/s, loss=0.053, v_num=0, reduced_train_loss=0.247, global_step=1499.0, val_loss=0.0631]2023-05-03 22:02:55,796 - root - INFO - val_loss: 0.06662103533744812
Epoch 19: 100%|██████████| 104/104 [00:20<00:00,  5.10it/s, loss=0.053, v_num=0, reduced_train_loss=0.247, global_step=1499.0, val_loss=0.0666]
Epoch 20:   0%|          | 0/104 [00:00<?, ?it/s, loss=0.053, v_num=0, reduced_train_loss=0.247, global_step=1499.0, val_loss=0.0666]          

I0503 22:02:55.796039 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06662103533744812


Epoch 19:  72%|███████▏  | 75/104 [00:16<00:06,  4.42it/s, loss=0.0615, v_num=0, reduced_train_loss=0.00203, global_step=1499.0, val_loss=0.150]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 20:  17%|█▋        | 18/104 [00:04<00:19,  4.34it/s, loss=0.0222, v_num=0, reduced_train_loss=0.00198, global_step=1517.0, val_loss=0.124]]
Epoch 19:  73%|███████▎  | 76/104 [00:17<00:06,  4.43it/s, loss=0.0615, v_num=0, reduced_train_loss=0.00203, global_step=1499.0, val_loss=0.150]
Epoch 20:  18%|█▊        | 19/104 [00:04<00:19,  4.34it/s, loss=0.0217, v_num=0, reduced_train_loss=0.00223, global_step=1518.0, val_loss=0.124]]
Epoch 19:  75%|███████▌  | 78/104 [00:17<00:05,  4.49it/s, loss=0.0615, v_num=0, reduced_train_loss=0.00203, global_step=1499.0, val_loss=0.150]
Epoch 20:  19%|█▉        | 20/104 [00:04<00:19,  4.32it/s, loss=0.0215, v_num=0, reduced_train_loss=0.000222, global_step=1519.0, val_loss=0.124]]
Epoch 20:  18%|█▊        | 19/104 [00

I0503 22:03:02.819531 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1222435012459755


Epoch 20:  72%|███████▏  | 75/104 [00:17<00:06,  4.36it/s, loss=0.0453, v_num=0, reduced_train_loss=0.00767, global_step=1574.0, val_loss=0.124]6]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 20:  72%|███████▏  | 75/104 [00:16<00:06,  4.46it/s, loss=0.0289, v_num=0, reduced_train_loss=0.00487, global_step=1574.0, val_loss=0.0666] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 20:  73%|███████▎  | 76/104 [00:17<00:06,  4.37it/s, loss=0.0453, v_num=0, reduced_train_loss=0.00767, global_step=1574.0, val_loss=0.124]
Epoch 20:  73%|███████▎  | 76/104 [00:17<00:06,  4.47it/s, loss=0.0289, v_num=0, reduced_train_loss=0.00487, global_step=1574.0, val_loss=0.0666]
Epoch 20:  41%|████▏     | 43/104 [00:10<00:14,  4.30it/s, loss=0.0489, v_num=0, reduced_train_loss=0.107, global_step=1542.0, val_loss=0.122] ]
Epoch 20:  74%|███████▍  | 

I0503 22:03:16.100631 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08515377342700958
I0503 22:03:16.223928 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.058985598385334015


Epoch 20:  72%|███████▏  | 75/104 [00:17<00:06,  4.28it/s, loss=0.0899, v_num=0, reduced_train_loss=0.00336, global_step=1574.0, val_loss=0.122]] 
Epoch 21:  17%|█▋        | 18/104 [00:04<00:19,  4.32it/s, loss=0.00879, v_num=0, reduced_train_loss=0.000545, global_step=1592.0, val_loss=0.059]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 21:  18%|█▊        | 19/104 [00:04<00:19,  4.33it/s, loss=0.0151, v_num=0, reduced_train_loss=0.0316, global_step=1593.0, val_loss=0.0852] 
Epoch 21:  18%|█▊        | 19/104 [00:04<00:19,  4.32it/s, loss=0.00878, v_num=0, reduced_train_loss=0.000244, global_step=1593.0, val_loss=0.059]
Epoch 21:  19%|█▉        | 20/104 [00:04<00:19,  4.33it/s, loss=0.0148, v_num=0, reduced_train_loss=0.00184, global_step=1594.0, val_loss=0.0852]
Epoch 21:  19%|█▉        | 20/104 [00:04<00:19,  4.32it/s, loss=0.00856, v_num=0, reduced_train_loss=0.000602, global_step=1594.0, val_loss=0.059]
Epoch 21:  20%|██        | 21/104 [00:04<00:19,  4.34it/s, loss=0.

I0503 22:03:24.043118 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.09656734764575958


Epoch 21:  72%|███████▏  | 75/104 [00:17<00:06,  4.39it/s, loss=0.0282, v_num=0, reduced_train_loss=0.000323, global_step=1649.0, val_loss=0.059] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 21:  72%|███████▏  | 75/104 [00:17<00:06,  4.34it/s, loss=0.0453, v_num=0, reduced_train_loss=0.111, global_step=1649.0, val_loss=0.0852]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 21:  38%|███▊      | 40/104 [00:09<00:15,  4.21it/s, loss=0.0755, v_num=0, reduced_train_loss=0.0181, global_step=1614.0, val_loss=0.0966] 
Epoch 21:  73%|███████▎  | 76/104 [00:17<00:06,  4.35it/s, loss=0.0453, v_num=0, reduced_train_loss=0.111, global_step=1649.0, val_loss=0.0852]
Epoch 21:  74%|███████▍  | 77/104 [00:17<00:06,  4.43it/s, loss=0.0282, v_num=0, reduced_train_loss=0.000323, global_step=1649.0, val_loss=0.059]
Epoch 21:  74%|███████▍  | 77/

I0503 22:03:36.949040 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06703179329633713
I0503 22:03:36.967534 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.12073282152414322


Epoch 21:  72%|███████▏  | 75/104 [00:17<00:06,  4.23it/s, loss=0.129, v_num=0, reduced_train_loss=0.00104, global_step=1649.0, val_loss=0.0966] 
Epoch 22:  20%|██        | 21/104 [00:04<00:19,  4.31it/s, loss=0.0151, v_num=0, reduced_train_loss=0.000223, global_step=1670.0, val_loss=0.067]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 22:  20%|██        | 21/104 [00:04<00:19,  4.22it/s, loss=0.018, v_num=0, reduced_train_loss=0.0169, global_step=1670.0, val_loss=0.121]
Epoch 22:  21%|██        | 22/104 [00:05<00:19,  4.31it/s, loss=0.0152, v_num=0, reduced_train_loss=0.00456, global_step=1671.0, val_loss=0.067] 
Epoch 22:  21%|██        | 22/104 [00:05<00:19,  4.21it/s, loss=0.018, v_num=0, reduced_train_loss=0.00143, global_step=1671.0, val_loss=0.121]]
Epoch 22:  22%|██▏       | 23/104 [00:05<00:18,  4.32it/s, loss=0.0152, v_num=0, reduced_train_loss=0.000811, global_step=1672.0, val_loss=0.067]
Epoch 22:  22%|██▏       | 23/104 [00:05<00:19,  4.22it/s, loss=0.018, v_n

I0503 22:03:45.414415 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10958194732666016


Epoch 22:  72%|███████▏  | 75/104 [00:16<00:06,  4.46it/s, loss=0.0408, v_num=0, reduced_train_loss=0.000156, global_step=1724.0, val_loss=0.067]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 22:  35%|███▍      | 36/104 [00:08<00:15,  4.28it/s, loss=0.0619, v_num=0, reduced_train_loss=0.0695, global_step=1685.0, val_loss=0.110] 
Epoch 22:  36%|███▌      | 37/104 [00:08<00:15,  4.28it/s, loss=0.0618, v_num=0, reduced_train_loss=0.00377, global_step=1686.0, val_loss=0.110]]
Epoch 22:  74%|███████▍  | 77/104 [00:17<00:06,  4.49it/s, loss=0.0408, v_num=0, reduced_train_loss=0.000156, global_step=1724.0, val_loss=0.067]
Epoch 22:  72%|███████▏  | 75/104 [00:17<00:06,  4.35it/s, loss=0.0373, v_num=0, reduced_train_loss=0.035, global_step=1724.0, val_loss=0.121]  ]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 22:  37%|███▋      | 38/104 [00:08<00:15,  4.28it/s, loss=0.0602, v_num=0, reduced_train_

I0503 22:03:57.270401 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.05105328932404518



Epoch 22:  50%|█████     | 52/104 [00:12<00:12,  4.29it/s, loss=0.0574, v_num=0, reduced_train_loss=0.00829, global_step=1701.0, val_loss=0.110] 
Epoch 22:  99%|█████████▉| 103/104 [00:20<00:00,  4.98it/s, loss=0.0373, v_num=0, reduced_train_loss=0.035, global_step=1724.0, val_loss=0.121]
Epoch 22: 100%|██████████| 104/104 [00:20<00:00,  5.02it/s, loss=0.0373, v_num=0, reduced_train_loss=0.035, global_step=1724.0, val_loss=0.121]2023-05-03 22:03:57,709 - root - INFO - val_loss: 0.07210730016231537
Epoch 22: 100%|██████████| 104/104 [00:20<00:00,  5.01it/s, loss=0.0373, v_num=0, reduced_train_loss=0.035, global_step=1724.0, val_loss=0.0721]
Epoch 23:   0%|          | 0/104 [00:00<?, ?it/s, loss=0.0373, v_num=0, reduced_train_loss=0.035, global_step=1724.0, val_loss=0.0721]          

I0503 22:03:57.709430 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07210730016231537


Epoch 22:  72%|███████▏  | 75/104 [00:17<00:06,  4.19it/s, loss=0.11, v_num=0, reduced_train_loss=0.0947, global_step=1724.0, val_loss=0.110] ]   
Epoch 23:  24%|██▍       | 25/104 [00:06<00:19,  4.13it/s, loss=0.0297, v_num=0, reduced_train_loss=0.00012, global_step=1749.0, val_loss=0.0511]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 23:  25%|██▌       | 26/104 [00:06<00:18,  4.14it/s, loss=0.0297, v_num=0, reduced_train_loss=0.00146, global_step=1750.0, val_loss=0.0511]]
Epoch 23:  24%|██▍       | 25/104 [00:06<00:18,  4.16it/s, loss=0.0319, v_num=0, reduced_train_loss=0.256, global_step=1749.0, val_loss=0.0721]   
Epoch 23:  26%|██▌       | 27/104 [00:06<00:18,  4.15it/s, loss=0.0298, v_num=0, reduced_train_loss=0.000693, global_step=1751.0, val_loss=0.0511]
Epoch 23:  25%|██▌       | 26/104 [00:06<00:18,  4.17it/s, loss=0.0318, v_num=0, reduced_train_loss=0.000581, global_step=1750.0, val_loss=0.0721

I0503 22:04:06.927616 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10228592157363892


Epoch 23:  72%|███████▏  | 75/104 [00:17<00:06,  4.29it/s, loss=0.0333, v_num=0, reduced_train_loss=0.000475, global_step=1799.0, val_loss=0.0511]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 23:  72%|███████▏  | 75/104 [00:17<00:06,  4.36it/s, loss=0.0405, v_num=0, reduced_train_loss=0.0434, global_step=1799.0, val_loss=0.0721]
Epoch 23:  33%|███▎      | 34/104 [00:08<00:16,  4.24it/s, loss=0.0345, v_num=0, reduced_train_loss=0.000503, global_step=1758.0, val_loss=0.102]
Epoch 23:  73%|███████▎  | 76/104 [00:17<00:06,  4.30it/s, loss=0.0333, v_num=0, reduced_train_loss=0.000475, global_step=1799.0, val_loss=0.0511]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 23:  74%|███████▍  | 77/104 [00:17<00:06,  4.32it/s, loss=0.0333, v_num=0, reduced_train_loss=0.000475, global_step=1799.0, val_loss=0.0511]
Epoch 23:  34%|███▎      | 35/104 [00:08<00:16,  4.25it/s, 

I0503 22:04:18.396652 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10322674363851547
I0503 22:04:18.452578 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06687038391828537


Epoch 23:  72%|███████▏  | 75/104 [00:17<00:06,  4.30it/s, loss=0.0575, v_num=0, reduced_train_loss=0.0495, global_step=1799.0, val_loss=0.102]9] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 24:  26%|██▌       | 27/104 [00:06<00:17,  4.43it/s, loss=0.0185, v_num=0, reduced_train_loss=0.000379, global_step=1826.0, val_loss=0.0669]
Epoch 24:  27%|██▋       | 28/104 [00:06<00:17,  4.46it/s, loss=0.0173, v_num=0, reduced_train_loss=0.00103, global_step=1827.0, val_loss=0.103] 
Epoch 24:  27%|██▋       | 28/104 [00:06<00:17,  4.43it/s, loss=0.0198, v_num=0, reduced_train_loss=0.0281, global_step=1827.0, val_loss=0.0669]  
Epoch 24:  28%|██▊       | 29/104 [00:06<00:16,  4.46it/s, loss=0.0173, v_num=0, reduced_train_loss=0.000424, global_step=1828.0, val_loss=0.103]
Epoch 24:  28%|██▊       | 29/104 [00:06<00:16,  4.43it/s, loss=0.0227, v_num=0, reduced_train_loss=0.0769, global_step=1828.0, val_loss=0.0669]
Epoch 24:  29%|██▉       | 30/104 

I0503 22:04:27.929568 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07757751643657684


Epoch 24:  72%|███████▏  | 75/104 [00:16<00:06,  4.42it/s, loss=0.0167, v_num=0, reduced_train_loss=0.000293, global_step=1874.0, val_loss=0.103] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 24:  72%|███████▏  | 75/104 [00:17<00:06,  4.39it/s, loss=0.127, v_num=0, reduced_train_loss=0.0132, global_step=1874.0, val_loss=0.0669]]
Validation: 0it [00:00, ?it/s][A
Epoch 24:  73%|███████▎  | 76/104 [00:17<00:06,  4.43it/s, loss=0.0167, v_num=0, reduced_train_loss=0.000293, global_step=1874.0, val_loss=0.103]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 24:  74%|███████▍  | 77/104 [00:17<00:06,  4.45it/s, loss=0.0167, v_num=0, reduced_train_loss=0.000293, global_step=1874.0, val_loss=0.103]
Epoch 24:  33%|███▎      | 34/104 [00:07<00:16,  4.36it/s, loss=0.0789, v_num=0, reduced_train_loss=0.0199, global_step=1833.0, val_loss=0.0776]
Epoch 24:  75%|███████▌  | 7

I0503 22:04:39.055444 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06608252972364426
I0503 22:04:39.119512 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10501304268836975



Epoch 24: 100%|██████████| 104/104 [00:20<00:00,  5.03it/s, loss=0.127, v_num=0, reduced_train_loss=0.0132, global_step=1874.0, val_loss=0.0669]2023-05-03 22:04:39,119 - root - INFO - val_loss: 0.10501304268836975
Epoch 24: 100%|██████████| 104/104 [00:20<00:00,  5.03it/s, loss=0.127, v_num=0, reduced_train_loss=0.0132, global_step=1874.0, val_loss=0.105] 
Epoch 24:  72%|███████▏  | 75/104 [00:17<00:06,  4.35it/s, loss=0.0823, v_num=0, reduced_train_loss=0.213, global_step=1874.0, val_loss=0.0776]   
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 25:  26%|██▌       | 27/104 [00:06<00:17,  4.36it/s, loss=0.0317, v_num=0, reduced_train_loss=0.0254, global_step=1901.0, val_loss=0.105]  
Epoch 25:  26%|██▌       | 27/104 [00:06<00:18,  4.26it/s, loss=0.0735, v_num=0, reduced_train_loss=0.202, global_step=1901.0, val_loss=0.0661]  
Epoch 25:  27%|██▋       | 28/104 [00:06<00:17,  4.35it/s, loss=0.0438, v_num=0, reduced_train_loss=0.255, global

I0503 22:04:48.798007 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.0865817442536354


Epoch 25:  31%|███       | 32/104 [00:07<00:16,  4.31it/s, loss=0.0562, v_num=0, reduced_train_loss=0.00227, global_step=1906.0, val_loss=0.0866]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 25:  72%|███████▏  | 75/104 [00:17<00:06,  4.33it/s, loss=0.0134, v_num=0, reduced_train_loss=0.00152, global_step=1949.0, val_loss=0.0661] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 25:  32%|███▏      | 33/104 [00:07<00:16,  4.31it/s, loss=0.0506, v_num=0, reduced_train_loss=0.00055, global_step=1907.0, val_loss=0.0866]
Epoch 25:  74%|███████▍  | 77/104 [00:17<00:06,  4.42it/s, loss=0.0246, v_num=0, reduced_train_loss=0.00934, global_step=1949.0, val_loss=0.105]
Epoch 25:  73%|███████▎  | 76/104 [00:17<00:06,  4.34it/s, loss=0.0134, v_num=0, reduced_train_loss=0.00152, global_step=1949.0, val_loss=0.0661]
Epoch 25:  33%|███▎      |

I0503 22:04:59.836790 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08488143235445023


Epoch 25:  46%|████▌     | 48/104 [00:11<00:12,  4.32it/s, loss=0.0517, v_num=0, reduced_train_loss=0.0465, global_step=1922.0, val_loss=0.0866]
Epoch 25:  99%|█████████▉| 103/104 [00:20<00:00,  4.93it/s, loss=0.0134, v_num=0, reduced_train_loss=0.00152, global_step=1949.0, val_loss=0.0661]
Epoch 25: 100%|██████████| 104/104 [00:20<00:00,  4.96it/s, loss=0.0134, v_num=0, reduced_train_loss=0.00152, global_step=1949.0, val_loss=0.0661]2023-05-03 22:05:00,037 - root - INFO - val_loss: 0.07081040740013123
Epoch 25: 100%|██████████| 104/104 [00:20<00:00,  4.96it/s, loss=0.0134, v_num=0, reduced_train_loss=0.00152, global_step=1949.0, val_loss=0.0708]
Epoch 26:   0%|          | 0/104 [00:00<?, ?it/s, loss=0.0134, v_num=0, reduced_train_loss=0.00152, global_step=1949.0, val_loss=0.0708]          

I0503 22:05:00.037827 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07081040740013123


Epoch 25:  72%|███████▏  | 75/104 [00:17<00:06,  4.32it/s, loss=0.1, v_num=0, reduced_train_loss=0.0249, global_step=1949.0, val_loss=0.0866]   ]  
Epoch 26:  26%|██▌       | 27/104 [00:06<00:18,  4.26it/s, loss=0.0242, v_num=0, reduced_train_loss=0.000109, global_step=1976.0, val_loss=0.0849]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 26:  26%|██▌       | 27/104 [00:06<00:17,  4.37it/s, loss=0.0155, v_num=0, reduced_train_loss=0.00076, global_step=1976.0, val_loss=0.0708]
Epoch 26:  27%|██▋       | 28/104 [00:06<00:17,  4.37it/s, loss=0.0157, v_num=0, reduced_train_loss=0.00434, global_step=1977.0, val_loss=0.0708]]
Epoch 25:  74%|███████▍  | 77/104 [00:17<00:06,  4.36it/s, loss=0.1, v_num=0, reduced_train_loss=0.0249, global_step=1949.0, val_loss=0.0866]
Epoch 26:  28%|██▊       | 29/104 [00:06<00:17,  4.37it/s, loss=0.0154, v_num=0, reduced_train_loss=9.84e-5, global_step=1978.0, val_loss=0.0708]]
Epoch 25:  76%|███████▌  | 79/104 [00:17<00:05,  4.41it/s, loss=0.1, 

I0503 22:05:09.829492 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10879460722208023


Epoch 26:  72%|███████▏  | 75/104 [00:16<00:06,  4.43it/s, loss=0.0567, v_num=0, reduced_train_loss=0.0503, global_step=2024.0, val_loss=0.0849] ] 
Epoch 26:  29%|██▉       | 30/104 [00:06<00:17,  4.30it/s, loss=0.0467, v_num=0, reduced_train_loss=0.00522, global_step=1979.0, val_loss=0.109]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 26:  71%|███████   | 74/104 [00:16<00:06,  4.39it/s, loss=0.0164, v_num=0, reduced_train_loss=0.000492, global_step=2023.0, val_loss=0.0708]
Epoch 26:  30%|██▉       | 31/104 [00:07<00:16,  4.31it/s, loss=0.0277, v_num=0, reduced_train_loss=0.0107, global_step=1980.0, val_loss=0.109] 
Epoch 26:  72%|███████▏  | 75/104 [00:17<00:06,  4.39it/s, loss=0.0164, v_num=0, reduced_train_loss=0.00115, global_step=2024.0, val_loss=0.0708] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 26:  31%|███       | 32/104 [00:07<00:16,  4.31it/s, 

I0503 22:05:20.230773 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.09227147698402405



Epoch 27:   1%|          | 1/104 [00:00<00:27,  3.76it/s, loss=0.0419, v_num=0, reduced_train_loss=0.000283, global_step=2025.0, val_loss=0.0923]]
Epoch 26:  98%|█████████▊| 102/104 [00:20<00:00,  4.97it/s, loss=0.0164, v_num=0, reduced_train_loss=0.00115, global_step=2024.0, val_loss=0.0708]
Epoch 27:   2%|▏         | 2/104 [00:00<00:24,  4.13it/s, loss=0.0418, v_num=0, reduced_train_loss=0.000862, global_step=2026.0, val_loss=0.0923]]
Epoch 26: 100%|██████████| 104/104 [00:20<00:00,  5.02it/s, loss=0.0164, v_num=0, reduced_train_loss=0.00115, global_step=2024.0, val_loss=0.0708]2023-05-03 22:05:20,767 - root - INFO - val_loss: 0.08090215921401978
Epoch 26: 100%|██████████| 104/104 [00:20<00:00,  5.02it/s, loss=0.0164, v_num=0, reduced_train_loss=0.00115, global_step=2024.0, val_loss=0.0809]
Epoch 27:   0%|          | 0/104 [00:00<?, ?it/s, loss=0.0164, v_num=0, reduced_train_loss=0.00115, global_step=2024.0, val_loss=0.0809]          

I0503 22:05:20.767777 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08090215921401978


Epoch 26:  72%|███████▏  | 75/104 [00:17<00:06,  4.37it/s, loss=0.0426, v_num=0, reduced_train_loss=0.00393, global_step=2024.0, val_loss=0.109]9] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 27:  27%|██▋       | 28/104 [00:06<00:17,  4.42it/s, loss=0.00142, v_num=0, reduced_train_loss=0.00222, global_step=2052.0, val_loss=0.0809]
Epoch 27:  31%|███       | 32/104 [00:07<00:15,  4.52it/s, loss=0.0326, v_num=0, reduced_train_loss=0.000112, global_step=2056.0, val_loss=0.0923]
Epoch 27:  28%|██▊       | 29/104 [00:06<00:16,  4.42it/s, loss=0.00146, v_num=0, reduced_train_loss=0.00132, global_step=2053.0, val_loss=0.0809]
Epoch 27:  29%|██▉       | 30/104 [00:06<00:16,  4.42it/s, loss=0.00146, v_num=0, reduced_train_loss=0.000283, global_step=2054.0, val_loss=0.0809]
Epoch 26:  76%|███████▌  | 79/104 [00:17<00:05,  4.46it/s, loss=0.0426, v_num=0, reduced_train_loss=0.00393, global_step=2024.0, val_loss=0.109]
Epoch 27:  30%|██▉       | 31/

I0503 22:05:30.584585 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10075750201940536


Epoch 27:  72%|███████▏  | 75/104 [00:16<00:06,  4.54it/s, loss=0.0365, v_num=0, reduced_train_loss=0.0878, global_step=2099.0, val_loss=0.0923]   
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 27:  27%|██▋       | 28/104 [00:06<00:17,  4.47it/s, loss=0.0498, v_num=0, reduced_train_loss=0.00133, global_step=2052.0, val_loss=0.101]
Epoch 27:  68%|██████▊   | 71/104 [00:16<00:07,  4.40it/s, loss=0.0484, v_num=0, reduced_train_loss=0.000408, global_step=2095.0, val_loss=0.0809]
Epoch 27:  69%|██████▉   | 72/104 [00:16<00:07,  4.40it/s, loss=0.0235, v_num=0, reduced_train_loss=0.000211, global_step=2096.0, val_loss=0.0809]
Epoch 27:  75%|███████▌  | 78/104 [00:16<00:05,  4.61it/s, loss=0.0365, v_num=0, reduced_train_loss=0.0878, global_step=2099.0, val_loss=0.0923]
Epoch 27:  70%|███████   | 73/104 [00:16<00:07,  4.40it/s, loss=0.0225, v_num=0, reduced_train_loss=0.000209, global_step=2097.0, val_loss=0.0809]
Epoch 27:  77%|███████▋  | 80/104

I0503 22:05:40.205705 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1072986051440239



Epoch 28:   1%|          | 1/104 [00:00<00:27,  3.78it/s, loss=0.0309, v_num=0, reduced_train_loss=0.0362, global_step=2100.0, val_loss=0.107]] ]
Epoch 27:  93%|█████████▎| 97/104 [00:19<00:01,  4.91it/s, loss=0.022, v_num=0, reduced_train_loss=0.000745, global_step=2099.0, val_loss=0.0809]
Epoch 28:   2%|▏         | 2/104 [00:00<00:24,  4.15it/s, loss=0.0309, v_num=0, reduced_train_loss=0.000417, global_step=2101.0, val_loss=0.107]]
Epoch 27:  44%|████▍     | 46/104 [00:10<00:12,  4.47it/s, loss=0.0579, v_num=0, reduced_train_loss=0.0198, global_step=2070.0, val_loss=0.101]  
Epoch 28:   3%|▎         | 3/104 [00:00<00:23,  4.29it/s, loss=0.031, v_num=0, reduced_train_loss=0.00198, global_step=2102.0, val_loss=0.107]  9]
Epoch 27:  45%|████▌     | 47/104 [00:10<00:12,  4.47it/s, loss=0.0579, v_num=0, reduced_train_loss=0.00136, global_step=2071.0, val_loss=0.101]9]
Epoch 28:   4%|▍         | 4/104 [00:00<00:22,  4.36it/s, loss=0.0228, v_num=0, reduced_train_loss=0.00668, global_step=2

I0503 22:05:41.308671 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07239259779453278


Epoch 27:  72%|███████▏  | 75/104 [00:16<00:06,  4.44it/s, loss=0.0265, v_num=0, reduced_train_loss=0.0403, global_step=2099.0, val_loss=0.101]24] 
Epoch 28:  32%|███▏      | 33/104 [00:07<00:15,  4.52it/s, loss=0.00822, v_num=0, reduced_train_loss=0.000432, global_step=2132.0, val_loss=0.107]
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 28:  27%|██▋       | 28/104 [00:06<00:16,  4.48it/s, loss=0.00234, v_num=0, reduced_train_loss=0.00209, global_step=2127.0, val_loss=0.0724]
Epoch 28:  28%|██▊       | 29/104 [00:06<00:16,  4.48it/s, loss=0.00243, v_num=0, reduced_train_loss=0.00222, global_step=2128.0, val_loss=0.0724]
Epoch 27:  74%|███████▍  | 77/104 [00:17<00:06,  4.47it/s, loss=0.0265, v_num=0, reduced_train_loss=0.0403, global_step=2099.0, val_loss=0.101]
Epoch 28:  29%|██▉       | 30/104 [00:06<00:16,  4.48it/s, loss=0.0115, v_num=0, reduced_train_loss=0.182, global_step=2129.0, val_loss=0.0724]   
Epoch 28:  35%|███▍      | 36/104 [00:07<00:15,  4.52it/s, loss=0.

I0503 22:05:51.044439 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10182167589664459


Epoch 28:  66%|██████▋   | 69/104 [00:15<00:07,  4.49it/s, loss=0.0377, v_num=0, reduced_train_loss=0.00179, global_step=2168.0, val_loss=0.0724]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 28:  24%|██▍       | 25/104 [00:05<00:18,  4.31it/s, loss=0.072, v_num=0, reduced_train_loss=0.463, global_step=2123.0, val_loss=0.102]
Epoch 28:  67%|██████▋   | 70/104 [00:15<00:07,  4.49it/s, loss=0.0365, v_num=0, reduced_train_loss=0.00203, global_step=2169.0, val_loss=0.0724]
Epoch 28:  25%|██▌       | 26/104 [00:06<00:18,  4.31it/s, loss=0.075, v_num=0, reduced_train_loss=0.0482, global_step=2125.0, val_loss=0.102] ]
Epoch 28:  68%|██████▊   | 71/104 [00:15<00:07,  4.49it/s, loss=0.0421, v_num=0, reduced_train_loss=0.118, global_step=2170.0, val_loss=0.0724]  
Epoch 28:  69%|██████▉   | 72/104 [00:16<00:07,  4.49it/s, loss=0.0421, v_num=0, reduced_train_loss=0.000289, global_step=2171.0, val_loss=0.0724]
Epoch 28:  77%|███████▋  | 80/104 [00:

I0503 22:06:00.268648 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.12369856238365173


Epoch 28:  38%|███▊      | 40/104 [00:09<00:14,  4.31it/s, loss=0.059, v_num=0, reduced_train_loss=0.021, global_step=2139.0, val_loss=0.102] 
Epoch 28:  89%|████████▉ | 93/104 [00:19<00:02,  4.89it/s, loss=0.0401, v_num=0, reduced_train_loss=0.000621, global_step=2174.0, val_loss=0.0724]
Epoch 28:  39%|███▉      | 41/104 [00:09<00:14,  4.31it/s, loss=0.0586, v_num=0, reduced_train_loss=0.000635, global_step=2140.0, val_loss=0.102]]
Epoch 28:  91%|█████████▏| 95/104 [00:19<00:01,  4.93it/s, loss=0.0401, v_num=0, reduced_train_loss=0.000621, global_step=2174.0, val_loss=0.0724]
Epoch 28:  40%|████      | 42/104 [00:09<00:14,  4.31it/s, loss=0.0706, v_num=0, reduced_train_loss=0.242, global_step=2141.0, val_loss=0.102]   ]
Epoch 28:  93%|█████████▎| 97/104 [00:19<00:01,  4.97it/s, loss=0.0401, v_num=0, reduced_train_loss=0.000621, global_step=2174.0, val_loss=0.0724]
Epoch 28:  41%|████▏     | 43/104 [00:09<00:14,  4.32it/s, loss=0.0694, v_num=0, reduced_train_loss=0.00593, global_step=2

I0503 22:06:01.631435 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.058132462203502655


Epoch 28:  72%|███████▏  | 75/104 [00:17<00:06,  4.35it/s, loss=0.0319, v_num=0, reduced_train_loss=0.00532, global_step=2174.0, val_loss=0.102]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 29:  34%|███▎      | 35/104 [00:08<00:16,  4.24it/s, loss=0.0242, v_num=0, reduced_train_loss=0.0258, global_step=2209.0, val_loss=0.124] 1]
Epoch 29:  30%|██▉       | 31/104 [00:07<00:16,  4.38it/s, loss=0.0139, v_num=0, reduced_train_loss=0.000729, global_step=2205.0, val_loss=0.0581]
Epoch 29:  35%|███▍      | 36/104 [00:08<00:16,  4.23it/s, loss=0.0332, v_num=0, reduced_train_loss=0.231, global_step=2210.0, val_loss=0.124] ]
Epoch 29:  31%|███       | 32/104 [00:07<00:16,  4.38it/s, loss=0.014, v_num=0, reduced_train_loss=0.00111, global_step=2206.0, val_loss=0.0581]  
Epoch 29:  36%|███▌      | 37/104 [00:08<00:15,  4.22it/s, loss=0.0332, v_num=0, reduced_train_loss=0.000456, gl

I0503 22:06:11.835184 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.12704403698444366


Epoch 29:  72%|███████▏  | 75/104 [00:17<00:06,  4.29it/s, loss=0.0644, v_num=0, reduced_train_loss=0.00022, global_step=2249.0, val_loss=0.124] 1]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 29:  26%|██▌       | 27/104 [00:06<00:17,  4.46it/s, loss=0.0606, v_num=0, reduced_train_loss=0.00242, global_step=2201.0, val_loss=0.127]81]
Epoch 29:  73%|███████▎  | 76/104 [00:17<00:06,  4.30it/s, loss=0.0644, v_num=0, reduced_train_loss=0.00022, global_step=2249.0, val_loss=0.124]
Epoch 29:  27%|██▋       | 28/104 [00:06<00:17,  4.46it/s, loss=0.0605, v_num=0, reduced_train_loss=0.00174, global_step=2202.0, val_loss=0.127]1] 
Epoch 29:  75%|███████▌  | 78/104 [00:17<00:05,  4.35it/s, loss=0.0644, v_num=0, reduced_train_loss=0.00022, global_step=2249.0, val_loss=0.124]
Epoch 29:  28%|██▊       | 29/104 [00:06<00:16,  4.46it/s, loss=0.0607, v_num=0, reduced_train_loss=0.0384, global_step=2203.0, val_loss=0.127] 81]
Epoch 29:  72%|███████▏  | 75/

I0503 22:06:21.345245 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.13158762454986572



Epoch 29:  41%|████▏     | 43/104 [00:09<00:13,  4.47it/s, loss=0.0567, v_num=0, reduced_train_loss=0.326, global_step=2217.0, val_loss=0.127]]
Epoch 29:  42%|████▏     | 44/104 [00:09<00:13,  4.47it/s, loss=0.0571, v_num=0, reduced_train_loss=0.00784, global_step=2218.0, val_loss=0.127]
Epoch 29:  96%|█████████▌| 100/104 [00:20<00:00,  4.99it/s, loss=0.0226, v_num=0, reduced_train_loss=0.320, global_step=2249.0, val_loss=0.0581]
Epoch 29:  43%|████▎     | 45/104 [00:10<00:13,  4.47it/s, loss=0.0534, v_num=0, reduced_train_loss=0.178, global_step=2219.0, val_loss=0.127]  
Epoch 29:  98%|█████████▊| 102/104 [00:20<00:00,  5.03it/s, loss=0.0226, v_num=0, reduced_train_loss=0.320, global_step=2249.0, val_loss=0.0581]
Epoch 30:   3%|▎         | 3/104 [00:00<00:24,  4.19it/s, loss=0.0452, v_num=0, reduced_train_loss=0.00283, global_step=2252.0, val_loss=0.132]]
Epoch 29: 100%|██████████| 104/104 [00:20<00:00,  5.08it/s, loss=0.0226, v_num=0, reduced_train_loss=0.320, global_step=2249.0, va

I0503 22:06:22.121835 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06653852760791779


Epoch 29:  72%|███████▏  | 75/104 [00:16<00:06,  4.49it/s, loss=0.0912, v_num=0, reduced_train_loss=0.180, global_step=2249.0, val_loss=0.127]65]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 30:  28%|██▊       | 29/104 [00:06<00:16,  4.44it/s, loss=0.0186, v_num=0, reduced_train_loss=0.00183, global_step=2278.0, val_loss=0.0665]
Epoch 30:  32%|███▏      | 33/104 [00:07<00:15,  4.44it/s, loss=0.052, v_num=0, reduced_train_loss=0.00214, global_step=2282.0, val_loss=0.132]
Epoch 30:  29%|██▉       | 30/104 [00:06<00:16,  4.44it/s, loss=0.0186, v_num=0, reduced_train_loss=6.37e-5, global_step=2279.0, val_loss=0.0665]
Epoch 30:  33%|███▎      | 34/104 [00:07<00:15,  4.44it/s, loss=0.0522, v_num=0, reduced_train_loss=0.00413, global_step=2283.0, val_loss=0.132]
Epoch 30:  30%|██▉       | 31/104 [00:06<00:16,  4.44it/s, loss=0.0186, v_num=0, reduced_train_loss=0.00155, global_step=2280.0, val_loss=0.0665]
Epoch 30:  34%|███▎      | 35/104 [00

I0503 22:06:32.054435 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.12139938771724701


Epoch 30:  72%|███████▏  | 75/104 [00:16<00:06,  4.47it/s, loss=0.0294, v_num=0, reduced_train_loss=0.00433, global_step=2324.0, val_loss=0.132]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 30:  26%|██▌       | 27/104 [00:06<00:17,  4.34it/s, loss=0.0419, v_num=0, reduced_train_loss=0.00396, global_step=2276.0, val_loss=0.121]5]
Epoch 30:  70%|███████   | 73/104 [00:16<00:06,  4.48it/s, loss=0.0128, v_num=0, reduced_train_loss=0.000194, global_step=2322.0, val_loss=0.0665]
Epoch 30:  27%|██▋       | 28/104 [00:06<00:17,  4.34it/s, loss=0.0485, v_num=0, reduced_train_loss=0.144, global_step=2277.0, val_loss=0.121]  
Epoch 30:  71%|███████   | 74/104 [00:16<00:06,  4.48it/s, loss=0.013, v_num=0, reduced_train_loss=0.00407, global_step=2323.0, val_loss=0.0665]  
Epoch 30:  28%|██▊       | 29/104 [00:06<00:17,  4.34it/s, loss=0.0488, v_num=0, reduced_train_loss=0.00594, global_step=2278.0, val_loss=0.121]
Epoch 30:  72%|███████▏  | 75/104 

I0503 22:06:41.707571 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.11343726515769958



Epoch 31:   1%|          | 1/104 [00:00<00:27,  3.75it/s, loss=0.0294, v_num=0, reduced_train_loss=0.000232, global_step=2325.0, val_loss=0.113] ]
Epoch 30:  97%|█████████▋| 101/104 [00:19<00:00,  5.07it/s, loss=0.013, v_num=0, reduced_train_loss=0.000626, global_step=2324.0, val_loss=0.0665]
Epoch 31:   2%|▏         | 2/104 [00:00<00:24,  4.11it/s, loss=0.0295, v_num=0, reduced_train_loss=0.0018, global_step=2326.0, val_loss=0.113]  5]
Epoch 30:  99%|█████████▉| 103/104 [00:20<00:00,  5.11it/s, loss=0.013, v_num=0, reduced_train_loss=0.000626, global_step=2324.0, val_loss=0.0665]
Epoch 30: 100%|██████████| 104/104 [00:20<00:00,  5.14it/s, loss=0.013, v_num=0, reduced_train_loss=0.000626, global_step=2324.0, val_loss=0.0665]2023-05-03 22:06:42,364 - root - INFO - val_loss: 0.07450435310602188
Epoch 30: 100%|██████████| 104/104 [00:20<00:00,  5.14it/s, loss=0.013, v_num=0, reduced_train_loss=0.000626, global_step=2324.0, val_loss=0.0745]
Epoch 31:   0%|          | 0/104 [00:00<?, ?it/s

I0503 22:06:42.364561 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07450435310602188


Epoch 30:  72%|███████▏  | 75/104 [00:17<00:06,  4.37it/s, loss=0.0502, v_num=0, reduced_train_loss=0.0728, global_step=2324.0, val_loss=0.121] ]] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 31:  33%|███▎      | 34/104 [00:07<00:15,  4.43it/s, loss=0.00779, v_num=0, reduced_train_loss=0.000661, global_step=2358.0, val_loss=0.113]
Epoch 31:  30%|██▉       | 31/104 [00:07<00:16,  4.36it/s, loss=0.0118, v_num=0, reduced_train_loss=0.00227, global_step=2355.0, val_loss=0.0745] 
Epoch 31:  34%|███▎      | 35/104 [00:07<00:15,  4.43it/s, loss=0.00777, v_num=0, reduced_train_loss=0.000491, global_step=2359.0, val_loss=0.113]
Epoch 31:  31%|███       | 32/104 [00:07<00:16,  4.36it/s, loss=0.012, v_num=0, reduced_train_loss=0.00262, global_step=2356.0, val_loss=0.0745] 
Epoch 31:  35%|███▍      | 36/104 [00:08<00:15,  4.44it/s, loss=0.00861, v_num=0, reduced_train_loss=0.0169, global_step=2360.0, val_loss=0.113]  
Epoch 31:  32%|███▏      | 33/

I0503 22:06:52.747304 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.09422073513269424


Epoch 31:  25%|██▌       | 26/104 [00:05<00:17,  4.46it/s, loss=0.0327, v_num=0, reduced_train_loss=0.00106, global_step=2350.0, val_loss=0.0942]] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 31:  26%|██▌       | 27/104 [00:06<00:17,  4.46it/s, loss=0.0182, v_num=0, reduced_train_loss=0.000504, global_step=2351.0, val_loss=0.0942]
Epoch 31:  74%|███████▍  | 77/104 [00:17<00:06,  4.49it/s, loss=0.0223, v_num=0, reduced_train_loss=0.00135, global_step=2399.0, val_loss=0.113]
Epoch 31:  27%|██▋       | 28/104 [00:06<00:17,  4.46it/s, loss=0.0197, v_num=0, reduced_train_loss=0.0434, global_step=2352.0, val_loss=0.0942]  
Epoch 31:  68%|██████▊   | 71/104 [00:16<00:07,  4.21it/s, loss=0.0418, v_num=0, reduced_train_loss=0.000209, global_step=2395.0, val_loss=0.0745]
Epoch 31:  77%|███████▋  | 80/104 [00:17<00:05,  4.56it/s, loss=0.0223, v_num=0, reduced_train_loss=0.00135, gl

I0503 22:07:02.166963 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.11151871085166931



Epoch 32:   1%|          | 1/104 [00:00<00:27,  3.75it/s, loss=0.0212, v_num=0, reduced_train_loss=0.000181, global_step=2400.0, val_loss=0.112]]]
Epoch 31:  42%|████▏     | 44/104 [00:09<00:13,  4.48it/s, loss=0.0206, v_num=0, reduced_train_loss=0.0787, global_step=2368.0, val_loss=0.0942] ]
Epoch 32:   2%|▏         | 2/104 [00:00<00:25,  4.05it/s, loss=0.0138, v_num=0, reduced_train_loss=0.000512, global_step=2401.0, val_loss=0.112]5]
Epoch 32:   3%|▎         | 3/104 [00:00<00:24,  4.18it/s, loss=0.0138, v_num=0, reduced_train_loss=0.000192, global_step=2402.0, val_loss=0.112]5]
Epoch 31:  44%|████▍     | 46/104 [00:10<00:12,  4.48it/s, loss=0.0219, v_num=0, reduced_train_loss=0.000846, global_step=2370.0, val_loss=0.0942]
Epoch 32:   4%|▍         | 4/104 [00:00<00:23,  4.25it/s, loss=0.0082, v_num=0, reduced_train_loss=0.000328, global_step=2403.0, val_loss=0.112]5]
Epoch 31:  45%|████▌     | 47/104 [00:10<00:12,  4.48it/s, loss=0.022, v_num=0, reduced_train_loss=0.00415, global_st

I0503 22:07:04.414193 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.05444490537047386


Epoch 31:  72%|███████▏  | 75/104 [00:16<00:06,  4.45it/s, loss=0.0422, v_num=0, reduced_train_loss=0.0292, global_step=2399.0, val_loss=0.0942]]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 32:  21%|██        | 22/104 [00:05<00:19,  4.17it/s, loss=0.0381, v_num=0, reduced_train_loss=0.00132, global_step=2421.0, val_loss=0.0544]
Epoch 32:  31%|███       | 32/104 [00:07<00:17,  4.14it/s, loss=0.0177, v_num=0, reduced_train_loss=0.00403, global_step=2431.0, val_loss=0.112] 
Epoch 32:  22%|██▏       | 23/104 [00:05<00:19,  4.19it/s, loss=0.000638, v_num=0, reduced_train_loss=0.000148, global_step=2422.0, val_loss=0.0544]
Epoch 32:  32%|███▏      | 33/104 [00:07<00:17,  4.14it/s, loss=0.0176, v_num=0, reduced_train_loss=0.0012, global_step=2432.0, val_loss=0.112] 44] 
Epoch 31:  76%|███████▌  | 79/104 [00:17<00:05,  4.54it/s, loss=0.0422, v_num=0, reduced_train_loss=0.0292, global_step=2399.0, val_loss=0.0942]
Epoch 32:  33%|███▎      | 34

I0503 22:07:13.227687 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.11029471457004547


Epoch 32:  72%|███████▏  | 75/104 [00:18<00:07,  4.10it/s, loss=0.00466, v_num=0, reduced_train_loss=0.000199, global_step=2474.0, val_loss=0.112]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 32:  31%|███       | 32/104 [00:07<00:16,  4.40it/s, loss=0.0467, v_num=0, reduced_train_loss=0.00111, global_step=2431.0, val_loss=0.110]
Epoch 32:  32%|███▏      | 33/104 [00:07<00:16,  4.40it/s, loss=0.0387, v_num=0, reduced_train_loss=0.00182, global_step=2432.0, val_loss=0.110]44]
Epoch 32:  69%|██████▉   | 72/104 [00:16<00:07,  4.37it/s, loss=0.00202, v_num=0, reduced_train_loss=0.000132, global_step=2471.0, val_loss=0.0544]
Epoch 32:  33%|███▎      | 34/104 [00:07<00:15,  4.40it/s, loss=0.0197, v_num=0, reduced_train_loss=0.0768, global_step=2433.0, val_loss=0.110] 2]
Epoch 32:  34%|███▎      | 35/104 [00:07<00:15,  4.40it/s, loss=0.0224, v_num=0, reduced_train_loss=0.055, global_step=2434.0, val_loss=0.110] 544]
Epoch 32:  71%|███████   | 

I0503 22:07:24.590756 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.11463280022144318



Epoch 33:   1%|          | 1/104 [00:00<00:29,  3.46it/s, loss=0.00463, v_num=0, reduced_train_loss=0.000308, global_step=2475.0, val_loss=0.115]44]
Epoch 32:  98%|█████████▊| 102/104 [00:20<00:00,  4.98it/s, loss=0.00201, v_num=0, reduced_train_loss=0.000289, global_step=2474.0, val_loss=0.0544]
Epoch 32:  50%|█████     | 52/104 [00:11<00:11,  4.40it/s, loss=0.0381, v_num=0, reduced_train_loss=0.000347, global_step=2451.0, val_loss=0.110]44]
Epoch 32: 100%|██████████| 104/104 [00:20<00:00,  5.03it/s, loss=0.00201, v_num=0, reduced_train_loss=0.000289, global_step=2474.0, val_loss=0.0544]2023-05-03 22:07:25,108 - root - INFO - val_loss: 0.06812835484743118
Epoch 32: 100%|██████████| 104/104 [00:20<00:00,  5.03it/s, loss=0.00201, v_num=0, reduced_train_loss=0.000289, global_step=2474.0, val_loss=0.0681]
Epoch 33:   2%|▏         | 2/104 [00:00<00:27,  3.69it/s, loss=0.00466, v_num=0, reduced_train_loss=0.000796, global_step=2476.0, val_loss=0.115]   

I0503 22:07:25.108293 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06812835484743118


Epoch 32:  72%|███████▏  | 75/104 [00:17<00:06,  4.41it/s, loss=0.0387, v_num=0, reduced_train_loss=0.000769, global_step=2474.0, val_loss=0.110]1]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 33:  22%|██▏       | 23/104 [00:05<00:18,  4.40it/s, loss=0.00188, v_num=0, reduced_train_loss=0.000954, global_step=2497.0, val_loss=0.0681]
Epoch 32:  73%|███████▎  | 76/104 [00:17<00:06,  4.42it/s, loss=0.0387, v_num=0, reduced_train_loss=0.000769, global_step=2474.0, val_loss=0.110]
Epoch 33:  23%|██▎       | 24/104 [00:05<00:19,  4.01it/s, loss=0.0104, v_num=0, reduced_train_loss=0.00362, global_step=2498.0, val_loss=0.115] 1]
Epoch 32:  75%|███████▌  | 78/104 [00:17<00:05,  4.47it/s, loss=0.0387, v_num=0, reduced_train_loss=0.000769, global_step=2474.0, val_loss=0.110]
Epoch 33:  24%|██▍       | 25/104 [00:06<00:19,  4.01it/s, loss=0.0105, v_num=0, reduced_train_loss=0.00647, global_step=2499.0, val_loss=0.115]81]
Epoch 33:  25%|██▌       | 2

I0503 22:07:33.842105 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.14431501924991608


Epoch 33:  72%|███████▏  | 75/104 [00:17<00:06,  4.38it/s, loss=0.00541, v_num=0, reduced_train_loss=0.00601, global_step=2549.0, val_loss=0.0681] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 33:  69%|██████▉   | 72/104 [00:17<00:07,  4.05it/s, loss=0.0169, v_num=0, reduced_train_loss=0.000106, global_step=2546.0, val_loss=0.115]
Epoch 33:  73%|███████▎  | 76/104 [00:17<00:06,  4.40it/s, loss=0.00541, v_num=0, reduced_train_loss=0.00601, global_step=2549.0, val_loss=0.0681]
Epoch 33:  70%|███████   | 73/104 [00:18<00:07,  4.05it/s, loss=0.0167, v_num=0, reduced_train_loss=0.000265, global_step=2547.0, val_loss=0.115]]
Epoch 33:  75%|███████▌  | 78/104 [00:17<00:05,  4.45it/s, loss=0.00541, v_num=0, reduced_train_loss=0.00601, global_step=2549.0, val_loss=0.0681]
Epoch 33:  71%|███████   | 74/104 [00:18<00:07,  4.05it/s, loss=0.017, v_num=0, reduced_train_loss=0.00493, global_step=2548.0, val_loss=0.115]  ]
Epoch 33:  77%|███████▋  | 80/

I0503 22:07:45.738073 140649257482048 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06158381327986717


Epoch 33:  50%|█████     | 52/104 [00:11<00:11,  4.34it/s, loss=0.0274, v_num=0, reduced_train_loss=0.000326, global_step=2526.0, val_loss=0.144]
Epoch 34:   1%|          | 1/104 [00:00<00:27,  3.72it/s, loss=0.00544, v_num=0, reduced_train_loss=0.000586, global_step=2550.0, val_loss=0.0616]
Epoch 33:  51%|█████     | 53/104 [00:12<00:11,  4.34it/s, loss=0.0263, v_num=0, reduced_train_loss=0.00525, global_step=2527.0, val_loss=0.144] 
Epoch 33:  52%|█████▏    | 54/104 [00:12<00:11,  4.34it/s, loss=0.0262, v_num=0, reduced_train_loss=0.000132, global_step=2528.0, val_loss=0.144]]
Epoch 33:  93%|█████████▎| 97/104 [00:21<00:01,  4.47it/s, loss=0.017, v_num=0, reduced_train_loss=0.000299, global_step=2549.0, val_loss=0.115]
Epoch 33:  53%|█████▎    | 55/104 [00:12<00:11,  4.34it/s, loss=0.0131, v_num=0, reduced_train_loss=0.00216, global_step=2529.0, val_loss=0.144] ]
Epoch 34:   4%|▍         | 4/104 [00:00<00:23,  4.25it/s, loss=0.00536, v_num=0, reduced_train_loss=0.000897, global_step=

I0503 22:07:47.232909 140522389899072 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10936349630355835


Epoch 33:  72%|███████▏  | 75/104 [00:17<00:06,  4.36it/s, loss=0.0527, v_num=0, reduced_train_loss=0.00284, global_step=2549.0, val_loss=0.144]]]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/29 [00:00<?, ?it/s][A
Epoch 34:  23%|██▎       | 24/104 [00:05<00:18,  4.44it/s, loss=0.00936, v_num=0, reduced_train_loss=2.09e-5, global_step=2573.0, val_loss=0.0616]
Epoch 34:  15%|█▌        | 16/104 [00:04<00:22,  3.94it/s, loss=0.00717, v_num=0, reduced_train_loss=0.000112, global_step=2565.0, val_loss=0.109]
Epoch 34:  24%|██▍       | 25/104 [00:05<00:17,  4.44it/s, loss=0.00936, v_num=0, reduced_train_loss=0.000156, global_step=2574.0, val_loss=0.0616]
Epoch 34:  25%|██▌       | 26/104 [00:05<00:17,  4.44it/s, loss=0.00938, v_num=0, reduced_train_loss=0.000485, global_step=2575.0, val_loss=0.0616]
Epoch 33:  76%|███████▌  | 79/104 [00:17<00:05,  4.45it/s, loss=0.0527, v_num=0, reduced_train_loss=0.00284, global_step=2549.0, val_loss=0.144]
Epoch 34:  26%|██▌       | 2

I0503 22:07:54.600974 139755592410944 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.12539149820804596


Epoch 34:  33%|███▎      | 34/104 [00:08<00:17,  3.96it/s, loss=0.00691, v_num=0, reduced_train_loss=0.055, global_step=2583.0, val_loss=0.109]    

#### 2. Federated P-Tuning
We use the [FedAvg](https://arxiv.org/abs/1602.05629) algorithm to p-tune the model in a federated scenario. First, modify the configuration files again.

In [2]:
!python3 modify_configs.py --job_folder "jobs/gpt_p-tuning_fedavg"

Set ROOT_DIR to /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning


Next, simulate the federated p-tuning using FedAvg. Here, each client p-tunes for one local epoch before sending their local model updates to the server for aggregation. This is repeated for 50 FL rounds.

In [None]:
from nvflare import SimulatorRunner    

simulator = SimulatorRunner(
    job_folder="jobs/gpt_p-tuning_fedavg",
    workspace="/tmp/nvflare/nemo/gpt_p-tuning_fedavg",
    n_clients=3,
    threads=3
)
run_status = simulator.run()
print("Simulator finished with run_status", run_status)

You can visualize the training process using TensorBoard

In [None]:
!tensorboard --logdir /tmp/nvflare/nemo

## Results
In this scenario, all clients utilize the same validation set, allowing for a direct comparison between the locally p-tuned and federated global models. As anticipated, the FedAvg-trained global model exhibits lower validation loss than the models trained solely on their local datasets. This is because the global model has access to all client datasets and can, consequently, generalize better.

![validation loss](./val_loss.svg)

## Inference

We can use `model.generate()` to run inference after p-tuning the model. 
Let's define some test examples to feed to the p-tuned model to see its predictions.

In [None]:
test_examples = [
    {"taskname": "sentiment", "sentence": "The products have a low salt and fat content ."},
    {"taskname": "sentiment", "sentence": "The agreement is valid for four years ."},
    {"taskname": "sentiment", "sentence": "Diluted EPS rose to EUR3 .68 from EUR0 .50 ."},
    {"taskname": "sentiment", "sentence": "The company is well positioned in Brazil and Uruguay ."},
    {"taskname": "sentiment", "sentence": "Profit before taxes decreased by 9 % to EUR 187.8 mn in the first nine months of 2008 , compared to EUR 207.1 mn a year earlier ."},
]

Next, we will load the global model.

In [None]:
import os
import torch
import pytorch_lightning as pl
from nemo_nvflare.fed_megatron_gpt_prompt_learning_model import FedMegatronGPTPromptLearningModel
from nemo_nvflare.utils import load_weights
from omegaconf import OmegaConf
from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy
from pytorch_lightning.plugins.environments import TorchElasticEnvironment

# Load model configuration used by one of the clients
config = OmegaConf.load("jobs/gpt_p-tuning_fedavg/app1/config/megatron_gpt_prompt_learning_config.yaml")

# Set GPT model path
config.model.language_model_path = "/workspace/Code/NeMo/upstream/tutorials/nlp/megatron_gpt_345m.nemo"

# Load task templates
config.model.task_templates = OmegaConf.load("jobs/gpt_p-tuning_fedavg/app1/config/task_templates.json")

# Set task that were learned
config.model.new_tasks = ["sentiment"]

# Setup cluster environment parameters
# use torch elastic cluster environment so `create_process_externally` is True
# the launcher is set to None. It will not try to spawn new processes.
# It won't create the misconfiguration error because of the `interactive session`
os.environ["LOCAL_RANK"] = '0'
os.environ["RANK"] = '0'
os.environ["WORLD_SIZE"] = '1'
strategy = NLPDDPStrategy(find_unused_parameters=False, no_ddp_communication_hook=True)
plugins = [TorchElasticEnvironment()]

# Set up the trainer and load the model that was used for p-tuning
trainer = pl.Trainer(plugins=plugins, strategy=strategy, **config.trainer)
model = FedMegatronGPTPromptLearningModel(cfg=config.model, trainer=trainer)
model.init_prompt_encoder()

print("Model initialized", type(model))

Overwrite the prompt encoder with the best global model

In [None]:
ckpt = torch.load("/tmp/nvflare/nemo/gpt_p-tuning_fedavg/simulate_job/app_server/best_FL_global_model.pt")
global_weights = ckpt["model"]

n_loaded = load_weights(model, global_weights, device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
print(f"Loaded {n_loaded} of {len(global_weights)} weights")

Run the model

In [None]:
response = model.generate(inputs=test_examples, length_params=None)

print('The prediction results of some sample queries with the trained model:')
for result in response['sentences']:
    print(result)
    print("-" * 30)

The expected output predictions look something like this

>      The products have a low salt and fat content . sentiment: neutral
>      ------------------------------
>      The agreement is valid for four years . sentiment: neutral
>      ------------------------------
>      Diluted EPS rose to EUR3 .68 from EUR0 .50 . sentiment: positive
>      ------------------------------
>      The company is well positioned in Brazil and Uruguay . sentiment: positive
>      ------------------------------
>      Profit before taxes decreased by 9 % to EUR 187.8 mn in the first nine months of 2008 , compared to EUR 207.1 mn a year earlier . sentiment: negative
>      ------------------------------