# Prompt Learning with NeMo

In this example, we utilize NeMo's [prompt learning](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/prompt_learning.html)
feature to showcase how to adapt a large language model (LLM) to 
a downstream task, such as financial sentiment predictions. 

The prompt learning technique shown in the example is [p-tuning](https://arxiv.org/abs/2103.10385), which adds a small prompt encoder network to the LLM
to produce virtual token embeddings that guide the model toward the desired output of the downstream task.

For more details on how to change hyperparameters for prompt learning in NeMo, see this [tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/nlp/Multitask_Prompt_and_PTuning.ipynb) which is also the basis for this NVFlare tutorial.

## Dependencies
We assume you followed the instructions [here](../../README.md#requirements) 
to install the NeMo framework and the NeMo-NVFlare package. 

## Download the pre-trained LLM
In this example, we use a `MegatronGPTModel`, a transformer-based language model based on the GPT architecture.

In [1]:
# Check what GPT .nemo models we have available on NGC
from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
MegatronGPTModel.list_available_models()

[NeMo W 2023-06-01 17:59:34 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-06-01 17:59:34 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-06-01 17:59:38 experimental:27] Module <class 'nemo.collections.asr.modules.audio_modules.SpectrogramToMultichannelFeatures'> is experimental, not ready for production and is not fully supported. Use at your own risk.


[PretrainedModelInfo(
 	pretrained_model_name=megatron_gpt_345m,
 	description=345M parameter GPT generative Megatron model.,
 	location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/megatron_gpt_345m/versions/1/files/megatron_gpt_345m.nemo
 )]

In [2]:
# Download the model from NGC
import os
model_file = "megatron_gpt_345m.nemo"
if not os.path.isfile(model_file):
    !wget "https://api.ngc.nvidia.com/v2/models/nvidia/nemo/megatron_gpt_345m/versions/1/files/$model_file"
else:
    print(f"{model_file} already downloaded.")

megatron_gpt_345m.nemo already downloaded.


## Data preprocessing
As our downstream task, we will use the [Financial PhraseBank dataset](https://huggingface.co/datasets/financial_phrasebank) for sentiment analysis.

The Financial PhraseBank dataset contains the sentiments for financial news headlines from a retail investor's perspective. Further details about the dataset can be found in Malo et al.'s ["Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts"](https://arxiv.org/abs/1307.5336).


#### 1. Download the preprocessing scripts
We use the preprocessing scripts provided by NeMo which can be downloaded from GitHub.

In [3]:
script_name = "prompt_learning_financial_phrase_bank_preprocessing.py"
if not os.path.isfile(script_name):
    !wget -N "https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/dataset_processing/nlp/financial_phrase_bank/$script_name"
else:
    print(f"{script_name} already downloaded.")

prompt_learning_financial_phrase_bank_preprocessing.py already downloaded.


#### 2. Download the Financial PhraseBank Dataset

Download the `FinancialPhraseBank-v1.0.zip` dataset from [here](https://www.researchgate.net/profile/Pekka_Malo/publication/251231364_FinancialPhraseBank-v1.0/data/0c96051eee4fb1d56e000000/FinancialPhraseBank-v1.0.zip).

Then extract it under `./data`.

#### 3. Preprocess the dataset

In [4]:
!python3 prompt_learning_financial_phrase_bank_preprocessing.py

Saving train split to data/FinancialPhraseBank-v1.0/financial_phrase_bank_train.jsonl
100%|███████████████████████████████████| 1811/1811 [00:00<00:00, 115010.74it/s]
Saving val split to data/FinancialPhraseBank-v1.0/financial_phrase_bank_val.jsonl
100%|█████████████████████████████████████| 226/226 [00:00<00:00, 113604.11it/s]
Saving test split to data/FinancialPhraseBank-v1.0/financial_phrase_bank_test.jsonl
100%|█████████████████████████████████████| 227/227 [00:00<00:00, 122567.84it/s]


#### 4. Split the dataset to simulate clients
Next, we use three clients to simulate federated learning for p-tuning with NeMo.

In [5]:
!python3 data/split_financial_phrase_data.py --data_path data/FinancialPhraseBank-v1.0/financial_phrase_bank_train.jsonl --num_clients 3 --out_dir data/FinancialPhraseBank-v1.0_split

Loaded training data with 1811 entries
Save split 1 of 3 with 604 entries to data/FinancialPhraseBank-v1.0_split/site-1.jsonl
Save split 2 of 3 with 604 entries to data/FinancialPhraseBank-v1.0_split/site-2.jsonl
Save split 3 of 3 with 603 entries to data/FinancialPhraseBank-v1.0_split/site-3.jsonl


## Federated learning simulations
Next, we are using NVFlare's [simulator](https://nvflare.readthedocs.io/en/latest/user_guide/fl_simulator.html) to simulate each client training on their own dataset locally and all three clients training together using the [FedAvg](https://arxiv.org/abs/1602.05629) algorithm implemented in NVFlare.

With this setting, we require a GPU with at least 16GB memory to run all clients in parallel on the same GPU. 
If you have multiple GPUs in your system, you can use the `gpu` argument to assign one GPU for each client, e.g., `gpu="0,1"`.

#### 1. Local P-Tuning
First, we create the configuration files and modify them to include the current directory path to access the dataset and pre-trained LLM.
At this point, we also modify the local number of clients, local epochs and FL rounds to simulate local training.

In [6]:
!python3 create_configs.py --job_folder "jobs/gpt_p-tuning_local_345M" --num_clients 3 --aggregation_epochs 50 --num_rounds 1

Created configs for 3 clients and set ROOT_DIR to /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning


Next, simulate each client p-tuning on their local dataset using the FL simulator. To do this, we only run 1 round of FL, with each client running 50 p-tuning epochs on their local dataset.

In [7]:
from nvflare import SimulatorRunner    

simulator = SimulatorRunner(
    job_folder="jobs/gpt_p-tuning_local_345M",
    workspace="/tmp/nvflare/nemo/gpt_p-tuning_local_345M",
    n_clients=3,
    threads=3
)
run_status = simulator.run()
print("Simulator finished with run_status", run_status)

2023-06-01 17:59:42,489 - SimulatorRunner - INFO - Create the Simulator Server.
2023-06-01 17:59:42,495 - Cell - INFO - server: creating listener on tcp://0:37707
2023-06-01 17:59:42,497 - Cell - INFO - server: created backbone external listener for tcp://0:37707
2023-06-01 17:59:42,498 - ConnectorManager - INFO - 2372: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2023-06-01 17:59:42,499 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:5994] is starting
2023-06-01 17:59:43,002 - Cell - INFO - server: created backbone internal listener for tcp://localhost:5994
2023-06-01 17:59:43,004 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 PASSIVE tcp://0:37707] is starting
2023-06-01 17:59:43,202 - nvflare.fuel.hci.server.hci - INFO - Starting Admin Server localhost on Port 55737
2023-06-01 17:59:43,203 - SimulatorRunner - INFO - Deploy the Apps.
2023-06-01 17:59:43,215 - SimulatorRunner - INFO - Create the simulate c

[NeMo W 2023-06-01 18:00:01 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-06-01 18:00:01 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-06-01 18:00:02 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-06-01 18:00:02 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully

2023-06-01 18:00:03,082 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-3, peer_run=simulate_job, task_name=share_config, task_id=c8a5911c-7e5e-441f-b165-a7c96fbeff7a]: assigned task to client site-3: name=share_config, id=c8a5911c-7e5e-441f-b165-a7c96fbeff7a
2023-06-01 18:00:03,084 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-3, peer_run=simulate_job, task_name=share_config, task_id=c8a5911c-7e5e-441f-b165-a7c96fbeff7a]: sent task assignment to client. client_name:site-3 task_id:c8a5911c-7e5e-441f-b165-a7c96fbeff7a
2023-06-01 18:00:03,086 - GetTaskCommand - INFO - return task to client.  client_name: site-3  task_name: share_config   task_id: c8a5911c-7e5e-441f-b165-a7c96fbeff7a  sharable_header_task_id: c8a5911c-7e5e-441f-b165-a7c96fbeff7a
2023-06-01 18:00:03,121 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-3, peer_run=simulate_job]:

I0601 18:00:03.077422 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job]: Initializing the Learner...
I0601 18:00:03.078066 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job]: Running with distributed environment: LOCAL_RANK: 0, RANK: 0, WORLD_SIZE 1, MASTER_ADDR: localhost, and MASTER_PORT: 36839
I0601 18:00:03.078348 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job]: client runner started
I0601 18:00:03.078536 140659256891200 simulator_worker.py:85] Initialize ClientRunner for client: site-3
I0601 18:00:03.088636 140658085013248 communicator.py:200] Received from simulator_server server  (3492 Bytes). getTask: share_config time: 0.007776737213134766 seconds
I0601 18:00:03.092673 140659256891200 fed_client.py:91] pull_task completed. Task name:share_config Status:True 
I0601 18:00:03.092940 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: got task a

2023-06-01 18:00:03,494 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-2, peer_run=simulate_job, task_name=share_config, task_id=c1e25224-6319-49f1-b387-e2bb0c8a2a33]: assigned task to client site-2: name=share_config, id=c1e25224-6319-49f1-b387-e2bb0c8a2a33
2023-06-01 18:00:03,496 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-2, peer_run=simulate_job, task_name=share_config, task_id=c1e25224-6319-49f1-b387-e2bb0c8a2a33]: sent task assignment to client. client_name:site-2 task_id:c1e25224-6319-49f1-b387-e2bb0c8a2a33
2023-06-01 18:00:03,497 - GetTaskCommand - INFO - return task to client.  client_name: site-2  task_name: share_config   task_id: c1e25224-6319-49f1-b387-e2bb0c8a2a33  sharable_header_task_id: c1e25224-6319-49f1-b387-e2bb0c8a2a33
2023-06-01 18:00:03,536 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-2, peer_run=simulate_job]:

I0601 18:00:03.490007 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job]: Initializing the Learner...
I0601 18:00:03.490608 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job]: Running with distributed environment: LOCAL_RANK: 0, RANK: 0, WORLD_SIZE 1, MASTER_ADDR: localhost, and MASTER_PORT: 41387
I0601 18:00:03.490874 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job]: client runner started
I0601 18:00:03.491047 140713251325760 simulator_worker.py:85] Initialize ClientRunner for client: site-2
I0601 18:00:03.499837 140712079447808 communicator.py:200] Received from simulator_server server  (3492 Bytes). getTask: share_config time: 0.00640869140625 seconds
I0601 18:00:03.503662 140713251325760 fed_client.py:91] pull_task completed. Task name:share_config Status:True 
I0601 18:00:03.503941 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: got task assig

2023-06-01 18:00:03,713 - ShareConfig - INFO - [identity=simulator_server, run=simulate_job, wf=share_config]: task share_config exit with status TaskCompletionStatus.OK
2023-06-01 18:00:03,798 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config]: Workflow: share_config finalizing ...
2023-06-01 18:00:03,915 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config]: starting workflow scatter_and_gather (<class 'nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather'>) ...
2023-06-01 18:00:03,917 - ScatterAndGather - INFO - [identity=simulator_server, run=simulate_job, wf=share_config]: Initializing ScatterAndGather workflow.
2023-06-01 18:00:03,922 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config]: Workflow scatter_and_gather (<class 'nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather'>) started
2023-06-01 18:00:03,923 - ScatterAndGather - INFO - [identity=simulat

I0601 18:00:05.269148 140657783011072 communicator.py:200] Received from simulator_server server  (16873468 Bytes). getTask: train time: 0.11282181739807129 seconds
I0601 18:00:05.270359 140659256891200 fed_client.py:91] pull_task completed. Task name:train Status:True 
I0601 18:00:05.270645 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: got task assignment: name=train, id=42980685-a93e-4664-b2e2-9e89ea7f2802
I0601 18:00:05.271328 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: invoking task executor <class 'nemo_nvflare.learner_executor.NemoLearnerExecutor'>
I0601 18:00:05.271556 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: Client trainer got ta

2023-06-01 18:00:05,548 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-2, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: assigned task to client site-2: name=train, id=e277103e-8b4a-4a10-9202-cf52a937c773
2023-06-01 18:00:05,550 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-2, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: sent task assignment to client. client_name:site-2 task_id:e277103e-8b4a-4a10-9202-cf52a937c773
2023-06-01 18:00:05,594 - GetTaskCommand - INFO - return task to client.  client_name: site-2  task_name: train   task_id: e277103e-8b4a-4a10-9202-cf52a937c773  sharable_header_task_id: e277103e-8b4a-4a10-9202-cf52a937c773
2023-06-01 18:00:05,632 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-1, peer_run=simulate_job, task_name=

I0601 18:00:05.680258 140711777445632 communicator.py:200] Received from simulator_server server  (16873468 Bytes). getTask: train time: 0.10965943336486816 seconds
I0601 18:00:05.681457 140713251325760 fed_client.py:91] pull_task completed. Task name:train Status:True 
I0601 18:00:05.681739 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: got task assignment: name=train, id=e277103e-8b4a-4a10-9202-cf52a937c773
I0601 18:00:05.682413 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: invoking task executor <class 'nemo_nvflare.learner_executor.NemoLearnerExecutor'>
I0601 18:00:05.682653 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Client trainer got ta

2023-06-01 18:00:05,929 - ScatterAndGather - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: Abort signal received. Exiting at round 0.
2023-06-01 18:00:05,931 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: Workflow: scatter_and_gather finalizing ...
[NeMo I 2023-06-01 18:00:06 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-06-01 18:00:06 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:00:06 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-06-01 18:00:06 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-06-01 18:00:06 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:00:06 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-06-01 18:00:06 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:00:06 megatron_init:253] Rank 0 has tensor model pa

23-06-01 18:00:06 - PID:2481 - rank:(0, 0, 0, 0) - microbatches.py:39 - INFO - setting number of micro-batches to constant 16


2023-06-01 18:00:06,426 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: ABOUT_TO_END_RUN fired
2023-06-01 18:00:06,427 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: END_RUN fired
2023-06-01 18:00:06,429 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: Server runner finished.
2023-06-01 18:00:07,789 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-1, peer_run=simulate_job]: server runner is finalizing - asked client to end the run
2023-06-01 18:00:07,801 - GetTaskCommand - INFO - return task to client.  client_name: site-1  task_name: __end_run__   task_id:   sharable_header_task_id: 
2023-06-01 18:00:07,807 - FederatedClient - INFO - pull_task completed. Task name:__end_run__ Status:True 
2023-06-01 18:00:07,807 - ClientRunner - INFO - [identity=site-1, run=simulate_job, peer=simulator_server, pe

I0601 18:00:07.807184 140671809345344 fed_client.py:91] pull_task completed. Task name:__end_run__ Status:True 
I0601 18:00:07.807564 140671809345344 fl_component.py:134] [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: server asked to end the run
I0601 18:00:07.807730 140671809345344 simulator_worker.py:102] End the Simulator run.
I0601 18:00:07.808275 140671809345344 simulator_worker.py:125] Clean up ClientRunner for : site-1 


2023-06-01 18:00:08,361 - SimulatorServer - INFO - Server app stopped.


2023-06-01 18:00:08,607 - nvflare.fuel.hci.server.hci - INFO - Admin Server localhost on Port 55737 shutdown!
2023-06-01 18:00:10,132 - SimulatorServer - INFO - shutting down server
2023-06-01 18:00:10,135 - SimulatorServer - INFO - canceling sync locks
2023-06-01 18:00:10,136 - SimulatorServer - INFO - server off
[NeMo I 2023-06-01 18:00:32 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-06-01 18:00:32 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:00:32 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-06-01 18:00:32 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-06-01 18:00:32 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:00:32 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-06-01 18:00:32 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 202

[NeMo W 2023-06-01 18:00:32 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.


[NeMo I 2023-06-01 18:00:32 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-06-01 18:00:32 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:00:32 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-06-01 18:00:32 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-06-01 18:00:32 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:00:32 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-06-01 18:00:32 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:00:32 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-06-01 18:00:32 megatron_init:267] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2023-06-01 18:00:32 megatron_init:279] Rank 0 has embedding group: [0]
[NeMo I 2023-06-01 18:00:32 megatron_init:285] All pipeline model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:00:32 megatron_init:286]

[NeMo W 2023-06-01 18:00:32 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
Downloading (…)lve/main/config.json: 100%|██████████| 665/665 [00:00<00:00, 133kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 4.19MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 843kB/s]Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-06-01 18:00:36 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.



Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-06-01 18:00:36 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.
[NeMo I 2023-06-01 18:00:38 nlp_overrides:374] Model MegatronGPTModel was successfully restored from /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning/megatron_gpt_345m.nemo.
[NeMo I 2023-06-01 18:00:38 auto_tokenizer:172] 10 special tokens added, resize your model accordingly.
[NeMo I 2023-06-01 18:00:38 nlp_overrides:374] Model MegatronGPTModel was successfully restored from /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning/megatron_gpt_345m.nemo.
[NeMo I 2023-06-01 18:00:38 auto_tokenizer:172] 10 special tokens added, resize your model accordingly.


Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-06-01 18:01:10 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-06-01 18:01:10 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:01:10 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-06-01 18:01:10 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-06-01 18:01:10 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:01:10 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-06-01 18:01:10 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:01:10 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-06-01 18:01:10 megatron_init:267] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2023-06-01 18:01:10 megatron_init:279] Rank 0 has embedding group: [0]
[NeMo I 2023-06-01 18:01:10 megatron_init:285] All pipeline model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:01:10 megatron_init:286]

[NeMo W 2023-06-01 18:01:10 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.


[NeMo I 2023-06-01 18:01:11 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-06-01 18:01:11 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:01:11 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-06-01 18:01:11 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-06-01 18:01:11 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:01:11 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-06-01 18:01:11 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:01:11 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-06-01 18:01:11 megatron_init:267] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2023-06-01 18:01:11 megatron_init:279] Rank 0 has embedding group: [0]
[NeMo I 2023-06-01 18:01:11 megatron_init:285] All pipeline model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:01:11 megatron_init:286]

[NeMo W 2023-06-01 18:01:11 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-06-01 18:01:12 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.


Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-06-01 18:01:12 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.
[NeMo I 2023-06-01 18:01:13 nlp_overrides:374] Model MegatronGPTModel was successfully restored from /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning/megatron_gpt_345m.nemo.
[NeMo I 2023-06-01 18:01:13 auto_tokenizer:172] 10 special tokens added, resize your model accordingly.
2023-06-01 18:01:13,927 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Initialized model <class 'nemo_nvflare.fed_megatron_gpt_prompt_learning_model.FedMegatronGPTPromptLearningModel'> and prompt encoder <class 'nemo.collections.nlp.modules.common.prompt_encoder.PromptEncoder'>
2023-06-01 18:01:13,938 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e

Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.
I0601 18:01:13.927469 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Initialized model <class 'nemo_nvflare.fed_megatron_gpt_prompt_learning_model.FedMegatronGPTPromptLearningModel'> and prompt encoder <class 'nemo.collections.nlp.modules.common.prompt_encoder.PromptEncoder'>
I0601 18:01:13.938830 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Loaded 7 of 7 weights
I0601 18:01:13.945666 140713251325760 distributed.py:244] Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
I0601 18:01:13.950233 140713251325760 distributed_c10d.py:393] Added key: store_based_barrier_key:1 to store for rank: 0
I0601 18:01:13.950562 140713251325760 distribu

2023-06-01 18:01:14,077 - PromptLearner - INFO - [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: Initialized model <class 'nemo_nvflare.fed_megatron_gpt_prompt_learning_model.FedMegatronGPTPromptLearningModel'> and prompt encoder <class 'nemo.collections.nlp.modules.common.prompt_encoder.PromptEncoder'>
2023-06-01 18:01:14,087 - PromptLearner - INFO - [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: Loaded 7 of 7 weights
2023-06-01 18:01:14,092 - lightning_fabric.utilities.distributed - INFO - Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
2023-06-01 18:01:14,095 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 0
2023-06-01 18:01:14,095 - torch.distributed.distributed_c10d - INFO - Rank 0: Completed store-based barrier for key:store_ba

I0601 18:01:14.077955 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: Initialized model <class 'nemo_nvflare.fed_megatron_gpt_prompt_learning_model.FedMegatronGPTPromptLearningModel'> and prompt encoder <class 'nemo.collections.nlp.modules.common.prompt_encoder.PromptEncoder'>
I0601 18:01:14.087286 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: Loaded 7 of 7 weights
I0601 18:01:14.092693 140659256891200 distributed.py:244] Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
I0601 18:01:14.095425 140659256891200 distributed_c10d.py:393] Added key: store_based_barrier_key:1 to store for rank: 0
I0601 18:01:14.095743 140659256891200 distributed_c10d.py:427] Rank 0: Completed store-based barrier for key:store_based_barrie

[NeMo I 2023-06-01 18:01:15 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 
[NeMo I 2023-06-01 18:01:15 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 


604it [00:00, 818.55it/s]
604it [00:00, 793.51it/s]
0it [00:00, ?it/s]

[NeMo I 2023-06-01 18:01:16 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
[NeMo I 2023-06-01 18:01:16 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 
[NeMo I 2023-06-01 18:01:16 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
[NeMo I 2023-06-01 18:01:16 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 


226it [00:00, 890.19it/s]
I0601 18:01:16.408701 140659256891200 cuda.py:58] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
226it [00:00, 902.85it/s]
I0601 18:01:16.450244 140713251325760 cuda.py:58] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


[NeMo I 2023-06-01 18:01:16 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
2023-06-01 18:01:16,408 - pytorch_lightning.accelerators.cuda - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[NeMo I 2023-06-01 18:01:16 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
2023-06-01 18:01:16,450 - pytorch_lightning.accelerators.cuda - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Validation: 0it [00:00, ?it/s]

    
    


Validation DataLoader 0: 100%|██████████| 4/4 [00:06<00:00,  1.64s/it]2023-06-01 18:01:24,199 - root - INFO - global_model_val_loss: 6.832405090332031
Validation DataLoader 0: 100%|██████████| 4/4 [00:06<00:00,  1.64s/it]2023-06-01 18:01:24,207 - root - INFO - global_model_val_loss: 6.832405090332031

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1m     Validate metric     [0m[1m [0m┃[1m [0m[1m      DataLoader 0       [0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│[36m [0m[36m  global_model_val_loss  [0m[36m [0m│[35m [0m[35m    6.832405090332031    [0m[35m [0m│
└───────────────────────────┴───────────────────────────┘
Validation DataLoader 0: 100%|██████████| 4/4 [00:06<00:00,  1.64s/it]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1m     Validate metric     [0m[1m [0m┃[1m [0m[1m      DataLoader 0       [0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│[36m [0m

I0601 18:01:24.199280 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] global_model_val_loss: 6.832405090332031
I0601 18:01:24.207840 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] global_model_val_loss: 6.832405090332031


2023-06-01 18:01:24,752 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Global_model global_model_val_loss: 6.832405090332031
2023-06-01 18:01:24,753 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Current/Total Round: 1/1
2023-06-01 18:01:24,753 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Client identity: site-2
2023-06-01 18:01:24,767 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Loaded 7 of 7 weights
2023-06-01 18:01:24,767 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simul

I0601 18:01:24.752224 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Global_model global_model_val_loss: 6.832405090332031
I0601 18:01:24.753607 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Current/Total Round: 1/1
I0601 18:01:24.753800 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Client identity: site-2
I0601 18:01:24.767098 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Loaded 7 of 7 weights
I0601 18:01:24.767334 140713251325760 fl_component.py:1

[NeMo I 2023-06-01 18:01:24 nlp_overrides:105] Configuring DDP for model parallelism.
[NeMo I 2023-06-01 18:01:24 nlp_overrides:105] Configuring DDP for model parallelism.
[NeMo I 2023-06-01 18:01:25 modelPT:722] Optimizer config = FusedAdam (
    Parameter Group 0
        betas: [0.9, 0.98]
        bias_correction: True
        eps: 1e-08
        lr: 0.0001
        weight_decay: 0.01
    )
[NeMo I 2023-06-01 18:01:25 lr_scheduler:910] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7ffa0e1bd490>" 
    will be used during training (effective maximum steps = 11000) - 
    Parameters : 
    (warmup_steps: 50
    min_lr: 0.0
    constant_steps: 0
    max_steps: 11000
    )
[NeMo I 2023-06-01 18:01:25 modelPT:722] Optimizer config = FusedAdam (
    Parameter Group 0
        betas: [0.9, 0.98]
        bias_correction: True
        eps: 1e-08
        lr: 0.0001
        weight_decay: 0.01
    )
[NeMo I 2023-06-01 18:01:25 lr_scheduler:910] Scheduler "<nemo.core.optim.lr_s

I0601 18:01:25.022401 140713251325760 model_summary.py:83] 
  | Name            | Type                   | Params
-----------------------------------------------------------
0 | frozen_model    | MegatronGPTModel       | 354 M 
1 | word_embeddings | VocabParallelEmbedding | 51.5 M
2 | prompt_encoder  | PromptEncoder          | 4.2 M 
-----------------------------------------------------------
4.2 M     Trainable params
354 M     Non-trainable params
359 M     Total params
718.178   Total estimated model params size (MB)
I0601 18:01:25.037023 140659256891200 model_summary.py:83] 
  | Name            | Type                   | Params
-----------------------------------------------------------
0 | frozen_model    | MegatronGPTModel       | 354 M 
1 | word_embeddings | VocabParallelEmbedding | 51.5 M
2 | prompt_encoder  | PromptEncoder          | 4.2 M 
-----------------------------------------------------------
4.2 M     Trainable params
354 M     Non-trainable params
359 M     Total para

Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:02<00:00,  1.03s/it]2023-06-01 18:01:27,423 - root - INFO - val_loss: 6.231474876403809
                                                                           2023-06-01 18:01:27,426 - root - INFO - val_loss: 6.231474876403809
Epoch 0:   0%|          | 0/13 [00:00<?, ?it/s]                            

I0601 18:01:27.423478 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 6.231474876403809
I0601 18:01:27.426760 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 6.231474876403809
      rank_zero_warn(
    
      rank_zero_warn(
    
    
    


Epoch 0:   8%|▊         | 1/13 [00:05<01:06,  5.56s/it, loss=8.21, v_num=0, reduced_train_loss=8.210, global_step=0.000]

    


Epoch 0:  69%|██████▉   | 9/13 [00:19<00:08,  2.20s/it, loss=6.62, v_num=0, reduced_train_loss=5.410, global_step=8.000]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 0:  69%|██████▉   | 9/13 [00:20<00:09,  2.27s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 0:  77%|███████▋  | 10/13 [00:20<00:06,  2.07s/it, loss=6.62, v_num=0, reduced_train_loss=5.410, global_step=8.000]
Epoch 0:  77%|███████▋  | 10/13 [00:21<00:06,  2.14s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000]
Epoch 0:  85%|████████▍ | 11/13 [00:21<00:03,  1.97s/it, loss=6.62, v_num=0, reduced_train_loss=5.410, global_step=8.000]
Epoch 0:  85%|████████▍ | 11/13 [00:22<00:04,  2.03s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000]
Epoch 0:  92%|█████████▏| 12/1

I0601 18:01:50.145513 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 4.991292953491211



Epoch 0:  92%|█████████▏| 12/13 [00:23<00:01,  1.94s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000]
Epoch 0: 100%|██████████| 13/13 [00:23<00:00,  1.80s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000]2023-06-01 18:01:50,801 - root - INFO - val_loss: 5.130880355834961
Epoch 0: 100%|██████████| 13/13 [00:23<00:00,  1.80s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000, val_loss=5.130]
Epoch 1:   0%|          | 0/13 [00:00<?, ?it/s, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000, val_loss=5.130]         

I0601 18:01:50.801930 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 5.130880355834961


Epoch 1:  69%|██████▉   | 9/13 [00:16<00:07,  1.80s/it, loss=5.49, v_num=0, reduced_train_loss=3.260, global_step=17.00, val_loss=4.990]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 1:  69%|██████▉   | 9/13 [00:16<00:07,  1.79s/it, loss=5.71, v_num=0, reduced_train_loss=3.580, global_step=17.00, val_loss=5.130]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 1:  77%|███████▋  | 10/13 [00:17<00:05,  1.72s/it, loss=5.49, v_num=0, reduced_train_loss=3.260, global_step=17.00, val_loss=4.990]
Epoch 1:  77%|███████▋  | 10/13 [00:17<00:05,  1.71s/it, loss=5.71, v_num=0, reduced_train_loss=3.580, global_step=17.00, val_loss=5.130]
Epoch 1:  85%|████████▍ | 11/13 [00:18<00:03,  1.65s/it, loss=5.49, v_num=0, reduced_train_loss=3.260, global_step=17.00, val_loss=4.990]
Epoch 1:  85%|████████▍ | 11/13 [00:18<00:03,  1.64s/it, loss=5.71, v_nu

I0601 18:02:09.403805 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 2.129469394683838



Epoch 1:  92%|█████████▏| 12/13 [00:18<00:01,  1.58s/it, loss=5.71, v_num=0, reduced_train_loss=3.580, global_step=17.00, val_loss=5.130]
Epoch 1: 100%|██████████| 13/13 [00:19<00:00,  1.47s/it, loss=5.71, v_num=0, reduced_train_loss=3.580, global_step=17.00, val_loss=5.130]2023-06-01 18:02:09,869 - root - INFO - val_loss: 2.3115713596343994
Epoch 1: 100%|██████████| 13/13 [00:19<00:00,  1.47s/it, loss=5.71, v_num=0, reduced_train_loss=3.580, global_step=17.00, val_loss=2.310]
Epoch 2:   0%|          | 0/13 [00:00<?, ?it/s, loss=5.71, v_num=0, reduced_train_loss=3.580, global_step=17.00, val_loss=2.310]         

I0601 18:02:09.869889 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 2.3115713596343994


Epoch 2:  69%|██████▉   | 9/13 [00:16<00:07,  1.81s/it, loss=3.23, v_num=0, reduced_train_loss=0.734, global_step=26.00, val_loss=2.130]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 2:  69%|██████▉   | 9/13 [00:16<00:07,  1.81s/it, loss=3.57, v_num=0, reduced_train_loss=0.778, global_step=26.00, val_loss=2.310]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 2:  77%|███████▋  | 10/13 [00:17<00:05,  1.73s/it, loss=3.23, v_num=0, reduced_train_loss=0.734, global_step=26.00, val_loss=2.130]
Epoch 2:  77%|███████▋  | 10/13 [00:17<00:05,  1.73s/it, loss=3.57, v_num=0, reduced_train_loss=0.778, global_step=26.00, val_loss=2.310]
Epoch 2:  85%|████████▍ | 11/13 [00:18<00:03,  1.66s/it, loss=3.23, v_num=0, reduced_train_loss=0.734, global_step=26.00, val_loss=2.130]
Epoch 2:  85%|████████▍ | 11/13 [00:18<00:03,  1.66s/it, loss=3.57, v_nu

I0601 18:02:28.750339 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.39972931146621704



Epoch 2:  92%|█████████▏| 12/13 [00:19<00:01,  1.60s/it, loss=3.57, v_num=0, reduced_train_loss=0.778, global_step=26.00, val_loss=2.310]
Epoch 2: 100%|██████████| 13/13 [00:19<00:00,  1.48s/it, loss=3.57, v_num=0, reduced_train_loss=0.778, global_step=26.00, val_loss=2.310]2023-06-01 18:02:29,153 - root - INFO - val_loss: 0.5346519947052002
Epoch 2: 100%|██████████| 13/13 [00:19<00:00,  1.48s/it, loss=3.57, v_num=0, reduced_train_loss=0.778, global_step=26.00, val_loss=0.535]
Epoch 3:   0%|          | 0/13 [00:00<?, ?it/s, loss=3.57, v_num=0, reduced_train_loss=0.778, global_step=26.00, val_loss=0.535]         

I0601 18:02:29.153979 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.5346519947052002


Epoch 3:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=1.31, v_num=0, reduced_train_loss=0.465, global_step=35.00, val_loss=0.400]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 3:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=1.54, v_num=0, reduced_train_loss=0.502, global_step=35.00, val_loss=0.535]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 3:  77%|███████▋  | 10/13 [00:17<00:05,  1.74s/it, loss=1.31, v_num=0, reduced_train_loss=0.465, global_step=35.00, val_loss=0.400]
Epoch 3:  77%|███████▋  | 10/13 [00:17<00:05,  1.74s/it, loss=1.54, v_num=0, reduced_train_loss=0.502, global_step=35.00, val_loss=0.535]
Epoch 3:  85%|████████▍ | 11/13 [00:18<00:03,  1.67s/it, loss=1.31, v_num=0, reduced_train_loss=0.465, global_step=35.00, val_loss=0.400]
Epoch 3:  85%|████████▍ | 11/13 [00:18<00:03,  1.67s/it, loss=1.54, v_nu

I0601 18:02:48.101423 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.3097499907016754



Epoch 3:  92%|█████████▏| 12/13 [00:19<00:01,  1.61s/it, loss=1.54, v_num=0, reduced_train_loss=0.502, global_step=35.00, val_loss=0.535]
Epoch 3: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=1.54, v_num=0, reduced_train_loss=0.502, global_step=35.00, val_loss=0.535]2023-06-01 18:02:48,578 - root - INFO - val_loss: 0.3871537148952484
Epoch 3: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=1.54, v_num=0, reduced_train_loss=0.502, global_step=35.00, val_loss=0.387]
Epoch 4:   0%|          | 0/13 [00:00<?, ?it/s, loss=1.54, v_num=0, reduced_train_loss=0.502, global_step=35.00, val_loss=0.387]         

I0601 18:02:48.578599 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.3871537148952484


Epoch 4:  69%|██████▉   | 9/13 [00:16<00:07,  1.81s/it, loss=0.548, v_num=0, reduced_train_loss=0.413, global_step=44.00, val_loss=0.310]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 4:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.63, v_num=0, reduced_train_loss=0.561, global_step=44.00, val_loss=0.387]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 4:  77%|███████▋  | 10/13 [00:17<00:05,  1.73s/it, loss=0.548, v_num=0, reduced_train_loss=0.413, global_step=44.00, val_loss=0.310]
Epoch 4:  77%|███████▋  | 10/13 [00:17<00:05,  1.75s/it, loss=0.63, v_num=0, reduced_train_loss=0.561, global_step=44.00, val_loss=0.387]
Epoch 4:  85%|████████▍ | 11/13 [00:18<00:03,  1.66s/it, loss=0.548, v_num=0, reduced_train_loss=0.413, global_step=44.00, val_loss=0.310]
Epoch 4:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.63, v

I0601 18:03:07.373527 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.2139081358909607



Epoch 4:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.63, v_num=0, reduced_train_loss=0.561, global_step=44.00, val_loss=0.387]
Epoch 4: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.63, v_num=0, reduced_train_loss=0.561, global_step=44.00, val_loss=0.387]2023-06-01 18:03:08,163 - root - INFO - val_loss: 0.31660202145576477
Epoch 4: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.63, v_num=0, reduced_train_loss=0.561, global_step=44.00, val_loss=0.317]
Epoch 5:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.63, v_num=0, reduced_train_loss=0.561, global_step=44.00, val_loss=0.317]         

I0601 18:03:08.163386 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.31660202145576477


Epoch 5:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.433, v_num=0, reduced_train_loss=0.397, global_step=53.00, val_loss=0.214]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 5:  69%|██████▉   | 9/13 [00:16<00:07,  1.81s/it, loss=0.513, v_num=0, reduced_train_loss=0.363, global_step=53.00, val_loss=0.317]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 5:  77%|███████▋  | 10/13 [00:17<00:05,  1.74s/it, loss=0.433, v_num=0, reduced_train_loss=0.397, global_step=53.00, val_loss=0.214]
Epoch 5:  77%|███████▋  | 10/13 [00:17<00:05,  1.73s/it, loss=0.513, v_num=0, reduced_train_loss=0.363, global_step=53.00, val_loss=0.317]
Epoch 5:  85%|████████▍ | 11/13 [00:18<00:03,  1.67s/it, loss=0.433, v_num=0, reduced_train_loss=0.397, global_step=53.00, val_loss=0.214]
Epoch 5:  85%|████████▍ | 11/13 [00:18<00:03,  1.66s/it, loss=0.513

I0601 18:03:26.739750 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.19116896390914917



Epoch 5:  92%|█████████▏| 12/13 [00:19<00:01,  1.61s/it, loss=0.513, v_num=0, reduced_train_loss=0.363, global_step=53.00, val_loss=0.317]
Epoch 5: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.513, v_num=0, reduced_train_loss=0.363, global_step=53.00, val_loss=0.317]2023-06-01 18:03:27,512 - root - INFO - val_loss: 0.21267980337142944
Epoch 5: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.513, v_num=0, reduced_train_loss=0.363, global_step=53.00, val_loss=0.213]
Epoch 6:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.513, v_num=0, reduced_train_loss=0.363, global_step=53.00, val_loss=0.213]         

I0601 18:03:27.512576 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.21267980337142944


Epoch 6:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.356, v_num=0, reduced_train_loss=0.206, global_step=62.00, val_loss=0.191]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 6:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.423, v_num=0, reduced_train_loss=0.318, global_step=62.00, val_loss=0.213]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 6:  77%|███████▋  | 10/13 [00:17<00:05,  1.75s/it, loss=0.356, v_num=0, reduced_train_loss=0.206, global_step=62.00, val_loss=0.191]
Epoch 6:  77%|███████▋  | 10/13 [00:17<00:05,  1.74s/it, loss=0.423, v_num=0, reduced_train_loss=0.318, global_step=62.00, val_loss=0.213]
Epoch 6:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.356, v_num=0, reduced_train_loss=0.206, global_step=62.00, val_loss=0.191]
Epoch 6:  85%|████████▍ | 11/13 [00:18<00:03,  1.66s/it, loss=0.423

I0601 18:03:46.349517 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1411426067352295



Epoch 6:  92%|█████████▏| 12/13 [00:19<00:01,  1.61s/it, loss=0.423, v_num=0, reduced_train_loss=0.318, global_step=62.00, val_loss=0.213]
Epoch 6: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.423, v_num=0, reduced_train_loss=0.318, global_step=62.00, val_loss=0.213]2023-06-01 18:03:46,900 - root - INFO - val_loss: 0.20767417550086975
Epoch 6: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.423, v_num=0, reduced_train_loss=0.318, global_step=62.00, val_loss=0.208]
Epoch 7:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.423, v_num=0, reduced_train_loss=0.318, global_step=62.00, val_loss=0.208]         

I0601 18:03:46.900951 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.20767417550086975


Epoch 7:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.292, v_num=0, reduced_train_loss=0.236, global_step=71.00, val_loss=0.141]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 7:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.352, v_num=0, reduced_train_loss=0.320, global_step=71.00, val_loss=0.208]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 7:  77%|███████▋  | 10/13 [00:17<00:05,  1.75s/it, loss=0.292, v_num=0, reduced_train_loss=0.236, global_step=71.00, val_loss=0.141]
Epoch 7:  77%|███████▋  | 10/13 [00:17<00:05,  1.75s/it, loss=0.352, v_num=0, reduced_train_loss=0.320, global_step=71.00, val_loss=0.208]
Epoch 7:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.292, v_num=0, reduced_train_loss=0.236, global_step=71.00, val_loss=0.141]
Epoch 7:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.352

I0601 18:04:06.004587 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.12022870779037476



Epoch 7:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.352, v_num=0, reduced_train_loss=0.320, global_step=71.00, val_loss=0.208]
Epoch 7: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.352, v_num=0, reduced_train_loss=0.320, global_step=71.00, val_loss=0.208]2023-06-01 18:04:06,430 - root - INFO - val_loss: 0.13132905960083008
Epoch 7: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.352, v_num=0, reduced_train_loss=0.320, global_step=71.00, val_loss=0.131]
Epoch 8:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.352, v_num=0, reduced_train_loss=0.320, global_step=71.00, val_loss=0.131]         

I0601 18:04:06.430940 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.13132905960083008


Epoch 8:  69%|██████▉   | 9/13 [00:16<00:07,  1.81s/it, loss=0.232, v_num=0, reduced_train_loss=0.180, global_step=80.00, val_loss=0.120]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 8:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.322, v_num=0, reduced_train_loss=0.303, global_step=80.00, val_loss=0.131]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 8:  77%|███████▋  | 10/13 [00:17<00:05,  1.73s/it, loss=0.232, v_num=0, reduced_train_loss=0.180, global_step=80.00, val_loss=0.120]
Epoch 8:  77%|███████▋  | 10/13 [00:17<00:05,  1.74s/it, loss=0.322, v_num=0, reduced_train_loss=0.303, global_step=80.00, val_loss=0.131]
Epoch 8:  85%|████████▍ | 11/13 [00:18<00:03,  1.66s/it, loss=0.232, v_num=0, reduced_train_loss=0.180, global_step=80.00, val_loss=0.120]
Epoch 8:  85%|████████▍ | 11/13 [00:18<00:03,  1.67s/it, loss=0.322

I0601 18:04:25.348242 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.12506113946437836



Epoch 8:  92%|█████████▏| 12/13 [00:19<00:01,  1.61s/it, loss=0.322, v_num=0, reduced_train_loss=0.303, global_step=80.00, val_loss=0.131]
Epoch 8: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.322, v_num=0, reduced_train_loss=0.303, global_step=80.00, val_loss=0.131]2023-06-01 18:04:25,797 - root - INFO - val_loss: 0.1160048171877861
Epoch 8: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.322, v_num=0, reduced_train_loss=0.303, global_step=80.00, val_loss=0.116]
Epoch 9:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.322, v_num=0, reduced_train_loss=0.303, global_step=80.00, val_loss=0.116]         

I0601 18:04:25.797415 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1160048171877861


Epoch 9:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.208, v_num=0, reduced_train_loss=0.145, global_step=89.00, val_loss=0.125]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 9:  69%|██████▉   | 9/13 [00:16<00:07,  1.86s/it, loss=0.279, v_num=0, reduced_train_loss=0.177, global_step=89.00, val_loss=0.116]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 9:  77%|███████▋  | 10/13 [00:17<00:05,  1.75s/it, loss=0.208, v_num=0, reduced_train_loss=0.145, global_step=89.00, val_loss=0.125]
Epoch 9:  77%|███████▋  | 10/13 [00:17<00:05,  1.77s/it, loss=0.279, v_num=0, reduced_train_loss=0.177, global_step=89.00, val_loss=0.116]
Epoch 9:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.208, v_num=0, reduced_train_loss=0.145, global_step=89.00, val_loss=0.125]
Epoch 9:  85%|████████▍ | 11/13 [00:18<00:03,  1.70s/it, loss=0.279

I0601 18:04:44.831470 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1013915166258812



Epoch 9:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.279, v_num=0, reduced_train_loss=0.177, global_step=89.00, val_loss=0.116]
Epoch 9: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.279, v_num=0, reduced_train_loss=0.177, global_step=89.00, val_loss=0.116]2023-06-01 18:04:45,498 - root - INFO - val_loss: 0.11481466144323349
Epoch 9: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.279, v_num=0, reduced_train_loss=0.177, global_step=89.00, val_loss=0.115]
Epoch 10:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.279, v_num=0, reduced_train_loss=0.177, global_step=89.00, val_loss=0.115]        

I0601 18:04:45.498575 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.11481466144323349


Epoch 10:  69%|██████▉   | 9/13 [00:17<00:07,  1.93s/it, loss=0.185, v_num=0, reduced_train_loss=0.180, global_step=98.00, val_loss=0.101]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 10:  69%|██████▉   | 9/13 [00:17<00:07,  1.91s/it, loss=0.255, v_num=0, reduced_train_loss=0.184, global_step=98.00, val_loss=0.115]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 10:  77%|███████▋  | 10/13 [00:18<00:05,  1.83s/it, loss=0.185, v_num=0, reduced_train_loss=0.180, global_step=98.00, val_loss=0.101]
Epoch 10:  77%|███████▋  | 10/13 [00:18<00:05,  1.83s/it, loss=0.255, v_num=0, reduced_train_loss=0.184, global_step=98.00, val_loss=0.115]
Epoch 10:  85%|████████▍ | 11/13 [00:19<00:03,  1.75s/it, loss=0.185, v_num=0, reduced_train_loss=0.180, global_step=98.00, val_loss=0.101]
Epoch 10:  85%|████████▍ | 11/13 [00:19<00:03,  1.75s/it, loss

I0601 18:05:05.138164 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08494291454553604



Epoch 10:  92%|█████████▏| 12/13 [00:20<00:01,  1.68s/it, loss=0.255, v_num=0, reduced_train_loss=0.184, global_step=98.00, val_loss=0.115]
Epoch 10: 100%|██████████| 13/13 [00:20<00:00,  1.56s/it, loss=0.255, v_num=0, reduced_train_loss=0.184, global_step=98.00, val_loss=0.115]2023-06-01 18:05:05,782 - root - INFO - val_loss: 0.15055806934833527
Epoch 10: 100%|██████████| 13/13 [00:20<00:00,  1.56s/it, loss=0.255, v_num=0, reduced_train_loss=0.184, global_step=98.00, val_loss=0.151]
Epoch 11:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.255, v_num=0, reduced_train_loss=0.184, global_step=98.00, val_loss=0.151]         

I0601 18:05:05.782470 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.15055806934833527


Epoch 11:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.175, v_num=0, reduced_train_loss=0.120, global_step=107.0, val_loss=0.0849]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 11:  69%|██████▉   | 9/13 [00:16<00:07,  1.88s/it, loss=0.219, v_num=0, reduced_train_loss=0.176, global_step=107.0, val_loss=0.151]9]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 11:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.175, v_num=0, reduced_train_loss=0.120, global_step=107.0, val_loss=0.0849]
Epoch 11:  77%|███████▋  | 10/13 [00:17<00:05,  1.80s/it, loss=0.219, v_num=0, reduced_train_loss=0.176, global_step=107.0, val_loss=0.151]
Epoch 11:  85%|████████▍ | 11/13 [00:19<00:03,  1.73s/it, loss=0.219, v_num=0, reduced_train_loss=0.176, global_step=107.0, val_loss

I0601 18:05:25.001159 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1008821427822113



Epoch 11:  92%|█████████▏| 12/13 [00:20<00:01,  1.67s/it, loss=0.219, v_num=0, reduced_train_loss=0.176, global_step=107.0, val_loss=0.151]
Epoch 11: 100%|██████████| 13/13 [00:20<00:00,  1.55s/it, loss=0.219, v_num=0, reduced_train_loss=0.176, global_step=107.0, val_loss=0.151]2023-06-01 18:05:25,918 - root - INFO - val_loss: 0.134514719247818
Epoch 11: 100%|██████████| 13/13 [00:20<00:00,  1.55s/it, loss=0.219, v_num=0, reduced_train_loss=0.176, global_step=107.0, val_loss=0.135]
Epoch 12:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.219, v_num=0, reduced_train_loss=0.176, global_step=107.0, val_loss=0.135]         

I0601 18:05:25.918585 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.134514719247818


Epoch 12:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.157, v_num=0, reduced_train_loss=0.125, global_step=116.0, val_loss=0.101] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 12:  69%|██████▉   | 9/13 [00:16<00:07,  1.86s/it, loss=0.199, v_num=0, reduced_train_loss=0.181, global_step=116.0, val_loss=0.135]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 12:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.157, v_num=0, reduced_train_loss=0.125, global_step=116.0, val_loss=0.101]
Epoch 12:  77%|███████▋  | 10/13 [00:17<00:05,  1.77s/it, loss=0.199, v_num=0, reduced_train_loss=0.181, global_step=116.0, val_loss=0.135]
Epoch 12:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.157, v_num=0, reduced_train_loss=0.125, global_step=116.0, val_loss=0

I0601 18:05:44.498316 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10436573624610901



Epoch 12:  85%|████████▍ | 11/13 [00:18<00:03,  1.72s/it, loss=0.199, v_num=0, reduced_train_loss=0.181, global_step=116.0, val_loss=0.135]
Epoch 12:  92%|█████████▏| 12/13 [00:19<00:01,  1.66s/it, loss=0.199, v_num=0, reduced_train_loss=0.181, global_step=116.0, val_loss=0.135]
Epoch 12: 100%|██████████| 13/13 [00:19<00:00,  1.54s/it, loss=0.199, v_num=0, reduced_train_loss=0.181, global_step=116.0, val_loss=0.135]2023-06-01 18:05:45,919 - root - INFO - val_loss: 0.09570994228124619
Epoch 12: 100%|██████████| 13/13 [00:19<00:00,  1.54s/it, loss=0.199, v_num=0, reduced_train_loss=0.181, global_step=116.0, val_loss=0.0957]
Epoch 13:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.199, v_num=0, reduced_train_loss=0.181, global_step=116.0, val_loss=0.0957]         

I0601 18:05:45.919863 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.09570994228124619


Epoch 13:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.128, v_num=0, reduced_train_loss=0.134, global_step=125.0, val_loss=0.104]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 13:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.194, v_num=0, reduced_train_loss=0.180, global_step=125.0, val_loss=0.0957]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 13:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.128, v_num=0, reduced_train_loss=0.134, global_step=125.0, val_loss=0.104]
Epoch 13:  77%|███████▋  | 10/13 [00:17<00:05,  1.76s/it, loss=0.194, v_num=0, reduced_train_loss=0.180, global_step=125.0, val_loss=0.0957]
Epoch 13:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.128, v_num=0, reduced_train_loss=0.134, global_step=125.0, val_loss=

I0601 18:06:04.107512 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.09089447557926178



Epoch 13:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.194, v_num=0, reduced_train_loss=0.180, global_step=125.0, val_loss=0.0957]
Epoch 13:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.194, v_num=0, reduced_train_loss=0.180, global_step=125.0, val_loss=0.0957]
Epoch 13: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.194, v_num=0, reduced_train_loss=0.180, global_step=125.0, val_loss=0.0957]2023-06-01 18:06:05,597 - root - INFO - val_loss: 0.11376456916332245
Epoch 13: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.194, v_num=0, reduced_train_loss=0.180, global_step=125.0, val_loss=0.114] 
Epoch 14:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.194, v_num=0, reduced_train_loss=0.180, global_step=125.0, val_loss=0.114]         

I0601 18:06:05.597873 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.11376456916332245


Epoch 14:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.126, v_num=0, reduced_train_loss=0.0841, global_step=134.0, val_loss=0.0909]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 14:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.175, v_num=0, reduced_train_loss=0.132, global_step=134.0, val_loss=0.114]09]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 14:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.126, v_num=0, reduced_train_loss=0.0841, global_step=134.0, val_loss=0.0909]
Epoch 14:  77%|███████▋  | 10/13 [00:17<00:05,  1.77s/it, loss=0.175, v_num=0, reduced_train_loss=0.132, global_step=134.0, val_loss=0.114]
Epoch 14:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.126, v_num=0, reduced_train_loss=0.0841, global_step=134.0, val_

I0601 18:06:23.620007 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07008767873048782



Epoch 14:  85%|████████▍ | 11/13 [00:18<00:03,  1.70s/it, loss=0.175, v_num=0, reduced_train_loss=0.132, global_step=134.0, val_loss=0.114]
Epoch 14:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.175, v_num=0, reduced_train_loss=0.132, global_step=134.0, val_loss=0.114]
Epoch 14: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.175, v_num=0, reduced_train_loss=0.132, global_step=134.0, val_loss=0.114]2023-06-01 18:06:25,328 - root - INFO - val_loss: 0.10617248713970184
Epoch 14: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.175, v_num=0, reduced_train_loss=0.132, global_step=134.0, val_loss=0.106]
Epoch 15:   8%|▊         | 1/13 [00:01<00:21,  1.82s/it, loss=0.128, v_num=0, reduced_train_loss=0.211, global_step=135.0, val_loss=0.0701] 

I0601 18:06:25.328665 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10617248713970184


Epoch 15:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.118, v_num=0, reduced_train_loss=0.132, global_step=143.0, val_loss=0.0701] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 15:  62%|██████▏   | 8/13 [00:14<00:09,  1.86s/it, loss=0.162, v_num=0, reduced_train_loss=0.191, global_step=142.0, val_loss=0.106]
Epoch 15:  77%|███████▋  | 10/13 [00:17<00:05,  1.75s/it, loss=0.118, v_num=0, reduced_train_loss=0.132, global_step=143.0, val_loss=0.0701]
Epoch 15:  69%|██████▉   | 9/13 [00:16<00:07,  1.86s/it, loss=0.163, v_num=0, reduced_train_loss=0.221, global_step=143.0, val_loss=0.106]1]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 15:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.118, v_num=0, reduced_train_loss=0.132, global_step=143.0, val_loss=0.0701]
Epoch 15: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it,

I0601 18:06:43.113163 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.0541568323969841



Epoch 16:   8%|▊         | 1/13 [00:01<00:22,  1.90s/it, loss=0.112, v_num=0, reduced_train_loss=0.0454, global_step=144.0, val_loss=0.0542]
Epoch 15:  92%|█████████▏| 12/13 [00:19<00:01,  1.65s/it, loss=0.163, v_num=0, reduced_train_loss=0.221, global_step=143.0, val_loss=0.106]
Epoch 15: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.163, v_num=0, reduced_train_loss=0.221, global_step=143.0, val_loss=0.106]2023-06-01 18:06:45,240 - root - INFO - val_loss: 0.08977265655994415
Epoch 15: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.163, v_num=0, reduced_train_loss=0.221, global_step=143.0, val_loss=0.0898]
Epoch 16:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.163, v_num=0, reduced_train_loss=0.221, global_step=143.0, val_loss=0.0898]         

I0601 18:06:45.240375 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08977265655994415


Epoch 16:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.106, v_num=0, reduced_train_loss=0.129, global_step=152.0, val_loss=0.0542] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 16:  62%|██████▏   | 8/13 [00:14<00:09,  1.85s/it, loss=0.154, v_num=0, reduced_train_loss=0.173, global_step=151.0, val_loss=0.0898]
Epoch 16:  77%|███████▋  | 10/13 [00:17<00:05,  1.77s/it, loss=0.106, v_num=0, reduced_train_loss=0.129, global_step=152.0, val_loss=0.0542]
Epoch 16:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.149, v_num=0, reduced_train_loss=0.111, global_step=152.0, val_loss=0.0898]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 16:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.106, v_num=0, reduced_train_loss=0.129, global_step=152.0, val_loss=0.0542]
Epoch 16: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it

I0601 18:07:02.919969 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.0644853487610817



Epoch 17:   8%|▊         | 1/13 [00:01<00:22,  1.85s/it, loss=0.106, v_num=0, reduced_train_loss=0.0792, global_step=153.0, val_loss=0.0645]
Epoch 16:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.149, v_num=0, reduced_train_loss=0.111, global_step=152.0, val_loss=0.0898]
Epoch 16: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.149, v_num=0, reduced_train_loss=0.111, global_step=152.0, val_loss=0.0898]2023-06-01 18:07:04,977 - root - INFO - val_loss: 0.09590989351272583
Epoch 16: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.149, v_num=0, reduced_train_loss=0.111, global_step=152.0, val_loss=0.0959]
Epoch 17:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.149, v_num=0, reduced_train_loss=0.111, global_step=152.0, val_loss=0.0959]         

I0601 18:07:04.977424 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.09590989351272583


Epoch 17:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.106, v_num=0, reduced_train_loss=0.0784, global_step=161.0, val_loss=0.0645] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 17:  62%|██████▏   | 8/13 [00:14<00:09,  1.86s/it, loss=0.147, v_num=0, reduced_train_loss=0.0614, global_step=160.0, val_loss=0.0959]
Epoch 17:  77%|███████▋  | 10/13 [00:17<00:05,  1.74s/it, loss=0.106, v_num=0, reduced_train_loss=0.0784, global_step=161.0, val_loss=0.0645]
Epoch 17:  69%|██████▉   | 9/13 [00:16<00:07,  1.86s/it, loss=0.142, v_num=0, reduced_train_loss=0.0589, global_step=161.0, val_loss=0.0959]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 17:  92%|█████████▏| 12/13 [00:19<00:01,  1.61s/it, loss=0.106, v_num=0, reduced_train_loss=0.0784, global_step=161.0, val_loss=0.0645]
Epoch 17: 100%|██████████| 13/13 [00:19<00:00,  1.4

I0601 18:07:22.286593 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.051474712789058685



Epoch 17:  77%|███████▋  | 10/13 [00:17<00:05,  1.78s/it, loss=0.142, v_num=0, reduced_train_loss=0.0589, global_step=161.0, val_loss=0.0959]
Epoch 18:   8%|▊         | 1/13 [00:01<00:22,  1.89s/it, loss=0.106, v_num=0, reduced_train_loss=0.073, global_step=162.0, val_loss=0.0515] ]
Epoch 17:  92%|█████████▏| 12/13 [00:19<00:01,  1.65s/it, loss=0.142, v_num=0, reduced_train_loss=0.0589, global_step=161.0, val_loss=0.0959]
Epoch 17: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.142, v_num=0, reduced_train_loss=0.0589, global_step=161.0, val_loss=0.0959]2023-06-01 18:07:24,799 - root - INFO - val_loss: 0.08990500867366791
Epoch 17: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.142, v_num=0, reduced_train_loss=0.0589, global_step=161.0, val_loss=0.0899]
Epoch 18:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.142, v_num=0, reduced_train_loss=0.0589, global_step=161.0, val_loss=0.0899]         

I0601 18:07:24.799031 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08990500867366791


Epoch 18:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.102, v_num=0, reduced_train_loss=0.0874, global_step=170.0, val_loss=0.0515]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 18:  62%|██████▏   | 8/13 [00:14<00:09,  1.86s/it, loss=0.133, v_num=0, reduced_train_loss=0.155, global_step=169.0, val_loss=0.0899]5]
Epoch 18:  69%|██████▉   | 9/13 [00:16<00:07,  1.86s/it, loss=0.136, v_num=0, reduced_train_loss=0.199, global_step=170.0, val_loss=0.0899]5]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 18:  92%|█████████▏| 12/13 [00:19<00:01,  1.61s/it, loss=0.102, v_num=0, reduced_train_loss=0.0874, global_step=170.0, val_loss=0.0515]
Epoch 18: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.102, v_num=0, reduced_train_loss=0.0874, global_step=170.0, va

I0601 18:07:41.667587 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06365110725164413



Epoch 19:   8%|▊         | 1/13 [00:01<00:22,  1.91s/it, loss=0.0992, v_num=0, reduced_train_loss=0.0774, global_step=171.0, val_loss=0.0637]
Epoch 18:  85%|████████▍ | 11/13 [00:18<00:03,  1.71s/it, loss=0.136, v_num=0, reduced_train_loss=0.199, global_step=170.0, val_loss=0.0899]
Epoch 18:  92%|█████████▏| 12/13 [00:19<00:01,  1.65s/it, loss=0.136, v_num=0, reduced_train_loss=0.199, global_step=170.0, val_loss=0.0899]
Epoch 18: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.136, v_num=0, reduced_train_loss=0.199, global_step=170.0, val_loss=0.0899]2023-06-01 18:07:44,691 - root - INFO - val_loss: 0.08345378935337067
Epoch 18: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.136, v_num=0, reduced_train_loss=0.199, global_step=170.0, val_loss=0.0835]
Epoch 19:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.136, v_num=0, reduced_train_loss=0.199, global_step=170.0, val_loss=0.0835]         

I0601 18:07:44.691220 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08345378935337067


Epoch 19:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.0997, v_num=0, reduced_train_loss=0.0571, global_step=179.0, val_loss=0.0637]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 19:  62%|██████▏   | 8/13 [00:14<00:09,  1.84s/it, loss=0.13, v_num=0, reduced_train_loss=0.200, global_step=178.0, val_loss=0.0835] 37]
Epoch 19:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.0997, v_num=0, reduced_train_loss=0.0571, global_step=179.0, val_loss=0.0637]
Epoch 19:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0997, v_num=0, reduced_train_loss=0.0571, global_step=179.0, val_loss=0.0637]
Epoch 19:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.13, v_num=0, reduced_train_loss=0.200, global_step=178.0, val_loss=0.0835]2023-06-01 18:08:01,260 - root - INFO - val_loss: 0.05994737520813942
Epoch 19: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.099

I0601 18:08:01.260590 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.05994737520813942



Epoch 20:   8%|▊         | 1/13 [00:01<00:22,  1.86s/it, loss=0.0973, v_num=0, reduced_train_loss=0.138, global_step=180.0, val_loss=0.0599] 
Epoch 19:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.131, v_num=0, reduced_train_loss=0.173, global_step=179.0, val_loss=0.0835]
Epoch 19:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.131, v_num=0, reduced_train_loss=0.173, global_step=179.0, val_loss=0.0835]
Epoch 19: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.131, v_num=0, reduced_train_loss=0.173, global_step=179.0, val_loss=0.0835]2023-06-01 18:08:04,390 - root - INFO - val_loss: 0.07826338708400726
Epoch 19: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.131, v_num=0, reduced_train_loss=0.173, global_step=179.0, val_loss=0.0783]
Epoch 20:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.131, v_num=0, reduced_train_loss=0.173, global_step=179.0, val_loss=0.0783]         

I0601 18:08:04.390181 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07826338708400726


Epoch 20:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.101, v_num=0, reduced_train_loss=0.182, global_step=188.0, val_loss=0.0599]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 20:  62%|██████▏   | 8/13 [00:14<00:09,  1.85s/it, loss=0.134, v_num=0, reduced_train_loss=0.148, global_step=187.0, val_loss=0.0783]]
Epoch 20:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.101, v_num=0, reduced_train_loss=0.182, global_step=188.0, val_loss=0.0599]
Epoch 20:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.101, v_num=0, reduced_train_loss=0.182, global_step=188.0, val_loss=0.0599]
Epoch 20: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.101, v_num=0, reduced_train_loss=0.182, global_step=188.0, val_loss=0.0599]2023-06-01 18:08:20,892 - root - INFO - val_loss: 0.08986059576272964
Epoch 20: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.101, v_

I0601 18:08:20.892435 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08986059576272964


Epoch 20:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.131, v_num=0, reduced_train_loss=0.0903, global_step=188.0, val_loss=0.0783]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 21:   8%|▊         | 1/13 [00:01<00:23,  1.93s/it, loss=0.11, v_num=0, reduced_train_loss=0.223, global_step=189.0, val_loss=0.0899] 3]
Epoch 20:  85%|████████▍ | 11/13 [00:18<00:03,  1.70s/it, loss=0.131, v_num=0, reduced_train_loss=0.0903, global_step=188.0, val_loss=0.0783]
Epoch 20:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.131, v_num=0, reduced_train_loss=0.0903, global_step=188.0, val_loss=0.0783]
Epoch 20: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.131, v_num=0, reduced_train_loss=0.0903, global_step=188.0, val_loss=0.0783]2023-06-01 18:08:24,111 - root - INFO - val_loss: 0.09582322090864182
Epoch 20: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.131,

I0601 18:08:24.111446 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.09582322090864182


Epoch 21:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.119, v_num=0, reduced_train_loss=0.111, global_step=197.0, val_loss=0.0899] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 21:  62%|██████▏   | 8/13 [00:14<00:09,  1.84s/it, loss=0.129, v_num=0, reduced_train_loss=0.126, global_step=196.0, val_loss=0.0958]]
Epoch 21:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.119, v_num=0, reduced_train_loss=0.111, global_step=197.0, val_loss=0.0899]
Epoch 21:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.119, v_num=0, reduced_train_loss=0.111, global_step=197.0, val_loss=0.0899]
Epoch 21: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.119, v_num=0, reduced_train_loss=0.111, global_step=197.0, val_loss=0.0899]2023-06-01 18:08:40,503 - root - INFO - val_loss: 0.05841206759214401
Epoch 21: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.119, v_n

I0601 18:08:40.503492 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.05841206759214401


Epoch 21:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.123, v_num=0, reduced_train_loss=0.0722, global_step=197.0, val_loss=0.0958]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 22:   8%|▊         | 1/13 [00:01<00:22,  1.87s/it, loss=0.117, v_num=0, reduced_train_loss=0.102, global_step=198.0, val_loss=0.0584]8]
Epoch 21:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.123, v_num=0, reduced_train_loss=0.0722, global_step=197.0, val_loss=0.0958]
Epoch 21:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.123, v_num=0, reduced_train_loss=0.0722, global_step=197.0, val_loss=0.0958]
Epoch 21: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.123, v_num=0, reduced_train_loss=0.0722, global_step=197.0, val_loss=0.0958]2023-06-01 18:08:43,687 - root - INFO - val_loss: 0.08032045513391495
Epoch 21: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.123,

I0601 18:08:43.687682 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08032045513391495


Epoch 22:  69%|██████▉   | 9/13 [00:16<00:07,  1.88s/it, loss=0.128, v_num=0, reduced_train_loss=0.116, global_step=206.0, val_loss=0.0584] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 22:  62%|██████▏   | 8/13 [00:14<00:09,  1.84s/it, loss=0.131, v_num=0, reduced_train_loss=0.136, global_step=205.0, val_loss=0.0803]]
Epoch 22:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.132, v_num=0, reduced_train_loss=0.147, global_step=206.0, val_loss=0.0803]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 22:  92%|█████████▏| 12/13 [00:19<00:01,  1.65s/it, loss=0.128, v_num=0, reduced_train_loss=0.116, global_step=206.0, val_loss=0.0584]
Epoch 22: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.128, v_num=0, reduced_train_loss=0.116, global_step=206.0, val_lo

I0601 18:09:00.423516 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.053931303322315216



Epoch 22:  77%|███████▋  | 10/13 [00:17<00:05,  1.76s/it, loss=0.132, v_num=0, reduced_train_loss=0.147, global_step=206.0, val_loss=0.0803]
Epoch 23:   8%|▊         | 1/13 [00:01<00:23,  1.95s/it, loss=0.125, v_num=0, reduced_train_loss=0.0787, global_step=207.0, val_loss=0.0539]
Epoch 22:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.132, v_num=0, reduced_train_loss=0.147, global_step=206.0, val_loss=0.0803]
Epoch 22: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.132, v_num=0, reduced_train_loss=0.147, global_step=206.0, val_loss=0.0803]2023-06-01 18:09:03,338 - root - INFO - val_loss: 0.10133230686187744
Epoch 22: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.132, v_num=0, reduced_train_loss=0.147, global_step=206.0, val_loss=0.101] 
Epoch 23:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.132, v_num=0, reduced_train_loss=0.147, global_step=206.0, val_loss=0.101]         

I0601 18:09:03.338592 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.10133230686187744


Epoch 23:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.114, v_num=0, reduced_train_loss=0.0967, global_step=215.0, val_loss=0.0539]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 23:  62%|██████▏   | 8/13 [00:14<00:09,  1.85s/it, loss=0.13, v_num=0, reduced_train_loss=0.0818, global_step=214.0, val_loss=0.101] 
Epoch 23:  77%|███████▋  | 10/13 [00:17<00:05,  1.78s/it, loss=0.114, v_num=0, reduced_train_loss=0.0967, global_step=215.0, val_loss=0.0539]
Epoch 23:  69%|██████▉   | 9/13 [00:16<00:07,  1.86s/it, loss=0.129, v_num=0, reduced_train_loss=0.146, global_step=215.0, val_loss=0.101]39]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 23:  92%|█████████▏| 12/13 [00:19<00:01,  1.65s/it, loss=0.114, v_num=0, reduced_train_loss=0.0967, global_step=215.0, val_loss=0.0539]
Epoch 23: 100%|██████████| 13/13 [00:19<00:00,  1.53s

I0601 18:09:20.305953 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.048957034945487976



Epoch 23:  77%|███████▋  | 10/13 [00:17<00:05,  1.78s/it, loss=0.129, v_num=0, reduced_train_loss=0.146, global_step=215.0, val_loss=0.101]
Epoch 24:   8%|▊         | 1/13 [00:01<00:22,  1.86s/it, loss=0.115, v_num=0, reduced_train_loss=0.141, global_step=216.0, val_loss=0.049] 
Epoch 23:  92%|█████████▏| 12/13 [00:19<00:01,  1.65s/it, loss=0.129, v_num=0, reduced_train_loss=0.146, global_step=215.0, val_loss=0.101]
Epoch 23: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.129, v_num=0, reduced_train_loss=0.146, global_step=215.0, val_loss=0.101]2023-06-01 18:09:23,219 - root - INFO - val_loss: 0.08367011696100235
Epoch 23: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.129, v_num=0, reduced_train_loss=0.146, global_step=215.0, val_loss=0.0837]
Epoch 24:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.129, v_num=0, reduced_train_loss=0.146, global_step=215.0, val_loss=0.0837]         

I0601 18:09:23.219537 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08367011696100235


Epoch 24:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.106, v_num=0, reduced_train_loss=0.0582, global_step=224.0, val_loss=0.049] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 24:  62%|██████▏   | 8/13 [00:15<00:09,  1.89s/it, loss=0.111, v_num=0, reduced_train_loss=0.113, global_step=223.0, val_loss=0.0837]]
Epoch 24:  85%|████████▍ | 11/13 [00:18<00:03,  1.66s/it, loss=0.106, v_num=0, reduced_train_loss=0.0582, global_step=224.0, val_loss=0.049]
Epoch 24:  92%|█████████▏| 12/13 [00:19<00:01,  1.60s/it, loss=0.106, v_num=0, reduced_train_loss=0.0582, global_step=224.0, val_loss=0.049]
Epoch 24: 100%|██████████| 13/13 [00:19<00:00,  1.48s/it, loss=0.106, v_num=0, reduced_train_loss=0.0582, global_step=224.0, val_loss=0.049]2023-06-01 18:09:39,627 - root - INFO - val_loss: 0.047372132539749146
Epoch 24: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.106, v_

I0601 18:09:39.627806 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.047372132539749146


Epoch 24:  69%|██████▉   | 9/13 [00:16<00:07,  1.88s/it, loss=0.108, v_num=0, reduced_train_loss=0.0782, global_step=224.0, val_loss=0.0837]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 25:   8%|▊         | 1/13 [00:01<00:23,  1.95s/it, loss=0.103, v_num=0, reduced_train_loss=0.122, global_step=225.0, val_loss=0.0474] ]
Epoch 24:  85%|████████▍ | 11/13 [00:18<00:03,  1.72s/it, loss=0.108, v_num=0, reduced_train_loss=0.0782, global_step=224.0, val_loss=0.0837]
Epoch 24:  92%|█████████▏| 12/13 [00:19<00:01,  1.65s/it, loss=0.108, v_num=0, reduced_train_loss=0.0782, global_step=224.0, val_loss=0.0837]
Epoch 24: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.108, v_num=0, reduced_train_loss=0.0782, global_step=224.0, val_loss=0.0837]2023-06-01 18:09:43,149 - root - INFO - val_loss: 0.06828837096691132
Epoch 24: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.108,

I0601 18:09:43.149822 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06828837096691132


Epoch 25:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.089, v_num=0, reduced_train_loss=0.090, global_step=233.0, val_loss=0.0474]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 25:  62%|██████▏   | 8/13 [00:14<00:09,  1.85s/it, loss=0.1, v_num=0, reduced_train_loss=0.0598, global_step=232.0, val_loss=0.0683]  
Epoch 25:  85%|████████▍ | 11/13 [00:18<00:03,  1.70s/it, loss=0.089, v_num=0, reduced_train_loss=0.090, global_step=233.0, val_loss=0.0474]
Epoch 25:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.089, v_num=0, reduced_train_loss=0.090, global_step=233.0, val_loss=0.0474]
Epoch 25: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.089, v_num=0, reduced_train_loss=0.090, global_step=233.0, val_loss=0.0474]2023-06-01 18:09:59,349 - root - INFO - val_loss: 0.04980413615703583
Epoch 25: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.089, v_

I0601 18:09:59.349560 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04980413615703583


Epoch 25:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.104, v_num=0, reduced_train_loss=0.162, global_step=233.0, val_loss=0.0683]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 26:   8%|▊         | 1/13 [00:01<00:23,  1.92s/it, loss=0.0892, v_num=0, reduced_train_loss=0.0912, global_step=234.0, val_loss=0.0498]
Epoch 25:  85%|████████▍ | 11/13 [00:18<00:03,  1.70s/it, loss=0.104, v_num=0, reduced_train_loss=0.162, global_step=233.0, val_loss=0.0683]
Epoch 25:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.104, v_num=0, reduced_train_loss=0.162, global_step=233.0, val_loss=0.0683]
Epoch 25: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.104, v_num=0, reduced_train_loss=0.162, global_step=233.0, val_loss=0.0683]2023-06-01 18:10:02,883 - root - INFO - val_loss: 0.06904532015323639
Epoch 25: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.104, v_n

I0601 18:10:02.883841 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06904532015323639


Epoch 26:  69%|██████▉   | 9/13 [00:16<00:07,  1.87s/it, loss=0.0792, v_num=0, reduced_train_loss=0.0858, global_step=242.0, val_loss=0.0498]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 26:  62%|██████▏   | 8/13 [00:14<00:09,  1.83s/it, loss=0.097, v_num=0, reduced_train_loss=0.0436, global_step=241.0, val_loss=0.069]98]
Epoch 26:  85%|████████▍ | 11/13 [00:18<00:03,  1.71s/it, loss=0.0792, v_num=0, reduced_train_loss=0.0858, global_step=242.0, val_loss=0.0498]
Epoch 26:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.0792, v_num=0, reduced_train_loss=0.0858, global_step=242.0, val_loss=0.0498]
Epoch 26: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0792, v_num=0, reduced_train_loss=0.0858, global_step=242.0, val_loss=0.0498]2023-06-01 18:10:19,161 - root - INFO - val_loss: 0.0479462705552578
Epoch 26: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.

I0601 18:10:19.161563 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.0479462705552578


Epoch 26:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.0974, v_num=0, reduced_train_loss=0.132, global_step=242.0, val_loss=0.069]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 27:   8%|▊         | 1/13 [00:01<00:22,  1.90s/it, loss=0.0769, v_num=0, reduced_train_loss=0.0758, global_step=243.0, val_loss=0.0479]
Epoch 26:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.0974, v_num=0, reduced_train_loss=0.132, global_step=242.0, val_loss=0.069]
Epoch 26:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0974, v_num=0, reduced_train_loss=0.132, global_step=242.0, val_loss=0.069]
Epoch 26: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0974, v_num=0, reduced_train_loss=0.132, global_step=242.0, val_loss=0.069]2023-06-01 18:10:22,501 - root - INFO - val_loss: 0.08676748722791672
Epoch 26: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0974, v_

I0601 18:10:22.501736 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08676748722791672


Epoch 27:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0787, v_num=0, reduced_train_loss=0.0561, global_step=251.0, val_loss=0.0479]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 27:  62%|██████▏   | 8/13 [00:14<00:09,  1.83s/it, loss=0.0834, v_num=0, reduced_train_loss=0.0551, global_step=250.0, val_loss=0.0868]]
Epoch 27:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.0787, v_num=0, reduced_train_loss=0.0561, global_step=251.0, val_loss=0.0479]
Epoch 27:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.0787, v_num=0, reduced_train_loss=0.0561, global_step=251.0, val_loss=0.0479]
Epoch 27: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.0787, v_num=0, reduced_train_loss=0.0561, global_step=251.0, val_loss=0.0479]2023-06-01 18:10:38,734 - root - INFO - val_loss: 0.060201387852430344
Epoch 27: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=

I0601 18:10:38.734540 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.060201387852430344


Epoch 27:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.0857, v_num=0, reduced_train_loss=0.0866, global_step=251.0, val_loss=0.0868]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 28:   8%|▊         | 1/13 [00:01<00:23,  1.99s/it, loss=0.0819, v_num=0, reduced_train_loss=0.112, global_step=252.0, val_loss=0.0602] ]
Epoch 27:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.0857, v_num=0, reduced_train_loss=0.0866, global_step=251.0, val_loss=0.0868]
Epoch 27:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.0857, v_num=0, reduced_train_loss=0.0866, global_step=251.0, val_loss=0.0868]
Epoch 27: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.0857, v_num=0, reduced_train_loss=0.0866, global_step=251.0, val_loss=0.0868]2023-06-01 18:10:42,048 - root - INFO - val_loss: 0.06863566488027573
Epoch 27: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0

I0601 18:10:42.048174 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06863566488027573


Epoch 28:  69%|██████▉   | 9/13 [00:16<00:07,  1.88s/it, loss=0.0784, v_num=0, reduced_train_loss=0.0436, global_step=260.0, val_loss=0.0602]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 28:  62%|██████▏   | 8/13 [00:14<00:09,  1.84s/it, loss=0.08, v_num=0, reduced_train_loss=0.0942, global_step=259.0, val_loss=0.0686] 2]
Epoch 28:  85%|████████▍ | 11/13 [00:18<00:03,  1.72s/it, loss=0.0784, v_num=0, reduced_train_loss=0.0436, global_step=260.0, val_loss=0.0602]
Epoch 28:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0774, v_num=0, reduced_train_loss=0.0831, global_step=260.0, val_loss=0.0686]]
Validation: 0it [00:00, ?it/s][A
Epoch 28: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.0784, v_num=0, reduced_train_loss=0.0436, global_step=260.0, val_loss=0.0602]
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 

I0601 18:10:58.664318 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.042932722717523575



Epoch 29:   8%|▊         | 1/13 [00:01<00:21,  1.83s/it, loss=0.0778, v_num=0, reduced_train_loss=0.0278, global_step=261.0, val_loss=0.0429]]
Epoch 28:  85%|████████▍ | 11/13 [00:18<00:03,  1.70s/it, loss=0.0774, v_num=0, reduced_train_loss=0.0831, global_step=260.0, val_loss=0.0686]
Epoch 28:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.0774, v_num=0, reduced_train_loss=0.0831, global_step=260.0, val_loss=0.0686]
Epoch 28: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0774, v_num=0, reduced_train_loss=0.0831, global_step=260.0, val_loss=0.0686]2023-06-01 18:11:01,776 - root - INFO - val_loss: 0.07661425322294235
Epoch 28: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0774, v_num=0, reduced_train_loss=0.0831, global_step=260.0, val_loss=0.0766]
Epoch 29:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.0774, v_num=0, reduced_train_loss=0.0831, global_step=260.0, val_loss=0.0766]         

I0601 18:11:01.776426 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07661425322294235


Epoch 29:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.0774, v_num=0, reduced_train_loss=0.0934, global_step=269.0, val_loss=0.0429]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 29:  62%|██████▏   | 8/13 [00:14<00:09,  1.83s/it, loss=0.0807, v_num=0, reduced_train_loss=0.131, global_step=268.0, val_loss=0.0766] ]
Epoch 29:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.0774, v_num=0, reduced_train_loss=0.0934, global_step=269.0, val_loss=0.0429]
Epoch 29:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.0774, v_num=0, reduced_train_loss=0.0934, global_step=269.0, val_loss=0.0429]
Epoch 29:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.0851, v_num=0, reduced_train_loss=0.177, global_step=269.0, val_loss=0.0766]2023-06-01 18:11:18,241 - root - INFO - val_loss: 0.048060305416584015

Epoch 29: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0

I0601 18:11:18.241319 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.048060305416584015



Epoch 30:   8%|▊         | 1/13 [00:01<00:22,  1.85s/it, loss=0.0741, v_num=0, reduced_train_loss=0.0342, global_step=270.0, val_loss=0.0481]
Epoch 29:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.0851, v_num=0, reduced_train_loss=0.177, global_step=269.0, val_loss=0.0766]
Epoch 29:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0851, v_num=0, reduced_train_loss=0.177, global_step=269.0, val_loss=0.0766]
Epoch 29: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0851, v_num=0, reduced_train_loss=0.177, global_step=269.0, val_loss=0.0766]2023-06-01 18:11:21,429 - root - INFO - val_loss: 0.0540040023624897
Epoch 29: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0851, v_num=0, reduced_train_loss=0.177, global_step=269.0, val_loss=0.054] 
Epoch 30:   0%|          | 0/13 [00:00<?, ?it/s, loss=0.0851, v_num=0, reduced_train_loss=0.177, global_step=269.0, val_loss=0.054]         

I0601 18:11:21.429900 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.0540040023624897


Epoch 30:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.0794, v_num=0, reduced_train_loss=0.114, global_step=278.0, val_loss=0.0481] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 30:  62%|██████▏   | 8/13 [00:14<00:09,  1.85s/it, loss=0.0924, v_num=0, reduced_train_loss=0.069, global_step=277.0, val_loss=0.054]1]
Epoch 30:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.0794, v_num=0, reduced_train_loss=0.114, global_step=278.0, val_loss=0.0481]
Epoch 30:  92%|█████████▏| 12/13 [00:19<00:01,  1.61s/it, loss=0.0794, v_num=0, reduced_train_loss=0.114, global_step=278.0, val_loss=0.0481]
Epoch 30: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.0794, v_num=0, reduced_train_loss=0.114, global_step=278.0, val_loss=0.0481]2023-06-01 18:11:37,685 - root - INFO - val_loss: 0.03813297301530838
Epoch 30: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.079

I0601 18:11:37.685600 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.03813297301530838


Epoch 30:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.0912, v_num=0, reduced_train_loss=0.0654, global_step=278.0, val_loss=0.054]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 31:   8%|▊         | 1/13 [00:01<00:22,  1.90s/it, loss=0.0781, v_num=0, reduced_train_loss=0.0434, global_step=279.0, val_loss=0.0381]
Epoch 30:  85%|████████▍ | 11/13 [00:18<00:03,  1.70s/it, loss=0.0912, v_num=0, reduced_train_loss=0.0654, global_step=278.0, val_loss=0.054]
Epoch 30:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.0912, v_num=0, reduced_train_loss=0.0654, global_step=278.0, val_loss=0.054]
Epoch 30: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0912, v_num=0, reduced_train_loss=0.0654, global_step=278.0, val_loss=0.054]2023-06-01 18:11:41,187 - root - INFO - val_loss: 0.06673140823841095
Epoch 30: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0912

I0601 18:11:41.187021 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06673140823841095


Epoch 31:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.0739, v_num=0, reduced_train_loss=0.0325, global_step=287.0, val_loss=0.0381]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 31:  54%|█████▍    | 7/13 [00:13<00:11,  1.86s/it, loss=0.0888, v_num=0, reduced_train_loss=0.126, global_step=285.0, val_loss=0.0667] 
Epoch 31:  77%|███████▋  | 10/13 [00:17<00:05,  1.74s/it, loss=0.0739, v_num=0, reduced_train_loss=0.0325, global_step=287.0, val_loss=0.0381]
Epoch 31:  62%|██████▏   | 8/13 [00:14<00:09,  1.86s/it, loss=0.0882, v_num=0, reduced_train_loss=0.0348, global_step=286.0, val_loss=0.0667]]
Epoch 31:  92%|█████████▏| 12/13 [00:19<00:01,  1.61s/it, loss=0.0739, v_num=0, reduced_train_loss=0.0325, global_step=287.0, val_loss=0.0381]
Epoch 31: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.0739, v_num=0, reduced_train_loss=0.0325, global_step=287.0, val_loss=0.0381]2023-06-01 18:11:57,141 - root - INFO - val_loss: 0.038876

I0601 18:11:57.141323 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.0388767346739769


Epoch 31:  69%|██████▉   | 9/13 [00:16<00:07,  1.86s/it, loss=0.0875, v_num=0, reduced_train_loss=0.0822, global_step=287.0, val_loss=0.0667]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 32:   8%|▊         | 1/13 [00:01<00:22,  1.91s/it, loss=0.0762, v_num=0, reduced_train_loss=0.0778, global_step=288.0, val_loss=0.0389]]
Epoch 32:  15%|█▌        | 2/13 [00:03<00:20,  1.84s/it, loss=0.0761, v_num=0, reduced_train_loss=0.0906, global_step=289.0, val_loss=0.0389]]
Epoch 31:  92%|█████████▏| 12/13 [00:19<00:01,  1.65s/it, loss=0.0875, v_num=0, reduced_train_loss=0.0822, global_step=287.0, val_loss=0.0667]
Epoch 31: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.0875, v_num=0, reduced_train_loss=0.0822, global_step=287.0, val_loss=0.0667]2023-06-01 18:12:01,070 - root - INFO - val_loss: 0.06519195437431335
Epoch 31: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0

I0601 18:12:01.070364 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06519195437431335


Epoch 32:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.0765, v_num=0, reduced_train_loss=0.0361, global_step=296.0, val_loss=0.0389]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 32:  54%|█████▍    | 7/13 [00:12<00:11,  1.85s/it, loss=0.0818, v_num=0, reduced_train_loss=0.119, global_step=294.0, val_loss=0.0652] 
Epoch 32:  77%|███████▋  | 10/13 [00:17<00:05,  1.75s/it, loss=0.0765, v_num=0, reduced_train_loss=0.0361, global_step=296.0, val_loss=0.0389]
Epoch 32:  62%|██████▏   | 8/13 [00:14<00:09,  1.86s/it, loss=0.0816, v_num=0, reduced_train_loss=0.081, global_step=295.0, val_loss=0.0652]9]
Epoch 32:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.0765, v_num=0, reduced_train_loss=0.0361, global_step=296.0, val_loss=0.0389]
Epoch 32: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.0765, v_num=0, reduced_train_loss=0.0361, global_step=296.0, val_loss=0.0389]2023-06-01 18:12:16,687 - root - INFO - val_loss: 0.039988

I0601 18:12:16.687155 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.03998851031064987


Epoch 32:  69%|██████▉   | 9/13 [00:16<00:07,  1.86s/it, loss=0.08, v_num=0, reduced_train_loss=0.0315, global_step=296.0, val_loss=0.0652] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 33:   8%|▊         | 1/13 [00:01<00:22,  1.91s/it, loss=0.0724, v_num=0, reduced_train_loss=0.0399, global_step=297.0, val_loss=0.040]
Epoch 32:  77%|███████▋  | 10/13 [00:17<00:05,  1.78s/it, loss=0.08, v_num=0, reduced_train_loss=0.0315, global_step=296.0, val_loss=0.0652]
Epoch 33:  15%|█▌        | 2/13 [00:03<00:20,  1.86s/it, loss=0.0678, v_num=0, reduced_train_loss=0.0214, global_step=298.0, val_loss=0.040]
Epoch 32:  92%|█████████▏| 12/13 [00:19<00:01,  1.66s/it, loss=0.08, v_num=0, reduced_train_loss=0.0315, global_step=296.0, val_loss=0.0652]
Epoch 32: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.08, v_num=0, reduced_train_loss=0.0315, global_step=296.0, val_loss=0.0652]2023-06-01 18:12:21,023 - root - INFO - val_loss: 0.0751685351133346

I0601 18:12:21.023462 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07516853511333466


Epoch 33:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0737, v_num=0, reduced_train_loss=0.0592, global_step=305.0, val_loss=0.040] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 33:  54%|█████▍    | 7/13 [00:12<00:11,  1.85s/it, loss=0.0819, v_num=0, reduced_train_loss=0.0401, global_step=303.0, val_loss=0.0752]
Epoch 33:  77%|███████▋  | 10/13 [00:17<00:05,  1.76s/it, loss=0.0737, v_num=0, reduced_train_loss=0.0592, global_step=305.0, val_loss=0.040]
Epoch 33:  62%|██████▏   | 8/13 [00:14<00:09,  1.85s/it, loss=0.0857, v_num=0, reduced_train_loss=0.133, global_step=304.0, val_loss=0.0752] 
Epoch 33:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0737, v_num=0, reduced_train_loss=0.0592, global_step=305.0, val_loss=0.040]
Epoch 33: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0737, v_num=0, reduced_train_loss=0.0592, global_step=305.0, val_loss=0.040]2023-06-01 18:12:36,355 - root - INFO - val_loss: 0.0438085198

I0601 18:12:36.355930 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04380851984024048


Epoch 33:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.0855, v_num=0, reduced_train_loss=0.122, global_step=305.0, val_loss=0.0752]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 34:   8%|▊         | 1/13 [00:01<00:22,  1.86s/it, loss=0.0732, v_num=0, reduced_train_loss=0.0286, global_step=306.0, val_loss=0.0438]
Epoch 33:  77%|███████▋  | 10/13 [00:17<00:05,  1.77s/it, loss=0.0855, v_num=0, reduced_train_loss=0.122, global_step=305.0, val_loss=0.0752]
Epoch 34:  15%|█▌        | 2/13 [00:03<00:20,  1.88s/it, loss=0.0739, v_num=0, reduced_train_loss=0.0478, global_step=307.0, val_loss=0.0438]
Epoch 33:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.0855, v_num=0, reduced_train_loss=0.122, global_step=305.0, val_loss=0.0752]
Epoch 33: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0855, v_num=0, reduced_train_loss=0.122, global_step=305.0, val_loss=0.0752]2023-06-01 18:12:40,794 - root - INFO - val_loss: 0.10678802430

I0601 18:12:40.794283 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.1067880243062973


Epoch 34:  69%|██████▉   | 9/13 [00:16<00:07,  1.87s/it, loss=0.0669, v_num=0, reduced_train_loss=0.0813, global_step=314.0, val_loss=0.0438]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 34:  54%|█████▍    | 7/13 [00:12<00:11,  1.84s/it, loss=0.0818, v_num=0, reduced_train_loss=0.0615, global_step=312.0, val_loss=0.107]
Epoch 34:  77%|███████▋  | 10/13 [00:17<00:05,  1.78s/it, loss=0.0669, v_num=0, reduced_train_loss=0.0813, global_step=314.0, val_loss=0.0438]
Epoch 34:  62%|██████▏   | 8/13 [00:14<00:09,  1.84s/it, loss=0.0829, v_num=0, reduced_train_loss=0.0735, global_step=313.0, val_loss=0.107]8]
Epoch 34:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.0669, v_num=0, reduced_train_loss=0.0813, global_step=314.0, val_loss=0.0438]
Epoch 34: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0669, v_num=0, reduced_train_loss=0.0813, global_step=314.0, val_loss=0.0438]2023-06-01 18:12:56,089 - root - INFO - val_loss: 0.0300088

I0601 18:12:56.089313 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.030008822679519653


Epoch 34:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.0799, v_num=0, reduced_train_loss=0.0589, global_step=314.0, val_loss=0.107]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 35:   8%|▊         | 1/13 [00:01<00:22,  1.89s/it, loss=0.0659, v_num=0, reduced_train_loss=0.0332, global_step=315.0, val_loss=0.030]
Epoch 34:  77%|███████▋  | 10/13 [00:17<00:05,  1.75s/it, loss=0.0799, v_num=0, reduced_train_loss=0.0589, global_step=314.0, val_loss=0.107]
Epoch 35:  15%|█▌        | 2/13 [00:03<00:20,  1.85s/it, loss=0.0681, v_num=0, reduced_train_loss=0.081, global_step=316.0, val_loss=0.030] ]
Epoch 34:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0799, v_num=0, reduced_train_loss=0.0589, global_step=314.0, val_loss=0.107]
Epoch 34: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0799, v_num=0, reduced_train_loss=0.0589, global_step=314.0, val_loss=0.107]2023-06-01 18:13:00,390 - root - INFO - val_loss: 0.075810529291

I0601 18:13:00.390986 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07581052929162979


Epoch 35:  69%|██████▉   | 9/13 [00:16<00:07,  1.88s/it, loss=0.0715, v_num=0, reduced_train_loss=0.0472, global_step=323.0, val_loss=0.030]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 35:  54%|█████▍    | 7/13 [00:12<00:11,  1.84s/it, loss=0.0796, v_num=0, reduced_train_loss=0.0427, global_step=321.0, val_loss=0.0758]
Epoch 35:  77%|███████▋  | 10/13 [00:17<00:05,  1.80s/it, loss=0.0715, v_num=0, reduced_train_loss=0.0472, global_step=323.0, val_loss=0.030]
Epoch 35:  62%|██████▏   | 8/13 [00:14<00:09,  1.85s/it, loss=0.0773, v_num=0, reduced_train_loss=0.0727, global_step=322.0, val_loss=0.0758]
Epoch 35:  92%|█████████▏| 12/13 [00:19<00:01,  1.65s/it, loss=0.0715, v_num=0, reduced_train_loss=0.0472, global_step=323.0, val_loss=0.030]
Epoch 35: 100%|██████████| 13/13 [00:19<00:00,  1.53s/it, loss=0.0715, v_num=0, reduced_train_loss=0.0472, global_step=323.0, val_loss=0.030]2023-06-01 18:13:16,011 - root - INFO - val_loss: 0.0405372604

I0601 18:13:16.011679 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.040537260472774506


Epoch 35:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0778, v_num=0, reduced_train_loss=0.0502, global_step=323.0, val_loss=0.0758]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 36:   8%|▊         | 1/13 [00:01<00:22,  1.84s/it, loss=0.0654, v_num=0, reduced_train_loss=0.138, global_step=324.0, val_loss=0.0405] 
Epoch 35:  77%|███████▋  | 10/13 [00:17<00:05,  1.76s/it, loss=0.0778, v_num=0, reduced_train_loss=0.0502, global_step=323.0, val_loss=0.0758]
Epoch 36:  15%|█▌        | 2/13 [00:03<00:20,  1.86s/it, loss=0.0691, v_num=0, reduced_train_loss=0.133, global_step=325.0, val_loss=0.0405]8]
Epoch 35:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0778, v_num=0, reduced_train_loss=0.0502, global_step=323.0, val_loss=0.0758]
Epoch 35: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0778, v_num=0, reduced_train_loss=0.0502, global_step=323.0, val_loss=0.0758]2023-06-01 18:13:20,019 - root - INFO - val_loss: 0.081948

I0601 18:13:20.019932 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08194859325885773


Epoch 36:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0676, v_num=0, reduced_train_loss=0.0203, global_step=332.0, val_loss=0.0405]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 36:  54%|█████▍    | 7/13 [00:12<00:11,  1.85s/it, loss=0.08, v_num=0, reduced_train_loss=0.0589, global_step=330.0, val_loss=0.0819] 
Epoch 36:  77%|███████▋  | 10/13 [00:17<00:05,  1.76s/it, loss=0.0676, v_num=0, reduced_train_loss=0.0203, global_step=332.0, val_loss=0.0405]
Epoch 36:  62%|██████▏   | 8/13 [00:14<00:09,  1.85s/it, loss=0.0785, v_num=0, reduced_train_loss=0.0755, global_step=331.0, val_loss=0.0819]]
Epoch 36:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0676, v_num=0, reduced_train_loss=0.0203, global_step=332.0, val_loss=0.0405]
Epoch 36: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0676, v_num=0, reduced_train_loss=0.0203, global_step=332.0, val_loss=0.0405]2023-06-01 18:13:35,634 - root - INFO - val_loss: 0.0415136

I0601 18:13:35.634619 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.041513603180646896


Epoch 36:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.0773, v_num=0, reduced_train_loss=0.038, global_step=332.0, val_loss=0.0819] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 37:   8%|▊         | 1/13 [00:01<00:22,  1.88s/it, loss=0.0699, v_num=0, reduced_train_loss=0.0638, global_step=333.0, val_loss=0.0415]
Epoch 36:  77%|███████▋  | 10/13 [00:17<00:05,  1.77s/it, loss=0.0773, v_num=0, reduced_train_loss=0.038, global_step=332.0, val_loss=0.0819]
Epoch 37:  15%|█▌        | 2/13 [00:03<00:20,  1.84s/it, loss=0.0711, v_num=0, reduced_train_loss=0.105, global_step=334.0, val_loss=0.0415] 
Epoch 36:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0773, v_num=0, reduced_train_loss=0.038, global_step=332.0, val_loss=0.0819]
Epoch 36: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0773, v_num=0, reduced_train_loss=0.038, global_step=332.0, val_loss=0.0819]2023-06-01 18:13:39,710 - root - INFO - val_loss: 0.0609568208

I0601 18:13:39.710510 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06095682084560394


Epoch 37:  69%|██████▉   | 9/13 [00:16<00:07,  1.86s/it, loss=0.0694, v_num=0, reduced_train_loss=0.0426, global_step=341.0, val_loss=0.0415]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 37:  54%|█████▍    | 7/13 [00:12<00:10,  1.82s/it, loss=0.0741, v_num=0, reduced_train_loss=0.0579, global_step=339.0, val_loss=0.061]
Epoch 37:  62%|██████▏   | 8/13 [00:14<00:09,  1.82s/it, loss=0.0746, v_num=0, reduced_train_loss=0.107, global_step=340.0, val_loss=0.061] 5]
Epoch 37:  85%|████████▍ | 11/13 [00:18<00:03,  1.70s/it, loss=0.0694, v_num=0, reduced_train_loss=0.0426, global_step=341.0, val_loss=0.0415]
Epoch 37:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.0694, v_num=0, reduced_train_loss=0.0426, global_step=341.0, val_loss=0.0415]
Epoch 37: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0694, v_num=0, reduced_train_loss=0.0426, global_step=341.0, val_loss=0.0415]2023-06-01 18:13:55,351 - root - INFO - val_loss: 0.0346840

I0601 18:13:55.351548 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.034684065729379654


Epoch 37:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.0762, v_num=0, reduced_train_loss=0.0755, global_step=341.0, val_loss=0.061]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 38:   8%|▊         | 1/13 [00:01<00:22,  1.87s/it, loss=0.0665, v_num=0, reduced_train_loss=0.0531, global_step=342.0, val_loss=0.0347]
Epoch 37:  77%|███████▋  | 10/13 [00:17<00:05,  1.75s/it, loss=0.0762, v_num=0, reduced_train_loss=0.0755, global_step=341.0, val_loss=0.061]
Epoch 38:  15%|█▌        | 2/13 [00:03<00:20,  1.83s/it, loss=0.0666, v_num=0, reduced_train_loss=0.0488, global_step=343.0, val_loss=0.0347]
Epoch 37:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0762, v_num=0, reduced_train_loss=0.0755, global_step=341.0, val_loss=0.061]
Epoch 37: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0762, v_num=0, reduced_train_loss=0.0755, global_step=341.0, val_loss=0.061]2023-06-01 18:13:59,321 - root - INFO - val_loss: 0.06927379220

I0601 18:13:59.321932 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06927379220724106


Epoch 38:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.0529, v_num=0, reduced_train_loss=0.0907, global_step=350.0, val_loss=0.0347]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 38:  54%|█████▍    | 7/13 [00:13<00:11,  1.91s/it, loss=0.0727, v_num=0, reduced_train_loss=0.129, global_step=348.0, val_loss=0.0693] 
Epoch 38:  77%|███████▋  | 10/13 [00:17<00:05,  1.74s/it, loss=0.0529, v_num=0, reduced_train_loss=0.0907, global_step=350.0, val_loss=0.0347]
Epoch 38:  62%|██████▏   | 8/13 [00:15<00:09,  1.90s/it, loss=0.0706, v_num=0, reduced_train_loss=0.0833, global_step=349.0, val_loss=0.0693]]
Epoch 38:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.0529, v_num=0, reduced_train_loss=0.0907, global_step=350.0, val_loss=0.0347]
Epoch 38: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.0529, v_num=0, reduced_train_loss=0.0907, global_step=350.0, val_loss=0.0347]2023-06-01 18:14:14,897 - root - INFO - val_loss: 0.039365

I0601 18:14:14.897034 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.03936554864048958


Epoch 38:  69%|██████▉   | 9/13 [00:17<00:07,  1.89s/it, loss=0.073, v_num=0, reduced_train_loss=0.109, global_step=350.0, val_loss=0.0693]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 39:   8%|▊         | 1/13 [00:01<00:23,  1.94s/it, loss=0.052, v_num=0, reduced_train_loss=0.0136, global_step=351.0, val_loss=0.0394] 
Epoch 38:  77%|███████▋  | 10/13 [00:18<00:05,  1.80s/it, loss=0.073, v_num=0, reduced_train_loss=0.109, global_step=350.0, val_loss=0.0693]
Epoch 39:  15%|█▌        | 2/13 [00:03<00:21,  1.91s/it, loss=0.0547, v_num=0, reduced_train_loss=0.0746, global_step=352.0, val_loss=0.0394]
Epoch 38:  92%|█████████▏| 12/13 [00:19<00:01,  1.67s/it, loss=0.073, v_num=0, reduced_train_loss=0.109, global_step=350.0, val_loss=0.0693]
Epoch 38: 100%|██████████| 13/13 [00:20<00:00,  1.54s/it, loss=0.073, v_num=0, reduced_train_loss=0.109, global_step=350.0, val_loss=0.0693]2023-06-01 18:14:19,396 - root - INFO - val_loss: 0.0936936959624

I0601 18:14:19.396110 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.09369369596242905


Epoch 39:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.0535, v_num=0, reduced_train_loss=0.0834, global_step=359.0, val_loss=0.0394]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 39:  54%|█████▍    | 7/13 [00:12<00:11,  1.84s/it, loss=0.0748, v_num=0, reduced_train_loss=0.0682, global_step=357.0, val_loss=0.0937]
Epoch 39:  77%|███████▋  | 10/13 [00:17<00:05,  1.76s/it, loss=0.0535, v_num=0, reduced_train_loss=0.0834, global_step=359.0, val_loss=0.0394]
Epoch 39:  62%|██████▏   | 8/13 [00:14<00:09,  1.84s/it, loss=0.0715, v_num=0, reduced_train_loss=0.0745, global_step=358.0, val_loss=0.0937]]
Epoch 39:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0535, v_num=0, reduced_train_loss=0.0834, global_step=359.0, val_loss=0.0394]
Epoch 39: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0535, v_num=0, reduced_train_loss=0.0834, global_step=359.0, val_loss=0.0394]2023-06-01 18:14:34,532 - root - INFO - val_loss: 0.047257

I0601 18:14:34.532256 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.047257740050554276


Epoch 39:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0714, v_num=0, reduced_train_loss=0.0559, global_step=359.0, val_loss=0.0937]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 40:   8%|▊         | 1/13 [00:01<00:22,  1.86s/it, loss=0.054, v_num=0, reduced_train_loss=0.0527, global_step=360.0, val_loss=0.0473] 
Epoch 39:  77%|███████▋  | 10/13 [00:17<00:05,  1.76s/it, loss=0.0714, v_num=0, reduced_train_loss=0.0559, global_step=359.0, val_loss=0.0937]
Epoch 40:  15%|█▌        | 2/13 [00:03<00:20,  1.86s/it, loss=0.0533, v_num=0, reduced_train_loss=0.0273, global_step=361.0, val_loss=0.0473]]
Epoch 39:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0714, v_num=0, reduced_train_loss=0.0559, global_step=359.0, val_loss=0.0937]
Epoch 39: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0714, v_num=0, reduced_train_loss=0.0559, global_step=359.0, val_loss=0.0937]2023-06-01 18:14:39,029 - root - INFO - val_loss: 0.084085

I0601 18:14:39.029793 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08408527076244354


Epoch 40:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.056, v_num=0, reduced_train_loss=0.0815, global_step=368.0, val_loss=0.0473] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 40:  54%|█████▍    | 7/13 [00:12<00:11,  1.85s/it, loss=0.064, v_num=0, reduced_train_loss=0.0315, global_step=366.0, val_loss=0.0841] 
Epoch 40:  77%|███████▋  | 10/13 [00:17<00:05,  1.77s/it, loss=0.056, v_num=0, reduced_train_loss=0.0815, global_step=368.0, val_loss=0.0473]
Epoch 40:  62%|██████▏   | 8/13 [00:14<00:09,  1.85s/it, loss=0.0642, v_num=0, reduced_train_loss=0.038, global_step=367.0, val_loss=0.0841]]
Epoch 40:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.056, v_num=0, reduced_train_loss=0.0815, global_step=368.0, val_loss=0.0473]
Epoch 40: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.056, v_num=0, reduced_train_loss=0.0815, global_step=368.0, val_loss=0.0473]2023-06-01 18:14:54,233 - root - INFO - val_loss: 0.0433248877

I0601 18:14:54.233572 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04332488775253296


Epoch 40:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.0632, v_num=0, reduced_train_loss=0.109, global_step=368.0, val_loss=0.0841]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 41:   8%|▊         | 1/13 [00:01<00:22,  1.86s/it, loss=0.0557, v_num=0, reduced_train_loss=0.0275, global_step=369.0, val_loss=0.0433]
Epoch 40:  77%|███████▋  | 10/13 [00:17<00:05,  1.77s/it, loss=0.0632, v_num=0, reduced_train_loss=0.109, global_step=368.0, val_loss=0.0841]
Epoch 41:  15%|█▌        | 2/13 [00:03<00:20,  1.84s/it, loss=0.0545, v_num=0, reduced_train_loss=0.0661, global_step=370.0, val_loss=0.0433]
Epoch 40:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.0632, v_num=0, reduced_train_loss=0.109, global_step=368.0, val_loss=0.0841]
Epoch 40: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0632, v_num=0, reduced_train_loss=0.109, global_step=368.0, val_loss=0.0841]2023-06-01 18:14:58,833 - root - INFO - val_loss: 0.06607980281

I0601 18:14:58.833198 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06607980281114578


Epoch 41:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.0544, v_num=0, reduced_train_loss=0.0153, global_step=377.0, val_loss=0.0433] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 41:  54%|█████▍    | 7/13 [00:12<00:11,  1.85s/it, loss=0.0596, v_num=0, reduced_train_loss=0.0207, global_step=375.0, val_loss=0.0661]]
Epoch 41:  62%|██████▏   | 8/13 [00:14<00:09,  1.85s/it, loss=0.0627, v_num=0, reduced_train_loss=0.110, global_step=376.0, val_loss=0.0661] ]
Epoch 41:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.0544, v_num=0, reduced_train_loss=0.0153, global_step=377.0, val_loss=0.0433]
Epoch 41: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.0544, v_num=0, reduced_train_loss=0.0153, global_step=377.0, val_loss=0.0433]2023-06-01 18:15:13,729 - root - INFO - val_loss: 0.04410412162542343
Epoch 41: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=

I0601 18:15:13.729714 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04410412162542343


Epoch 41:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.061, v_num=0, reduced_train_loss=0.0339, global_step=377.0, val_loss=0.0661]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 42:   8%|▊         | 1/13 [00:01<00:22,  1.87s/it, loss=0.0509, v_num=0, reduced_train_loss=0.0284, global_step=378.0, val_loss=0.0441]
Epoch 42:  15%|█▌        | 2/13 [00:03<00:20,  1.85s/it, loss=0.0478, v_num=0, reduced_train_loss=0.0225, global_step=379.0, val_loss=0.0441]
Epoch 41:  85%|████████▍ | 11/13 [00:18<00:03,  1.70s/it, loss=0.061, v_num=0, reduced_train_loss=0.0339, global_step=377.0, val_loss=0.0661]
Epoch 41:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.061, v_num=0, reduced_train_loss=0.0339, global_step=377.0, val_loss=0.0661]
Epoch 41: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.061, v_num=0, reduced_train_loss=0.0339, global_step=377.0, val_loss=0.0661]2023-06-01 18:15:18,561 - root - INFO - val_loss: 0.06102567911

I0601 18:15:18.561875 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06102567911148071


Epoch 42:  69%|██████▉   | 9/13 [00:16<00:07,  1.81s/it, loss=0.0494, v_num=0, reduced_train_loss=0.0509, global_step=386.0, val_loss=0.0441]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 42:  54%|█████▍    | 7/13 [00:12<00:11,  1.84s/it, loss=0.0593, v_num=0, reduced_train_loss=0.0613, global_step=384.0, val_loss=0.061]1]
Epoch 42:  85%|████████▍ | 11/13 [00:18<00:03,  1.66s/it, loss=0.0494, v_num=0, reduced_train_loss=0.0509, global_step=386.0, val_loss=0.0441]
Epoch 42:  92%|█████████▏| 12/13 [00:19<00:01,  1.60s/it, loss=0.0494, v_num=0, reduced_train_loss=0.0509, global_step=386.0, val_loss=0.0441]
Epoch 42: 100%|██████████| 13/13 [00:19<00:00,  1.48s/it, loss=0.0494, v_num=0, reduced_train_loss=0.0509, global_step=386.0, val_loss=0.0441]2023-06-01 18:15:33,033 - root - INFO - val_loss: 0.04054192453622818
Epoch 42: 100%|██████████| 13/13 [00:19<00:00,  1.48s/it, loss=0

I0601 18:15:33.033331 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04054192453622818


Epoch 42:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0583, v_num=0, reduced_train_loss=0.0547, global_step=386.0, val_loss=0.061]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 43:  15%|█▌        | 2/13 [00:03<00:20,  1.83s/it, loss=0.0497, v_num=0, reduced_train_loss=0.0811, global_step=388.0, val_loss=0.0405]
Epoch 42:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.0583, v_num=0, reduced_train_loss=0.0547, global_step=386.0, val_loss=0.061]
Epoch 42:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0583, v_num=0, reduced_train_loss=0.0547, global_step=386.0, val_loss=0.061]
Epoch 42: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0583, v_num=0, reduced_train_loss=0.0547, global_step=386.0, val_loss=0.061]2023-06-01 18:15:38,232 - root - INFO - val_loss: 0.08367002010345459
Epoch 42: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.058

I0601 18:15:38.232507 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08367002010345459


Epoch 43:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0437, v_num=0, reduced_train_loss=0.0472, global_step=395.0, val_loss=0.0405] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 43:  54%|█████▍    | 7/13 [00:12<00:11,  1.84s/it, loss=0.0515, v_num=0, reduced_train_loss=0.011, global_step=393.0, val_loss=0.0837] ]
Epoch 43:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.0437, v_num=0, reduced_train_loss=0.0472, global_step=395.0, val_loss=0.0405]
Epoch 43:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0437, v_num=0, reduced_train_loss=0.0472, global_step=395.0, val_loss=0.0405]
Epoch 43: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0437, v_num=0, reduced_train_loss=0.0472, global_step=395.0, val_loss=0.0405]2023-06-01 18:15:52,651 - root - INFO - val_loss: 0.04500887542963028
Epoch 43: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=

I0601 18:15:52.651866 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04500887542963028


Epoch 43:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.05, v_num=0, reduced_train_loss=0.0398, global_step=395.0, val_loss=0.0837] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 44:  15%|█▌        | 2/13 [00:03<00:19,  1.79s/it, loss=0.0462, v_num=0, reduced_train_loss=0.0555, global_step=397.0, val_loss=0.045]
Epoch 43:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.05, v_num=0, reduced_train_loss=0.0398, global_step=395.0, val_loss=0.0837]
Epoch 43:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.05, v_num=0, reduced_train_loss=0.0398, global_step=395.0, val_loss=0.0837]
Epoch 43: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.05, v_num=0, reduced_train_loss=0.0398, global_step=395.0, val_loss=0.0837]2023-06-01 18:15:57,804 - root - INFO - val_loss: 0.08908981084823608
Epoch 43: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.05, v_nu

I0601 18:15:57.804195 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08908981084823608


Epoch 44:  69%|██████▉   | 9/13 [00:16<00:07,  1.82s/it, loss=0.0469, v_num=0, reduced_train_loss=0.0408, global_step=404.0, val_loss=0.045]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 44:  54%|█████▍    | 7/13 [00:12<00:11,  1.85s/it, loss=0.0491, v_num=0, reduced_train_loss=0.0444, global_step=402.0, val_loss=0.0891]
Epoch 44:  85%|████████▍ | 11/13 [00:18<00:03,  1.67s/it, loss=0.0469, v_num=0, reduced_train_loss=0.0408, global_step=404.0, val_loss=0.045]
Epoch 44:  92%|█████████▏| 12/13 [00:19<00:01,  1.61s/it, loss=0.0469, v_num=0, reduced_train_loss=0.0408, global_step=404.0, val_loss=0.045]
Epoch 44: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.0469, v_num=0, reduced_train_loss=0.0408, global_step=404.0, val_loss=0.045]2023-06-01 18:16:12,005 - root - INFO - val_loss: 0.04859080910682678
Epoch 44: 100%|██████████| 13/13 [00:19<00:00,  1.49s/it, loss=0.046

I0601 18:16:12.005881 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04859080910682678


Epoch 44:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0494, v_num=0, reduced_train_loss=0.0134, global_step=404.0, val_loss=0.0891]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 45:  15%|█▌        | 2/13 [00:03<00:20,  1.85s/it, loss=0.0459, v_num=0, reduced_train_loss=0.0184, global_step=406.0, val_loss=0.0486]]
Epoch 44:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.0494, v_num=0, reduced_train_loss=0.0134, global_step=404.0, val_loss=0.0891]
Epoch 44:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0494, v_num=0, reduced_train_loss=0.0134, global_step=404.0, val_loss=0.0891]
Epoch 44: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0494, v_num=0, reduced_train_loss=0.0134, global_step=404.0, val_loss=0.0891]2023-06-01 18:16:17,503 - root - INFO - val_loss: 0.08786755800247192
Epoch 44: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0

I0601 18:16:17.503418 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08786755800247192


Epoch 45:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.0513, v_num=0, reduced_train_loss=0.0511, global_step=413.0, val_loss=0.0486]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 45:  54%|█████▍    | 7/13 [00:12<00:11,  1.83s/it, loss=0.0493, v_num=0, reduced_train_loss=0.0886, global_step=411.0, val_loss=0.0879]]
Epoch 45:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.0513, v_num=0, reduced_train_loss=0.0511, global_step=413.0, val_loss=0.0486]
Epoch 45:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.0513, v_num=0, reduced_train_loss=0.0511, global_step=413.0, val_loss=0.0486]
Epoch 45: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0513, v_num=0, reduced_train_loss=0.0511, global_step=413.0, val_loss=0.0486]2023-06-01 18:16:31,734 - root - INFO - val_loss: 0.03960010036826134
Epoch 45: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0

I0601 18:16:31.734048 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.03960010036826134


Epoch 45:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0486, v_num=0, reduced_train_loss=0.0344, global_step=413.0, val_loss=0.0879]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 46:  15%|█▌        | 2/13 [00:03<00:20,  1.86s/it, loss=0.0494, v_num=0, reduced_train_loss=0.0452, global_step=415.0, val_loss=0.0396]]
Epoch 45:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.0486, v_num=0, reduced_train_loss=0.0344, global_step=413.0, val_loss=0.0879]
Epoch 45:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0486, v_num=0, reduced_train_loss=0.0344, global_step=413.0, val_loss=0.0879]
Epoch 45: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0486, v_num=0, reduced_train_loss=0.0344, global_step=413.0, val_loss=0.0879]2023-06-01 18:16:37,168 - root - INFO - val_loss: 0.06906101107597351
Epoch 45: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0

I0601 18:16:37.168536 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.06906101107597351


Epoch 46:  69%|██████▉   | 9/13 [00:16<00:07,  1.86s/it, loss=0.0601, v_num=0, reduced_train_loss=0.130, global_step=422.0, val_loss=0.0396]  
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 46:  54%|█████▍    | 7/13 [00:12<00:11,  1.84s/it, loss=0.0473, v_num=0, reduced_train_loss=0.0119, global_step=420.0, val_loss=0.0691]
Epoch 46:  85%|████████▍ | 11/13 [00:18<00:03,  1.71s/it, loss=0.0601, v_num=0, reduced_train_loss=0.130, global_step=422.0, val_loss=0.0396]
Epoch 46:  92%|█████████▏| 12/13 [00:19<00:01,  1.64s/it, loss=0.0601, v_num=0, reduced_train_loss=0.130, global_step=422.0, val_loss=0.0396]
Epoch 46: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.0601, v_num=0, reduced_train_loss=0.130, global_step=422.0, val_loss=0.0396]2023-06-01 18:16:51,538 - root - INFO - val_loss: 0.04582426697015762
Epoch 46: 100%|██████████| 13/13 [00:19<00:00,  1.52s/it, loss=0.06

I0601 18:16:51.538048 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04582426697015762


Epoch 46:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0467, v_num=0, reduced_train_loss=0.0376, global_step=422.0, val_loss=0.0691]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 47:  15%|█▌        | 2/13 [00:03<00:20,  1.85s/it, loss=0.0581, v_num=0, reduced_train_loss=0.0464, global_step=424.0, val_loss=0.0458]]
Epoch 46:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.0467, v_num=0, reduced_train_loss=0.0376, global_step=422.0, val_loss=0.0691]
Epoch 46:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0467, v_num=0, reduced_train_loss=0.0376, global_step=422.0, val_loss=0.0691]
Epoch 46: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0467, v_num=0, reduced_train_loss=0.0376, global_step=422.0, val_loss=0.0691]2023-06-01 18:16:56,762 - root - INFO - val_loss: 0.07064922153949738
Epoch 46: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0

I0601 18:16:56.762210 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07064922153949738


Epoch 47:  69%|██████▉   | 9/13 [00:16<00:07,  1.83s/it, loss=0.057, v_num=0, reduced_train_loss=0.0961, global_step=431.0, val_loss=0.0458] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 47:  54%|█████▍    | 7/13 [00:12<00:11,  1.84s/it, loss=0.0442, v_num=0, reduced_train_loss=0.0344, global_step=429.0, val_loss=0.0706]
Epoch 47:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.057, v_num=0, reduced_train_loss=0.0961, global_step=431.0, val_loss=0.0458]
Epoch 47:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.057, v_num=0, reduced_train_loss=0.0961, global_step=431.0, val_loss=0.0458]
Epoch 47: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.057, v_num=0, reduced_train_loss=0.0961, global_step=431.0, val_loss=0.0458]2023-06-01 18:17:11,023 - root - INFO - val_loss: 0.03067917563021183
Epoch 47: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.057

I0601 18:17:11.023819 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.03067917563021183


Epoch 47:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0437, v_num=0, reduced_train_loss=0.0459, global_step=431.0, val_loss=0.0706]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 48:  15%|█▌        | 2/13 [00:03<00:20,  1.85s/it, loss=0.0538, v_num=0, reduced_train_loss=0.0301, global_step=433.0, val_loss=0.0307] 
Epoch 47:  85%|████████▍ | 11/13 [00:18<00:03,  1.68s/it, loss=0.0437, v_num=0, reduced_train_loss=0.0459, global_step=431.0, val_loss=0.0706]
Epoch 47:  92%|█████████▏| 12/13 [00:19<00:01,  1.62s/it, loss=0.0437, v_num=0, reduced_train_loss=0.0459, global_step=431.0, val_loss=0.0706]
Epoch 47: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=0.0437, v_num=0, reduced_train_loss=0.0459, global_step=431.0, val_loss=0.0706]2023-06-01 18:17:16,277 - root - INFO - val_loss: 0.09230723232030869
Epoch 47: 100%|██████████| 13/13 [00:19<00:00,  1.50s/it, loss=

I0601 18:17:16.277606 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.09230723232030869


Epoch 48:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0516, v_num=0, reduced_train_loss=0.0407, global_step=440.0, val_loss=0.0307]]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 48:  54%|█████▍    | 7/13 [00:12<00:11,  1.85s/it, loss=0.0397, v_num=0, reduced_train_loss=0.0486, global_step=438.0, val_loss=0.0923] 
Epoch 48:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.0516, v_num=0, reduced_train_loss=0.0407, global_step=440.0, val_loss=0.0307]
Epoch 48:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0516, v_num=0, reduced_train_loss=0.0407, global_step=440.0, val_loss=0.0307]
Epoch 48: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0516, v_num=0, reduced_train_loss=0.0407, global_step=440.0, val_loss=0.0307]2023-06-01 18:17:30,628 - root - INFO - val_loss: 0.04659083113074303
Epoch 48: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=

I0601 18:17:30.628400 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.04659083113074303


Epoch 48:  69%|██████▉   | 9/13 [00:16<00:07,  1.84s/it, loss=0.0414, v_num=0, reduced_train_loss=0.0252, global_step=440.0, val_loss=0.0923]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 49:  15%|█▌        | 2/13 [00:03<00:20,  1.88s/it, loss=0.0456, v_num=0, reduced_train_loss=0.0392, global_step=442.0, val_loss=0.0466]]
Epoch 48:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.0414, v_num=0, reduced_train_loss=0.0252, global_step=440.0, val_loss=0.0923]
Epoch 48:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.0414, v_num=0, reduced_train_loss=0.0252, global_step=440.0, val_loss=0.0923]
Epoch 48: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.0414, v_num=0, reduced_train_loss=0.0252, global_step=440.0, val_loss=0.0923]2023-06-01 18:17:35,933 - root - INFO - val_loss: 0.07195349037647247
Epoch 48: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0

I0601 18:17:35.933097 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.07195349037647247


Epoch 49:  69%|██████▉   | 9/13 [00:17<00:07,  1.91s/it, loss=0.0458, v_num=0, reduced_train_loss=0.0556, global_step=449.0, val_loss=0.0466]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 49:  54%|█████▍    | 7/13 [00:12<00:11,  1.84s/it, loss=0.041, v_num=0, reduced_train_loss=0.049, global_step=447.0, val_loss=0.072]  6]
Epoch 49:  85%|████████▍ | 11/13 [00:19<00:03,  1.73s/it, loss=0.0458, v_num=0, reduced_train_loss=0.0556, global_step=449.0, val_loss=0.0466]
Epoch 49:  92%|█████████▏| 12/13 [00:20<00:01,  1.67s/it, loss=0.0458, v_num=0, reduced_train_loss=0.0556, global_step=449.0, val_loss=0.0466]
Epoch 49: 100%|██████████| 13/13 [00:20<00:00,  1.55s/it, loss=0.0458, v_num=0, reduced_train_loss=0.0556, global_step=449.0, val_loss=0.0466]2023-06-01 18:17:50,733 - root - INFO - val_loss: 0.05934610217809677
Epoch 49: 100%|██████████| 13/13 [00:20<00:00,  1.55s/it, loss=0

I0601 18:17:50.733752 140659256891200 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.05934610217809677
I0601 18:17:50.740092 140659256891200 fit_loop.py:175] `Trainer.fit` stopped: `max_epochs=50` reached.


2023-06-01 18:17:51,743 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-3, peer_run=simulate_job]: got result from client site-3 for task: name=train, id=42980685-a93e-4664-b2e2-9e89ea7f2802
2023-06-01 18:17:51,745 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-3, peer_run=simulate_job, peer_rc=OK, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: ignored result submission since server runner's status is done
2023-06-01 18:17:51,747 - SubmitUpdateCommand - INFO - submit_update process. client_name:site-3   task_id:42980685-a93e-4664-b2e2-9e89ea7f2802

2023-06-01 18:17:51,627 - PromptLearner - INFO - [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: Computed 7 weight differences for global model of length 7
2023-06-01 18:17:51,628 - PromptLearner - INFO - [identity=site

I0601 18:17:51.627754 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: Computed 7 weight differences for global model of length 7
I0601 18:17:51.628289 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: Local steps per epoch: 9
I0601 18:17:51.628584 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: Local epochs finished. Returning shareable
I0601 18:17:51.629480 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=42980685-a93e-4664-b2e2-9e89ea7f2802]: finished processing task
I0601 18:17:51.633421 140656

Epoch 49:  69%|██████▉   | 9/13 [00:16<00:07,  1.85s/it, loss=0.042, v_num=0, reduced_train_loss=0.033, global_step=449.0, val_loss=0.072] 
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A2023-06-01 18:17:53,759 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-3, peer_run=simulate_job]: server runner is finalizing - asked client to end the run

Epoch 49:  77%|███████▋  | 10/13 [00:17<00:05,  1.76s/it, loss=0.042, v_num=0, reduced_train_loss=0.033, global_step=449.0, val_loss=0.072]2023-06-01 18:17:53,777 - GetTaskCommand - INFO - return task to client.  client_name: site-3  task_name: __end_run__   task_id:   sharable_header_task_id: 
2023-06-01 18:17:53,782 - FederatedClient - INFO - pull_task completed. Task name:__end_run__ Status:True 
2023-06-01 18:17:53,782 - ClientRunner - INFO - [identity=site-3, run=simulate_job, peer=sim

I0601 18:17:53.782506 140659256891200 fed_client.py:91] pull_task completed. Task name:__end_run__ Status:True 
I0601 18:17:53.782876 140659256891200 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: server asked to end the run
I0601 18:17:53.783044 140659256891200 simulator_worker.py:102] End the Simulator run.
I0601 18:17:53.783630 140659256891200 simulator_worker.py:125] Clean up ClientRunner for : site-3 



Epoch 49:  85%|████████▍ | 11/13 [00:18<00:03,  1.69s/it, loss=0.042, v_num=0, reduced_train_loss=0.033, global_step=449.0, val_loss=0.072]
Epoch 49:  92%|█████████▏| 12/13 [00:19<00:01,  1.63s/it, loss=0.042, v_num=0, reduced_train_loss=0.033, global_step=449.0, val_loss=0.072]
Epoch 49: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.042, v_num=0, reduced_train_loss=0.033, global_step=449.0, val_loss=0.072]2023-06-01 18:17:55,525 - root - INFO - val_loss: 0.08060461282730103
Epoch 49: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.042, v_num=0, reduced_train_loss=0.033, global_step=449.0, val_loss=0.0806]
Epoch 49: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.042, v_num=0, reduced_train_loss=0.033, global_step=449.0, val_loss=0.0806]2023-06-01 18:17:55,532 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer.fit` stopped: `max_epochs=50` reached.
Epoch 49: 100%|██████████| 13/13 [00:19<00:00,  1.51s/it, loss=0.042, v_num=0, reduced_train_loss=0.033, glo

I0601 18:17:55.525470 140713251325760 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 0.08060461282730103
I0601 18:17:55.532335 140713251325760 fit_loop.py:175] `Trainer.fit` stopped: `max_epochs=50` reached.


2023-06-01 18:17:56,510 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-2, peer_run=simulate_job]: got result from client site-2 for task: name=train, id=e277103e-8b4a-4a10-9202-cf52a937c773
2023-06-01 18:17:56,513 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-2, peer_run=simulate_job, peer_rc=OK, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: ignored result submission since server runner's status is done
2023-06-01 18:17:56,515 - SubmitUpdateCommand - INFO - submit_update process. client_name:site-2   task_id:e277103e-8b4a-4a10-9202-cf52a937c773


I0601 18:17:56.414454 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Computed 7 weight differences for global model of length 7
I0601 18:17:56.414947 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Local steps per epoch: 9
I0601 18:17:56.415169 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Local epochs finished. Returning shareable
I0601 18:17:56.416029 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: finished processing task
I0601 18:17:56.419102 140709


2023-06-01 18:17:56,414 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Computed 7 weight differences for global model of length 7
2023-06-01 18:17:56,414 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Local steps per epoch: 9
2023-06-01 18:17:56,415 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: Local epochs finished. Returning shareable
2023-06-01 18:17:56,416 - ClientRunner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=e277103e-8b4a-4a10-9202-cf52a937c773]: finished processing task
2023-06-01 18:17:56,419 - FederatedClient - INFO - Starting to push 

I0601 18:17:58.548664 140713251325760 fed_client.py:91] pull_task completed. Task name:__end_run__ Status:True 
I0601 18:17:58.549079 140713251325760 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: server asked to end the run
I0601 18:17:58.549272 140713251325760 simulator_worker.py:102] End the Simulator run.
I0601 18:17:58.549880 140713251325760 simulator_worker.py:125] Clean up ClientRunner for : site-2 


2023-06-01 18:18:02,196 - MPM - INFO - MPM: Good Bye!
Simulator finished with run_status 0


#### 2. Federated P-Tuning
We use the [FedAvg](https://arxiv.org/abs/1602.05629) algorithm to p-tune the model in a federated scenario. First, create and modify the configuration files again. 
This time, we increase the number of FL rounds and decrease the number of local epochs per round to match the federated scenario.

In [8]:
!python3 create_configs.py --job_folder "jobs/gpt_p-tuning_fedavg_345M" --num_clients 3 --aggregation_epochs 1 --num_rounds 50

Created configs for 3 clients and set ROOT_DIR to /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning


Next, simulate the federated p-tuning using FedAvg. Here, each client p-tunes for one local epoch before sending their local model updates to the server for aggregation. This is repeated for 50 FL rounds.

In [9]:
from nvflare import SimulatorRunner    

simulator = SimulatorRunner(
    job_folder="jobs/gpt_p-tuning_fedavg_345M",
    workspace="/tmp/nvflare/nemo/gpt_p-tuning_fedavg_345M",
    n_clients=3,
    threads=3
)
run_status = simulator.run()
print("Simulator finished with run_status", run_status)

2023-06-01 18:18:03,497 - SimulatorRunner - INFO - Create the Simulator Server.
2023-06-01 18:18:03,504 - Cell - INFO - server: creating listener on tcp://0:55255
2023-06-01 18:18:03,506 - Cell - INFO - server: created backbone external listener for tcp://0:55255
2023-06-01 18:18:03,507 - ConnectorManager - INFO - 6413: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2023-06-01 18:18:03,508 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:4457] is starting
2023-06-01 18:18:04,011 - Cell - INFO - server: created backbone internal listener for tcp://localhost:4457
2023-06-01 18:18:04,017 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 PASSIVE tcp://0:55255] is starting
2023-06-01 18:18:04,215 - nvflare.fuel.hci.server.hci - INFO - Starting Admin Server localhost on Port 59359
2023-06-01 18:18:04,217 - SimulatorRunner - INFO - Deploy the Apps.
2023-06-01 18:18:04,229 - SimulatorRunner - INFO - Create the simulate c

[NeMo W 2023-06-01 18:18:23 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-06-01 18:18:23 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-06-01 18:18:23 experimental:27] Module <class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2023-06-01 18:18:23 experimental:27] Module <class 'nemo.collections.nlp.models.text_normalization_as_tagging.thutmose_tagger.ThutmoseTaggerModel'> is experimental, not ready for production and is not fully

2023-06-01 18:18:25,109 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-2, peer_run=simulate_job, task_name=share_config, task_id=6836e3dd-68ed-4928-b9f4-b1b659918590]: assigned task to client site-2: name=share_config, id=6836e3dd-68ed-4928-b9f4-b1b659918590
2023-06-01 18:18:25,111 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-2, peer_run=simulate_job, task_name=share_config, task_id=6836e3dd-68ed-4928-b9f4-b1b659918590]: sent task assignment to client. client_name:site-2 task_id:6836e3dd-68ed-4928-b9f4-b1b659918590
2023-06-01 18:18:25,115 - GetTaskCommand - INFO - return task to client.  client_name: site-2  task_name: share_config   task_id: 6836e3dd-68ed-4928-b9f4-b1b659918590  sharable_header_task_id: 6836e3dd-68ed-4928-b9f4-b1b659918590
2023-06-01 18:18:25,152 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-2, peer_run=simulate_job]:

[NeMo W 2023-06-01 18:18:25 experimental:27] Module <class 'nemo.collections.asr.modules.audio_modules.SpectrogramToMultichannelFeatures'> is experimental, not ready for production and is not fully supported. Use at your own risk.
I0601 18:18:25.103726 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job]: Initializing the Learner...
I0601 18:18:25.104350 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job]: Running with distributed environment: LOCAL_RANK: 0, RANK: 0, WORLD_SIZE 1, MASTER_ADDR: localhost, and MASTER_PORT: 40037
I0601 18:18:25.104617 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job]: client runner started
I0601 18:18:25.104791 139825093494592 simulator_worker.py:85] Initialize ClientRunner for client: site-2
I0601 18:18:25.117143 139823787386624 communicator.py:200] Received from simulator_server server  (3492 Bytes). getTask: share_config time: 0.009407520294189453 seconds
I0601 18:18:25.121602 139825093494

2023-06-01 18:18:25,231 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-1, peer_run=simulate_job, task_name=share_config, task_id=f4db6818-240b-4d93-94ba-19ec7015715a]: assigned task to client site-1: name=share_config, id=f4db6818-240b-4d93-94ba-19ec7015715a
2023-06-01 18:18:25,233 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-1, peer_run=simulate_job, task_name=share_config, task_id=f4db6818-240b-4d93-94ba-19ec7015715a]: sent task assignment to client. client_name:site-1 task_id:f4db6818-240b-4d93-94ba-19ec7015715a
2023-06-01 18:18:25,234 - GetTaskCommand - INFO - return task to client.  client_name: site-1  task_name: share_config   task_id: f4db6818-240b-4d93-94ba-19ec7015715a  sharable_header_task_id: f4db6818-240b-4d93-94ba-19ec7015715a
2023-06-01 18:18:25,271 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config, peer=site-1, peer_run=simulate_job]:

I0601 18:18:25.226405 140049889785664 fl_component.py:134] [identity=site-1, run=simulate_job]: Initializing the Learner...
I0601 18:18:25.227121 140049889785664 fl_component.py:134] [identity=site-1, run=simulate_job]: Running with distributed environment: LOCAL_RANK: 0, RANK: 0, WORLD_SIZE 1, MASTER_ADDR: localhost, and MASTER_PORT: 44059
I0601 18:18:25.227404 140049889785664 fl_component.py:134] [identity=site-1, run=simulate_job]: client runner started
I0601 18:18:25.227586 140049889785664 simulator_worker.py:85] Initialize ClientRunner for client: site-1
I0601 18:18:25.236530 140048717907712 communicator.py:200] Received from simulator_server server  (3492 Bytes). getTask: share_config time: 0.00652313232421875 seconds
I0601 18:18:25.240642 140049889785664 fed_client.py:91] pull_task completed. Task name:share_config Status:True 
I0601 18:18:25.240924 140049889785664 fl_component.py:134] [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: got task as

2023-06-01 18:18:25,512 - ShareConfig - INFO - [identity=simulator_server, run=simulate_job, wf=share_config]: task share_config exit with status TaskCompletionStatus.OK
NEMO version 1.17.0
2023-06-01 18:18:25,371 - PromptLearner - INFO - [identity=site-3, run=simulate_job]: Initializing the Learner...
2023-06-01 18:18:25,372 - PromptLearner - INFO - [identity=site-3, run=simulate_job]: Running with distributed environment: LOCAL_RANK: 0, RANK: 0, WORLD_SIZE 1, MASTER_ADDR: localhost, and MASTER_PORT: 44417
2023-06-01 18:18:25,372 - ClientRunner - INFO - [identity=site-3, run=simulate_job]: client runner started
2023-06-01 18:18:25,372 - ClientTaskWorker - INFO - Initialize ClientRunner for client: site-3
2023-06-01 18:18:25,384 - Communicator - INFO - Received from simulator_server server  (3492 Bytes). getTask: share_config time: 0.007061958312988281 seconds
2023-06-01 18:18:25,389 - FederatedClient - INFO - pull_task completed. Task name:share_config Status:True 
2023-06-01 18:18:25

I0601 18:18:25.427738 140410198423296 communicator.py:268]  SubmitUpdate size: 477 Bytes. time: 0.007112264633178711 seconds
I0601 18:18:25.428703 140412718274368 fl_component.py:134] [identity=site-3, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=share_config, task_id=7947adaf-5f47-40a8-b831-44172351602a]: result sent to server for task: name=share_config, id=7947adaf-5f47-40a8-b831-44172351602a
I0601 18:18:25.429044 140412718274368 simulator_worker.py:94] Finished one task run for client: site-3 interval: 2 task_processed: True


2023-06-01 18:18:25,714 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config]: starting workflow scatter_and_gather (<class 'nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather'>) ...
2023-06-01 18:18:25,716 - ScatterAndGather - INFO - [identity=simulator_server, run=simulate_job, wf=share_config]: Initializing ScatterAndGather workflow.
2023-06-01 18:18:25,719 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=share_config]: Workflow scatter_and_gather (<class 'nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather'>) started
2023-06-01 18:18:25,720 - ScatterAndGather - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: Beginning ScatterAndGather training phase.
2023-06-01 18:18:25,722 - ScatterAndGather - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: Round 0 started.
2023-06-01 18:18:25,723 - ScatterAndGather - INFO - [identity=simulator_server, run=s

I0601 18:18:27.305884 139823398516480 communicator.py:200] Received from simulator_server server  (16873468 Bytes). getTask: train time: 0.11535334587097168 seconds
I0601 18:18:27.307244 139825093494592 fed_client.py:91] pull_task completed. Task name:train Status:True 
I0601 18:18:27.307551 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: got task assignment: name=train, id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff
I0601 18:18:27.308239 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: invoking task executor <class 'nemo_nvflare.learner_executor.NemoLearnerExecutor'>
I0601 18:18:27.308483 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Client trainer got ta

2023-06-01 18:18:27,732 - ScatterAndGather - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: Abort signal received. Exiting at round 0.
2023-06-01 18:18:27,734 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: Workflow: scatter_and_gather finalizing ...
[NeMo I 2023-06-01 18:18:27 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-06-01 18:18:27 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:18:27 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-06-01 18:18:27 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-06-01 18:18:27 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:18:27 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-06-01 18:18:27 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:18:27 megatron_init:253] Rank 0 has tensor model pa

23-06-01 18:18:27 - PID:6522 - rank:(0, 0, 0, 0) - microbatches.py:39 - INFO - setting number of micro-batches to constant 16


2023-06-01 18:18:28,224 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: ABOUT_TO_END_RUN fired
2023-06-01 18:18:28,226 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: END_RUN fired
2023-06-01 18:18:28,228 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: Server runner finished.
2023-06-01 18:18:29,437 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-1, peer_run=simulate_job]: server runner is finalizing - asked client to end the run
2023-06-01 18:18:29,451 - GetTaskCommand - INFO - return task to client.  client_name: site-1  task_name: __end_run__   task_id:   sharable_header_task_id: 
2023-06-01 18:18:29,509 - SimulatorServer - INFO - Server app stopped.


2023-06-01 18:18:29,456 - FederatedClient - INFO - pull_task completed. Task name:__end_run__ Status:True 
2023-06-01 18:18:29,456 - ClientRunn

I0601 18:18:29.456576 140049889785664 fed_client.py:91] pull_task completed. Task name:__end_run__ Status:True 
I0601 18:18:29.456952 140049889785664 fl_component.py:134] [identity=site-1, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: server asked to end the run
I0601 18:18:29.457109 140049889785664 simulator_worker.py:102] End the Simulator run.
I0601 18:18:29.457633 140049889785664 simulator_worker.py:125] Clean up ClientRunner for : site-1 


2023-06-01 18:18:29,717 - nvflare.fuel.hci.server.hci - INFO - Admin Server localhost on Port 59359 shutdown!
2023-06-01 18:18:29,720 - SimulatorServer - INFO - shutting down server
2023-06-01 18:18:29,722 - SimulatorServer - INFO - canceling sync locks
2023-06-01 18:18:29,724 - SimulatorServer - INFO - server off
[NeMo I 2023-06-01 18:18:41 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-06-01 18:18:41 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:18:41 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-06-01 18:18:41 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-06-01 18:18:41 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:18:41 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-06-01 18:18:41 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:18:41 megatron_init:253] Rank 0 has tensor model parallel rank

[NeMo W 2023-06-01 18:18:41 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-06-01 18:18:41 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.
[NeMo I 2023-06-01 18:18:43 nlp_overrides:374] Model MegatronGPTModel was successfully restored from /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning/megatron_gpt_345m.nemo.
[NeMo I 2023-06-01 18:18:43 auto_tokenizer:172] 10 special tokens added, resize your model accordingly.


Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-06-01 18:18:59 megatron_init:225] Rank 0 has data parallel group: [0]
[NeMo I 2023-06-01 18:18:59 megatron_init:228] All data parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:18:59 megatron_init:229] Ranks 0 has data parallel rank: 0
[NeMo I 2023-06-01 18:18:59 megatron_init:237] Rank 0 has model parallel group: [0]
[NeMo I 2023-06-01 18:18:59 megatron_init:238] All model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:18:59 megatron_init:248] Rank 0 has tensor model parallel group: [0]
[NeMo I 2023-06-01 18:18:59 megatron_init:252] All tensor model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:18:59 megatron_init:253] Rank 0 has tensor model parallel rank: 0
[NeMo I 2023-06-01 18:18:59 megatron_init:267] Rank 0 has pipeline model parallel group: [0]
[NeMo I 2023-06-01 18:18:59 megatron_init:279] Rank 0 has embedding group: [0]
[NeMo I 2023-06-01 18:18:59 megatron_init:285] All pipeline model parallel group ranks: [[0]]
[NeMo I 2023-06-01 18:18:59 megatron_init:286]

[NeMo W 2023-06-01 18:18:59 modelPT:245] You tried to register an artifact under config key=tokenizer.vocab_file but an artifact for it has already been registered.
Using sep_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.


[NeMo I 2023-06-01 18:19:00 megatron_base_model:205] Padded vocab_size: 50304, original vocab_size: 50257, dummy tokens: 47.
[NeMo I 2023-06-01 18:19:01 nlp_overrides:374] Model MegatronGPTModel was successfully restored from /workspace/Code/nvflare/nemo_nvflare/integration/nemo/examples/prompt_learning/megatron_gpt_345m.nemo.
[NeMo I 2023-06-01 18:19:01 auto_tokenizer:172] 10 special tokens added, resize your model accordingly.
2023-06-01 18:19:01,541 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Initialized model <class 'nemo_nvflare.fed_megatron_gpt_prompt_learning_model.FedMegatronGPTPromptLearningModel'> and prompt encoder <class 'nemo.collections.nlp.modules.common.prompt_encoder.PromptEncoder'>
2023-06-01 18:19:01,550 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558

Using pad_token, but it is not set yet.
Using mask_token, but it is not set yet.
I0601 18:19:01.541778 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Initialized model <class 'nemo_nvflare.fed_megatron_gpt_prompt_learning_model.FedMegatronGPTPromptLearningModel'> and prompt encoder <class 'nemo.collections.nlp.modules.common.prompt_encoder.PromptEncoder'>
I0601 18:19:01.550390 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Loaded 7 of 7 weights
I0601 18:19:01.555524 139825093494592 distributed.py:244] Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
I0601 18:19:01.557950 139825093494592 distributed_c10d.py:393] Added key: store_based_barrier_key:1 to store for rank: 0
I0601 18:19:01.558239 139825093494592 distribu

[NeMo I 2023-06-01 18:19:02 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 


604it [00:00, 795.19it/s]
0it [00:00, ?it/s]

[NeMo I 2023-06-01 18:19:03 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
[NeMo I 2023-06-01 18:19:03 gpt_prompt_learning_dataset:85] Loading and tokenizing dataset ... 


226it [00:00, 854.17it/s]
I0601 18:19:03.825299 139825093494592 cuda.py:58] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


[NeMo I 2023-06-01 18:19:03 gpt_prompt_learning_dataset:196] Skipped 0 sentences, sequence length too short or too long even after truncation
2023-06-01 18:19:03,825 - pytorch_lightning.accelerators.cuda - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Validation: 0it [00:00, ?it/s]

    


Validation DataLoader 0: 100%|██████████| 4/4 [00:06<00:00,  1.51s/it]2023-06-01 18:19:10,699 - root - INFO - global_model_val_loss: 6.832405090332031
Validation DataLoader 0: 100%|██████████| 4/4 [00:06<00:00,  1.51s/it]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1m     Validate metric     [0m[1m [0m┃[1m [0m[1m      DataLoader 0       [0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│[36m [0m[36m  global_model_val_loss  [0m[36m [0m│[35m [0m[35m    6.832405090332031    [0m[35m [0m│
└───────────────────────────┴───────────────────────────┘


I0601 18:19:10.699887 139825093494592 fed_megatron_gpt_prompt_learning_model.py:99] global_model_val_loss: 6.832405090332031


2023-06-01 18:19:11,275 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Global_model global_model_val_loss: 6.832405090332031
2023-06-01 18:19:11,275 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Current/Total Round: 1/50
2023-06-01 18:19:11,276 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Client identity: site-2
2023-06-01 18:19:11,283 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Loaded 7 of 7 weights
2023-06-01 18:19:11,283 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simu

I0601 18:19:11.275437 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Global_model global_model_val_loss: 6.832405090332031
I0601 18:19:11.275945 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Current/Total Round: 1/50
I0601 18:19:11.276138 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Client identity: site-2
I0601 18:19:11.283552 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Loaded 7 of 7 weights
I0601 18:19:11.283828 139825093494592 fl_component.py:

[NeMo I 2023-06-01 18:19:11 modelPT:722] Optimizer config = FusedAdam (
    Parameter Group 0
        betas: [0.9, 0.98]
        bias_correction: True
        eps: 1e-08
        lr: 0.0001
        weight_decay: 0.01
    )
[NeMo I 2023-06-01 18:19:11 lr_scheduler:910] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7f2b294cb850>" 
    will be used during training (effective maximum steps = 11000) - 
    Parameters : 
    (warmup_steps: 50
    min_lr: 0.0
    constant_steps: 0
    max_steps: 11000
    )
2023-06-01 18:19:11,515 - pytorch_lightning.callbacks.model_summary - INFO - 
  | Name            | Type                   | Params
-----------------------------------------------------------
0 | frozen_model    | MegatronGPTModel       | 354 M 
1 | word_embeddings | VocabParallelEmbedding | 51.5 M
2 | prompt_encoder  | PromptEncoder          | 4.2 M 
-----------------------------------------------------------
4.2 M     Trainable params
354 M     Non-trainable params


I0601 18:19:11.515919 139825093494592 model_summary.py:83] 
  | Name            | Type                   | Params
-----------------------------------------------------------
0 | frozen_model    | MegatronGPTModel       | 354 M 
1 | word_embeddings | VocabParallelEmbedding | 51.5 M
2 | prompt_encoder  | PromptEncoder          | 4.2 M 
-----------------------------------------------------------
4.2 M     Trainable params
354 M     Non-trainable params
359 M     Total params
718.178   Total estimated model params size (MB)


Sanity Checking DataLoader 0: 100%|██████████| 2/2 [00:01<00:00,  1.03it/s]2023-06-01 18:19:13,828 - root - INFO - val_loss: 6.231474876403809
Epoch 0:   0%|          | 0/13 [00:00<?, ?it/s]                            

I0601 18:19:13.828571 139825093494592 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 6.231474876403809
      rank_zero_warn(
    
    
    


Epoch 0:  69%|██████▉   | 9/13 [00:19<00:08,  2.22s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/4 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/4 [00:00<?, ?it/s][A
Epoch 0:  77%|███████▋  | 10/13 [00:20<00:06,  2.10s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000]
Epoch 0:  85%|████████▍ | 11/13 [00:21<00:03,  1.99s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000]
Epoch 0:  92%|█████████▏| 12/13 [00:22<00:01,  1.90s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000]
Epoch 0: 100%|██████████| 13/13 [00:22<00:00,  1.76s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000]2023-06-01 18:19:36,757 - root - INFO - val_loss: 5.130880355834961
Epoch 0: 100%|██████████| 13/13 [00:22<00:00,  1.76s/it, loss=6.76, v_num=0, reduced_train_loss=5.550, global_step=8.000, val_loss=5.130]
Epoch 0: 100%|██████████| 13/13 [

I0601 18:19:36.757718 139825093494592 fed_megatron_gpt_prompt_learning_model.py:99] val_loss: 5.130880355834961
I0601 18:19:36.766366 139825093494592 fit_loop.py:175] `Trainer.fit` stopped: `max_epochs=1` reached.


2023-06-01 18:19:37,731 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-2, peer_run=simulate_job]: got result from client site-2 for task: name=train, id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff
2023-06-01 18:19:37,734 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-2, peer_run=simulate_job, peer_rc=OK, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: ignored result submission since server runner's status is done
2023-06-01 18:19:37,736 - SubmitUpdateCommand - INFO - submit_update process. client_name:site-2   task_id:bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff

2023-06-01 18:19:37,643 - PromptLearner - INFO - [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Computed 7 weight differences for global model of length 7
2023-06-01 18:19:37,644 - PromptLearner - INFO - [identity=site

I0601 18:19:37.643822 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Computed 7 weight differences for global model of length 7
I0601 18:19:37.644314 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Local steps per epoch: 9
I0601 18:19:37.644602 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: Local epochs finished. Returning shareable
I0601 18:19:37.645701 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job, task_name=train, task_id=bf2d9558-29de-4f0e-ade0-e5f9e2a1d9ff]: finished processing task
I0601 18:19:37.648889 139814

2023-06-01 18:19:39,748 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather, peer=site-2, peer_run=simulate_job]: server runner is finalizing - asked client to end the run
2023-06-01 18:19:39,760 - GetTaskCommand - INFO - return task to client.  client_name: site-2  task_name: __end_run__   task_id:   sharable_header_task_id: 
2023-06-01 18:19:39,767 - FederatedClient - INFO - Shutting down client run: site-1
2023-06-01 18:19:39,769 - FederatedClient - INFO - Shutting down client run: site-2
2023-06-01 18:19:39,772 - FederatedClient - INFO - Shutting down client run: site-3
2023-06-01 18:19:39,772 - ServerRunner - INFO - [identity=simulator_server, run=simulate_job, wf=scatter_and_gather]: asked to abort - triggered abort_signal to stop the RUN
2023-06-01 18:19:39,766 - FederatedClient - INFO - pull_task completed. Task name:__end_run__ Status:True 
2023-06-01 18:19:39,766 - ClientRunner - INFO - [identity=site-2, run=simulate_job, peer=simulator

I0601 18:19:39.766087 139825093494592 fed_client.py:91] pull_task completed. Task name:__end_run__ Status:True 
I0601 18:19:39.766475 139825093494592 fl_component.py:134] [identity=site-2, run=simulate_job, peer=simulator_server, peer_run=simulate_job]: server asked to end the run
I0601 18:19:39.766636 139825093494592 simulator_worker.py:102] End the Simulator run.
I0601 18:19:39.767145 139825093494592 simulator_worker.py:125] Clean up ClientRunner for : site-2 


2023-06-01 18:19:43,291 - MPM - INFO - MPM: Good Bye!
Simulator finished with run_status 0


You can visualize the training process using TensorBoard

In [None]:
!tensorboard --logdir /tmp/nvflare/nemo

TensorFlow installation not found - running with reduced feature set.

NOTE: Using experimental fast data loading logic. To disable, pass
    "--load_fast=false" and report issues on GitHub. More details:
    https://github.com/tensorflow/tensorboard/issues/4784

TensorBoard 2.9.0 at http://localhost:6006/ (Press CTRL+C to quit)


## Results
In this scenario, all clients utilize the same validation set, allowing for a direct comparison between the locally p-tuned and federated global models. As anticipated, the FedAvg-trained global model exhibits lower validation loss than the models trained solely on their local datasets. This is because the global model has access to all client datasets and can, consequently, generalize better.

![validation loss](./figs/val_loss.svg)

## Inference

We can use `model.generate()` to run inference after p-tuning the model. 
Let's define some test examples to feed to the p-tuned model to see its predictions.

In [None]:
test_examples = [
    {"taskname": "sentiment", "sentence": "The products have a low salt and fat content ."},
    {"taskname": "sentiment", "sentence": "The agreement is valid for four years ."},
    {"taskname": "sentiment", "sentence": "Diluted EPS rose to EUR3 .68 from EUR0 .50 ."},
    {"taskname": "sentiment", "sentence": "The company is well positioned in Brazil and Uruguay ."},
    {"taskname": "sentiment", "sentence": "Profit before taxes decreased by 9 % to EUR 187.8 mn in the first nine months of 2008 , compared to EUR 207.1 mn a year earlier ."},
]

Next, we will load the global model.

In [None]:
import os
import torch
import pytorch_lightning as pl
from nemo_nvflare.fed_megatron_gpt_prompt_learning_model import FedMegatronGPTPromptLearningModel
from nemo_nvflare.utils import load_weights
from omegaconf import OmegaConf
from nemo.collections.nlp.parts.nlp_overrides import NLPDDPStrategy
from pytorch_lightning.plugins.environments import TorchElasticEnvironment

# Load model configuration used by one of the clients
config = OmegaConf.load("jobs/gpt_p-tuning_fedavg_345M/server/config/megatron_gpt_prompt_learning_config.yaml")

# Set GPT model path
config.model.language_model_path = "megatron_gpt_345m.nemo"

# Load task templates
config.model.task_templates = OmegaConf.load("jobs/gpt_p-tuning_fedavg_345M/server/config/task_templates.json")

# Set task that were learned
config.model.new_tasks = ["sentiment"]

# Setup cluster environment parameters
# use torch elastic cluster environment so `create_process_externally` is True
# the launcher is set to None. It will not try to spawn new processes.
# It won't create the misconfiguration error because of the `interactive session`
os.environ["LOCAL_RANK"] = '0'
os.environ["RANK"] = '0'
os.environ["WORLD_SIZE"] = '1'
strategy = NLPDDPStrategy(find_unused_parameters=False, no_ddp_communication_hook=True)
plugins = [TorchElasticEnvironment()]

# Set up the trainer and load the model that was used for p-tuning
trainer = pl.Trainer(plugins=plugins, strategy=strategy, **config.trainer)
model = FedMegatronGPTPromptLearningModel(cfg=config.model, trainer=trainer)
model.init_prompt_encoder()

print("Model initialized", type(model))

Overwrite the prompt encoder with the best global model

In [None]:
ckpt = torch.load("/tmp/nvflare/nemo/gpt_p-tuning_fedavg_345M/simulate_job/app_server/best_FL_global_model.pt")
global_weights = ckpt["model"]

n_loaded = load_weights(model, global_weights, device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu"))
print(f"Loaded {n_loaded} of {len(global_weights)} weights")

Run the model

In [None]:
response = model.generate(inputs=test_examples, length_params=None)

print('The prediction results of some sample queries with the trained model:')
for result in response['sentences']:
    print(result)
    print("-" * 30)

The expected output predictions look something like this

>      The products have a low salt and fat content . sentiment: neutral
>      ------------------------------
>      The agreement is valid for four years . sentiment: neutral
>      ------------------------------
>      Diluted EPS rose to EUR3 .68 from EUR0 .50 . sentiment: positive
>      ------------------------------
>      The company is well positioned in Brazil and Uruguay . sentiment: positive
>      ------------------------------
>      Profit before taxes decreased by 9 % to EUR 187.8 mn in the first nine months of 2008 , compared to EUR 207.1 mn a year earlier . sentiment: negative
>      ------------------------------