# Training Emely

## Run this noteboook in Jupyter to work with WandB

This notebook is for training Emely with different configurations.
Use the blender_opts dictionary for the standard options.

### Configuration

The default options for training are located in settings/default_blender_opts.json and settings/run_blender_opts.json. The default_blender_opts are assumed to stay unchanged, while the run_blender_opts can be altered for each model instance.

The current options that can be varied between models with default settings are:

- init_model: "zoo:blender/blender_90M/model",
- dict_file: "zoo:blender/blender_90M/model.dict",
- bs: 16,
- betas: "0.9,0.999",
- lr: 1e-06,
- dropout: 0.1,
- inference: "beam",
- beam_size: 10,
- beam_min_length: 10,
- beam_block_ngram: 3,
- wandb_project: "emely-v0.X",
- task: "internal,external,external-gpt3",
- multitask_weights: "6,3,3",
- mutators: null

# Steps

## 1. Preparation

- Log in to WandB using  "wandb login" or "wandb login --relogin"
- Prepare datasets for use by model training
- Make sure docker does not need sudo privilegies and is pruned from previous runs to avoid name collisions
- Create a copy of this notebook, and name it accordingly (e.g. pipeline-v04.ipynb)

## 2. Main options

Set the main options. Currently the only supported persona is interview.

## 3. Model specific preparations

- Run the first three cells of code, and edit the model specific options in temp_opts/model_i_opts.json
- Make sure torch.cuda.is_available()=True
- Create the same number of new cells as the number of models and paste "model_names.append(run_training(i))" for model i in each cell.

## 4. Run training

Once the above steps are completed, training is run by "run all cells below". This will automatically generate model names, upload training logs to wandb, create docker images and dockerfiles

## OBS. Testing not yet implemented.

# Main options

In [31]:
n_models = 1
wandb_project_name = "emely-vXX"
persona = "interview"
err = os.system("mkdir ../../models/emely-runs")

# Imports

In [32]:
import json
from parlai.scripts.train_model import TrainModel
from pathlib import Path
import time
import os
import subprocess
from subprocess import Popen
import torch
import names
torch.cuda.is_available()

False

# Choose model specific settings

### Run the following cell, and then edit the model specific settings in the temp_opts json files.

In [33]:
with open("temp_opts/run_blender_opts.json","r") as file:
    run_blender_opts = json.load(file)
run_blender_opts["wandb_project"] = wandb_project_name
for i in range(n_models):
    with open("temp_opts/model_" + str(i+1) + "_opts.json","w") as file:
        json.dump(run_blender_opts, file, sort_keys=False, indent=4)

# Define training function

For clarity, the training function is also defined in this notebook.

In [34]:
def run_training(model_id):

    with open("temp_opts/default_blender_opts.json","r") as file:
        default_blender_opts = json.load(file)

    with open("temp_opts/model_" + str(model_id) + "_opts.json","r") as file:
        run_blender_opts = json.load(file)

    # Set name for file and model run on wandb
    name = names.get_full_name().replace(" ", "_").lower()
    mf = Path.cwd().parents[1].joinpath(f'models/emely-runs/{name}/model')
    
    # Finalize training opts
    run_blender_opts["model_file"] = mf.as_posix()
    run_blender_opts["wandb_name"] = name
    run_blender_opts.update(default_blender_opts)

    # Uncomment the following line to run for one epoch during testing
    #run_blender_opts["eps"] = 1

    if run_blender_opts["mutators"] is None:
        del run_blender_opts["mutators"]
    
    # Run training
    TrainModel.main(**run_blender_opts)

    # Wrap up
    os.system(f"parlai vacuum -mf ../../models/emely-runs/{name}/model")
    with open(f"../../models/emely-runs/{name}/run_opts.json","w") as file:
        json.dump(run_blender_opts, file, sort_keys=False, indent=4)

    return name

model_names = []

# Run the training in separate cells

- Create as many cells as models to train and paste "model_names.append(run_training(i))", i=1...N, in each cell.
- Only the models that are successfully generated will be appended to model_names, and used for testing.

In [35]:
model_names.append(run_training(1))

14:49:08 | building dictionary first...
14:49:08 | No model with opt yet at: /home/ckjellson/code/emely-models/models/emely-runs/maria_ecton/model(.opt)
14:49:08 | [33myour model is being loaded with opts that do not exist in the model you are initializing the weights with: allow_missing_init_opts: False,download_path: None,loglevel: info,dynamic_batching: None,verbose: False,is_debug: False,datapath: /home/ckjellson/code/emely-models/ParlAI/data,eval_dynamic_batching: None,num_workers: 0,max_train_steps: -1,log_every_n_steps: 50,validation_every_n_steps: -1,load_from_checkpoint: True,tensorboard_logdir: None,wandb_log: True,wandb_name: maria_ecton,wandb_project: emely-vXX,wandb_entity: None,mutators: None,n_encoder_layers: -1,n_decoder_layers: -1,model_parallel: False,beam_block_full_context: True,beam_delay: 30,beam_block_list_filename: None,temperature: 1.0,interactive_mode: False,history_reversed: False,history_add_global_end_token: None,special_tok_lst: None,bpe_vocab: None,bpe_m

[34m[1mwandb[0m: wandb version 0.12.4 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


14:49:14 | training...
14:49:15 | time:5s total_exs:16 total_steps:1 epochs:16.00 time_left:0s
    clen  clip  ctpb  ctps  ctrunc  ctrunclen  exps  exs  gnorm  llen  loss    lr  ltpb  ltps  ltrunc  ltrunclen   ppl  \
      41     1   656 481.7       0          0 11.75   16  47.34    11 2.819 1e-06   176 129.2       0          0 16.75   
    token_acc  token_em  total_train_updates  tpb   tps   ups  
        .3580         0                    1  832 610.9 .7351

14:49:15 | num_epochs completed:1.0 time elapsed:5.249457836151123s
14:49:15 | Saving dictionary to /home/ckjellson/code/emely-models/models/emely-runs/maria_ecton/model.dict
14:49:17 | [33mOverriding opt["init_model"] to zoo:blender/blender_90M/model (previously: /home/ckjellson/code/emely-models/ParlAI/data/models/blender/blender_90M/model)[0m
14:49:17 | [33mOverriding opt["betas"] to (0.9, 0.999) (previously: [0.9, 0.999])[0m
14:49:17 | [33mOverriding opt["multitask_weights"] to (1.0,) (previously: [1.0])[0m
14:49:17 | 

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
exs/train,16.0
clen/train,41.0
ctrunc/train,0.0
ctrunclen/train,0.0
llen/train,11.0
ltrunc/train,0.0
ltrunclen/train,0.0
loss/train,2.81868
ppl/train,16.75474
token_acc/train,0.35795


0,1
exs/train,▁
clen/train,▁
ctrunc/train,▁
ctrunclen/train,▁
llen/train,▁
ltrunc/train,▁
ltrunclen/train,▁
loss/train,▁
ppl/train,▁
token_acc/train,▁


# Models are trained, now create docker images

### Start with creating a local parlai image

In [36]:
err = os.system(f"cp ../parlai-docker/Dockerfile ../../Dockerfile")
err = os.system(f"docker build -t parlai-emely-base-image ../..")
err = os.system(f"rm ../../Dockerfile")

In [37]:
image_names = []
for name in model_names:
    try:
        # Store the Dockerfile that is used in the model directory
        with open(f"../{persona}/Dockerfile","r") as file:
            lines = file.readlines()
        lines[2] = f"COPY models/emely-runs/{name} ./models/interview-model\n"
        dockerfile = "".join(lines)
        with open(f"../../models/emely-runs/{name}/Dockerfile","w") as file:
            file.write(dockerfile)
        
        # Create docker image
        err = os.system(f"cp ../../models/emely-runs/{name}/Dockerfile ../../Dockerfile")
        err = os.system(f"docker build -t {name} ../..")
        err = os.system(f"rm ../../Dockerfile")

        image_names.append(name)
    except:
        print(f"Error generating docker-image for model {name}")


# Run testing (not yet working)

In [38]:
if False:
    tested_names = []
    image_names = "blender-minimal-1-model_1"
    #print(os.system(f"conda activate {testenv}"))
    for name in image_names:
        #try:
        p1 = Popen("/bin/bash", stdin=subprocess.PIPE, stdout=subprocess.PIPE, encoding='utf8')
        p2 = Popen("/bin/bash", stdin=subprocess.PIPE, stdout=subprocess.PIPE, encoding='utf8')

        out,err = p1.communicate(f"docker run --name {name} -p 8080:8080 {name}")
        print(out)
        print(err)
        time.sleep(5)
        out,err = p2.communicate(f"conda activate {testenv} ; python ../../../emely-testing/main.py")
        print(err)
        print(out)
        while p2.poll() is not None:
            out = p2.stdout
            #print(out)
        print(out)

        p2.kill()
        p1.kill()
        # p2 = Popen("python ../../../emely-testing/main.py")

        # while True:
        #     if p2.poll() is None:
        #         break

        print(os.system(f"docker stop {name}"))
        print(os.system(f"docker rm {name}"))
        # tested_names.append(name)
        #except:
        #    print(f"Error testing model {name}")

    #docker run --name blender-minimal-1-model_1  -p 8080:8080 blender-minimal-1-model_1

# Final clean-up

In [39]:
print(f"Successfully trained and dockerized models:")
for name in model_names:    # When testing is implemented this should be "for name in tested_names:"
    print(f"{name}")

Successfully trained and dockerized models:
maria_ecton


In [40]:
for i in range(len(model_names)):
    os.system(f"rm temp_opts/model_{str(i+1)}_opts.json")

# --- End of pipeline ---

# Some utils to change the default files used in this notebook

In [22]:
if False:
    default_blender_opts = {
        "activation": "gelu",
        "attention_dropout": 0.0,
        "dict_lower": True,
        "dict_tokenizer": "bpe",
        "embedding_size": 512,
        "evaltask": "internal,external",
        "ffn_size": 2048,
        "fp16": True,
        "gradient_clip": 0.1,
        "label_truncate": 128,
        "learn_positional_embeddings": True,
        "lr_scheduler": "reduceonplateau",
        "metrics": "ppl,bleu-4,rouge-L",
        "model": "transformer/generator",
        "n_heads": 16,
        "n_layers": 8,
        "n_positions": 512,
        "optimizer": "adamax",
        "relu_dropout": 0.0,
        "save_after_valid": True,
        "skip_generation": False,
        "stim": 60,
        "tensorboard_log": True,
        "text_truncate": 512,
        "update_freq": 1,
        "variant": "xlm",
        "veps": 0.25,
        "vme": 20000,
        "vmm": "min",
        "vmt": "ppl",
        "vp": 15,
        "wblog": True
    }
    run_blender_opts = {
        'init_model': 'zoo:blender/blender_90M/model',
        'dict_file': 'zoo:blender/blender_90M/model.dict',
        'bs': 16,
        'betas': '0.9,0.999',
        'lr': 1e-06,
        'dropout': 0.1,
        'inference': 'beam',
        'beam_size': 10,
        'beam_min_length': 10,
        'beam_block_ngram': 3,
        'wandb_project': 'parlaiemely',
        'task': 'internal,external,external-gpt3',
        'multitask_weights': '6,3,3',
        'mutators': None
    }

    with open("temp_opts/default_blender_opts.json","w") as file:
        json.dump(default_blender_opts,file, sort_keys=True, indent=4)
    with open("temp_opts/run_blender_opts.json","w") as file:
        json.dump(run_blender_opts,file, sort_keys=False, indent=4)