# Sequential Training Example

[Sequential training](https://arxiv.org/abs/1811.01088v2) involves fine-tuning a language-encoding model (e.g. BERT) on one task (the "intermediate" task), and then again on a second task (the "target" task). In many cases, the right choice of intermediate task can improve the performance on the target task compared to fine-tuning only on the target task.

Between the two phases of training, we are going to carry over the language encoding model, and not the task heads.

--- 

In this notebook, we will:

* Train a RoBERTa base model on MNLI, and the further fine-tune the model on RTE

## Setup

#### Install dependencies

First, we will install libraries we need for this code.

In [None]:
%%capture
!git clone https://github.com/nyu-mll/jiant.git
%cd jiant
!pip install -r requirements-no-torch.txt
!pip install --no-deps -e ./

#### Download data

Next, we will download MNLI and RTE data. 

In [None]:
%%capture
# Download MNLI and RTE data
!PYTHONPATH=/content/jiant python jiant/jiant/scripts/download_data/runscript.py \
    download \
    --tasks mnli rte \
    --output_path=/content/tasks/

## `jiant` Pipeline

In [None]:
import sys
sys.path.insert(0, "/content/jiant")

In [None]:
import jiant.proj.main.tokenize_and_cache as tokenize_and_cache
import jiant.proj.main.export_model as export_model
import jiant.proj.main.scripts.configurator as configurator
import jiant.proj.main.runscript as main_runscript
import jiant.shared.caching as caching
import jiant.utils.python.io as py_io
import jiant.utils.display as display
import os

#### Download model

Next, we will download a `roberta-base` model. This also includes the tokenizer.

In [None]:
export_model.lookup_and_export_model(
    model_type="roberta-base",
    output_base_path="./models/roberta-base",
)

#### Tokenize and cache

With the model and data ready, we can now tokenize and cache the inputs features for our tasks. This converts the input examples to tokenized features ready to be consumed by the model, and saved them to disk in chunks.

In [None]:
# Tokenize and cache each task
for task_name in ["mnli", "rte"]:
    tokenize_and_cache.main(tokenize_and_cache.RunConfiguration(
        task_config_path=f"./tasks/configs/{task_name}_config.json",
        model_type="roberta-base",
        model_tokenizer_path="./models/roberta-base/tokenizer",
        output_dir=f"./cache/{task_name}",
        phases=["train", "val"],
    ))

We can inspect the first examples of the first chunk of each task.

In [None]:
row = caching.ChunkedFilesDataCache("./cache/mnli/train").load_chunk(0)[0]["data_row"]
print(row.input_ids)
print(row.tokens)

In [None]:
row = caching.ChunkedFilesDataCache("./cache/rte/val").load_chunk(0)[0]["data_row"]
print(row.input_ids)
print(row.tokens)

#### Writing a run config

Here we are going to write what we call a `jiant_task_container_config`. This configuration file basically defines a lot of the subtleties of our training pipeline, such as what tasks we will train on, do evaluation on, batch size for each task. The new version of `jiant` leans heavily toward explicitly specifying everything, for the purpose of inspectability and leaving minimal surprises for the user, even as the cost of being more verbose.

Since we are training in two phases, we will need to write two run configs - one for MNLI, and one for RTE. (This might seem tedious, but note that these can be easily reusable across different combinations of intermediate and target tasks.)

We use a helper "Configurator" to write out a `jiant_task_container_config`, since most of our setup is pretty standard. 

We start with the MNLI config:

**Depending on what GPU your Colab session is assigned to, you may need to lower the train batch size.**

In [None]:
jiant_run_config = configurator.SimpleAPIMultiTaskConfigurator(
    task_config_base_path="./tasks/configs",
    task_cache_base_path="./cache",
    train_task_name_list=["mnli"],
    val_task_name_list=["mnli"],
    train_batch_size=8,
    eval_batch_size=16,
    epochs=0.1,
    num_gpus=1,
).create_config()
os.makedirs("./run_configs/", exist_ok=True)
py_io.write_json(jiant_run_config, "./run_configs/mnli_run_config.json")
display.show_json(jiant_run_config)

To briefly go over the major components of the `jiant_task_container_config`:

* `task_config_path_dict`: The paths to the task config files we wrote above.
* `task_cache_config_dict`: The paths to the task features caches we generated above.
* `sampler_config`: Determines how to sample from different tasks during training.
* `global_train_config`: The number of total steps and warmup steps during training.
* `task_specific_configs_dict`: Task-specific arguments for each task, such as training batch size and gradient accumulation steps.
* `taskmodels_config`: Task-model specific arguments for each task-model, including what tasks use which model.
* `metric_aggregator_config`: Determines how to weight/aggregate the metrics across multiple tasks.

Next, we will write the equivalent for RTE.

In [None]:
jiant_run_config = configurator.SimpleAPIMultiTaskConfigurator(
    task_config_base_path="./tasks/configs",
    task_cache_base_path="./cache",
    train_task_name_list=["rte"],
    val_task_name_list=["rte"],
    train_batch_size=8,
    eval_batch_size=16,
    epochs=0.5,
    num_gpus=1,
).create_config()
os.makedirs("./run_configs/", exist_ok=True)
py_io.write_json(jiant_run_config, "./run_configs/rte_run_config.json")
display.show_json(jiant_run_config)

#### Start training

Finally, we can start our training run. 

Before starting training, the script also prints out the list of parameters in our model. In the first phase, we are simply training on MNLI.

In [None]:
run_args = main_runscript.RunConfiguration(
    jiant_task_container_config_path="./run_configs/mnli_run_config.json",
    output_dir="./runs/mnli",
    model_type="roberta-base",
    model_path="./models/roberta-base/model/roberta-base.p",
    model_config_path="./models/roberta-base/model/roberta-base.json",
    model_tokenizer_path="./models/roberta-base/tokenizer",
    learning_rate=1e-5,
    eval_every_steps=500,
    do_train=True,
    do_val=True,
    do_save=True,
    force_overwrite=True,
)
main_runscript.run_loop(run_args)

The above run saves the best model weights to `./runs/mnli/best_model.p`. Now, we will pick up from those saved model weights and start training on RTE. In addition to changing the `model_path`, we also set `model_load_mode="partial"`. This tells `jiant` that we will not be loading and reusing the task heads from the previous run.

In [None]:
run_args = main_runscript.RunConfiguration(
    jiant_task_container_config_path="./run_configs/rte_run_config.json",
    output_dir="./runs/mnli___rte",
    model_type="roberta-base",
    model_path="./runs/mnli/best_model.p",  # Loading the best model
    model_load_mode="partial",
    model_config_path="./models/roberta-base/model/roberta-base.json",
    model_tokenizer_path="./models/roberta-base/tokenizer",
    learning_rate=1e-5,
    eval_every_steps=500,
    do_train=True,
    do_val=True,
    force_overwrite=True,
)
main_runscript.run_loop(run_args)

Finally, we should see the validation scores RTE. You can compare these to just training on RTE and should see a good margin of improvement.