<img src="https://parl.ai/docs/_static/img/parlai.png" width="700"/>

**Author**: Stephen Roller ([GitHub](https://github.com/stephenroller), [Twitter](https://twitter.com/stephenroller))


# Welcome to the ParlAI interactive tutorial

In this tutorial we will:

- Chat with a neural network model!
- Show how to use common commands in ParlAI, like inspecting data and model outputs.
- See where to find information about many options.
- Show how to fine-tune a pretrained model on a specific task
- Add our own datasets to ParlAI
- And add our own models to ParlAI

We won't be running any examples of using Amazon Mechanical Turk, or connecting to Chat services, but you can check out our [docs](https://parl.ai/docs/) for more information on these areas.

**Note:** *Make sure you're running this session with a GPU attached.*

In [1]:
!nvidia-smi

Wed Jul 13 10:38:00 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Quadro RTX 4000     Off  | 00000000:01:00.0  On |                  N/A |
| 30%   40C    P8     7W / 125W |     98MiB /  7981MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0  

## Installing parlai

We need to install ParlAI. Since we're in Google Colab, we can assume PyTorch and similar dependencies are installed already

In [2]:
!pip3 install -q parlai
!pip3 install -q subword_nmt # extra requirement we need for this tutorial

# Chatting with a model

Let's start by chatting interactively with a model file from our model zoo! We'll pick our "tutorial transformer generator" model, which is a generative transformer trained on pushshift.io Reddit. You can take a look at the [model zoo](https://parl.ai/docs/zoo.html) for a more complete list.

In [10]:
# Import the Interactive script
from parlai.scripts.interactive import Interactive

# call it with particular args
Interactive.main(
    # the model_file is a filename path pointing to a particular model dump.
    # Model files that begin with "zoo:" are special files distributed by the ParlAI team.
    # They'll be automatically downloaded when you ask to use them.
    model_file='zoo:tutorial_transformer_generator/model'
)

12:16:53 | [33mOverriding opt["model_file"] to /home/jf/.local/ParlAI/data/models/tutorial_transformer_generator/model (previously: /checkpoint/roller/20190909/cleanreddit/585/model)[0m
12:16:53 | loading dictionary from /home/jf/.local/ParlAI/data/models/tutorial_transformer_generator/model.dict
12:16:53 | num words = 54944
12:16:53 | TransformerGenerator: full interactive mode on.
12:16:54 | Total parameters: 87,508,992 (87,508,992 trainable)
12:16:54 | Loading existing model params from /home/jf/.local/ParlAI/data/models/tutorial_transformer_generator/model
12:16:54 | Opt:
12:16:54 |     activation: gelu
12:16:54 |     adafactor_eps: '(1e-30, 0.001)'
12:16:54 |     adam_eps: 1e-06
12:16:54 |     add_p1_after_newln: False
12:16:54 |     aggregate_micro: False
12:16:54 |     allow_missing_init_opts: False
12:16:54 |     attention_dropout: 0.0
12:16:54 |     batch_length_range: 5
12:16:54 |     batch_sort_cache_type: pop
12:16:54 |     batch_sort_field: text
12:16:54 |     batchsize:

KeyboardInterrupt: Interrupted by user

The same on the command line:
```bash
python -m parlai.scripts.interactive --model-file zoo:tutorial_transformer_generator/model
```

In [6]:
from parlai.chat_service.services.terminal_chat import terminal_manager
terminal_manager    

<module 'parlai.chat_service.services.terminal_chat.terminal_manager' from '/home/jf/.local/ParlAI/parlai/chat_service/services/terminal_chat/terminal_manager.py'>

In [7]:
!python parlai/chat_service/services/terminal_chat/run.py --config-path parlai/chat_service/tasks/chatbot/config.yml --port 10001

python: can't open file '/home/jf/.local/ParlAI/my_work/parlai/chat_service/services/terminal_chat/run.py': [Errno 2] No such file or directory


In [None]:
!python -m parlai.scripts.interactive --model-file zoo:tutorial_transformer_generator/model

23:31:24 | [33mOverriding opt["model_file"] to /usr/local/lib/python3.7/dist-packages/data/models/tutorial_transformer_generator/model (previously: /checkpoint/roller/20190909/cleanreddit/585/model)[0m
23:31:24 | [33mLoading model with `--beam-block-full-context false`[0m
23:31:24 | Using CUDA
23:31:24 | loading dictionary from /usr/local/lib/python3.7/dist-packages/data/models/tutorial_transformer_generator/model.dict
23:31:24 | num words = 54944
23:31:24 | TransformerGenerator: full interactive mode on.
23:31:25 | [33mDEPRECATED: XLM should only be used for backwards compatibility, as it involves a less-stable layernorm operation.[0m
23:31:27 | Total parameters: 87,508,992 (87,508,992 trainable)
23:31:27 | Loading existing model params from /usr/local/lib/python3.7/dist-packages/data/models/tutorial_transformer_generator/model
23:31:28 | Opt:
23:31:28 |     activation: gelu
23:31:28 |     adafactor_eps: '(1e-30, 0.001)'
23:31:28 |     adam_eps: 1e-06
23:31:28 |     add_p1_after

# Taking a look at some data

We can look at look into a specific dataset. Let's look into the "empathetic dialogues" dataset, which aims to teach models how to respond with text expressing the appropriate emotion. We have over existing 80 datasets in ParlAI. You can take a full look in our [task list](https://parl.ai/docs/tasks.html).

In [None]:
# The display_data script is used to show the contents of a particular task.
# By default, we show the train
from parlai.scripts.display_data import DisplayData
DisplayData.main(task='empathetic_dialogues', num_examples=15)

23:32:50 | Opt:
23:32:50 |     allow_missing_init_opts: False
23:32:50 |     batchsize: 1
23:32:50 |     datapath: /usr/local/lib/python3.7/dist-packages/data
23:32:50 |     datatype: train:ordered
23:32:50 |     dict_class: None
23:32:50 |     display_add_fields: 
23:32:50 |     download_path: None
23:32:50 |     dynamic_batching: None
23:32:50 |     hide_labels: False
23:32:50 |     ignore_agent_reply: True
23:32:50 |     image_cropsize: 224
23:32:50 |     image_mode: raw
23:32:50 |     image_size: 256
23:32:50 |     init_model: None
23:32:50 |     init_opt: None
23:32:50 |     is_debug: False
23:32:50 |     loglevel: info
23:32:50 |     max_display_len: 1000
23:32:50 |     model: None
23:32:50 |     model_file: None
23:32:50 |     multitask_weights: [1]
23:32:50 |     mutators: None
23:32:50 |     num_examples: 15
23:32:50 |     override: "{'task': 'empathetic_dialogues', 'num_examples': 15}"
23:32:50 |     parlai_home: /usr/local/lib/python3.7/dist-packages
23:32:50 |     starttime

Downloading empatheticdialogues.tar.gz: 100%|██████████| 28.0M/28.0M [00:01<00:00, 25.4MB/s]


[1;31m- - - NEW EPISODE: empathetic_dialogues - - -[0;0m
[0mI remember going to see the fireworks with my best friend. It was the first time we ever spent time alone together. Although there was a lot of people, we felt like the only people in the world.[0;0m
   [1;94mWas this a friend you were in love with, or just a best friend?[0;0m
[0mThis was a best friend. I miss her.[0;0m
   [1;94mWhere has she gone?[0;0m
[0mWe no longer talk.[0;0m
   [1;94mOh was this something that happened because of an argument?[0;0m
[1;31m- - - NEW EPISODE: empathetic_dialogues - - -[0;0m
[0mWas this a friend you were in love with, or just a best friend?[0;0m
   [1;94mThis was a best friend. I miss her.[0;0m
[0mWhere has she gone?[0;0m
   [1;94mWe no longer talk.[0;0m
[1;31m- - - NEW EPISODE: empathetic_dialogues - - -[0;0m
[0m it feels like hitting to blank wall when i see the darkness[0;0m
   [1;94mOh ya? I don't really see how[0;0m
[0mdont you feel so.. its a wonder [0;0m


The black, unindented text is the _prompt_, while the blue text is the _label_. That is, the label is what we will be training the model to mimic.

We can also ask to see fewer examples, and get them from the validation set instead.

In [None]:
# we can instead ask to see fewer examples, and get them from the valid set.
DisplayData.main(task='empathetic_dialogues', num_examples=3, datatype='valid')

23:36:09 | Opt:
23:36:09 |     allow_missing_init_opts: False
23:36:09 |     batchsize: 1
23:36:09 |     datapath: /usr/local/lib/python3.7/dist-packages/data
23:36:09 |     datatype: valid
23:36:09 |     dict_class: None
23:36:09 |     display_add_fields: 
23:36:09 |     download_path: None
23:36:09 |     dynamic_batching: None
23:36:09 |     hide_labels: False
23:36:09 |     ignore_agent_reply: True
23:36:09 |     image_cropsize: 224
23:36:09 |     image_mode: raw
23:36:09 |     image_size: 256
23:36:09 |     init_model: None
23:36:09 |     init_opt: None
23:36:09 |     is_debug: False
23:36:09 |     loglevel: info
23:36:09 |     max_display_len: 1000
23:36:09 |     model: None
23:36:09 |     model_file: None
23:36:09 |     multitask_weights: [1]
23:36:09 |     mutators: None
23:36:09 |     num_examples: 3
23:36:09 |     override: "{'task': 'empathetic_dialogues', 'num_examples': 3, 'datatype': 'valid'}"
23:36:09 |     parlai_home: /usr/local/lib/python3.7/dist-packages
23:36:09 |   

On the command line:
```bash
python -m parlai.scripts.display_data --task empathetic_dialogues
```
or a bit shorter
```
python -m parlai.scripts.display_data -t empathetic_dialogues
```

# Training a model

Well it's one thing looking at data, but what if we want to train our own model (from scratch)? Let's train a very simple seq2seq LSTM with attention, to respond to empathetic dialogues.

To get some extra performance, we'll initialize using GloVe embeddings, but we will cap the training time to 2 minutes for this tutorial. It won't perform very well, but that's okay.

In [None]:
# we'll save it in the "from_scratch_model" directory
!rm -rf from_scratch_model
!mkdir -p from_scratch_model

from parlai.scripts.train_model import TrainModel
TrainModel.main(
    # we MUST provide a filename
    model_file='from_scratch_model/model',
    # train on empathetic dialogues
    task='empathetic_dialogues',
    # limit training time to 2 minutes, and a batchsize of 16
    max_train_time=2 * 60,
    batchsize=16,
    
    # we specify the model type as seq2seq
    model='seq2seq',
    # some hyperparamter choices. We'll use attention. We could use pretrained
    # embeddings too, with embedding_type='fasttext', but they take a long
    # time to download.
    attention='dot',
    # tie the word embeddings of the encoder/decoder/softmax.
    lookuptable='all',
    # truncate text and labels at 64 tokens, for memory and time savings
    truncate=64,
)

23:36:22 | building dictionary first...
23:36:22 | Opt:
23:36:22 |     adafactor_eps: '(1e-30, 0.001)'
23:36:22 |     adam_eps: 1e-08
23:36:22 |     add_p1_after_newln: False
23:36:22 |     aggregate_micro: False
23:36:22 |     allow_missing_init_opts: False
23:36:22 |     attention: dot
23:36:22 |     attention_length: 48
23:36:22 |     attention_time: post
23:36:22 |     batchsize: 1
23:36:22 |     beam_block_full_context: True
23:36:22 |     beam_block_list_filename: None
23:36:22 |     beam_block_ngram: -1
23:36:22 |     beam_context_block_ngram: -1
23:36:22 |     beam_delay: 30
23:36:22 |     beam_length_penalty: 0.65
23:36:22 |     beam_min_length: 1
23:36:22 |     beam_size: 1
23:36:22 |     betas: '(0.9, 0.999)'
23:36:22 |     bidirectional: False
23:36:22 |     bpe_add_prefix_space: None
23:36:22 |     bpe_debug: False
23:36:22 |     bpe_dropout: None
23:36:22 |     bpe_merge: None
23:36:22 |     bpe_vocab: None
23:36:22 |     compute_tokenized_bleu: False
23:36:22 |     datap

Building dictionary: 100%|██████████| 64.6k/64.6k [00:04<00:00, 14.9kex/s]

23:36:27 | Saving dictionary to from_scratch_model/model.dict
23:36:27 | dictionary built with 22419 tokens in 0.0s





23:36:27 | No model with opt yet at: from_scratch_model/model(.opt)
23:36:27 | Using CUDA
23:36:27 | loading dictionary from from_scratch_model/model.dict
23:36:28 | num words = 22419
23:36:28 | Total parameters: 3,453,203 (3,453,203 trainable)
23:36:28 | Opt:
23:36:28 |     adafactor_eps: '(1e-30, 0.001)'
23:36:28 |     adam_eps: 1e-08
23:36:28 |     add_p1_after_newln: False
23:36:28 |     aggregate_micro: False
23:36:28 |     allow_missing_init_opts: False
23:36:28 |     attention: dot
23:36:28 |     attention_length: 48
23:36:28 |     attention_time: post
23:36:28 |     batchsize: 16
23:36:28 |     beam_block_full_context: True
23:36:28 |     beam_block_list_filename: None
23:36:28 |     beam_block_ngram: -1
23:36:28 |     beam_context_block_ngram: -1
23:36:28 |     beam_delay: 30
23:36:28 |     beam_length_penalty: 0.65
23:36:28 |     beam_min_length: 1
23:36:28 |     beam_size: 1
23:36:28 |     betas: '(0.9, 0.999)'
23:36:28 |     bidirectional: False
23:36:28 |     bpe_add_prefi

Our perplexity and F1 (word overlap) scores are pretty bad, and our BLEU-4 score is nearly 0. That's okay, we would normally want to train for well over an hour. Feel free to change the max_train_time above.

## Performance is pretty bad there. Can we improve it?

The easiest way to improve it is to *initialize* using a *pretrained model*, utilizing *transfer learning*. Let's use the one from the interactive session at the beginning of the chat!

In [None]:
!rm -rf from_pretrained
!mkdir -p from_pretrained

TrainModel.main(
    # similar to before
    task='empathetic_dialogues', 
    model='transformer/generator',
    model_file='from_pretrained/model',
    
    # initialize with a pretrained model
    init_model='zoo:tutorial_transformer_generator/model',
    
    # arguments we get from the pretrained model.
    # Unfortunately, these must be looked up separately for each model.
    n_heads=16, n_layers=8, n_positions=512, text_truncate=512,
    label_truncate=128, ffn_size=2048, embedding_size=512,
    activation='gelu', variant='xlm',
    dict_lower=True, dict_tokenizer='bpe',
    dict_file='zoo:tutorial_transformer_generator/model.dict',
    learn_positional_embeddings=True,
    
    # some training arguments, specific to this fine-tuning
    # use a small learning rate with ADAM optimizer
    lr=1e-5, optimizer='adam',
    warmup_updates=100,
    # early stopping on perplexity
    validation_metric='ppl',
    # train at most 10 minutes, and validate every 0.25 epochs
    max_train_time=600, validation_every_n_epochs=0.25,
    
    # depend on your gpu. If you have a V100, this is good
    batchsize=12, fp16=True, fp16_impl='mem_efficient',
    
    # speeds up validation
    skip_generation=True,
    
    # helps us cram more examples into our gpu at a time
    dynamic_batching='full',
)

21:24:01 | building dictionary first...
21:24:01 | No model with opt yet at: from_pretrained/model(.opt)
21:24:01 | [33myour model is being loaded with opts that do not exist in the model you are initializing the weights with: allow_missing_init_opts: False,download_path: None,loglevel: info,dynamic_batching: full,verbose: False,datapath: /usr/local/lib/python3.7/dist-packages/data,eval_dynamic_batching: None,load_from_checkpoint: True,tensorboard_logdir: None,wandb_log: False,wandb_name: None,wandb_project: None,mutators: None,train_experiencer_only: False,remove_political_convos: False,n_encoder_layers: -1,n_decoder_layers: -1,model_parallel: False,beam_block_full_context: True,beam_length_penalty: 0.65,topk: 10,topp: 0.9,beam_delay: 30,beam_block_list_filename: None,temperature: 1.0,compute_tokenized_bleu: False,interactive_mode: False,fp16_impl: mem_efficient,force_fp16_tokens: False,adafactor_eps: (1e-30, 0.001),history_reversed: False,history_add_global_end_token: None,special_t

({'ctpb': GlobalAverageMetric(3464),
  'ctps': GlobalTimerMetric(3.591e+04),
  'exps': GlobalTimerMetric(887.8),
  'exs': SumMetric(5738),
  'gpu_mem': GlobalAverageMetric(0.07747),
  'loss': AverageMetric(2.446),
  'lr': GlobalAverageMetric(1e-05),
  'ltpb': GlobalAverageMetric(1371),
  'ltps': GlobalTimerMetric(1.421e+04),
  'ppl': PPLMetric(11.54),
  'token_acc': AverageMetric(0.4445),
  'total_train_updates': GlobalFixedMetric(1716),
  'tpb': GlobalAverageMetric(4835),
  'tps': GlobalTimerMetric(5.013e+04)},
 {'ctpb': GlobalAverageMetric(3535),
  'ctps': GlobalTimerMetric(3.689e+04),
  'exps': GlobalTimerMetric(844.3),
  'exs': SumMetric(5259),
  'gpu_mem': GlobalAverageMetric(0.07745),
  'loss': AverageMetric(2.469),
  'lr': GlobalAverageMetric(1e-05),
  'ltpb': GlobalAverageMetric(1313),
  'ltps': GlobalTimerMetric(1.37e+04),
  'ppl': PPLMetric(11.81),
  'token_acc': AverageMetric(0.4422),
  'total_train_updates': GlobalFixedMetric(1716),
  'tpb': GlobalAverageMetric(4848),
  'tp

## Wow that's a lot of options? Where do I find more info?

As you might have noticed, there are a LOT of options to ParlAI. You're best reading the [ParlAI docs](https://parl.ai/docs) to find a list of hyperparameters. We provide lists of the command-line args for both models

You can get some guidance in this notebook by using:

In [None]:
# note that if you want to see model-specific arguments, you must specify a model name
print(TrainModel.help(model='seq2seq'))

usage: TrainModel [-h] [-o INIT_OPT] [-v] [-t TASK]
                  [-dt {train,train:stream,train:ordered,train:ordered:stream,train:stream:ordered,train:evalmode,train:evalmode:stream,train:evalmode:ordered,train:evalmode:ordered:stream,train:evalmode:stream:ordered,valid,valid:stream,test,test:stream}]
                  [-nt NUMTHREADS] [-bs BATCHSIZE] [-dynb {None,batchsort,full}]
                  [-dp DATAPATH] [-m MODEL] [-mf MODEL_FILE] [-im INIT_MODEL]
                  [-et EVALTASK] [-eps NUM_EPOCHS] [-ttim MAX_TRAIN_TIME]
                  [-vtim VALIDATION_EVERY_N_SECS] [-stim SAVE_EVERY_N_SECS]
                  [-sval SAVE_AFTER_VALID] [-veps VALIDATION_EVERY_N_EPOCHS]
                  [-vp VALIDATION_PATIENCE] [-vmt VALIDATION_METRIC]
                  [-vmm {max,min}] [-mcs METRICS] [-micro AGGREGATE_MICRO]
                  [-tblog TENSORBOARD_LOG] [-hs HIDDENSIZE] [-esz EMBEDDINGSIZE]
                  [-nl NUMLAYERS] [-dr DROPOUT] [-bi BIDIRECTIONAL]
            

You'll notice the options are give as commandline arguments. We control our options via `argparse`. The option names are relatively predictable: `--init-model` becomes `init_model`; `--num-epochs` becomes `num_epochs` and so on.

# Looking at model predictions

We have shown how we can chat with a model ourselves, interactively. We might want to inspect how the model reacts with a fixed set of inputs. Let's use that model we just trained!


In [None]:
from parlai.scripts.display_model import DisplayModel
DisplayModel.main(
    task='empathetic_dialogues',
    model_file='from_pretrained/model',
    num_examples=2,
)

23:35:09 | No model with opt yet at: from_pretrained/model(.opt)


RuntimeError: ignored

Whoa wait a second! The model isn't giving any responses? That's because we set `--skip-generation true` to speed up training. We need to turn that back off.

In [None]:
from parlai.scripts.display_model import DisplayModel
DisplayModel.main(
    task='empathetic_dialogues',
    model_file='from_pretrained/model',
    num_examples=2,
    skip_generation=False,
)

[ Using CUDA ]
Dictionary: loading dictionary from from_pretrained/model.dict
[ num words =  54944 ]
Total parameters: 87,508,992 (87,508,992 trainable)
[ Loading existing model params from from_pretrained/model ]
[creating task(s): empathetic_dialogues]
[EmpatheticDialoguesTeacher] Only use experiencer side? True, datatype: valid
[1;31m- - - NEW EPISODE: empathetic_dialogues- - -[0;0m
[0mToday,as i was leaving for work in the morning,i had a tire burst in the middle of a busy road. That scared the hell out of me![0;0m
[1;94m    labels: Are you fine now?[0;0m
[0;95m     model: oh no ! that ' s terrible ! did you get a new tire ?[0;0m
[0mYeah,i'm doing alright now, but with minor injuries.[0;0m
[1;94m    labels: Cool :) Is your car damaged a lot?[0;0m
[0;95m     model: that ' s good . i hope you are okay .[0;0m


On the command line:
```bash
python -m parlai.scripts.display_model --task empathetic_dialogues --model-file zoo:tutorial_transformer_generator/model
```

# Bringing your own datasets

What if you want to build your own dataset in ParlAI? Of course you can do that!

In [None]:
from parlai.core.teachers import register_teacher, DialogTeacher

@register_teacher("my_teacher")
class MyTeacher(DialogTeacher):
    def __init__(self, opt, shared=None):
        # opt is the command line arguments.
        
        # What is this shared thing?
        # We make many copies of a teacher, one-per-batchsize. Shared lets us store 
        
        # We just need to set the "datafile".  This is boilerplate, but differs in many teachers.
        # The "datafile" is the filename where we will load the data from. In this case, we'll set it to
        # the fold name (train/valid/test) + ".txt"
        opt['datafile'] = opt['datatype'].split(':')[0] + ".txt"
        super().__init__(opt, shared)
    
    def setup_data(self, datafile):
        # filename tells us where to load from.
        # We'll just use some hardcoded data, but show how you could read the filename here:
        print(f" ~~ Loading from {datafile} ~~ ")
        
        # setup_data should yield tuples of ((text, label), new_episode)
        # That is ((str, str), bool)
        
        # first episode
        # notice how we have call, response, and then True? The True indicates this is a first message
        # in a conversation
        yield ('Hello', 'Hi'), True
        # Next we have the second turn. This time, the last element is False, indicating we're still going
        yield ('How are you', 'I am fine'), False
        yield ("Let's say goodbye", 'Goodbye!'), False
        
        # second episode. We need to have True again!
        yield ("Hey", "hi there"), True
        yield ("Deja vu?", "Deja vu!"), False
        yield ("Last chance", "This is it"), False
        
        
DisplayData.main(task="my_teacher")

[creating task(s): my_teacher]
 ~~ Loading from train.txt ~~ 
[1;31m- - - NEW EPISODE: my_teacher - - -[0;0m
[0mHello[0;0m
   [1;94mHi[0;0m
[0mHow are you[0;0m
   [1;94mI am fine[0;0m
[0mLet's say goodbye[0;0m
   [1;94mGoodbye![0;0m
[1;31m- - - NEW EPISODE: my_teacher - - -[0;0m
[0mHey[0;0m
   [1;94mhi there[0;0m
[0mDeja vu?[0;0m
   [1;94mDeja vu![0;0m
[0mLast chance[0;0m
   [1;94mThis is it[0;0m
EPOCH DONE
[ loaded 2 episodes with a total of 6 examples ]


Notice how the data corresponds to the utterances we provided? In reality, we'd normally want to load up a data file, loop through it, and yield the tuples from processed data. But for this simple example, it works well.

We can now use our teacher in the standard places! Let's see how the model we trained earlier behaves with it:

In [None]:
DisplayModel.main(task='my_teacher', model_file='from_pretrained/model', skip_generation=False)

[ Using CUDA ]
Dictionary: loading dictionary from from_pretrained/model.dict
[ num words =  54944 ]
Total parameters: 87,508,992 (87,508,992 trainable)
[ Loading existing model params from from_pretrained/model ]
[creating task(s): my_teacher]
 ~~ Loading from valid.txt ~~ 
[1;31m- - - NEW EPISODE: my_teacher- - -[0;0m
[0mHello[0;0m
[1;94m    labels: Hi[0;0m
[0;95m     model: hi[0;0m
[0mHow are you[0;0m
[1;94m    labels: I am fine[0;0m
[0;95m     model: i am good , how are you ?[0;0m
[0mLet's say goodbye[0;0m
[1;94m    labels: Goodbye![0;0m
[0;95m     model: i am fine[0;0m
[1;31m- - - NEW EPISODE: my_teacher- - -[0;0m
[0mHey[0;0m
[1;94m    labels: hi there[0;0m
[0;95m     model: hi[0;0m
[0mDeja vu?[0;0m
[1;94m    labels: Deja vu![0;0m
[0;95m     model: i ' ve just been in this place before[0;0m
[0mLast chance[0;0m
[1;94m    labels: This is it[0;0m
[0;95m     model: i ' ve just been in this place before[0;0m
EPOCH DONE


Note that the `register_teacher` decorator makes the commands aware of your teacher. If you leave it off, the commands won't be able to locate it. If you want to use your teacher on the command line, you'll need to put it in a very specific filename: `parlai/agents/my_teacher/agents.py`, and you'll need to name the class `DefaultTeacher` instead of `MyTeacher`.

# Creating your own models

As a start, we'll implement a *very* simple agent. This agent will just sort of respond with "hello X, my name is Y", where X is based on the input

In [None]:
from parlai.core.agents import register_agent, Agent

@register_agent("hello")
class HelloAgent(Agent):
    @classmethod
    def add_cmdline_args(cls, parser, partial_opt):
        parser.add_argument('--name', type=str, default='Alice', help="The agent's name.")
        return parser
        
    def __init__(self, opt, shared=None):
        # similar to the teacher, we have the Opt and the shared memory objects!
        super().__init__(opt, shared)
        self.id = 'HelloAgent'
        self.name = opt['name']
    
    def observe(self, observation):
        # Gather the last word from the other user's input
        words = observation.get('text', '').split()
        if words:
            self.last_word = words[-1]
        else:
            self.last_word = "stranger!"
    
    def act(self):
        # Always return a string like this.
        return {
            'id': self.id,
            'text': f"Hello {self.last_word}, I'm {self.name}",
        }

Let's try seeing how this agent behaves:

In [None]:
DisplayModel.main(task='my_teacher', model='hello')

[creating task(s): my_teacher]
 ~~ Loading from valid.txt ~~ 
[1;31m- - - NEW EPISODE: my_teacher- - -[0;0m
[0mHello[0;0m
[1;94m    labels: Hi[0;0m
[0;95m     model: Hello Hello, I'm Alice[0;0m
[0mHow are you[0;0m
[1;94m    labels: I am fine[0;0m
[0;95m     model: Hello you, I'm Alice[0;0m
[0mLet's say goodbye[0;0m
[1;94m    labels: Goodbye![0;0m
[0;95m     model: Hello goodbye, I'm Alice[0;0m
[1;31m- - - NEW EPISODE: my_teacher- - -[0;0m
[0mHey[0;0m
[1;94m    labels: hi there[0;0m
[0;95m     model: Hello Hey, I'm Alice[0;0m
[0mDeja vu?[0;0m
[1;94m    labels: Deja vu![0;0m
[0;95m     model: Hello vu?, I'm Alice[0;0m
[0mLast chance[0;0m
[1;94m    labels: This is it[0;0m
[0;95m     model: Hello chance, I'm Alice[0;0m
EPOCH DONE


Notice how it read the words from the user, and provides its name from the command line argument? We can also interact with it easily enough.

In [None]:
Interactive.main(model='hello', name='Bob')

[ optional arguments: ] 
[  display_examples: False ]
[  display_ignore_fields: label_candidates,text_candidates ]
[  display_prettify: False ]
[  interactive_task: True ]
[  name: Bob ]
[ Main ParlAI Arguments: ] 
[  batchsize: 1 ]
[  datapath: /usr/local/lib/python3.6/dist-packages/data ]
[  datatype: train ]
[  download_path: /usr/local/lib/python3.6/dist-packages/downloads ]
[  dynamic_batching: None ]
[  hide_labels: False ]
[  image_mode: raw ]
[  init_opt: None ]
[  multitask_weights: [1] ]
[  numthreads: 1 ]
[  show_advanced_args: False ]
[  task: interactive ]
[ ParlAI Model Arguments: ] 
[  dict_class: None ]
[  init_model: None ]
[  model: hello ]
[  model_file: None ]
[ Local Human Arguments: ] 
[  local_human_candidates_file: None ]
[  single_turn: False ]
[ ParlAI Image Preprocessing Arguments: ] 
[  image_cropsize: 224 ]
[  image_size: 256 ]
[1;31mEnter [DONE] if you want to end the episode, [EXIT] to quit.[0;0m
[creating task(s): interactive]
[0mEnter Your Message:[

Similar to the teacher, the call to `register_agent` makes it available for use in commands. If you forget the `register_agent` decorator, you won't be able to refer to it. Similarly, if you wanted to use this model from the command line, you would need to save this code to a special folder: `parlai/agents/hello/hello.py`.

## Creating a neural network model

The base Agent class is very simple, but it also provides extremely little functionality. We have created solid abstractions for creating neural-network type models. [`TorchGeneratorAgent`](https://parl.ai/docs/torch_agent.html#module-parlai.core.torch_generator_agent) is one our common abstractions, and it assumes a model which outputs one-word-at-a-time.

The following is from our [ExampleSeq2Seq](https://github.com/facebookresearch/ParlAI/blob/master/parlai/agents/examples/seq2seq.py) agent. It's a simple RNN model, trained like a Machine Translation model. The Model is too complex to go over in this document, but please feel free to [read our TorchGeneratorAgent tutorial](https://parl.ai/docs/tutorial_torch_generator_agent.html).

In [None]:
import torch.nn as nn
import torch.nn.functional as F
import parlai.core.torch_generator_agent as tga


class Encoder(nn.Module):
    """
    Example encoder, consisting of an embedding layer and a 1-layer LSTM with the
    specified hidden size.
    Pay particular attention to the ``forward`` output.
    """

    def __init__(self, embeddings, hidden_size):
        """
        Initialization.
        Arguments here can be used to provide hyperparameters.
        """
        # must call super on all nn.Modules.
        super().__init__()

        self.embeddings = embeddings
        self.lstm = nn.LSTM(
            input_size=hidden_size,
            hidden_size=hidden_size,
            num_layers=1,
            batch_first=True,
        )

    def forward(self, input_tokens):
        """
        Perform the forward pass for the encoder.
        Input *must* be input_tokens, which are the context tokens given
        as a matrix of lookup IDs.
        :param input_tokens:
            Input tokens as a bsz x seqlen LongTensor.
            Likely will contain padding.
        :return:
            You can return anything you like; it is will be passed verbatim
            into the decoder for conditioning. However, it should be something
            you can easily manipulate in ``reorder_encoder_states``.
            This particular implementation returns the hidden and cell states from the
            LSTM.
        """
        embedded = self.embeddings(input_tokens)
        _output, hidden = self.lstm(embedded)
        return hidden


class Decoder(nn.Module):
    """
    Basic example decoder, consisting of an embedding layer and a 1-layer LSTM with the
    specified hidden size. Decoder allows for incremental decoding by ingesting the
    current incremental state on each forward pass.
    Pay particular note to the ``forward``.
    """

    def __init__(self, embeddings, hidden_size):
        """
        Initialization.
        Arguments here can be used to provide hyperparameters.
        """
        super().__init__()
        self.embeddings = embeddings
        self.lstm = nn.LSTM(
            input_size=hidden_size,
            hidden_size=hidden_size,
            num_layers=1,
            batch_first=True,
        )

    def forward(self, input, encoder_state, incr_state=None):
        """
        Run forward pass.
        :param input:
            The currently generated tokens from the decoder.
        :param encoder_state:
            The output from the encoder module.
        :parm incr_state:
            The previous hidden state of the decoder.
        """
        embedded = self.embeddings(input)
        if incr_state is None:
            # this is our very first call. We want to seed the LSTM with the
            # hidden state of the decoder
            state = encoder_state
        else:
            # We've generated some tokens already, so we can reuse the existing
            # decoder state
            state = incr_state

        # get the new output and decoder incremental state
        output, incr_state = self.lstm(embedded, state)

        return output, incr_state


class ExampleModel(tga.TorchGeneratorModel):
    """
    ExampleModel implements the abstract methods of TorchGeneratorModel to define how to
    re-order encoder states and decoder incremental states.
    It also instantiates the embedding table, encoder, and decoder, and defines the
    final output layer.
    """

    def __init__(self, dictionary, hidden_size=1024):
        super().__init__(
            padding_idx=dictionary[dictionary.null_token],
            start_idx=dictionary[dictionary.start_token],
            end_idx=dictionary[dictionary.end_token],
            unknown_idx=dictionary[dictionary.unk_token],
        )
        self.embeddings = nn.Embedding(len(dictionary), hidden_size)
        self.encoder = Encoder(self.embeddings, hidden_size)
        self.decoder = Decoder(self.embeddings, hidden_size)

    def output(self, decoder_output):
        """
        Perform the final output -> logits transformation.
        """
        return F.linear(decoder_output, self.embeddings.weight)

    def reorder_encoder_states(self, encoder_states, indices):
        """
        Reorder the encoder states to select only the given batch indices.
        Since encoder_state can be arbitrary, you must implement this yourself.
        Typically you will just want to index select on the batch dimension.
        """
        h, c = encoder_states
        return h[:, indices, :], c[:, indices, :]

    def reorder_decoder_incremental_state(self, incr_state, indices):
        """
        Reorder the decoder states to select only the given batch indices.
        This method can be a stub which always returns None; this will result in the
        decoder doing a complete forward pass for every single token, making generation
        O(n^2). However, if any state can be cached, then this method should be
        implemented to reduce the generation complexity to O(n).
        """
        h, c = incr_state
        return h[:, indices, :], c[:, indices, :]


@register_agent("my_first_lstm")
class Seq2seqAgent(tga.TorchGeneratorAgent):
    """
    Example agent.
    Implements the interface for TorchGeneratorAgent. The minimum requirement is that it
    implements ``build_model``, but we will want to include additional command line
    parameters.
    """

    @classmethod
    def add_cmdline_args(cls, argparser, partial_opt):
        """
        Add CLI arguments.
        """
        # Make sure to add all of TorchGeneratorAgent's arguments
        super().add_cmdline_args(argparser)

        # Add custom arguments only for this model.
        group = argparser.add_argument_group('Example TGA Agent')
        group.add_argument(
            '-hid', '--hidden-size', type=int, default=1024, help='Hidden size.'
        )

    def build_model(self):
        """
        Construct the model.
        """

        model = ExampleModel(self.dict, self.opt['hidden_size'])
        # Optionally initialize pre-trained embeddings by copying them from another
        # source: GloVe, fastText, etc.
        self._copy_embeddings(model.embeddings.weight, self.opt['embedding_type'])
        return model

Of course, now we can train with our new model. Let's train it on our toy task that we created earlier.

In [None]:
# of course, we can train the model! Let's Train it on our silly toy task from above
!rm -rf my_first_lstm
!mkdir -p my_first_lstm

TrainModel.main(
    model='my_first_lstm',
    model_file='my_first_lstm/model',
    task='my_teacher',
    batchsize=1,
    validation_every_n_secs=10,
    max_train_time=60,
)

Building dictionary: 100%|██████████| 6.00/6.00 [00:00<00:00, 1.91kex/s]

[ building dictionary first... ]
[creating task(s): my_teacher]
 ~~ Loading from train.txt ~~ 
 ~~ Loading from train.txt ~~ 
Dictionary: saving dictionary to my_first_lstm/model.dict
[ dictionary built with 30 tokens in 0s ]
[ no model with opt yet at: my_first_lstm/model(.opt) ]
[ Using CUDA ]
Dictionary: loading dictionary from my_first_lstm/model.dict
[ num words =  30 ]
Total parameters: 16,824,320 (16,824,320 trainable)
[creating task(s): my_teacher]
 ~~ Loading from train.txt ~~ 
[ training... ]





[ time:10.0s total_exs:1828 epochs:304.67 ]
     clip  exs  gnorm  gpu_mem   loss  lr   ppl  token_acc  total_train_updates   tpb  updates
   .01641 1828  1.368    .9171 .04105   1 1.042      .9942                 1828 3.328     1828

[ time:10.0s total_exs:1828 epochs:304.67 ]
    gpu_mem  lr  total_train_updates
      .9171   1                 1828

[creating task(s): my_teacher]
 ~~ Loading from valid.txt ~~ 
[ running eval: valid ]
[ eval completed in 0.06s ]
valid:
    accuracy   bleu-4  exs  f1  gpu_mem  loss  lr  ppl  token_acc  total_train_updates   tpb
           1 .0003337    6   1    .9171     0   1    1          1                 1828 3.333

[ new best accuracy: 1 ]
[ saving best valid model: my_first_lstm/model ]
[ task solved! stopping. ]
[ Using CUDA ]
Dictionary: loading dictionary from my_first_lstm/model.dict
[ num words =  30 ]
Total parameters: 16,824,320 (16,824,320 trainable)
[ Loading existing model params from my_first_lstm/model ]
[creating task(s): my_teacher]

Let's see how it does. It should reproduce the data perfectly:

In [None]:
DisplayModel.main(model_file='my_first_lstm/model', task='my_teacher')

[ Using CUDA ]
Dictionary: loading dictionary from my_first_lstm/model.dict
[ num words =  30 ]
Total parameters: 16,824,320 (16,824,320 trainable)
[ Loading existing model params from my_first_lstm/model ]
[creating task(s): my_teacher]
 ~~ Loading from valid.txt ~~ 
[1;31m- - - NEW EPISODE: my_teacher- - -[0;0m
[0mHello[0;0m
[1;94m    labels: Hi[0;0m
[0;95m     model: Hi[0;0m
[0mHow are you[0;0m
[1;94m    labels: I am fine[0;0m
[0;95m     model: I am fine[0;0m
[0mLet's say goodbye[0;0m
[1;94m    labels: Goodbye![0;0m
[0;95m     model: Goodbye ![0;0m
[1;31m- - - NEW EPISODE: my_teacher- - -[0;0m
[0mHey[0;0m
[1;94m    labels: hi there[0;0m
[0;95m     model: hi there[0;0m
[0mDeja vu?[0;0m
[1;94m    labels: Deja vu![0;0m
[0;95m     model: Deja vu ![0;0m
[0mLast chance[0;0m
[1;94m    labels: This is it[0;0m
[0;95m     model: This is it[0;0m
EPOCH DONE


Unsurprisingly, we got perfect accuracy. This is because the data set is only a handful of utterances, and we can perfectly memorize it in this LSTM. Nonetheless, a great success!

# What's next!

The sky's the limit! Be sure to check out our [GitHub](https://github.com/facebookresearch/ParlAI) and [Follow ParlAI on Twitter](https://twitter.com/parlai_parley). We're eager to hear what you are using ParlAI for!

Here are some other great resources:
- [Our research page](https://parl.ai/projects/)
- [ParlAI Documentations](https://parl.ai/docs/index.html)
- [Tutorial: Writing a Ranker model](https://parl.ai/docs/tutorial_torch_ranker_agent.html)
- [Tutorial: Using Mechanical Turk](https://parl.ai/docs/tutorial_mturk.html)
- [Tutorial: Connecting to chat services](https://parl.ai/docs/tutorial_chat_service.html)