<a href="https://colab.research.google.com/github/prime2911/handsome-jack-discord-chatbot/blob/main/model_train_upload_workflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tune a DialoGPT model

Adapted from the notebook in [this Medium post](https://towardsdatascience.com/make-your-own-rick-sanchez-bot-with-transformers-and-dialogpt-fine-tuning-f85e6d1f4e30?gi=e4a72d1510f0).

## Setup

In [3]:
from google.colab import drive
drive.mount("/content/drive/")

Mounted at /content/drive/


In [4]:
!pip -q install transformers

[K     |████████████████████████████████| 3.8 MB 5.4 MB/s 
[K     |████████████████████████████████| 895 kB 43.1 MB/s 
[K     |████████████████████████████████| 6.5 MB 35.8 MB/s 
[K     |████████████████████████████████| 67 kB 4.8 MB/s 
[K     |████████████████████████████████| 596 kB 46.4 MB/s 
[?25h

In [5]:
import os
os.chdir("/content/drive/My Drive/Colab Notebooks/Handsome Jack Discord Chatbot")

In [6]:
# all the imports

import glob
import logging
import os
import pickle
import random
import re
import shutil
from typing import Dict, List, Tuple

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader, Dataset, RandomSampler, SequentialSampler
from torch.utils.data.distributed import DistributedSampler
from tqdm.notebook import tqdm, trange

from pathlib import Path

from transformers import (
    MODEL_WITH_LM_HEAD_MAPPING,
    WEIGHTS_NAME,
    AdamW,
    AutoConfig,
    PreTrainedModel,
    PreTrainedTokenizer,
    get_linear_schedule_with_warmup,
)


try:
    from torch.utils.tensorboard import SummaryWriter
except ImportError:
    from tensorboardX import SummaryWriter

In [7]:
!ls

cached		    handsome_jack.txt		       parse_scripts.ipynb
handsome_bot.ipynb  model_train_upload_workflow.ipynb  runs
handsome_jack.csv   output-small


## Get Data from Kaggle

In [None]:
# !mkdir ~/.kaggle
# !cp kaggle.json ~/.kaggle/kaggle.json

In [None]:
# !kaggle datasets download ruolinzheng/twewy-game-script -f twewy-name-line-full.csv

In [8]:
data = pd.read_csv("handsome_jack.csv")

In [9]:
data.sample(6)

Unnamed: 0,name,line
469,Angel,I've been following Jack's orders from the be...
1278,Brick,What the - are those Vaults?
714,Brick,Feel the wrath of the Slab King!
834,Angel,That's one of the body doubles Jack uses to c...
733,Brick,"Watch out for the mortars, Slab!"
1086,Brick,Got some Slabs headin' in for pickup!


In [10]:
CHARACTER_NAME = "Handsome Jack"

In [11]:
contexted = []

# context window of size 7
n = 7

for i in data[data.name == CHARACTER_NAME].index:
  if i < n:
    continue
  row = []
  prev = i - 1 - n # we additionally substract 1, so row will contain current responce and 7 previous responces  
  for j in range(i, prev, -1):
    row.append(data.line[j])
  contexted.append(row)

columns = ['response', 'context'] 
columns = columns + ['context/' + str(i) for i in range(n - 1)]

df = pd.DataFrame.from_records(contexted, columns=columns)

In [12]:
df

Unnamed: 0,response,context,context/0,context/1,context/2,context/3,context/4,context/5
0,"Hey, kiddo. Jack here – President of Hyperion...","Many thanks, friend of friends! Onward!",…you still there? I don’t hear Bullymongs any...,Hey! What’s that noise? Are you fighting?,You’ll need that funny little robot’s help to...,Ugh – AGAIN?! Jack’s tearing Pandora apart to...,Ooh!,"Let’s go! If we don’t get my eye back, we’ll ..."
1,"Attention, people of Pandora! Handsome Jack h...","Keep your wits about you, minion -- this glac...",This way!,"This way to Southern Shelf, minion -- let's g...",Lemme know when you’re ready to go meet with ...,It’s a long way to Sanctuary -- please take w...,You’re welcome! Perks of being an Artificial ...,Yes! I knew I'd get it eventually.
2,"""Hey, everybody! How are ya? Jack here!""",You've discovered one of Helena Pierce's audi...,"""We've hijacked the train that runs through T...","Come on -- work, curse you! Ah, fecal matter ...",Ah – there you are.,I see our fearless leader Jack is looking for...,"Minion, roll out!","Now that Liar’s Berg is clear, I might as wel..."
3,"""I'm sorry, what was your name?""","""What is the meaning of this?""","""NOBODY MOVE.""","""Hey, everybody! How are ya? Jack here!""",You've discovered one of Helena Pierce's audi...,"""We've hijacked the train that runs through T...","Come on -- work, curse you! Ah, fecal matter ...",Ah – there you are.
4,"""Well, Ms. Pierce - and please don't tell me ...","""Pierce.""","""I'm sorry, what was your name?""","""What is the meaning of this?""","""NOBODY MOVE.""","""Hey, everybody! How are ya? Jack here!""",You've discovered one of Helena Pierce's audi...,"""We've hijacked the train that runs through T..."
...,...,...,...,...,...,...,...,...
166,This can't be happening... THIS CAN'T BE HAPP...,NO!!!,KILL.,WARRIOR!,The greatest alien power Pandora has ever see...,...I WIN!,"You're too late, bandit...",Unghh!
167,"No, no, no... I can't die like this... not wh...",Holy badass - I think you killed it. Never hu...,This can't be happening... THIS CAN'T BE HAPP...,NO!!!,KILL.,WARRIOR!,The greatest alien power Pandora has ever see...,...I WIN!
168,The Warrior was practically a god. How - HOW ...,"No, no, no... I can't die like this... not wh...",Holy badass - I think you killed it. Never hu...,This can't be happening... THIS CAN'T BE HAPP...,NO!!!,KILL.,WARRIOR!,The greatest alien power Pandora has ever see...
169,You idiots! The Warrior could have brought pe...,The Warrior was practically a god. How - HOW ...,"No, no, no... I can't die like this... not wh...",Holy badass - I think you killed it. Never hu...,This can't be happening... THIS CAN'T BE HAPP...,NO!!!,KILL.,WARRIOR!


In [13]:
df.sample(6)

Unnamed: 0,response,context,context/0,context/1,context/2,context/3,context/4,context/5
108,"So this is how you bandits fight, is that it,...",Right -- I'm on it!,"I meant NOW, Roland!","Uh... did I miss something, or is Angel a SIR...",Roland! I need you to lower the shields aroun...,Hey! Up here! Need a hand?,I'm sending ammo to you!,Grab this ammo!
115,'Sup.,ROLAND!,"We've got the Vault Key, but this isn't over ...",She's dead. Jack just lost his only way to aw...,The kind of guy who deserves to die.,What kind of person would do this to their ow...,"Angel?! NO, ANGEL!","Dad, I have to tell you something... You're a..."
13,Butt Stallion says hello.,I should probably clarify -- the diamond hors...,Aaaaaand open!,I’m rackin’ my brain trying to think of a nam...,"Minion, what have you DONE?! These were human...","Hey! How -- ah, these pretzels suck… So, how’...",They’re called that because they rip people’s...,Be careful taking down Boom Bewm. He’s one of...
89,"Oh, come ON! What's wrong with that statue?! ...",Initializing laser cutter.,I can actually see why you'd wanna tear that ...,Cutting operation complete.,"Hey, you know what book I'm reading there? It...",Initializing laser cutter.,Does that feel good? You got it out of your s...,Cutting operation complete.
91,"What is this even ACCOMPLISHING?! Are you, ar...",Initializing laser cutter.,"Oh, for the LOVE of... ALRIGHT! Great! Succes...",Cutting operation complete.,"Oh, come ON! What's wrong with that statue?! ...",Initializing laser cutter.,I can actually see why you'd wanna tear that ...,Cutting operation complete.
87,"Hey, you know what book I'm reading there? It...",Initializing laser cutter.,Does that feel good? You got it out of your s...,Cutting operation complete.,Nooo! The bot's been destroyed! You failed!,Repair the constructor!,OH GOD! The bot's only got a quarter of its h...,"They damaged the constructor, minion!"


In [14]:
trn_df, val_df = train_test_split(df, test_size=0.1)
trn_df.head()

Unnamed: 0,response,context,context/0,context/1,context/2,context/3,context/4,context/5
24,Bandits of Sanctuary: I hear a new Vault Hunt...,"Well, let's head to the center a' town and pl...","Now, you gon' help us out with this Roland si...","Wait a minute! Ha! Well, hang me upside down ...",Scooter: Catch-A-Ride!],Sanctuary. Built on the ruins of the Dahl cor...,"Crap. I mean, uh... darn. Roland needs your h...",Commander Roland never came back from his sec...
31,"(Oh, I'm sorry) 'Condescend' is a word that m...","Be honest with yourself, kid. Do you really t...",Roland is being taken to a Hyperion outpost i...,"God dammit, he's getting away! They'll want t...",The drop-barge is coming in! Kill the constru...,Nailed it!,Good kill!,Nice!
86,Does that feel good? You got it out of your s...,Cutting operation complete.,Nooo! The bot's been destroyed! You failed!,Repair the constructor!,OH GOD! The bot's only got a quarter of its h...,"They damaged the constructor, minion!",The bot is half dead!,They're attacking the constructor!
72,THAT is why you don't screw with me. You and ...,NOOOOOOO!,"Oh, now I remember! EXPLOSIIIIIIVE!",Okay. She's still alive. Get the microchip fr...,"I'm loadin' the tranq dart! Tranquila, Blood ...","Corrosion... yeah, I remember that one... Com...",Corrosion!,"She's charged with electricity, watch out!"
143,"Why isn't this working, Angel?",Drinks are on Handsome Jack!,"Thanks Jimmy -- rnrgh! Well, then.","It's Jeffrey, sir. And no.",Tut-tut-tut-tut-tut... Shhhhh. There we go. A...,"Ghkk... please, please...","No, no, Jimmy, choking is something you do wh...",Choked... Mister Moorin...


In [15]:
# create dataset suitable for our model
def construct_conv(row, tokenizer, eos = True):
    flatten = lambda l: [item for sublist in l for item in sublist]
    conv = list(reversed([tokenizer.encode(x) + [tokenizer.eos_token_id] for x in row]))
    conv = flatten(conv)
    return conv

class ConversationDataset(Dataset):
    def __init__(self, tokenizer: PreTrainedTokenizer, args, df, block_size=512):

        block_size = block_size - (tokenizer.model_max_length - tokenizer.max_len_single_sentence)

        directory = args.cache_dir
        cached_features_file = os.path.join(
            directory, args.model_type + "_cached_lm_" + str(block_size)
        )

        if os.path.exists(cached_features_file) and not args.overwrite_cache:
            logger.info("Loading features from cached file %s", cached_features_file)
            with open(cached_features_file, "rb") as handle:
                self.examples = pickle.load(handle)
        else:
            logger.info("Creating features from dataset file at %s", directory)

            self.examples = []
            for _, row in df.iterrows():
                conv = construct_conv(row, tokenizer)
                self.examples.append(conv)

            logger.info("Saving features into cached file %s", cached_features_file)
            with open(cached_features_file, "wb") as handle:
                pickle.dump(self.examples, handle, protocol=pickle.HIGHEST_PROTOCOL)

    def __len__(self):
        return len(self.examples)

    def __getitem__(self, item):
        return torch.tensor(self.examples[item], dtype=torch.long)

In [16]:
# Cacheing and storing of data/checkpoints

def load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False):
    return ConversationDataset(tokenizer, args, df_val if evaluate else df_trn)


def set_seed(args):
    random.seed(args.seed)
    np.random.seed(args.seed)
    torch.manual_seed(args.seed)
    if args.n_gpu > 0:
        torch.cuda.manual_seed_all(args.seed)


def _sorted_checkpoints(args, checkpoint_prefix="checkpoint", use_mtime=False) -> List[str]:
    ordering_and_checkpoint_path = []

    glob_checkpoints = glob.glob(os.path.join(args.output_dir, "{}-*".format(checkpoint_prefix)))

    for path in glob_checkpoints:
        if use_mtime:
            ordering_and_checkpoint_path.append((os.path.getmtime(path), path))
        else:
            regex_match = re.match(".*{}-([0-9]+)".format(checkpoint_prefix), path)
            if regex_match and regex_match.groups():
                ordering_and_checkpoint_path.append((int(regex_match.groups()[0]), path))

    checkpoints_sorted = sorted(ordering_and_checkpoint_path)
    checkpoints_sorted = [checkpoint[1] for checkpoint in checkpoints_sorted]
    return checkpoints_sorted


def _rotate_checkpoints(args, checkpoint_prefix="checkpoint", use_mtime=False) -> None:
    if not args.save_total_limit:
        return
    if args.save_total_limit <= 0:
        return

    # Check if we should delete older checkpoint(s)
    checkpoints_sorted = _sorted_checkpoints(args, checkpoint_prefix, use_mtime)
    if len(checkpoints_sorted) <= args.save_total_limit:
        return

    number_of_checkpoints_to_delete = max(0, len(checkpoints_sorted) - args.save_total_limit)
    checkpoints_to_be_deleted = checkpoints_sorted[:number_of_checkpoints_to_delete]
    for checkpoint in checkpoints_to_be_deleted:
        logger.info("Deleting older checkpoint [{}] due to args.save_total_limit".format(checkpoint))
        shutil.rmtree(checkpoint)

## Build Model

In [17]:
from transformers import AutoModelWithLMHead, AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
model = AutoModelWithLMHead.from_pretrained("microsoft/DialoGPT-small")

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/641 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]



Downloading:   0%|          | 0.00/335M [00:00<?, ?B/s]

In [18]:
"""
Fine-tuning the library models for language modeling on a text file (GPT, GPT-2, BERT, RoBERTa).
GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa are fine-tuned
using a masked language modeling (MLM) loss.
"""

# Configs
logger = logging.getLogger(__name__)

MODEL_CONFIG_CLASSES = list(MODEL_WITH_LM_HEAD_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)

In [19]:
# Args to allow for easy convertion of python script to notebook
class Args():
    def __init__(self):
        self.output_dir = 'output-small'
        self.model_type = 'gpt2'
        self.model_name_or_path = 'microsoft/DialoGPT-small'
        self.config_name = 'microsoft/DialoGPT-small'
        self.tokenizer_name = 'microsoft/DialoGPT-small'
        self.cache_dir = 'cached'
        self.block_size = 512
        self.do_train = True
        self.do_eval = True
        self.evaluate_during_training = False
        self.per_gpu_train_batch_size = 4
        self.per_gpu_eval_batch_size = 4
        self.gradient_accumulation_steps = 1
        self.learning_rate = 5e-5
        self.weight_decay = 0.0
        self.adam_epsilon = 1e-8
        self.max_grad_norm = 1.0
        self.num_train_epochs = 50
        self.max_steps = -1
        self.warmup_steps = 0
        self.logging_steps = 1000
        self.save_steps = 3500
        self.save_total_limit = None
        self.eval_all_checkpoints = False
        self.no_cuda = False
        self.overwrite_output_dir = True
        self.overwrite_cache = True
        self.should_continue = False
        self.seed = 42
        self.local_rank = -1
        self.fp16 = False
        self.fp16_opt_level = 'O1'

args = Args()

## Train and Evaluate

In [20]:
def train(args, train_dataset, model: PreTrainedModel, tokenizer: PreTrainedTokenizer) -> Tuple[int, float]:
    """ Train the model """
    if args.local_rank in [-1, 0]:
        tb_writer = SummaryWriter()

    args.train_batch_size = args.per_gpu_train_batch_size * max(1, args.n_gpu)

    def collate(examples: List[torch.Tensor]):
        if tokenizer._pad_token is None:
            return pad_sequence(examples, batch_first=True)
        return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

    train_sampler = RandomSampler(train_dataset) if args.local_rank == -1 else DistributedSampler(train_dataset)
    train_dataloader = DataLoader(
        train_dataset, sampler=train_sampler, batch_size=args.train_batch_size, collate_fn=collate, drop_last = True
    )

    if args.max_steps > 0:
        t_total = args.max_steps
        args.num_train_epochs = args.max_steps // (len(train_dataloader) // args.gradient_accumulation_steps) + 1
    else:
        t_total = len(train_dataloader) // args.gradient_accumulation_steps * args.num_train_epochs

    model = model.module if hasattr(model, "module") else model  # Take care of distributed/parallel training
    model.resize_token_embeddings(len(tokenizer))
    # add_special_tokens_(model, tokenizer)


    # Prepare optimizer and schedule (linear warmup and decay)
    no_decay = ["bias", "LayerNorm.weight"]
    optimizer_grouped_parameters = [
        {
            "params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
            "weight_decay": args.weight_decay,
        },
        {"params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], "weight_decay": 0.0},
    ]
    optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate, eps=args.adam_epsilon)
    scheduler = get_linear_schedule_with_warmup(
        optimizer, num_warmup_steps=args.warmup_steps, num_training_steps=t_total
    )

    # Check if saved optimizer or scheduler states exist
    if (
        args.model_name_or_path
        and os.path.isfile(os.path.join(args.model_name_or_path, "optimizer.pt"))
        and os.path.isfile(os.path.join(args.model_name_or_path, "scheduler.pt"))
    ):
        # Load in optimizer and scheduler states
        optimizer.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "optimizer.pt")))
        scheduler.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "scheduler.pt")))

    if args.fp16:
        try:
            from apex import amp
        except ImportError:
            raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use fp16 training.")
        model, optimizer = amp.initialize(model, optimizer, opt_level=args.fp16_opt_level)

    # multi-gpu training (should be after apex fp16 initialization)
    if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

    # Distributed training (should be after apex fp16 initialization)
    if args.local_rank != -1:
        model = torch.nn.parallel.DistributedDataParallel(
            model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True
        )

    # Train!
    logger.info("***** Running training *****")
    logger.info("  Num examples = %d", len(train_dataset))
    logger.info("  Num Epochs = %d", args.num_train_epochs)
    logger.info("  Instantaneous batch size per GPU = %d", args.per_gpu_train_batch_size)
    logger.info(
        "  Total train batch size (w. parallel, distributed & accumulation) = %d",
        args.train_batch_size
        * args.gradient_accumulation_steps
        * (torch.distributed.get_world_size() if args.local_rank != -1 else 1),
    )
    logger.info("  Gradient Accumulation steps = %d", args.gradient_accumulation_steps)
    logger.info("  Total optimization steps = %d", t_total)

    global_step = 0
    epochs_trained = 0
    steps_trained_in_current_epoch = 0
    # Check if continuing training from a checkpoint
    if args.model_name_or_path and os.path.exists(args.model_name_or_path):
        try:
            # set global_step to gobal_step of last saved checkpoint from model path
            checkpoint_suffix = args.model_name_or_path.split("-")[-1].split("/")[0]
            global_step = int(checkpoint_suffix)
            epochs_trained = global_step // (len(train_dataloader) // args.gradient_accumulation_steps)
            steps_trained_in_current_epoch = global_step % (len(train_dataloader) // args.gradient_accumulation_steps)

            logger.info("  Continuing training from checkpoint, will skip to saved global_step")
            logger.info("  Continuing training from epoch %d", epochs_trained)
            logger.info("  Continuing training from global step %d", global_step)
            logger.info("  Will skip the first %d steps in the first epoch", steps_trained_in_current_epoch)
        except ValueError:
            logger.info("  Starting fine-tuning.")

    tr_loss, logging_loss = 0.0, 0.0

    model.zero_grad()
    train_iterator = trange(
        epochs_trained, int(args.num_train_epochs), desc="Epoch", disable=args.local_rank not in [-1, 0]
    )
    set_seed(args)  # Added here for reproducibility
    for _ in train_iterator:
        epoch_iterator = tqdm(train_dataloader, desc="Iteration", disable=args.local_rank not in [-1, 0])
        for step, batch in enumerate(epoch_iterator):

            # Skip past any already trained steps if resuming training
            if steps_trained_in_current_epoch > 0:
                steps_trained_in_current_epoch -= 1
                continue

            inputs, labels = (batch, batch)
            if inputs.shape[1] > 1024: continue
            inputs = inputs.to(args.device)
            labels = labels.to(args.device)
            model.train()
            outputs = model(inputs, labels=labels)
            loss = outputs[0]  # model outputs are always tuple in transformers (see doc)

            if args.n_gpu > 1:
                loss = loss.mean()  # mean() to average on multi-gpu parallel training
            if args.gradient_accumulation_steps > 1:
                loss = loss / args.gradient_accumulation_steps

            if args.fp16:
                with amp.scale_loss(loss, optimizer) as scaled_loss:
                    scaled_loss.backward()
            else:
                loss.backward()

            tr_loss += loss.item()
            if (step + 1) % args.gradient_accumulation_steps == 0:
                if args.fp16:
                    torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), args.max_grad_norm)
                else:
                    torch.nn.utils.clip_grad_norm_(model.parameters(), args.max_grad_norm)
                optimizer.step()
                scheduler.step()  # Update learning rate schedule
                model.zero_grad()
                global_step += 1

                if args.local_rank in [-1, 0] and args.logging_steps > 0 and global_step % args.logging_steps == 0:
                    # Log metrics
                    if (
                        args.local_rank == -1 and args.evaluate_during_training
                    ):  # Only evaluate when single GPU otherwise metrics may not average well
                        results = evaluate(args, model, tokenizer)
                        for key, value in results.items():
                            tb_writer.add_scalar("eval_{}".format(key), value, global_step)
                    tb_writer.add_scalar("lr", scheduler.get_lr()[0], global_step)
                    tb_writer.add_scalar("loss", (tr_loss - logging_loss) / args.logging_steps, global_step)
                    logging_loss = tr_loss

                if args.local_rank in [-1, 0] and args.save_steps > 0 and global_step % args.save_steps == 0:
                    checkpoint_prefix = "checkpoint"
                    # Save model checkpoint
                    output_dir = os.path.join(args.output_dir, "{}-{}".format(checkpoint_prefix, global_step))
                    os.makedirs(output_dir, exist_ok=True)
                    model_to_save = (
                        model.module if hasattr(model, "module") else model
                    )  # Take care of distributed/parallel training
                    model_to_save.save_pretrained(output_dir)
                    tokenizer.save_pretrained(output_dir)

                    torch.save(args, os.path.join(output_dir, "training_args.bin"))
                    logger.info("Saving model checkpoint to %s", output_dir)

                    _rotate_checkpoints(args, checkpoint_prefix)

                    torch.save(optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
                    torch.save(scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
                    logger.info("Saving optimizer and scheduler states to %s", output_dir)

            if args.max_steps > 0 and global_step > args.max_steps:
                epoch_iterator.close()
                break
        if args.max_steps > 0 and global_step > args.max_steps:
            train_iterator.close()
            break

    if args.local_rank in [-1, 0]:
        tb_writer.close()

    return global_step, tr_loss / global_step

# Evaluation of some model

def evaluate(args, model: PreTrainedModel, tokenizer: PreTrainedTokenizer, df_trn, df_val, prefix="") -> Dict:
    # Loop to handle MNLI double evaluation (matched, mis-matched)
    eval_output_dir = args.output_dir

    eval_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=True)
    os.makedirs(eval_output_dir, exist_ok=True)
    args.eval_batch_size = args.per_gpu_eval_batch_size * max(1, args.n_gpu)
    # Note that DistributedSampler samples randomly

    def collate(examples: List[torch.Tensor]):
        if tokenizer._pad_token is None:
            return pad_sequence(examples, batch_first=True)
        return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

    eval_sampler = SequentialSampler(eval_dataset)
    eval_dataloader = DataLoader(
        eval_dataset, sampler=eval_sampler, batch_size=args.eval_batch_size, collate_fn=collate, drop_last = True
    )

    # multi-gpu evaluate
    if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

    # Eval!
    logger.info("***** Running evaluation {} *****".format(prefix))
    logger.info("  Num examples = %d", len(eval_dataset))
    logger.info("  Batch size = %d", args.eval_batch_size)
    eval_loss = 0.0
    nb_eval_steps = 0
    model.eval()

    for batch in tqdm(eval_dataloader, desc="Evaluating"):
        inputs, labels = (batch, batch)
        inputs = inputs.to(args.device)
        labels = labels.to(args.device)

        with torch.no_grad():
            outputs = model(inputs, labels=labels)
            lm_loss = outputs[0]
            eval_loss += lm_loss.mean().item()
        nb_eval_steps += 1

    eval_loss = eval_loss / nb_eval_steps
    perplexity = torch.exp(torch.tensor(eval_loss))

    result = {"perplexity": perplexity}

    output_eval_file = os.path.join(eval_output_dir, prefix, "eval_results.txt")
    with open(output_eval_file, "w") as writer:
        logger.info("***** Eval results {} *****".format(prefix))
        for key in sorted(result.keys()):
            logger.info("  %s = %s", key, str(result[key]))
            writer.write("%s = %s\n" % (key, str(result[key])))

    return result

In [21]:
# Main runner

def main(df_trn, df_val):
    args = Args()
    
    if args.should_continue:
        sorted_checkpoints = _sorted_checkpoints(args)
        if len(sorted_checkpoints) == 0:
            raise ValueError("Used --should_continue but no checkpoint was found in --output_dir.")
        else:
            args.model_name_or_path = sorted_checkpoints[-1]

    if (
        os.path.exists(args.output_dir)
        and os.listdir(args.output_dir)
        and args.do_train
        and not args.overwrite_output_dir
        and not args.should_continue
    ):
        raise ValueError(
            "Output directory ({}) already exists and is not empty. Use --overwrite_output_dir to overcome.".format(
                args.output_dir
            )
        )

    # Setup CUDA, GPU & distributed training
    device = torch.device("cuda")
    args.n_gpu = torch.cuda.device_count()
    args.device = device

    # Setup logging
    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO if args.local_rank in [-1, 0] else logging.WARN,
    )
    logger.warning(
        "Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s",
        args.local_rank,
        device,
        args.n_gpu,
        bool(args.local_rank != -1),
        args.fp16,
    )

    # Set seed
    set_seed(args)

    config = AutoConfig.from_pretrained(args.config_name, cache_dir=args.cache_dir)
    tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_name, cache_dir=args.cache_dir)
    model = AutoModelWithLMHead.from_pretrained(
        args.model_name_or_path,
        from_tf=False,
        config=config,
        cache_dir=args.cache_dir,
    )
    model.to(args.device)
    
    logger.info("Training/evaluation parameters %s", args)

    # Training
    if args.do_train:
        train_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False)

        global_step, tr_loss = train(args, train_dataset, model, tokenizer)
        logger.info(" global_step = %s, average loss = %s", global_step, tr_loss)

    # Saving best-practices: if you use save_pretrained for the model and tokenizer, you can reload them using from_pretrained()
    if args.do_train:
        # Create output directory if needed
        os.makedirs(args.output_dir, exist_ok=True)

        logger.info("Saving model checkpoint to %s", args.output_dir)
        # Save a trained model, configuration and tokenizer using `save_pretrained()`.
        # They can then be reloaded using `from_pretrained()`
        model_to_save = (
            model.module if hasattr(model, "module") else model
        )  # Take care of distributed/parallel training
        model_to_save.save_pretrained(args.output_dir)
        tokenizer.save_pretrained(args.output_dir)

        # Good practice: save your training arguments together with the trained model
        torch.save(args, os.path.join(args.output_dir, "training_args.bin"))

        # Load a trained model and vocabulary that you have fine-tuned
        model = AutoModelWithLMHead.from_pretrained(args.output_dir)
        tokenizer = AutoTokenizer.from_pretrained(args.output_dir)
        model.to(args.device)

    # Evaluation
    results = {}
    if args.do_eval and args.local_rank in [-1, 0]:
        checkpoints = [args.output_dir]
        if args.eval_all_checkpoints:
            checkpoints = list(
                os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + "/**/" + WEIGHTS_NAME, recursive=True))
            )
            logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN)  # Reduce logging
        logger.info("Evaluate the following checkpoints: %s", checkpoints)
        for checkpoint in checkpoints:
            global_step = checkpoint.split("-")[-1] if len(checkpoints) > 1 else ""
            prefix = checkpoint.split("/")[-1] if checkpoint.find("checkpoint") != -1 else ""

            model = AutoModelWithLMHead.from_pretrained(checkpoint)
            model.to(args.device)
            result = evaluate(args, model, tokenizer, df_trn, df_val, prefix=prefix)
            result = dict((k + "_{}".format(global_step), v) for k, v in result.items())
            results.update(result)

    return results

## Run the Main Function

In [22]:
main(trn_df, val_df)

03/10/2022 18:02:20 - INFO - __main__ -   Training/evaluation parameters <__main__.Args object at 0x7f942d263710>
03/10/2022 18:02:20 - INFO - __main__ -   Creating features from dataset file at cached
03/10/2022 18:02:21 - INFO - __main__ -   Saving features into cached file cached/gpt2_cached_lm_512
03/10/2022 18:02:22 - INFO - __main__ -   ***** Running training *****
03/10/2022 18:02:22 - INFO - __main__ -     Num examples = 153
03/10/2022 18:02:22 - INFO - __main__ -     Num Epochs = 50
03/10/2022 18:02:22 - INFO - __main__ -     Instantaneous batch size per GPU = 4
03/10/2022 18:02:22 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 4
03/10/2022 18:02:22 - INFO - __main__ -     Gradient Accumulation steps = 1
03/10/2022 18:02:22 - INFO - __main__ -     Total optimization steps = 1900


Epoch:   0%|          | 0/50 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]



Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

Iteration:   0%|          | 0/38 [00:00<?, ?it/s]

03/10/2022 18:25:09 - INFO - __main__ -    global_step = 1900, average loss = 0.6599142013215705
03/10/2022 18:25:09 - INFO - __main__ -   Saving model checkpoint to output-small
03/10/2022 18:25:15 - INFO - __main__ -   Evaluate the following checkpoints: ['output-small']
03/10/2022 18:25:17 - INFO - __main__ -   Creating features from dataset file at cached
03/10/2022 18:25:17 - INFO - __main__ -   Saving features into cached file cached/gpt2_cached_lm_512
03/10/2022 18:25:17 - INFO - __main__ -   ***** Running evaluation  *****
03/10/2022 18:25:17 - INFO - __main__ -     Num examples = 18
03/10/2022 18:25:17 - INFO - __main__ -     Batch size = 4


Evaluating:   0%|          | 0/4 [00:00<?, ?it/s]

03/10/2022 18:25:18 - INFO - __main__ -   ***** Eval results  *****
03/10/2022 18:25:18 - INFO - __main__ -     perplexity = tensor(3.3902)


{'perplexity_': tensor(3.3902)}

## Load the Trained Model

In [23]:
tokenizer = AutoTokenizer.from_pretrained('microsoft/DialoGPT-small')
model = AutoModelWithLMHead.from_pretrained('output-small')



In [24]:
# Let's chat for 4 lines
for step in range(4):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')
    # print(new_user_input_ids)

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(
        bot_input_ids, max_length=200,
        pad_token_id=tokenizer.eos_token_id,  
        no_repeat_ngram_size=3,       
        do_sample=True, 
        top_k=100, 
        top_p=0.7,
        temperature=0.8
    )
    
    # pretty print last ouput tokens from bot
    print("JackBot: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

>> User:Hey there!
JackBot:  Hey! Up here!
>> User:What's up?
JackBot:  Hey -- you know what book I'm reading there? It's called, 101 Ways to Forcibly Make Vandals Eat Their Own Entrails. I'll let you borrow it sometime.
>> User:Kiss my ass
JackBot:  101 Ways To Forcically Make Vans Eat Their own Entrail. I shoulda seen it coming.
>> User:Eat my shorts
JackBot:  Ha ha! I was wondering when Roland'd call in that favor.


## Push Model to Hugging Face

In [25]:
os.chdir("/content/")

In [26]:
!ls

drive  sample_data


In [27]:
!pip install huggingface_hub



In [28]:
from huggingface_hub import notebook_login

notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token
[1m[31mAuthenticated through git-credential store but this isn't the helper defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal in case you want to set this credential helper as the default

git config --global credential.helper store[0m


In [29]:
!git config --global credential.helper store

In [None]:
!huggingface-cli repo create DialoGPT-small-handsomejack

[90mgit version 2.17.1[0m
[1m[31mLooks like you do not have git-lfs installed, please install. You can install from https://git-lfs.github.com/. Then run `git lfs install` (you only have to do this once).[0m

You are about to create [1mPrime2911/DialoGPT-small-handsomejack[0m
Proceed? [Y/n] y

Your repo now lives at:
  [1mhttps://huggingface.co/Prime2911/DialoGPT-small-handsomejack[0m

You can clone it locally with the command below, and commit/push as usual.

  git clone https://huggingface.co/Prime2911/DialoGPT-small-handsomejack



In [30]:
!sudo apt-get install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 2,129 kB of archives.
After this operation, 7,662 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]
Fetched 2,129 kB in 1s (1,976 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package git-lfs.
(Reading database ... 155335 files and directories c

In [31]:
!cat /root/.huggingface/token

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

In [32]:
!git clone https://Prime2911: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@huggingface.co/Prime2911/DialoGPT-small-handsomejack

Cloning into 'DialoGPT-small-handsomejack'...
remote: Enumerating objects: 17, done.[K
remote: Counting objects: 100% (17/17), done.[K
remote: Compressing objects: 100% (15/15), done.[K
remote: Total 17 (delta 2), reused 0 (delta 0)[K
Unpacking objects: 100% (17/17), done.


In [33]:
!ls "/content/drive/MyDrive/Colab Notebooks/Handsome Jack Discord Chatbot/output-small"

config.json	  pytorch_model.bin	   tokenizer.json
eval_results.txt  special_tokens_map.json  training_args.bin
merges.txt	  tokenizer_config.json    vocab.json


In [34]:
!mv /content/drive/MyDrive/Colab\ Notebooks/Handsome\ Jack\ Discord\ Chatbot/output-small/* DialoGPT-small-handsomejack/

In [35]:
os.chdir("DialoGPT-small-handsomejack/")

In [36]:
!ls

config.json	  pytorch_model.bin	   tokenizer_config.json  vocab.json
eval_results.txt  README.md		   tokenizer.json
merges.txt	  special_tokens_map.json  training_args.bin


In [37]:
!git lfs install

Updated git hooks.
Git LFS initialized.


In [38]:
!pwd

/content/DialoGPT-small-handsomejack


In [39]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   eval_results.txt[m
	[31mmodified:   pytorch_model.bin[m
	[31mmodified:   training_args.bin[m

no changes added to commit (use "git add" and/or "git commit -a")


In [40]:
!git add .

In [41]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	[32mmodified:   eval_results.txt[m
	[32mmodified:   pytorch_model.bin[m
	[32mmodified:   training_args.bin[m



In [42]:
!git config --global user.email "squallduy99@gmail.com"
# Tip: using the same email as your huggingface.co account will link your commits to your profile
!git config --global user.name "prime2911"

In [43]:
!git commit -m "Increased the number of epochs"

[main 222b456] Increased the number of epochs
 3 files changed, 3 insertions(+), 3 deletions(-)


In [44]:
!git push

Git LFS: (2 of 2 files) 486.76 MB / 486.76 MB
Counting objects: 5, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (5/5), 592 bytes | 592.00 KiB/s, done.
Total 5 (delta 1), reused 0 (delta 0)
To https://huggingface.co/Prime2911/DialoGPT-small-handsomejack
   8d93a2b..222b456  main -> main


In [None]:
# MY_MODEL_NAME = 'DialoGPT-small-joshua'
# with open('HuggingFace-API-key.txt', 'rt') as f:
#   HUGGINGFACE_API_KEY = f.read().strip()

In [None]:
# model.push_to_hub(MY_MODEL_NAME, use_auth_token=HUGGINGFACE_API_KEY)
# tokenizer.push_to_hub(MY_MODEL_NAME, use_auth_token=HUGGINGFACE_API_KEY)

## All Done!