<a href="https://colab.research.google.com/github/atkinsonsean/Capstone/blob/main/2_2_model_gpt2_small.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **GPT-2 CHATBOT MODEL (SMALL)**

This notebook is influenced by/adapted from several projects and owes huge debt of gratitude to [Hugging Face](https://github.com/huggingface/transformers/blob/master/examples/legacy/run_language_modeling.py), [Nathan Cooper](https://nathancooper.io/i-am-a-nerd/chatbot/deep-learning/gpt2/2020/05/12/chatbot-part-1.html), [Tornike Tsereteli](https://towardsdatascience.com/rick-and-morty-story-generation-with-gpt2-using-transformers-and-streamlit-in-57-lines-of-code-8f81a8f92692).



In [None]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [None]:
! pip -q install transformers

[K     |████████████████████████████████| 3.5 MB 14.2 MB/s 
[K     |████████████████████████████████| 67 kB 5.6 MB/s 
[K     |████████████████████████████████| 895 kB 64.1 MB/s 
[K     |████████████████████████████████| 596 kB 60.5 MB/s 
[K     |████████████████████████████████| 6.8 MB 57.5 MB/s 
[?25h

# **IMPORTING LIBRARIES**

In [None]:
import os
os.chdir("/content/drive/My Drive/Colab Notebooks")

In [None]:
from transformers import AutoModelWithLMHead, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
model = AutoModelWithLMHead.from_pretrained("microsoft/DialoGPT-small")

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/641 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]



Downloading:   0%|          | 0.00/335M [00:00<?, ?B/s]

In [None]:
import glob
import logging
import os
import pickle
import random
import re
import shutil
from typing import Dict, List, Tuple

import pandas as pd
import numpy as np
import torch

from sklearn.model_selection import train_test_split

from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader, Dataset, RandomSampler, SequentialSampler
from torch.utils.data.distributed import DistributedSampler
from tqdm.notebook import tqdm, trange

from pathlib import Path

from transformers import (
    MODEL_WITH_LM_HEAD_MAPPING,
    WEIGHTS_NAME,
    AdamW,
    AutoConfig,
    AutoModelWithLMHead,
    AutoTokenizer,
    PreTrainedModel,
    PreTrainedTokenizer,
    get_linear_schedule_with_warmup,
)


try:
    from torch.utils.tensorboard import SummaryWriter
except ImportError:
    from tensorboardX import SummaryWriter

# Configs
logger = logging.getLogger(__name__)

MODEL_CONFIG_CLASSES = list(MODEL_WITH_LM_HEAD_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)

# **GPT-2 OUT OF THE BOX TEST**

In [None]:
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')
# append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids
# generated a response while limiting the total chat history to 1000 tokens    
    chat_history_ids = model.generate(
    bot_input_ids, max_length=1000,
    pad_token_id=tokenizer.eos_token_id
    )
# pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

>> User:Tell me about Amazon
DialoGPT: I'm in the same boat. I'm in the same boat as you.
>> User:What do you know about Amazon?
DialoGPT: I'm in the same boat. I'm in the same boat.
>> User:What do you know about Microsoft?
DialoGPT: What do you know about Google?
>> User:What do you know about Google?
DialoGPT: What do you know about Google?
>> User:What is your name?
DialoGPT: What is your favorite thing about the internet?


# **CONFIGURING VARIABLES**

In [None]:
class Args():
    def __init__(self):
        self.output_dir = 'output-small-50-20-epochs'
        self.model_type = 'gpt2'
        self.model_name_or_path = 'microsoft/DialoGPT-small'
        self.config_name = 'microsoft/DialoGPT-small'
        self.tokenizer_name = 'microsoft/DialoGPT-small'
        self.cache_dir = 'cached'
        self.block_size = 512
        self.do_train = True
        self.do_eval = True
        self.evaluate_during_training = False
        self.per_gpu_train_batch_size = 4
        self.per_gpu_eval_batch_size = 4
        self.gradient_accumulation_steps = 1
        self.learning_rate = 5e-5
        self.weight_decay = 0.0
        self.adam_epsilon = 1e-8
        self.max_grad_norm = 1.0
        self.num_train_epochs = 20
        self.max_steps = -1
        self.warmup_steps = 0
        self.logging_steps = 1000
        self.save_steps = 3500
        self.save_total_limit = None
        self.eval_all_checkpoints = False
        self.no_cuda = False
        self.overwrite_output_dir = True
        self.overwrite_cache = True
        self.should_continue = False
        self.seed = 42
        self.local_rank = -1
        self.fp16 = False
        self.fp16_opt_level = 'O1'

args = Args()

# **IMPORTING DATA**

In [None]:
import pandas as pd

In [None]:
topical_chat = pd.read_csv('topical_chat.csv')

The dataset I decided to use is a [topical chat dataset](https://www.kaggle.com/arnavsharmaas/chatbot-dataset-topical-chat) from Amazon that consists of 8,000+ conversations and 184,000+ messages.

It was created by Arnav Sharma and is a more streamlined version of [this](https://github.com/alexa/Topical-Chat) original Amazon Alexa dataset.

The dataset contains a conversation id, a message (which is either a reply or the start of a conversation), and the sentiment of each message.

The dataset spands 8 broad topics and contains conversation partner who do not have defined roles. It was created with the goal of [aiding in the effort to build a socialbot that can have deep, engaging open-domain conversations with humans](https://m.media-amazon.com/images/G/01/amazon.jobs/3079_Paper._CB1565131710_.pdf).

The eight broad topics are as follows:
- fashion
- politics
- books
- sports
- general entertainment
- music
- science and technology
- movies

In [None]:
#due to the time it takes to train and fine-tune a model, I had to limit my data to the first 50,000 entries
df = topical_chat[:50000]

In [None]:
df

Unnamed: 0,conversation_id,message,sentiment
0,1,Are you a fan of Google or Microsoft?,Curious to dive deeper
1,1,Both are excellent technology they are helpfu...,Curious to dive deeper
2,1,"I'm not a huge fan of Google, but I use it a...",Curious to dive deeper
3,1,Google provides online related services and p...,Curious to dive deeper
4,1,"Yeah, their services are good. I'm just not a...",Curious to dive deeper
...,...,...,...
49995,2289,So good that she broke 8 national records and...,Curious to dive deeper
49996,2289,Nice. Most players after retirement have fina...,Curious to dive deeper
49997,2289,Wow that is sad. DId you know Iverson signed ...,Curious to dive deeper
49998,2289,"I have not heard of it, did he get a good deal?",Curious to dive deeper


In [None]:
df.head(50)

Unnamed: 0,conversation_id,message,sentiment
0,1,Are you a fan of Google or Microsoft?,Curious to dive deeper
1,1,Both are excellent technology they are helpfu...,Curious to dive deeper
2,1,"I'm not a huge fan of Google, but I use it a...",Curious to dive deeper
3,1,Google provides online related services and p...,Curious to dive deeper
4,1,"Yeah, their services are good. I'm just not a...",Curious to dive deeper
5,1,Google is leading the alphabet subsidiary and...,Curious to dive deeper
6,1,Did you know Google had hundreds of live goat...,Curious to dive deeper
7,1,"It is very interesting. Google provide ""Chrom...",Curious to dive deeper
8,1,I like Google Chrome. Do you use it as well f...,Curious to dive deeper
9,1,Yes.Google is the biggest search engine and G...,Curious to dive deeper


# **CREATING CONTEXT FOR EACH MESSAGE**

In [None]:
contexted = []
n = 7
for i in range(n, len(df['message'])):
  row = []
  prev = i - 1 - n # we additionally subtract 1, so row will contain current response and 7 previous responses  
  for j in range(i, prev, -1):
    row.append(df['message'][j])
  contexted.append(row)
columns = ['response', 'context'] 
columns = columns + ['context/'+str(i) for i in range(n-1)]
df = pd.DataFrame.from_records(contexted, columns=columns)
df.head(5)

Unnamed: 0,response,context,context/0,context/1,context/2,context/3,context/4,context/5
0,"It is very interesting. Google provide ""Chrom...",Did you know Google had hundreds of live goat...,Google is leading the alphabet subsidiary and...,"Yeah, their services are good. I'm just not a...",Google provides online related services and p...,"I'm not a huge fan of Google, but I use it a...",Both are excellent technology they are helpfu...,Are you a fan of Google or Microsoft?
1,I like Google Chrome. Do you use it as well f...,"It is very interesting. Google provide ""Chrom...",Did you know Google had hundreds of live goat...,Google is leading the alphabet subsidiary and...,"Yeah, their services are good. I'm just not a...",Google provides online related services and p...,"I'm not a huge fan of Google, but I use it a...",Both are excellent technology they are helpfu...
2,Yes.Google is the biggest search engine and G...,I like Google Chrome. Do you use it as well f...,"It is very interesting. Google provide ""Chrom...",Did you know Google had hundreds of live goat...,Google is leading the alphabet subsidiary and...,"Yeah, their services are good. I'm just not a...",Google provides online related services and p...,"I'm not a huge fan of Google, but I use it a..."
3,"By the way, do you like Fish?",Yes.Google is the biggest search engine and G...,I like Google Chrome. Do you use it as well f...,"It is very interesting. Google provide ""Chrom...",Did you know Google had hundreds of live goat...,Google is leading the alphabet subsidiary and...,"Yeah, their services are good. I'm just not a...",Google provides online related services and p...
4,Yes. They form a sister group of tourniquets-...,"By the way, do you like Fish?",Yes.Google is the biggest search engine and G...,I like Google Chrome. Do you use it as well f...,"It is very interesting. Google provide ""Chrom...",Did you know Google had hundreds of live goat...,Google is leading the alphabet subsidiary and...,"Yeah, their services are good. I'm just not a..."


In [None]:
df_trn, df_val = train_test_split(df, test_size = 0.2)

# **CONVERTING THE CONTEXT AND RESPONSE DATAFRAME INTO A SINGLE CONVERSATION STRING**

In [None]:
#separating conversation strings and telling the model when a person is done speaking
def construct_conv(row, tokenizer, eos = True):
    flatten = lambda l: [item for sublist in l for item in sublist]
    conv = list(reversed([tokenizer.encode(x) + [tokenizer.eos_token_id] for x in row]))
    conv = flatten(conv)
    return conv

class ConversationDataset(Dataset):
    def __init__(self, tokenizer: PreTrainedTokenizer, args, df, block_size=512):

        block_size = block_size - (tokenizer.model_max_length - tokenizer.max_len_single_sentence)

        directory = args.cache_dir
        cached_features_file = os.path.join(
            directory, args.model_type + "_cached_lm_" + str(block_size)
        )

        if os.path.exists(cached_features_file) and not args.overwrite_cache:
            logger.info("Loading features from cached file %s", cached_features_file)
            with open(cached_features_file, "rb") as handle:
                self.examples = pickle.load(handle)
        else:
            logger.info("Creating features from dataset file at %s", directory)

            self.examples = []
            for _, row in df.iterrows():
                conv = construct_conv(row, tokenizer)
                self.examples.append(conv)

            logger.info("Saving features into cached file %s", cached_features_file)
            with open(cached_features_file, "wb") as handle:
                pickle.dump(self.examples, handle, protocol=pickle.HIGHEST_PROTOCOL)

    def __len__(self):
        return len(self.examples)

    def __getitem__(self, item):
        return torch.tensor(self.examples[item], dtype=torch.long)
      
# Cacheing and storing of data/checkpoints

def load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False):
    return ConversationDataset(tokenizer, args, df_val if evaluate else df_trn)

def set_seed(args):
    random.seed(args.seed)
    np.random.seed(args.seed)
    torch.manual_seed(args.seed)
    if args.n_gpu > 0:
        torch.cuda.manual_seed_all(args.seed)

def _sorted_checkpoints(args, checkpoint_prefix="checkpoint", use_mtime=False) -> List[str]:
    ordering_and_checkpoint_path = []

    glob_checkpoints = glob.glob(os.path.join(args.output_dir, "{}-*".format(checkpoint_prefix)))

    for path in glob_checkpoints:
        if use_mtime:
            ordering_and_checkpoint_path.append((os.path.getmtime(path), path))
        else:
            regex_match = re.match(".*{}-([0-9]+)".format(checkpoint_prefix), path)
            if regex_match and regex_match.groups():
                ordering_and_checkpoint_path.append((int(regex_match.groups()[0]), path))

    checkpoints_sorted = sorted(ordering_and_checkpoint_path)
    checkpoints_sorted = [checkpoint[1] for checkpoint in checkpoints_sorted]
    return checkpoints_sorted

def _rotate_checkpoints(args, checkpoint_prefix="checkpoint", use_mtime=False) -> None:
    if not args.save_total_limit:
        return
    if args.save_total_limit <= 0:
        return

    # Check if we should delete older checkpoint(s)
    checkpoints_sorted = _sorted_checkpoints(args, checkpoint_prefix, use_mtime)
    if len(checkpoints_sorted) <= args.save_total_limit:
        return

    number_of_checkpoints_to_delete = max(0, len(checkpoints_sorted) - args.save_total_limit)
    checkpoints_to_be_deleted = checkpoints_sorted[:number_of_checkpoints_to_delete]
    for checkpoint in checkpoints_to_be_deleted:
        logger.info("Deleting older checkpoint [{}] due to args.save_total_limit".format(checkpoint))
        shutil.rmtree(checkpoint)

# **TRAINING AND EVALUATION**

In [None]:
def train(args, train_dataset, model: PreTrainedModel, tokenizer: PreTrainedTokenizer) -> Tuple[int, float]:
    """ Train the model """
    if args.local_rank in [-1, 0]:
        tb_writer = SummaryWriter()

    args.train_batch_size = args.per_gpu_train_batch_size * max(1, args.n_gpu)

    def collate(examples: List[torch.Tensor]):
        if tokenizer._pad_token is None:
            return pad_sequence(examples, batch_first=True)
        return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

    train_sampler = RandomSampler(train_dataset) if args.local_rank == -1 else DistributedSampler(train_dataset)
    train_dataloader = DataLoader(
        train_dataset, sampler=train_sampler, batch_size=args.train_batch_size, collate_fn=collate, drop_last = True
    )

    if args.max_steps > 0:
        t_total = args.max_steps
        args.num_train_epochs = args.max_steps // (len(train_dataloader) // args.gradient_accumulation_steps) + 1
    else:
        t_total = len(train_dataloader) // args.gradient_accumulation_steps * args.num_train_epochs

    model = model.module if hasattr(model, "module") else model  # Take care of distributed/parallel training
    model.resize_token_embeddings(len(tokenizer))
    # add_special_tokens_(model, tokenizer)


    # Prepare optimizer and schedule (linear warmup and decay)
    no_decay = ["bias", "LayerNorm.weight"]
    optimizer_grouped_parameters = [
        {
            "params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
            "weight_decay": args.weight_decay,
        },
        {"params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], "weight_decay": 0.0},
    ]
    optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate, eps=args.adam_epsilon)
    scheduler = get_linear_schedule_with_warmup(
        optimizer, num_warmup_steps=args.warmup_steps, num_training_steps=t_total
    )

    # Check if saved optimizer or scheduler states exist
    if (
        args.model_name_or_path
        and os.path.isfile(os.path.join(args.model_name_or_path, "optimizer.pt"))
        and os.path.isfile(os.path.join(args.model_name_or_path, "scheduler.pt"))
    ):
        # Load in optimizer and scheduler states
        optimizer.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "optimizer.pt")))
        scheduler.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "scheduler.pt")))

    if args.fp16:
        try:
            from apex import amp
        except ImportError:
            raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use fp16 training.")
        model, optimizer = amp.initialize(model, optimizer, opt_level=args.fp16_opt_level)

    # multi-gpu training (should be after apex fp16 initialization)
    if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

    # Distributed training (should be after apex fp16 initialization)
    if args.local_rank != -1:
        model = torch.nn.parallel.DistributedDataParallel(
            model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True
        )

    # Train!
    logger.info("***** Running training *****")
    logger.info("  Num examples = %d", len(train_dataset))
    logger.info("  Num Epochs = %d", args.num_train_epochs)
    logger.info("  Instantaneous batch size per GPU = %d", args.per_gpu_train_batch_size)
    logger.info(
        "  Total train batch size (w. parallel, distributed & accumulation) = %d",
        args.train_batch_size
        * args.gradient_accumulation_steps
        * (torch.distributed.get_world_size() if args.local_rank != -1 else 1),
    )
    logger.info("  Gradient Accumulation steps = %d", args.gradient_accumulation_steps)
    logger.info("  Total optimization steps = %d", t_total)

    global_step = 0
    epochs_trained = 0
    steps_trained_in_current_epoch = 0
    # Check if continuing training from a checkpoint
    if args.model_name_or_path and os.path.exists(args.model_name_or_path):
        try:
            # set global_step to gobal_step of last saved checkpoint from model path
            checkpoint_suffix = args.model_name_or_path.split("-")[-1].split("/")[0]
            global_step = int(checkpoint_suffix)
            epochs_trained = global_step // (len(train_dataloader) // args.gradient_accumulation_steps)
            steps_trained_in_current_epoch = global_step % (len(train_dataloader) // args.gradient_accumulation_steps)

            logger.info("  Continuing training from checkpoint, will skip to saved global_step")
            logger.info("  Continuing training from epoch %d", epochs_trained)
            logger.info("  Continuing training from global step %d", global_step)
            logger.info("  Will skip the first %d steps in the first epoch", steps_trained_in_current_epoch)
        except ValueError:
            logger.info("  Starting fine-tuning.")

    tr_loss, logging_loss = 0.0, 0.0

    model.zero_grad()
    train_iterator = trange(
        epochs_trained, int(args.num_train_epochs), desc="Epoch", disable=args.local_rank not in [-1, 0]
    )
    set_seed(args)  # Added here for reproducibility
    for _ in train_iterator:
        epoch_iterator = tqdm(train_dataloader, desc="Iteration", disable=args.local_rank not in [-1, 0])
        for step, batch in enumerate(epoch_iterator):

            # Skip past any already trained steps if resuming training
            if steps_trained_in_current_epoch > 0:
                steps_trained_in_current_epoch -= 1
                continue

            inputs, labels = (batch, batch)
            if inputs.shape[1] > 1024: continue
            inputs = inputs.to(args.device)
            labels = labels.to(args.device)
            model.train()
            outputs = model(inputs, labels=labels)
            loss = outputs[0]  # model outputs are always tuple in transformers (see doc)

            if args.n_gpu > 1:
                loss = loss.mean()  # mean() to average on multi-gpu parallel training
            if args.gradient_accumulation_steps > 1:
                loss = loss / args.gradient_accumulation_steps

            if args.fp16:
                with amp.scale_loss(loss, optimizer) as scaled_loss:
                    scaled_loss.backward()
            else:
                loss.backward()

            tr_loss += loss.item()
            if (step + 1) % args.gradient_accumulation_steps == 0:
                if args.fp16:
                    torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), args.max_grad_norm)
                else:
                    torch.nn.utils.clip_grad_norm_(model.parameters(), args.max_grad_norm)
                optimizer.step()
                scheduler.step()  # Update learning rate schedule
                model.zero_grad()
                global_step += 1

                if args.local_rank in [-1, 0] and args.logging_steps > 0 and global_step % args.logging_steps == 0:
                    # Log metrics
                    if (
                        args.local_rank == -1 and args.evaluate_during_training
                    ):  # Only evaluate when single GPU otherwise metrics may not average well
                        results = evaluate(args, model, tokenizer)
                        for key, value in results.items():
                            tb_writer.add_scalar("eval_{}".format(key), value, global_step)
                    tb_writer.add_scalar("lr", scheduler.get_last_lr()[0], global_step)
                    tb_writer.add_scalar("loss", (tr_loss - logging_loss) / args.logging_steps, global_step)
                    logging_loss = tr_loss

                if args.local_rank in [-1, 0] and args.save_steps > 0 and global_step % args.save_steps == 0:
                    checkpoint_prefix = "checkpoint"
                    # Save model checkpoint
                    output_dir = os.path.join(args.output_dir, "{}-{}".format(checkpoint_prefix, global_step))
                    os.makedirs(output_dir, exist_ok=True)
                    model_to_save = (
                        model.module if hasattr(model, "module") else model
                    )  # Take care of distributed/parallel training
                    model_to_save.save_pretrained(output_dir)
                    tokenizer.save_pretrained(output_dir)

                    torch.save(args, os.path.join(output_dir, "training_args.bin"))
                    logger.info("Saving model checkpoint to %s", output_dir)

                    _rotate_checkpoints(args, checkpoint_prefix)

                    torch.save(optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
                    torch.save(scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
                    logger.info("Saving optimizer and scheduler states to %s", output_dir)

            if args.max_steps > 0 and global_step > args.max_steps:
                epoch_iterator.close()
                break
        if args.max_steps > 0 and global_step > args.max_steps:
            train_iterator.close()
            break

    if args.local_rank in [-1, 0]:
        tb_writer.close()

    return global_step, tr_loss / global_step

# Evaluation of some model

def evaluate(args, model: PreTrainedModel, tokenizer: PreTrainedTokenizer, df_trn, df_val, prefix="") -> Dict:
    # Loop to handle MNLI double evaluation (matched, mis-matched)
    eval_output_dir = args.output_dir

    eval_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=True)
    os.makedirs(eval_output_dir, exist_ok=True)
    args.eval_batch_size = args.per_gpu_eval_batch_size * max(1, args.n_gpu)
    # Note that DistributedSampler samples randomly

    def collate(examples: List[torch.Tensor]):
        if tokenizer._pad_token is None:
            return pad_sequence(examples, batch_first=True)
        return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

    eval_sampler = SequentialSampler(eval_dataset)
    eval_dataloader = DataLoader(
        eval_dataset, sampler=eval_sampler, batch_size=args.eval_batch_size, collate_fn=collate, drop_last = True
    )

    # multi-gpu evaluate
    if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

    # Eval!
    logger.info("***** Running evaluation {} *****".format(prefix))
    logger.info("  Num examples = %d", len(eval_dataset))
    logger.info("  Batch size = %d", args.eval_batch_size)
    eval_loss = 0.0
    nb_eval_steps = 0
    model.eval()

    for batch in tqdm(eval_dataloader, desc="Evaluating"):
        inputs, labels = (batch, batch)
        inputs = inputs.to(args.device)
        labels = labels.to(args.device)

        with torch.no_grad():
            outputs = model(inputs, labels=labels)
            lm_loss = outputs[0]
            eval_loss += lm_loss.mean().item()
        nb_eval_steps += 1

    eval_loss = eval_loss / nb_eval_steps
    perplexity = torch.exp(torch.tensor(eval_loss))

    result = {"perplexity": perplexity}

    output_eval_file = os.path.join(eval_output_dir, prefix, "eval_results.txt")
    with open(output_eval_file, "w") as writer:
        logger.info("***** Eval results {} *****".format(prefix))
        for key in sorted(result.keys()):
            logger.info("  %s = %s", key, str(result[key]))
            writer.write("%s = %s\n" % (key, str(result[key])))

    return result

In [None]:
  def main(df_trn, df_val):
    args = Args()

    if args.should_continue:
        sorted_checkpoints = _sorted_checkpoints(args)
        if len(sorted_checkpoints) == 0:
            raise ValueError("Used --should_continue but no checkpoint was found in --output_dir.")
        else:
            args.model_name_or_path = sorted_checkpoints[-1]

    if (
        os.path.exists(args.output_dir)
        and os.listdir(args.output_dir)
        and args.do_train
        and not args.overwrite_output_dir
        and not args.should_continue
    ):
        raise ValueError(
            "Output directory ({}) already exists and is not empty. Use --overwrite_output_dir to overcome.".format(
                args.output_dir
            )
        )

    # Setup CUDA, GPU & distributed training
    device = torch.device("cuda")
    args.n_gpu = torch.cuda.device_count()
    args.device = device

    # Setup logging
    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO if args.local_rank in [-1, 0] else logging.WARN,
    )
    logger.warning(
        "Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s",
        args.local_rank,
        device,
        args.n_gpu,
        bool(args.local_rank != -1),
        args.fp16,
    )

    # Set seed
    set_seed(args)

    config = AutoConfig.from_pretrained(args.config_name, cache_dir=args.cache_dir)
    tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_name, cache_dir=args.cache_dir)
    model = AutoModelWithLMHead.from_pretrained(
        args.model_name_or_path,
        from_tf=False,
        config=config,
        cache_dir=args.cache_dir,
    )
    model.to(args.device)

    logger.info("Training/evaluation parameters %s", args)

    # Training
    if args.do_train:
        train_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False)

        global_step, tr_loss = train(args, train_dataset, model, tokenizer)
        logger.info(" global_step = %s, average loss = %s", global_step, tr_loss)

    # Saving best-practices: if you use save_pretrained for the model and tokenizer, you can reload them using from_pretrained()
    if args.do_train:
        # Create output directory if needed
        os.makedirs(args.output_dir, exist_ok=True)

        logger.info("Saving model checkpoint to %s", args.output_dir)
        # Save a trained model, configuration and tokenizer using `save_pretrained()`.
        # They can then be reloaded using `from_pretrained()`
        model_to_save = (
            model.module if hasattr(model, "module") else model
        )  # Take care of distributed/parallel training
        model_to_save.save_pretrained(args.output_dir)
        tokenizer.save_pretrained(args.output_dir)

        # Good practice: save your training arguments together with the trained model
        torch.save(args, os.path.join(args.output_dir, "training_args.bin"))

        # Load a trained model and vocabulary that you have fine-tuned
        model = AutoModelWithLMHead.from_pretrained(args.output_dir)
        tokenizer = AutoTokenizer.from_pretrained(args.output_dir)
        model.to(args.device)

    # Evaluation
    results = {}
    if args.do_eval and args.local_rank in [-1, 0]:
        checkpoints = [args.output_dir]
        if args.eval_all_checkpoints:
            checkpoints = list(
                os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + "/**/" + WEIGHTS_NAME, recursive=True))
            )
            logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN)  # Reduce logging
        logger.info("Evaluate the following checkpoints: %s", checkpoints)
        for checkpoint in checkpoints:
            global_step = checkpoint.split("-")[-1] if len(checkpoints) > 1 else ""
            prefix = checkpoint.split("/")[-1] if checkpoint.find("checkpoint") != -1 else ""

            model = AutoModelWithLMHead.from_pretrained(checkpoint)
            model.to(args.device)
            result = evaluate(args, model, tokenizer, df_trn, df_val, prefix=prefix)
            result = dict((k + "_{}".format(global_step), v) for k, v in result.items())
            results.update(result)

    return results

In [None]:
main(df_trn, df_val)


02/22/2022 07:59:25 - INFO - __main__ -   Training/evaluation parameters <__main__.Args object at 0x7ff38d679d10>
02/22/2022 07:59:25 - INFO - __main__ -   Creating features from dataset file at cached
02/22/2022 08:01:32 - INFO - __main__ -   Saving features into cached file cached/gpt2_cached_lm_512
02/22/2022 08:01:32 - INFO - __main__ -   ***** Running training *****
02/22/2022 08:01:32 - INFO - __main__ -     Num examples = 39994
02/22/2022 08:01:32 - INFO - __main__ -     Num Epochs = 20
02/22/2022 08:01:32 - INFO - __main__ -     Instantaneous batch size per GPU = 4
02/22/2022 08:01:32 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 4
02/22/2022 08:01:32 - INFO - __main__ -     Gradient Accumulation steps = 1
02/22/2022 08:01:32 - INFO - __main__ -     Total optimization steps = 199960


Epoch:   0%|          | 0/20 [00:00<?, ?it/s]

Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 08:12:46 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-3500
02/22/2022 08:12:57 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-3500
02/22/2022 08:24:08 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-7000
02/22/2022 08:24:20 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-7000


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 08:35:28 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-10500
02/22/2022 08:35:42 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-10500
02/22/2022 08:46:49 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-14000
02/22/2022 08:47:03 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-14000
02/22/2022 08:58:10 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-17500
02/22/2022 08:58:15 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-17500


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 09:09:22 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-21000
02/22/2022 09:09:27 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-21000
02/22/2022 09:20:33 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-24500
02/22/2022 09:20:36 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-24500
02/22/2022 09:31:38 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-28000
02/22/2022 09:31:43 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-28000


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 09:42:47 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-31500
02/22/2022 09:42:51 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-31500
02/22/2022 09:53:56 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-35000
02/22/2022 09:53:59 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-35000
02/22/2022 10:05:04 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-38500
02/22/2022 10:05:08 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-38500


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 10:16:11 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-42000
02/22/2022 10:16:14 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-42000
02/22/2022 10:27:19 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-45500
02/22/2022 10:27:22 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-45500
02/22/2022 10:38:28 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-49000
02/22/2022 10:38:31 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-49000


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 10:49:33 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-52500
02/22/2022 10:49:36 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-52500
02/22/2022 11:00:42 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-56000
02/22/2022 11:00:45 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-56000
02/22/2022 11:11:49 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-59500
02/22/2022 11:11:54 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-59500


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 11:22:59 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-63000
02/22/2022 11:23:03 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-63000
02/22/2022 11:34:08 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-66500
02/22/2022 11:34:12 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-66500


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 11:45:17 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-70000
02/22/2022 11:45:20 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-70000
02/22/2022 11:56:25 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-73500
02/22/2022 11:56:30 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-73500
02/22/2022 12:07:35 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-77000
02/22/2022 12:07:38 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-77000


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 12:18:40 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-80500
02/22/2022 12:18:44 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-80500
02/22/2022 12:29:49 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-84000
02/22/2022 12:29:52 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-84000
02/22/2022 12:40:56 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-87500
02/22/2022 12:40:59 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-87500


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 12:52:08 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-91000
02/22/2022 12:52:12 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-91000
02/22/2022 13:03:16 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-94500
02/22/2022 13:03:19 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-94500
02/22/2022 13:14:22 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-98000
02/22/2022 13:14:25 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-98000


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 13:25:28 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-101500
02/22/2022 13:25:32 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-101500
02/22/2022 13:36:39 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-105000
02/22/2022 13:36:44 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-105000
02/22/2022 13:47:50 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-108500
02/22/2022 13:47:53 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-108500


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 13:58:58 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-112000
02/22/2022 13:59:01 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-112000
02/22/2022 14:10:07 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-115500
02/22/2022 14:10:10 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-115500
02/22/2022 14:21:15 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-119000
02/22/2022 14:21:19 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-119000


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 14:32:23 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-122500
02/22/2022 14:32:27 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-122500
02/22/2022 14:43:32 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-126000
02/22/2022 14:43:36 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-126000
02/22/2022 14:54:40 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-129500
02/22/2022 14:54:45 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-129500


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 15:05:51 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-133000
02/22/2022 15:05:54 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-133000
02/22/2022 15:16:58 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-136500
02/22/2022 15:17:01 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-136500


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 15:28:05 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-140000
02/22/2022 15:28:10 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-140000
02/22/2022 15:39:13 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-143500
02/22/2022 15:39:17 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-143500
02/22/2022 15:50:24 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-147000
02/22/2022 15:50:28 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-147000


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 16:01:35 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-150500
02/22/2022 16:01:40 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-150500
02/22/2022 16:12:40 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-154000
02/22/2022 16:12:45 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-154000
02/22/2022 16:23:50 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-157500
02/22/2022 16:23:53 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-157500


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 16:35:00 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-161000
02/22/2022 16:35:03 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-161000
02/22/2022 16:46:09 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-164500
02/22/2022 16:46:12 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-164500
02/22/2022 16:57:15 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-168000
02/22/2022 16:57:19 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-168000


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 17:08:28 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-171500
02/22/2022 17:08:31 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-171500
02/22/2022 17:19:33 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-175000
02/22/2022 17:19:37 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-175000
02/22/2022 17:30:42 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-178500
02/22/2022 17:30:45 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-178500


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 17:41:50 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-182000
02/22/2022 17:41:53 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-182000
02/22/2022 17:52:59 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-185500
02/22/2022 17:53:04 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-185500
02/22/2022 18:04:08 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-189000
02/22/2022 18:04:11 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-189000


Iteration:   0%|          | 0/9998 [00:00<?, ?it/s]

02/22/2022 18:15:16 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-192500
02/22/2022 18:15:19 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-192500
02/22/2022 18:26:27 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-196000
02/22/2022 18:26:30 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-196000
02/22/2022 18:37:34 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs/checkpoint-199500
02/22/2022 18:37:37 - INFO - __main__ -   Saving optimizer and scheduler states to output-small-50-20-epochs/checkpoint-199500
02/22/2022 18:39:04 - INFO - __main__ -    global_step = 199960, average loss = 0.6357949749282877
02/22/2022 18:39:04 - INFO - __main__ -   Saving model checkpoint to output-small-50-20-epochs
02/22/2022 18:39:08 - INFO - __main__ -   Evaluate the following checkpoints: ['output

Evaluating:   0%|          | 0/2499 [00:00<?, ?it/s]

02/22/2022 18:42:16 - INFO - __main__ -   ***** Eval results  *****
02/22/2022 18:42:16 - INFO - __main__ -     perplexity = tensor(1.1191)


{'perplexity_': tensor(1.1191)}

# **CHATTING WITH THE MODEL**



In [None]:
tokenizer = AutoTokenizer.from_pretrained('microsoft/DialoGPT-small')
model = AutoModelWithLMHead.from_pretrained('output-small-50-20-epochs')

# Let's chat for 5 lines
for step in range(6):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')
    # print(new_user_input_ids)

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(
        bot_input_ids, max_length=1000,
        pad_token_id=tokenizer.eos_token_id,
        top_p=0.92, top_k = 50
    )
    
    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))



>> User:What's your favourite company?
DialoGPT:  I like Amazon. I think it's so cool that they started as an online bookstore.
>> User:Know any science facts?
DialoGPT:  I know that they have a publishing arm and a publishing arm.
>> User:Do you like science?
DialoGPT:  I do. I am a fan of the field. I am not a scientist though.
>> User:Know any sports teams?
DialoGPT:  I do. I am a fan of the NBA.
>> User:What's your favourite basketball team?
DialoGPT: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

# **PUSHING TO HUGGING FACE**

In [None]:
!sudo apt-get install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-470
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 39 not upgraded.
Need to get 2,129 kB of archives.
After this operation, 7,662 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]
Fetched 2,129 kB in 0s (17.5 MB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-pr

In [None]:
!git config --global user.email "#####"
!git config --global user.name "#####"

In [None]:
MY_MODEL_NAME = 'DialoGPT-small-sean'
HUGGINGFACE_API_KEY = '#####'

In [None]:
model.push_to_hub(MY_MODEL_NAME, use_auth_token=HUGGINGFACE_API_KEY)
tokenizer.push_to_hub(MY_MODEL_NAME, use_auth_token=HUGGINGFACE_API_KEY)

Cloning https://huggingface.co/satkinson/DialoGPT-small-sean into local empty directory.


Upload file pytorch_model.bin:   0%|          | 3.38k/487M [00:00<?, ?B/s]

To https://huggingface.co/satkinson/DialoGPT-small-sean
   972d367..94986c3  main -> main

To https://huggingface.co/satkinson/DialoGPT-small-sean
   94986c3..d5f7d00  main -> main



'https://huggingface.co/satkinson/DialoGPT-small-sean/commit/d5f7d00a4f7e16e8d38b79da08475ea572ea78a3'