# Funnybot's Generator Models

## Overview

This notebook builds and evaluates models used by Funnybot to generate jokes. 

I'm using the [GPT2 transformers](https://huggingface.co/gpt2) as well as other utilities from [Hugging Face](https://huggingface.co) to train and evaluate the models in this project.

Due to my current hardware contraints and reluctance to spend some of my hard-earned money on cloud GPU's (in other words, I'm a cheapskate), these models do not generate incredible jokes, however, I hope that these notebooks may be helpful to someone strugging to find a straigh-forward and complete tutorial on text generation on the web.

The models from Hugging Face are basically pre-trained models that already have been optimized to have an understanding of natural language and generate natural language text. In this notebook we are going to fine-tune these models to generate text from a particular class, i.e., funny jokes.

## Dependencies

In [1]:
from datasets import Dataset

import evaluate
from evaluate import evaluator
from evaluate import load

import pandas as pd

from pathlib import Path

import profanity_check

import string

import shutil

import torch
from torch import nn
from torch.utils.data import DataLoader

from tqdm.auto import tqdm

from transformers import PreTrainedTokenizer, DataCollatorWithPadding, GenerationConfig, TextGenerationPipeline
from transformers import GPT2LMHeadModel, GPT2Config, GPT2Tokenizer
from torch.optim import AdamW
from transformers import get_scheduler

This notebook requires the native dependencies from the "Comedy Club" project. You may install them by running:

```
cd <"Comedy Club" project root>
pip install .
```

Additionally, the dependencies defined in [requirements.txt](../requirements-dev.txt) are required. You may install them by running:

```
pip install -r requirements-dev.txt
```

## Raw Data
### Dad Jokes

The following normalization will be applied to all jokes:

- Remove non ASCII characters (with the assumption that jokes are always in English).
- Remove some non-standard punctuation characters.
- Remove excessive spacing.


That's just to generate a nicer output for the application in our project. The GPT2 models don't seem to mind about those and I haven't notice any difference in the quality of the results.

In [2]:
def normalize_sentence(row):
    characters_to_remove = string.punctuation.replace(".", "").replace("-", "").replace("'", "")
    
    text = row["text"].encode('ascii', errors='ignore').decode()
    text = " ".join(text.split()).strip()
    text = " ".join(text.split(characters_to_remove)).strip()

    return text

data_dir = save_dir = Path().absolute().parent.parent / "data"

dad_jokes_df = pd.read_csv(data_dir / "dad-jokes.csv")
dad_jokes_df["type"] = "Dad Jokes"
dad_jokes_df = dad_jokes_df.rename(columns={"Joke": "text"})

dad_jokes_df["text"] = dad_jokes_df.apply(normalize_sentence, axis=1)

dad_jokes_df

Unnamed: 0,text,type
0,I'm tired of following my dreams. I'm just goi...,Dad Jokes
1,Did you hear about the guy whose whole left si...,Dad Jokes
2,Why didnt the skeleton cross the road? Because...,Dad Jokes
3,What did one nut say as he chased another nut?...,Dad Jokes
4,Where do fish keep their money? In the riverbank,Dad Jokes
...,...,...
738,What do you call a guy lying on your doorstep?...,Dad Jokes
739,"I met this girl on a dating site and, I don't ...",Dad Jokes
740,What did the calculator say to the student? Yo...,Dad Jokes
741,What do you call a gorilla wearing headphones?...,Dad Jokes


Assuming that the jokes are always in the English language, I decided to encode the text to ASCII to remove non-standard punctuation characters from the text.

We will initialize our jokes dataset with "Dad Jokes":

In [3]:
jokes_df = dad_jokes_df

### Question/Answer Jokes

In [4]:
def combine_question_and_answer(row):
    return " ".join([str(row["Question"]).replace("Q:", ""), str(row["Answer"]).replace("A:", "")])

qa_jokes_df = pd.read_csv(data_dir / "question-answer-jokes.csv")
qa_jokes_df["type"] = "Question Answer Jokes"
qa_jokes_df["text"] = qa_jokes_df.apply(combine_question_and_answer, axis=1)
qa_jokes_df["text"] = qa_jokes_df.apply(normalize_sentence, axis=1)
qa_jokes_df = qa_jokes_df.drop(columns=["ID", "Question", "Answer"])

qa_jokes_df.sort_values("text", ascending=False)

Unnamed: 0,type,text
32258,Question Answer Jokes,you met the short guy who came out of the cupb...
16162,Question Answer Jokes,you know why Santa sack is so big? because he ...
12611,Question Answer Jokes,you know who makes the best cocoa? paedophiles
2452,Question Answer Jokes,"you know what would be cool ? "" an ice cube ....."
18938,Question Answer Jokes,you know what really turns on a nerd? unprotec...
...,...,...
33029,Question Answer Jokes,"""What is your greatest strength""? Brevity."
21341,Question Answer Jokes,"""What do you call someone who makes cakes in S..."
14407,Question Answer Jokes,"""HUGE for an Asian"" slogan stupid or funny? Ye..."
223,Question Answer Jokes,"""Did you hear about the $3,000,000 Maryland St..."


We now add the question/answer jokes to the jokes dataset:

In [5]:
jokes_df = pd.concat([qa_jokes_df, dad_jokes_df], axis=0)

In [6]:
model_types = jokes_df["type"].unique().tolist()

model_types

['Question Answer Jokes', 'Dad Jokes']

In [7]:
model_type = model_types[1]

model_type

'Dad Jokes'

In [8]:
jokes_df = jokes_df[jokes_df["type"] == model_type]

jokes_df

Unnamed: 0,type,text
0,Dad Jokes,I'm tired of following my dreams. I'm just goi...
1,Dad Jokes,Did you hear about the guy whose whole left si...
2,Dad Jokes,Why didnt the skeleton cross the road? Because...
3,Dad Jokes,What did one nut say as he chased another nut?...
4,Dad Jokes,Where do fish keep their money? In the riverbank
...,...,...
738,Dad Jokes,What do you call a guy lying on your doorstep?...
739,Dad Jokes,"I met this girl on a dating site and, I don't ..."
740,Dad Jokes,What did the calculator say to the student? Yo...
741,Dad Jokes,What do you call a gorilla wearing headphones?...


The same process may be performed for any other specialty jokes category. We may a new type of joke to our dataset and create specific transformers for that dataset.

We don't want any profanity in our dataset given that we are aiming for a "clean" bot, so we remove them from the dataset:

In [9]:
def check_profanity(row):
    return profanity_check.predict([row["text"]])[0] > 0

jokes_df["profanity"] = jokes_df.apply(check_profanity, axis=1)

jokes_df.drop(jokes_df[jokes_df["profanity"]].index, inplace = True)

jokes_df = jokes_df.reset_index(drop=True)

jokes_df

Unnamed: 0,type,text,profanity
0,Dad Jokes,I'm tired of following my dreams. I'm just goi...,False
1,Dad Jokes,Did you hear about the guy whose whole left si...,False
2,Dad Jokes,Why didnt the skeleton cross the road? Because...,False
3,Dad Jokes,Where do fish keep their money? In the riverbank,False
4,Dad Jokes,I accidentally took my cats meds last night. D...,False
...,...,...,...
725,Dad Jokes,What do you call a guy lying on your doorstep?...,False
726,Dad Jokes,"I met this girl on a dating site and, I don't ...",False
727,Dad Jokes,What did the calculator say to the student? Yo...,False
728,Dad Jokes,What do you call a gorilla wearing headphones?...,False


This cell should be executed only for development purposes. It truncates the "train" dataset for a less time consuming training:

In [10]:
#%%script false --no-raise-error

max_dataset_size = 500

jokes_df = jokes_df[:max_dataset_size]

jokes_df

Unnamed: 0,type,text,profanity
0,Dad Jokes,I'm tired of following my dreams. I'm just goi...,False
1,Dad Jokes,Did you hear about the guy whose whole left si...,False
2,Dad Jokes,Why didnt the skeleton cross the road? Because...,False
3,Dad Jokes,Where do fish keep their money? In the riverbank,False
4,Dad Jokes,I accidentally took my cats meds last night. D...,False
...,...,...,...
495,Dad Jokes,A man walked in to a bar with some asphalt on ...,False
496,Dad Jokes,Did you know the first French fries weren't ac...,False
497,Dad Jokes,"Ill tell you something about German sausages, ...",False
498,Dad Jokes,Where did Captain Hook get his hook? From a se...,False


## Creating Datasets

Now it's time to create datasets that can be consumed by the transformers. We will also split our dataset into two datasets, for training and evaluation:

In [11]:
full_dataset = Dataset.from_pandas(jokes_df)

raw_datasets = full_dataset.train_test_split(test_size=0.3)

raw_datasets

DatasetDict({
    train: Dataset({
        features: ['type', 'text', 'profanity'],
        num_rows: 350
    })
    test: Dataset({
        features: ['type', 'text', 'profanity'],
        num_rows: 150
    })
})

We need to wrap every text with a start and end token:

In [12]:
def wrap_text(example):
    example["text"] = "<|startoftext|>" + example["text"] + "<|endoftext|>"
    return example

raw_datasets["train"] = raw_datasets["train"].map(wrap_text)

raw_datasets["train"]["text"][:2]

Map:   0%|          | 0/350 [00:00<?, ? examples/s]

['<|startoftext|>I never wanted to believe that my Dad was stealing from his job as a road worker. But when I got home, all the signs were there.<|endoftext|>',
 '<|startoftext|>This furniture store keeps emailing me, all I wanted was one night stand!<|endoftext|>']

## Creating a Tokenizer

As for the checkpoint, we are going to use [gpt2](https://huggingface.co/gpt2), which is suitable for text generation and small enough for the purposes of this challenge (development).

Hugging Face makes available the following GPT2 checkpoints for transformers:

- gpt2 (137M parameters)
- gpt2-medium (380M parameters)
- gpts-large (821M parameters)
- gpt2-xl (1.5B parameters)

Even gpt2-medium was challenging for my local computer, thus, we will stick to "gpt2" for the purposes of this project.

In [13]:
checkpoint = "gpt2"

The tokenizer we are going to use is the following (suitable for our generator):

Now we perform tokenization on our jokes dataset:

In [14]:
tokenizer = GPT2Tokenizer.from_pretrained(
    checkpoint,
    bos_token="<|startoftext|>",
    eos_token="<|endoftext|>",
    padding=True,
    pad_token="<|pad|>",
    padding_side="left"
)

inputs = tokenizer("<|startoftext|>This is my sentence.<|endoftext|><|pad|>")
tokenizer.convert_ids_to_tokens(inputs["input_ids"])

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


['<|startoftext|>',
 'This',
 'Ġis',
 'Ġmy',
 'Ġsentence',
 '.',
 '<|endoftext|>',
 '<|pad|>']

>***Note:***
>
>*The warning is just fine. We did add new tokens to the vocabulary (beggining/ending of sentence, and padding tokens) and we are going to fine-tune the word embeddings when we train the model.*

In [15]:
def tokenize_function(example):
    return tokenizer(example["text"], truncation=True)

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

tokenized_datasets["train"]

Map:   0%|          | 0/350 [00:00<?, ? examples/s]

Map:   0%|          | 0/150 [00:00<?, ? examples/s]

Dataset({
    features: ['type', 'text', 'profanity', 'input_ids', 'attention_mask'],
    num_rows: 350
})

In [16]:
tokenized_datasets["test"]

Dataset({
    features: ['type', 'text', 'profanity', 'input_ids', 'attention_mask'],
    num_rows: 150
})

The original feature columns can not be used for training, thus they will be removed. We also change the format to "torch":

In [17]:
tokenized_datasets = tokenized_datasets.remove_columns(["text", "type", "profanity"])
tokenized_datasets.set_format("torch")

tokenized_datasets

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 350
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask'],
        num_rows: 150
    })
})

According to [this](https://discuss.huggingface.co/t/shifting-ids-to-the-right-when-training-gpt-2-on-text-generation/5308), labels should be initialized with the values of the input ID's:

In [18]:
def add_labels(example):
    example["labels"] = example["input_ids"]
    return example

tokenized_datasets["train"] = tokenized_datasets["train"].map(add_labels)
tokenized_datasets["test"] = tokenized_datasets["test"].map(add_labels)

tokenized_datasets

Map:   0%|          | 0/350 [00:00<?, ? examples/s]

Map:   0%|          | 0/150 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 350
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 150
    })
})

## Creating a Data Loader

The data loader allows us to feed our dataset by batches during training. First we need to create a data collator:

In [19]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

data_collator

DataCollatorWithPadding(tokenizer=GPT2Tokenizer(name_or_path='gpt2', vocab_size=50257, model_max_length=1024, is_fast=False, padding_side='left', truncation_side='right', special_tokens={'bos_token': AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': AddedToken("<|pad|>", rstrip=False, lstrip=False, single_word=False, normalized=True)}, clean_up_tokenization_spaces=True), padding=True, max_length=None, pad_to_multiple_of=None, return_tensors='pt')

Now we create a data loader for the training dataset:

In [20]:
train_dataloader = DataLoader(
    tokenized_datasets["train"], shuffle=True, collate_fn=data_collator
)

for batch in train_dataloader:
    break
{k: v.shape for k, v in batch.items()}

{'input_ids': torch.Size([1, 17]),
 'attention_mask': torch.Size([1, 17]),
 'labels': torch.Size([1, 17])}

## Training our Model

Given that our purpose is text generation, [GPT2LMHeadModel](https://huggingface.co/docs/transformers/v4.31.0/en/model_doc/gpt2#transformers.GPT2LMHeadModel) is suitable for the job.

The following might seem odd, but it's the [way recommended by Hugging Face](https://huggingface.co/docs/transformers/generation_strategies). We need to save a pre-trained model to temporary directory, modify its configuration, and then load the model from the temporary directory.

In [21]:
model = GPT2LMHeadModel.from_pretrained(checkpoint)
temp_model_dir = "/tmp/cached_gpt2_model"
shutil.rmtree(temp_model_dir, ignore_errors=True)
model.save_pretrained(temp_model_dir)

configuration = GenerationConfig(
    max_new_tokens=100,
    min_new_tokens=10,
    do_sample=True,
    top_k=50,
    top_p=0.9,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
    padding_side=tokenizer.padding_side
)
configuration.save_pretrained(temp_model_dir)

model = GPT2LMHeadModel.from_pretrained(temp_model_dir)
model.resize_token_embeddings(len(tokenizer))

outputs = model(**batch)
outputs[:2]

(tensor(77.5884, grad_fn=<NllLossBackward0>),
 tensor([[[ -36.9449,  -36.4698,  -40.5080,  ...,  -37.1380,   -1.3210,
             -2.5912],
          [ -99.8990, -100.1197, -103.9894,  ..., -101.4061,   -3.2685,
             -7.5770],
          [ -78.9669,  -78.4222,  -82.3988,  ...,  -79.1327,   -2.7331,
             -6.0738],
          ...,
          [ -19.0832,  -18.6768,  -23.9441,  ...,  -21.0065,   -0.2918,
             -0.8426],
          [ -96.6570,  -95.2647,  -97.4394,  ...,  -93.1870,   -2.0474,
             -6.7438],
          [ -88.9241,  -81.9257,  -84.6400,  ...,  -90.6247,   -1.5910,
             -5.8709]]], grad_fn=<UnsafeViewBackward0>))

To traing our model using [PyTorch](https://pytorch.org/) we will require the following components:

- Optimizer (in case you are not familiar with ML, this optimizer implements stochastic gradient descent for neural networks).
- Scheduler (a component that manages the iterations required to train the model).
- Tokenizer (created in previous sections).
- Data Loader (created in previous sections).

We are going to use ADAM as our model's optimizer:

In [22]:
optimizer = AdamW(model.parameters(), lr=5e-5)

The device used depends on your own hardware:

In [23]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

device

device(type='cpu')

Here are the parameters use for our training steps:

In [24]:
num_epochs = 3
num_training_steps = num_epochs * len(train_dataloader)
num_warmup_steps = int(num_training_steps * 0.1)

(num_training_steps, num_warmup_steps)

(1050, 105)

We will using 10% of our training steps for warm-up.

Here's our scheduler:

In [25]:
scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=num_warmup_steps,
    num_training_steps=num_training_steps,
)

Finally, we train our model using our dataset:

In [26]:
progress_bar = tqdm(range(num_training_steps))

model.train()
for epoch in range(num_epochs):
    for batch in train_dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model.to(device)(**batch)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)

  0%|          | 0/1050 [00:00<?, ?it/s]

> ***Note:***
> 
> *If training is taking too long or you ran out of memory. Assuming that you are just running this notebook for the sake of learning, try to reduce `max_dataset_size` and `num_epochs`. These parameter variables have been defined earlier in this notebook. The performance of the models will degrade though.*

## Testing our Model

Now we can create a pipeline that uses our model and tokenizer and make predictions:

In [27]:
pipeline = TextGenerationPipeline(
    model=model.to(device),
    tokenizer=tokenizer,
)

predictions = pipeline("", num_return_sequences=len(raw_datasets["test"]))
predictions = [prediction["generated_text"] for prediction in predictions]

(predictions[:2], len(predictions))

(["Why does the skylight make the sun look a little more blue? Because it can't make the moon look all that bad.",
  'I was afraid to open my fridge after I was told I was a vampire. I ate ice with the fridge. I have always been sad.'],
 150)

## Evaluating our Model

We need to way to evaluate the performance of our models, thus we will use an evaluator:

In [28]:
task_evaluator = evaluator("text-generation")

model.eval()
evaluator_results = task_evaluator.compute(
    model_or_pipeline=model.to(device),
    tokenizer=tokenizer,
    data=raw_datasets["test"],
    input_column="text",
)

print(pd.DataFrame(evaluator_results, index=[0]).to_markdown())

|    |   total_word_count |   unique_words |   total_time_in_seconds |   samples_per_second |   latency_in_seconds |
|---:|-------------------:|---------------:|------------------------:|---------------------:|---------------------:|
|  0 |               3691 |           1108 |                 68.9528 |               2.1754 |             0.459685 |


***epochs: 3***

|    |   total_word_count |   unique_words |   total_time_in_seconds |   samples_per_second |   latency_in_seconds |
|---:|-------------------:|---------------:|------------------------:|---------------------:|---------------------:|
|  0 |               5068 |           1377 |                 148.925 |              1.47054 |             0.680024 |

***epochs: 4***

|    |   total_word_count |   unique_words |   total_time_in_seconds |   samples_per_second |   latency_in_seconds |
|---:|-------------------:|---------------:|------------------------:|---------------------:|---------------------:|
|  0 |               5105 |           1445 |                 170.742 |              1.28264 |             0.779644 |

That's not extremely useful, so let's use the [BERT score](https://huggingface.co/spaces/evaluate-metric/bertscore) for evaluating text generation instead:

In [29]:
bertscore = load("bertscore")

references = raw_datasets["test"]["text"]
scorer_results = bertscore.compute(predictions=predictions, references=references, model_type="distilbert-base-uncased")
scorer_results = pd.DataFrame(scorer_results, index=[0] * len(raw_datasets["test"]))

scorer_results[:2]

Unnamed: 0,precision,recall,f1,hashcode
0,0.735906,0.75223,0.743979,distilbert-base-uncased_L5_no-idf_version=0.3....
0,0.69258,0.682768,0.687639,distilbert-base-uncased_L5_no-idf_version=0.3....


In [30]:
print(scorer_results.describe().to_markdown())

|       |   precision |      recall |          f1 |
|:------|------------:|------------:|------------:|
| count | 150         | 150         | 150         |
| mean  |   0.705332  |   0.701462  |   0.703114  |
| std   |   0.0370431 |   0.0402527 |   0.0360881 |
| min   |   0.611148  |   0.603059  |   0.615114  |
| 25%   |   0.683321  |   0.675502  |   0.681355  |
| 50%   |   0.702779  |   0.698414  |   0.701225  |
| 75%   |   0.731247  |   0.72337   |   0.723633  |
| max   |   0.800699  |   0.820124  |   0.801497  |


It can't tell if how funny a joke is, but at least it can compute the similarity between the jokes generated and a reference dataset (in our case, the "test" dataset).

Here are some results for different training parameters:

***epochs: 3 max_dataset_size: 700***

|       |   precision |      recall |          f1 |
|:------|------------:|------------:|------------:|
| count | 219         | 219         | 219         |
| mean  |   0.7054    |   0.70002   |   0.70246   |
| std   |   0.0358243 |   0.0363763 |   0.0337329 |
| min   |   0.592793  |   0.607776  |   0.600191  |
| 25%   |   0.682816  |   0.673237  |   0.678634  |
| 50%   |   0.702439  |   0.697652  |   0.700339  |
| 75%   |   0.727483  |   0.724764  |   0.723261  |
| max   |   0.828355  |   0.820289  |   0.802596  |

***epochs: 4 max_dataset_size: 700***

|       |   precision |      recall |          f1 |
|:------|------------:|------------:|------------:|
| count | 219         | 219         | 219         |
| mean  |   0.705329  |   0.699855  |   0.702345  |
| std   |   0.0428907 |   0.0411949 |   0.0399797 |
| min   |   0.560061  |   0.595935  |   0.577442  |
| 25%   |   0.675965  |   0.671475  |   0.674927  |
| 50%   |   0.700021  |   0.695283  |   0.699969  |
| 75%   |   0.728157  |   0.724505  |   0.723873  |
| max   |   0.903992  |   0.891035  |   0.897467  |

You may notice that there's no significant change in both precision and f1 scores increasing the number of epochs.

The right procedure would be trying different parameters to squeeze as much performance as possible. However, it seems like we have reached already the threshold of overfitting.

A larger dataset will improve performance, however, any AI model will eventually hit a performance threshold and not improve no matter how much data we use. AI models based on neural networks have a much higher threshold than any other, thus, we should expect to keep improving this model quite a bit by adding data.

Being this project for demonstrating purposes, though I'm not willing to spent the time and resources to achieve peak performance.

Using my current hardware (no GPU's, 8 MBytes of RAM), training this model with the full 38K records from "Question Answer Jokes" would take over 40 hours... Yep, I could use a computer upgrade (even though cloud GPU's probably would be a more cost effective alternative).

Still, I probably do much more analysis and improve this notebook quite a lot, but I'll leave this for another weekend...

***TODO:***
>
>- Sometimes that kernel will die due to out of memory. The notebook could use some memory optimization (i.e., deleting unused variables, using a context manager, strategically using Jupyter's reset command).
>- Increase the dataset size even though it would take a good amount of hours to run.

## Saving our Models
### Local Environment

We are required to save our models to the directory the application is expecting.

A new folder will be created for each different joke type. Models saved under `./joke-/models` will be used by our project.

In [31]:
save_dir = Path().absolute().parent / "joke-generator"

model.save_pretrained(save_dir / "models" / model_type)
tokenizer.save_pretrained(save_dir / "tokenizer")

('/home/marcio/workspace/konfuzio-ai/ai-comedy-club/bots/funnybot/transformers/joke-generator/tokenizer/tokenizer_config.json',
 '/home/marcio/workspace/konfuzio-ai/ai-comedy-club/bots/funnybot/transformers/joke-generator/tokenizer/special_tokens_map.json',
 '/home/marcio/workspace/konfuzio-ai/ai-comedy-club/bots/funnybot/transformers/joke-generator/tokenizer/vocab.json',
 '/home/marcio/workspace/konfuzio-ai/ai-comedy-club/bots/funnybot/transformers/joke-generator/tokenizer/merges.txt',
 '/home/marcio/workspace/konfuzio-ai/ai-comedy-club/bots/funnybot/transformers/joke-generator/tokenizer/added_tokens.json')

### Hugging Face's Hub

You will need a Hugging Face's acccount and of course you will only be able to push to repositories in your own account.

In [32]:
%%script false --no-raise-error

model.push_to_hub(f"marciogualtieri/funnybot-joke-generator-model-{model_type.lower().replace(' ', '-')}")
tokenizer.push_to_hub("marciogualtieri/funnybot-joke-generator-tokenizer")

pytorch_model.bin:   0%|          | 0.00/498M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/marciogualtieri/funnybot-joke-generator-tokenizer/commit/0b5f6f3fab7922b1501dc05b35cc39cbe993ae39', commit_message='Upload tokenizer', commit_description='', oid='0b5f6f3fab7922b1501dc05b35cc39cbe993ae39', pr_url=None, pr_revision=None, pr_num=None)