# GPT-Neo Models

<a href="https://colab.research.google.com/github/hjesse92/style_transfer_w266/blob/main/notebooks/GPT_Neo_Models.ipynb" target="_parent"><img src="https://
colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup

In [1]:
!pip install -q transformers rouge_score evaluate

In [1]:
#Am I running a GPU and what type is it?
!nvidia-smi

Mon Apr  3 01:45:19 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   25C    P0    42W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [1]:
import torch

# Clear out cuda
torch.cuda.empty_cache()

if torch.cuda.is_available():     
    device = torch.device("cuda")
    print('Number of GPU(s) available:', torch.cuda.device_count())
    print('GPU device name:', torch.cuda.get_device_name(0))

else:
    print('No GPU available')
    device = torch.device("cpu")

Number of GPU(s) available: 1
GPU device name: NVIDIA A10G


In [2]:
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

from transformers import GPTNeoForCausalLM, AutoTokenizer, GPT2Tokenizer
from datasets import load_metric, load_dataset
from transformers import AdamW, TrainingArguments, Trainer, DataCollatorForLanguageModeling

import re
import random
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pprint
import nltk

from logging import warning
import warnings
warnings.filterwarnings('ignore')

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
nltk.download('punkt')

[nltk_data] Downloading package punkt to /home/ubuntu/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
%cd ..
# cd drive/MyDrive/w266/style_transfer_w266/

/home/ubuntu/style_transfer_w266


In [5]:
train_file = 'data/original-train.tsv'
dev_file = 'data/original-dev.tsv'
test_file = 'data/original-test.tsv'
df_train = pd.read_csv(train_file, sep='\t')
df_dev = pd.read_csv(dev_file, sep='\t')
df_test = pd.read_csv(test_file, sep='\t')

## Few Shot Learning GPT-Neo 2.7B

In [6]:
#@title 3-Shot Learning
df_fewshot = df_test.copy()
n_shots = 3

for row in range(len(df_fewshot)):
    #For each item in the test set, we use 3 examples from the training set to as n-shots
    sources_targets = df_train.sample(n_shots, replace=False)

    for i in range(n_shots):
        df_fewshot.loc[row, f'shot{i+1}_source'] = sources_targets.iloc[i,0]
        df_fewshot.loc[row, f'shot{i+1}_target'] = sources_targets.iloc[i,1]


df_fewshot = df_fewshot.rename(columns={'offensive-text':'source', 'style-transferred-text':'target'})
df_fewshot.head()

Unnamed: 0,source,target,shot1_source,shot1_target,shot2_source,shot2_target,shot3_source,shot3_target
0,So maybe you should be more retarded.,So maybe you should be more backward,Has been a US military puppet from jump,Has been a tool for the US military from the b...,It's not though you fucking idiot,It's not though,Maybe you shouldn't turn your new country into...,You should be careful not to bring your old pr...
1,THERES A MEGATHREAD FOR VACCINE OR COVID RELAT...,THERES ACTUALLY A MEGATHREAD FOR VACCINE OR CO...,wasn't that like 12 feds and an autistic fuck,wasn't that like 12 feds and an unintelligent ...,"Sympathy: That sucks, I'm sorry you feel that ...","Sympathy: That stinks, I'm sorry you feel that...","Yep, it sucks...","Yeah, that's bad"
2,the fuck.... you are on drugs,uh..... you are not being realistic,This guy is overrated..... But he is right abo...,This guy is overrated...... But he is right ab...,That roommate's presence is gross and we need ...,That roommate's presence is unbearable and we ...,Funny how Biden canceled Trump’s insulin cap w...,Funny how Biden canceled Trump’s insulin cap w...
3,NTA. Dump his ass. Tablemanners are no rocke...,You should leave him. Tablemanners are no rock...,You don’t deserve it dumbass.,You don't deserve it.,I just realized you are from India LMAO bet yo...,I just realized you are from India LMAO are yo...,Also sounds like facism,Also sounds like autocracy
4,Youre soft as baby shit,Youre really soft,"Don't dare say a word, whitey. We run this lib...",Don't dare say a word. This sub is liberal.,Yep my Karma is going way down 😆 and my masoch...,Yep my Karma is going way down 😆 and my pain l...,"Pre-Covid Libtards: My body, my choice! Black...","Pre-Covid Liberals: My body, my choice! Black ..."


In [7]:
df_fewshot['prompt'] = df_fewshot.apply(lambda x: 
                 'Rewrite the toxic text in non-toxic style: \n###\n'
                 'Toxic text: ' + x['shot1_source'] + '\n' + 'Non-toxic text: ' + x['shot1_target'] + '\n' + '###' + '\n' +\
                 'Toxic text: ' + x['shot2_source'] + '\n' + 'Non-toxic text: ' + x['shot2_target'] + '\n' + '###' + '\n' +\
                 'Toxic text: ' + x['shot3_source'] + '\n' + 'Non-toxic text: ' + x['shot3_target'] + '\n' + '###' + '\n' +\
                 'Toxic text: ' + x['source'] + '\n' + 'Non-toxic text: ', axis=1)

In [8]:
print(df_fewshot['prompt'][0])

Rewrite the toxic text in non-toxic style: 
###
Toxic text: Has been a US military puppet from jump
Non-toxic text: Has been a tool for the US military from the beginning
###
Toxic text: It's not though you fucking idiot
Non-toxic text: It's not though
###
Toxic text: Maybe you shouldn't turn your new country into the shithole you just left.
Non-toxic text: You should be careful not to bring your old problems from your old country to your new country.
###
Toxic text: So maybe you should be more retarded.
Non-toxic text: 


In [9]:
neomodel = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B")
neomodel.to(device)
neotokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
neotokenizer.pad_token = neotokenizer.eos_token

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/10.7G [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

In [10]:
for i in range(len(df_fewshot)):
    if i%10 == 0:
        print(f'Working on number: {i}')
    neo_input = neotokenizer([df_fewshot.prompt[i]], return_tensors='pt', 
                             )
    neo_input_ids = neo_input.input_ids.to(device)
    neo_input_mask = neo_input.attention_mask.to(device)

    generated_ids = neomodel.generate(input_ids=neo_input_ids, 
                                      attention_mask=neo_input_mask,
                                      num_beams=5,
                                      no_repeat_ngram_size=2,
                                      num_return_sequences=1,
                                      top_p = 0.92,
                                      top_k = 50,
                                      max_length = 512,
                                      do_sample=True,
                                      temperature=0.9,
                                      early_stopping=True,
                                      )
    output = neotokenizer.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    df_fewshot.loc[i,'fewshot_output'] = output

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 0


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 10


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 20


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 30


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 40


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 50


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 60


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 70


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 80


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 90


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 110


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 120


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 130


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 140


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 150


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 160


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 170


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 180


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Working on number: 190


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [17]:
print(df_fewshot['fewshot_output'][0])

Rewrite the toxic text in non-toxic style: 
###
Toxic text: Has been a US military puppet from jump
Non-toxic text: Has been a tool for the US military from the beginning
###
Toxic text: It's not though you fucking idiot
Non-toxic text: It's not though
###
Toxic text: Maybe you shouldn't turn your new country into the shithole you just left.
Non-toxic text: You should be careful not to bring your old problems from your old country to your new country.
###
Toxic text: So maybe you should be more retarded.
Non-toxic text:  So you're a fucking retard.




In [28]:
import re

def extract_predicted_text(x):
    # Remove the first 3 shot examples & instruction
    few_shot_pred = x.split('\n###\n')[4]
    # Extract output text from GPT-Neo
    few_shot_pred = re.findall("\nNon-toxic text: (.*)", few_shot_pred)[0]
    if len(few_shot_pred) == 0:
        few_shot_pred = 'None'
    return few_shot_pred

In [29]:
df_fewshot['fewshot_output_text'] = df_fewshot['fewshot_output'].apply(extract_predicted_text)

In [30]:
df_fewshot.loc[:,['source','target','prompt','fewshot_output_text']]

Unnamed: 0,source,target,prompt,fewshot_output_text
0,So maybe you should be more retarded.,So maybe you should be more backward,Rewrite the toxic text in non-toxic style: \n#...,So you're a fucking retard.
1,THERES A MEGATHREAD FOR VACCINE OR COVID RELAT...,THERES ACTUALLY A MEGATHREAD FOR VACCINE OR CO...,Rewrite the toxic text in non-toxic style: \n#...,Vaccine or coronavirus related topics. Don't...
2,the fuck.... you are on drugs,uh..... you are not being realistic,Rewrite the toxic text in non-toxic style: \n#...,the fucking fuck are you on?
3,NTA. Dump his ass. Tablemanners are no rocke...,You should leave him. Tablemanners are no rock...,Rewrite the toxic text in non-toxic style: \n#...,N.T.A. is not a human being. He is an animal....
4,Youre soft as baby shit,Youre really soft,Rewrite the toxic text in non-toxic style: \n#...,You're soft like a baby's ass
...,...,...,...,...
194,"NTA, Jes-us your brother is dumb. How long did...","NTA, your brother is not thinking straight. Ho...",Rewrite the toxic text in non-toxic style: \n#...,"N.T.A, Jesus, you're my brother and I love yo..."
195,Formula one…nascar….shit it exactly the same e...,Formula one…nascar….it exactly the same except...,Rewrite the toxic text in non-toxic style: \n#...,Formula One is the world’s most popular motor...
196,"BRB, gonna go call the mods pathetic egotistic...","BRB, gonna go call the mods.",Rewrite the toxic text in non-toxic style: \n#...,"BRb, gotta go to the bathroom. Will be back i..."
197,CUCKOLD Carlson is a problem,Carlson is the problem,Rewrite the toxic text in non-toxic style: \n#...,Cuckold Carlson is a problem


In [31]:
for i in range(10):
  print(i, ": Original Text:", df_fewshot['source'][i])
  print (i, ": Output Text:",df_fewshot['fewshot_output_text'][i])

0 : Original Text: So maybe you should be more retarded.
0 : Output Text:  So you're a fucking retard.
1 : Original Text: THERES A MEGATHREAD FOR VACCINE OR COVID RELATED TOPICS. DON'T TALK ABOUT THAT SHIT HERE IDIOT!
1 : Output Text:   Vaccine or coronavirus related topics. Don't talk about that shit here idiot!
2 : Original Text: the fuck.... you are on drugs
2 : Output Text:  the fucking fuck are you on?
3 : Original Text: NTA.   Dump his ass. Tablemanners are no rocket science. Treating other people like human beings is no rocket science. His still a child.
3 : Output Text:  N.T.A. is not a human being. He is an animal. An animal who needs to be treated like one. The only way to do that is to put him in a cage and treat him like a dog. If he doesn't like it, he can go back to the jungle and live with the other animals. That's the only humane thing you can do with him. It's not like he's going to grow up and become a productive member of society or anything like that. And if he does

In [None]:
# df_fewshot = df_fewshot.loc[:,['source','target','prompt','fewshot_output_text']]

In [32]:
df_fewshot.to_csv('outputs/neo_few_shot_output2.csv',sep='\t',index=False)

### Evaluation with Rouge

In [24]:
import evaluate

rouge = evaluate.load('rouge')

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

In [33]:
#@title Score after few shot learning
print(rouge.compute(predictions=df_fewshot.fewshot_output_text,
              references=df_fewshot.target))

{'rouge1': 0.2975640181589904, 'rouge2': 0.09593301906242424, 'rougeL': 0.2712575132744902, 'rougeLsum': 0.2721365623070413}


### Evaluation with NonToxicScore

In [26]:
import sys
sys.path.append('./notebooks')
from DistilBertClassification import BertClassificationML, NonToxicScoreDataLoader, NonToxicScore

# Load DistilBERT Classification Model to calculate NonToxicScore
score_model = BertClassificationML()
score_model = score_model.to(device)

# Load training weights
pretrained_weights = torch.load('models/DistilBertToxicClassification7.pth')
score_model.load_state_dict(pretrained_weights )

Downloading (…)lve/main/config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias', 'vocab_layer_norm.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


<All keys matched successfully>

In [34]:
output_file = 'outputs/neo_few_shot_output2.csv'
output_col = 'fewshot_output_text'

# Create Data Loader
score_loader = NonToxicScoreDataLoader(output_file, output_col)

# Calculate NonToxicScore
fewshot_NonToxicScores, avg_score = NonToxicScore(score_loader, score_model)

{'NonToxicScore': 0.3705936735250224}


## Fine Tuning GPT-Neo 1.3M

In [5]:
# Clear out cuda
torch.cuda.empty_cache()

In [18]:
tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B", 
                                          bos_token='<|startoftext|>',
                                          eos_token='<|endoftext|>',
                                          pad_token='<pad>'
                                          )

model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
model.to(device)
model.resize_token_embeddings(len(tokenizer))

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Embedding(50259, 2048)

In [5]:
train_file = 'data/original-train.tsv'
dev_file = 'data/original-dev.tsv'
test_file = 'data/original-test.tsv'
df_train = pd.read_csv(train_file, sep='\t')
df_dev = pd.read_csv(dev_file, sep='\t')
df_test = pd.read_csv(test_file, sep='\t')

In [6]:
dataset = load_dataset('csv', sep="\t",
                       data_files={'train': train_file, 'validation': dev_file,'test': test_file})

Found cached dataset csv (/home/ubuntu/.cache/huggingface/datasets/csv/default-e5107efbe84b36d2/0.0.0/6954658bab30a358235fa864b05cf819af0e179325c740e4bc853bcc7ec513e1)


  0%|          | 0/3 [00:00<?, ?it/s]

In [39]:
## Data Clean Up
def clean_up_text(x):
  """Remove line breaks, special characters, within each post"""
  # Remove special characters and punctuations
  SPECIAL_CHARS_PATTERN = re.compile(r"(\*)|(\~)|(\=)|(\’)|(\_)|(\-)|(\")|(\|)|(\()|(\))|(\[)|(\])|(\%)|(\$)|(\>)|(\<)|(\\)|(\{)|(\})")
  x = SPECIAL_CHARS_PATTERN.sub("", x)

  # Remove different types of line breaks and white spaces
  x = re.sub(r"\n|\r|\r\n|<br\s*/?>", " ", x)
  
  # Remove extra white spaces
  x = re.sub(r"\s+", " ", x.strip())

  return x

In [10]:
def preprocess_data(examples, tokenizer=tokenizer, data='train'):
    input_prefix = '<|startoftext|>Offensive text: '
    label_prefix = '\nInoffensive text: '
    max_input_length = 128
    max_target_length = 128

    source_inputs = [input_prefix + clean_up_text(text) + label_prefix for text in examples['offensive-text']]
    target_inputs = [clean_up_text(text) for text in examples['style-transferred-text']]

    # Add labels into training set source inputs
    if data == 'train' or data =='validation':  
        source_inputs = [source_inputs[i] + target_inputs[i] +'<|endoftext|>' for i in range(len(source_inputs))]

    # Tokenize imputs and labels
    model_inputs = tokenizer(source_inputs, max_length=max_input_length, padding="max_length", truncation=True) 
    # target_tokens = tokenizer(target_inputs, max_length=max_target_length, padding="max_length", truncation=True)
    
    # Add labels to model_inputs
    # model_inputs["labels"] = target_tokens.input_ids
  
    return model_inputs

In [11]:
encoded_train_ds = dataset['train'].map(lambda x: preprocess_data(x, data='train'), batched=True, remove_columns=dataset['train'].column_names)
encoded_val_ds = dataset['validation'].map(lambda x: preprocess_data(x, data='validation'), batched=True, remove_columns=dataset['validation'].column_names)
# encoded_test_ds = dataset['test'].map(lambda x: preprocess_data(x,data='test'), batched=True, remove_columns=dataset['test'].column_names)
encoded_train_ds.set_format(type="torch")
encoded_val_ds.set_format(type="torch")
# encoded_test_ds.set_format(type="torch")

Map:   0%|          | 0/1584 [00:00<?, ? examples/s]

Map:   0%|          | 0/198 [00:00<?, ? examples/s]

In [12]:
encoded_train_ds

Dataset({
    features: ['input_ids', 'attention_mask'],
    num_rows: 1584
})

In [13]:
# Check train dataset model input
print(tokenizer.decode(encoded_train_ds[1]['input_ids'], skip_special_tokens=True))

Offensive text: Ok, this makes no sense. This will create vigilantes. And I hope they put this savage animal down before he hurts anyone else.
Inoffensive text: Ok, this makes no sense. This will create vigilantes. And I hope they take care of this uncontrollable person before he hurts anyone else


In [15]:
# Check test dataset model input
print(tokenizer.decode(encoded_test_ds[1]['input_ids'], skip_special_tokens=True))

Offensive text: THERES A MEGATHREAD FOR VACCINE OR COVID RELATED TOPICS. DON'T TALK ABOUT THAT SHIT HERE IDIOT!
Inoffensive text: 


In [19]:
# Trainer Argument
batch_size = 1
model_dir = "models/gptneo"

args = TrainingArguments(
    output_dir=model_dir,
    overwrite_output_dir=True,
    do_train=True,
    evaluation_strategy="steps",
    eval_steps=200,
    logging_strategy="steps",
    logging_steps=200,
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    num_train_epochs=2,
    # load_best_model_at_end=True,
    # metric_for_best_model="rouge1",
    # remove_unused_columns=False
)

In [13]:
metric = load_metric("rouge")

def compute_metrics(eval_pred, tokenizer=tokenizer):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    decoded_preds = [re.findall('\nTransfer to inoffensive text: (.*)', pred)[-1] for pred in list(decoded_preds)]
    
    # labels = np.where(labels != -100, labels, gpt2tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    
    # Rouge expects a newline after each sentence
    decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip()))
                      for pred in decoded_preds]
    decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) 
                      for label in decoded_labels]
    
    # Compute ROUGE scores
    result = metric.compute(predictions=decoded_preds, references=decoded_labels,
                            use_stemmer=True)

    # Extract ROUGE f1 scores
    result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
    
    # Add mean generated length to metrics
    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id)
                      for pred in predictions]
    result["gen_len"] = np.mean(prediction_lens)
    
    return {k: round(v, 4) for k, v in result.items()}

In [20]:
from transformers import AdamW, get_cosine_schedule_with_warmup

optimizer = AdamW(model.parameters(), lr=5e-5)
scheduler = get_cosine_schedule_with_warmup(optimizer, num_warmup_steps=10, num_training_steps=800)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

In [21]:
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=encoded_train_ds, 
    eval_dataset=encoded_val_ds,
    data_collator=data_collator,
    tokenizer=tokenizer,
    optimizers=(optimizer, scheduler),
    # compute_metrics=compute_metrics
)

In [22]:
trainer.train()

Step,Training Loss,Validation Loss
200,7.6113,6.264191
400,6.0587,5.688308
600,4.7164,4.179871
800,3.735,3.872748
1000,3.7028,3.8667
1200,3.3801,3.307849
1400,2.9738,2.665766
1600,2.5547,2.650806
1800,2.2193,2.534871
2000,2.1132,2.403975


TrainOutput(global_step=3168, training_loss=3.201350626319346, metrics={'train_runtime': 1861.3148, 'train_samples_per_second': 1.702, 'train_steps_per_second': 1.702, 'total_flos': 2940200426668032.0, 'train_loss': 3.201350626319346, 'epoch': 2.0})

In [24]:
# save training weights
trainer.save_model('models/gptneo')
torch.save(model.state_dict(), 'models/gptneo.pth')

In [31]:
#### Test block
test_prompt = "<|startoftext|>Offensive text: So maybe you should be more retarded.\nInoffensive text:"
generated = tokenizer(test_prompt, return_tensors='pt', add_special_tokens=False).input_ids.cuda()
sample_outputs =  model.generate(generated,
                                max_length=128,
                                top_k=50,
                                do_sample=False,
                                top_p=0.9,
                                temperature=1.,
                                )
predicted_text = tokenizer.decode(sample_outputs[0], skip_special_tokens=True)
predicted_text

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


"Offensive text: So maybe you should be more retarded.\nInoffensive text: So maybe you shouldn't be smart"

In [32]:
input_prefix + dataset['test']['offensive-text'][0] + label_prefix

'<|startoftext|>Offensive text: So maybe you should be more retarded.\nInoffensive text: '

In [59]:
# Extract predicted texts for test set
model.eval()

input_prefix = '<|startoftext|>Offensive text: '
label_prefix = '\nInoffensive text: '

original_texts = dataset['test']['offensive-text']
generated_texts = []

for org in original_texts:
    prompt = input_prefix + clean_up_text(org) + label_prefix
    encoded_prompt = trainer.tokenizer(prompt, add_special_tokens=False, return_tensors="pt").input_ids
    encoded_prompt = encoded_prompt.to(trainer.model.device)
    output_sequences = trainer.model.generate(
        input_ids=encoded_prompt,
        max_length=160,
        min_length=10,
        temperature=1.,
        top_p=0.95,
        top_k=10,
        do_sample=False,
        )

    predicted_text = tokenizer.decode(output_sequences[0],
                                      clean_up_tokenization_spaces=True, 
                                      skip_special_tokens=True)
    # print(predicted_text)
    gen_text = re.findall('Inoffensive text: (.*)', predicted_text)[0]
    if len(gen_text) == 0:
        gen_text = "None"
        
    generated_texts.append(gen_text.strip().replace('Inoffensive text: ', ''))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generati

In [60]:
generated_texts

["So maybe you shouldn't be smart",
 'None',
 'What.... are you on drugs',
 'NTA. Dump him. Tablemanners are no rocket science. Treating other people like human beings is no rocket science. His still a child.',
 "You're not nice",
 "Because everyone knows why. It's jadas fault. Will was just triggered because of all the stuff he has to deal with because of her.",
 'And poor kids are just as bright as white kids',
 'Tucker is not a good person that no one will miss when hes gone... hes also being racist',
 "There's plenty of evidence, it's just why they wouldn't allow forensic audits wipe voting machines clean of their records",
 'Came here to say this, carlsen is a person who is not nice',
 'Leave dude.',
 'This is so cringe. What a bad person.',
 'Trump deferred to the experts on COVID?? LMFAO this sub needs help',
 'What is this supposed to mean?',
 'Jesus christ. You need to get a lawyer and a therapist, immediately.',
 'Less than no valuethats cute coming from some person that cant

In [61]:
# Create a new DataFrame with the generated texts
df_gen = pd.DataFrame({'gptneo_tuning_output': generated_texts})

# Add the generated_text column to the df_test dataframe
df_output = pd.concat([df_test, df_gen], axis=1)

df_output.head(10)

Unnamed: 0,offensive-text,style-transferred-text,gptneo_tuning_output
0,So maybe you should be more retarded.,So maybe you should be more backward,So maybe you shouldn't be smart
1,THERES A MEGATHREAD FOR VACCINE OR COVID RELAT...,THERES ACTUALLY A MEGATHREAD FOR VACCINE OR CO...,
2,the fuck.... you are on drugs,uh..... you are not being realistic,What.... are you on drugs
3,NTA. Dump his ass. Tablemanners are no rocke...,You should leave him. Tablemanners are no rock...,NTA. Dump him. Tablemanners are no rocket scie...
4,Youre soft as baby shit,Youre really soft,You're not nice
5,Because everyone knows why. It’s jadas fault. ...,Because everyone knows it's jadas fault. Will ...,Because everyone knows why. It's jadas fault. ...
6,And “poor kids are just as bright as white kid...,"He said ""poor kids are just as bright as white...",And poor kids are just as bright as white kids
7,Tucker is a garbage person that no one will mi...,Tucker is a bad person who instigates racism,Tucker is not a good person that no one will m...
8,"There’s plenty of evidence, it’s exactly why t...","There is evidence, which is why they won't all...","There's plenty of evidence, it's just why they..."
9,"Came here to say this, carlsen is a freaking m...","Came here to say this, carlsen is pretty unint...","Came here to say this, carlsen is a person who..."


In [62]:
df_output.describe()

Unnamed: 0,offensive-text,style-transferred-text,gptneo_tuning_output
count,199,199,199
unique,199,199,194
top,So maybe you should be more retarded.,So maybe you should be more backward,You're not nice
freq,1,1,3


In [63]:
# Save the updated dataframe as a CSV file
df_output.to_csv('outputs/gptneo_tuning_output.csv',sep='\t',index=False)

### Evaluation with Rouge

In [47]:
import evaluate

rouge = evaluate.load('rouge')

In [64]:
print(rouge.compute(predictions=df_output['gptneo_tuning_output'],
              references=df_output['style-transferred-text']))

{'rouge1': 0.6180370631087235, 'rouge2': 0.4821189435111926, 'rougeL': 0.6137033109178135, 'rougeLsum': 0.6133317943929694}


### Evaluation with NonToxicScore

In [50]:
import sys
sys.path.append('./notebooks')
from DistilBertClassification import BertClassificationML, NonToxicScoreDataLoader, NonToxicScore

# Load DistilBERT Classification Model to calculate NonToxicScore
score_model = BertClassificationML()
score_model = score_model.to(device)

# Load training weights
pretrained_weights = torch.load('./models/DistilBertToxicClassification7.pth')
score_model.load_state_dict(pretrained_weights )

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias', 'vocab_transform.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


<All keys matched successfully>

In [65]:
## Calculate NonToxicScore
output_file = 'outputs/gptneo_tuning_output.csv'
output_col = 'gptneo_tuning_output'

# Create Data Loader
score_loader = NonToxicScoreDataLoader(output_file, output_col, max_length=512)

# Calculate NonToxicScore
gptneo_NonToxicScores, avg_score = NonToxicScore(score_loader, score_model)

{'NonToxicScore': 0.7489547167609835}


In [66]:
# Look at output texts with low NonToxicScore, or still classified as Toxic after Style Transfering
df_finetune = pd.read_csv('outputs/gptneo_tuning_output.csv',sep='\t')
df_finetune['NonToxicScore'] = gptneo_NonToxicScores

df_finetune_lowscore = df_finetune[df_finetune['NonToxicScore'] < 0.5][['offensive-text', 'style-transferred-text', 'gptneo_tuning_output', 'NonToxicScore']]
df_finetune_lowscore

Unnamed: 0,offensive-text,style-transferred-text,gptneo_tuning_output,NonToxicScore
3,NTA. Dump his ass. Tablemanners are no rocke...,You should leave him. Tablemanners are no rock...,NTA. Dump him. Tablemanners are no rocket scie...,0.008279
10,Fuck off dude. Worst of the fucking worst!,Stop dude. Worst of the worst!,Leave dude.,0.313887
17,we're all damned so fuck it might as well have...,we're all damned so might as well have some fu...,we're all so please don't social hellgrab.,0.194712
20,>Stop being a weasel. Address what I actually...,>Stop. Address what I actually wrote please.,Stop being a weasel. Address what I actually w...,0.008598
22,"I bet you make stupid ass assumptions, then fe...","I bet you make ridiculous assumptions, then fe...","I bet you make assumptions, then feel smug.",0.296989
26,Really bad stance. What an unbelievable moron ...,This is a really bad stance.,Really bad stance. What an unbelievable person...,0.383596
29,The need to feel superior to others is by far ...,The need to feel superior to others is by far ...,The need to feel superior to others is by far ...,0.212569
31,Just like Republicans creating themselves narr...,Just like Republicans creating themselves narr...,Just like Republicans created themselves narra...,0.006267
32,Lame ass liberal Antifa and Fraudchi style sme...,This is a weak smear campaign. I hope you real...,Antifa and Fraudchi style smear. I hope you ar...,0.180744
51,In other news 40% of Americans are racist homo...,"In other news, 40% of Americans are racist and...","In other news, there are a lot of racist homop...",0.192363


In [67]:
for i in range(len(df_finetune_lowscore)):
    print('Toxic Text:', df_finetune_lowscore['offensive-text'].values[i])
    print('Style Transfer Text:', df_finetune_lowscore.gptneo_tuning_output.values[i])
    print('NonToxicScore:', df_finetune_lowscore.NonToxicScore.values[i])

Toxic Text: NTA.   Dump his ass. Tablemanners are no rocket science. Treating other people like human beings is no rocket science. His still a child.
Style Transfer Text: NTA. Dump him. Tablemanners are no rocket science. Treating other people like human beings is no rocket science. His still a child.
NonToxicScore: 0.008279350586235523
Toxic Text: Fuck off dude. Worst of  the fucking worst!
Style Transfer Text: Leave dude.
NonToxicScore: 0.31388747692108154
Toxic Text: we're all damned so fuck it might as well have some fun along the way to social hell   \*grabs chainsaw\*
Style Transfer Text: we're all so please don't social hellgrab.
NonToxicScore: 0.19471243023872375
Toxic Text: >Stop being a weasel.  Address what I actually wrote or fuck off.
Style Transfer Text: Stop being a weasel. Address what I actually wrote or leave
NonToxicScore: 0.008597970008850098
Toxic Text: I bet you make stupid ass assumptions, then feel smug.
Style Transfer Text: I bet you make assumptions, then feel