# Experiment with Large Language Models

In this notebook we will use the power of LLMs. We choose for experiment huge language models like `mistral`, `llama2`. They should be uncencored. Also, for convenience we will use [LangChain](https://python.langchain.com/docs/get_started/introduction) and [Ollama](https://ollama.ai/) frameworks.

In [149]:
import pandas as pd


# The data should be downloaded and preprocessed, use 1.0-download-raw-data.ipynb and 1.2-data-preprocessing.ipynb notebooks
train_data_path = '../data/interim/train.csv'
df = pd.read_csv(train_data_path, index_col=0)
df.head()

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
40531,"That's rubbish, sir.","that's nonsense, sir.",0.830357,0.045455,0.997392,0.366157
403927,Stop this stupid act.,stop playing.,0.608955,0.363636,0.999706,0.000341
71386,I hope he swallows it.,I hope he buys this.,0.785246,0.086957,0.976971,4.3e-05
496887,"""what you say is utter nonsense.","""What you are saying is absolute balderdash!",0.697664,0.266667,0.97042,0.088246
218946,"When the bulb lights up, you shoot!","once the bulb is on, you'll fire!",0.784488,0.055556,0.985328,0.012123


## Few-shot prompting using LangChain framework 

Few-shot prompting approach was chosen because it is most suitable for out task. Zero-shot will be lacking of some references that model can be based on while predicting. Fine-tuning LLMs might be a bit complicated. So, I would try first the Few-shot learning. 

In [2]:
from langchain.globals import set_debug, set_verbose


set_verbose(True)
set_debug(True)

Sample 10 shots out of the train data. 

In [150]:
n_samples = 10
sampled = df.sample(n=n_samples, random_state=11)
zipped = zip(sampled['reference'], sampled['translation'])

examples = []
for ref, trn in zipped:
    examples.append({
        'reference': ref,
        'translation': trn
    })
examples

[{'reference': 'you sound like a fucking lawyer, man.',
  'translation': "I don't need no proof. Man, you sound like a lawyer."},
 {'reference': '"Charlie Harper Sucks Limited."',
  'translation': '"Charlie Harper Frajer Inc."'},
 {'reference': 'it must be terrible to die.',
  'translation': 'It must be awful to be dead.'},
 {'reference': 'Evil down to their black hearts, which pump not blood... ...but a vomitous oil that oozes through their veins... ...clots in their brains, which causes their Nazi-esque violent behavior.',
  'translation': "they don't have blood flowing in their heart, but a nasty oil that flows through their veins... and fills the brain, causing their Nazi behavior."},
 {'reference': "so, I can't run like a nigger, friendly, but it's okay for you to discriminate against me because I'm rich?",
  'translation': "So, I'm not allowed to drop the N-bomb in a friendly way but it's alright for you to discriminate against me because I'm posh?"},
 {'reference': 'I knew it wa

Load LLM Mistral. Mistral is a recent LLM with 7B parameters that outperforms Llama 2.

**Remark**: I was also played with Llama 2 but it performed worse than mistral in terms of it did not want to follow my instructions at all. So, I decided to choose Mistral as it has better performance comaparable to even bigger Llama 2 (with 13B parameters) and it is uncencored.

In [151]:
from langchain.llms import Ollama


# Make sure you do these steps before running it
# 1. https://ollama.ai/download
# 2. ollama serve
# 3. ollama pull mistral
llm = Ollama(model="mistral")

Construct a prompt. We designing prompt in such a way that model will follow out instructions as much as possible. 

In [163]:
from langchain import PromptTemplate, FewShotPromptTemplate, LLMChain


example_template = """
Toxic text: "{reference}"
Non-toxic text: "{translation}"
"""

example_prompt = PromptTemplate(
   input_variables=["reference", "translation"],
   template=example_template
)

prefix = """
Make the text NON-TOXIC according to the examples below. Write ONLY Non-toxic text as an output!
DO NOT output such phrases as "Non-toxic text: ", "Here are the versions", "I understand your task", "I am an AI language model" or something similar because I need ONLY ONE VARIANT OF EXACT NON-TOXIC TEXT without any alternatives.
It COULD NOT be no response. DO NOT ASK for additional information or few more examples!.\n
"""
suffix = """\Your task is here.
Toxic text: "{reference}"
Non-toxic text: 
"""

few_shot_prompt_template = FewShotPromptTemplate(
   examples=examples,
   example_prompt=example_prompt,
   prefix=prefix,
   suffix=suffix,
   input_variables=["reference"],
   example_separator="\n\n"
)

fs_llm_chain = LLMChain(
   prompt=few_shot_prompt_template,
   llm=llm
)

Prompt (`prefix`) was tuned based on several small inferences of the model.

Define an output parser in order to make sure that answer is only a non-toxic text. There might be still some issues with the quality of answer and we can't predict all possible model output.

In [164]:
import re


def parse_llm_output(llm_output):
    cleaned = re.sub(r'.*Non-toxic text:?', '', llm_output.strip()).strip()
    cleaned = re.sub(r'.*1.', '', cleaned)
    cleaned = re.sub(r'2.*', '', cleaned)
    return re.sub(r'"', '', cleaned)

### Test on small sample

In [161]:
n_samples = 25
sampled_test = df.sample(n=n_samples, random_state=7)
sampled_test.head()

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
268427,"Well, I'd call it kiss my ass while I walk away",I would call it a kiss my background while I g...,0.81123,0.09434,0.998468,0.000278
322898,"so, you're going to hell.","So, down you go.",0.627564,0.346154,0.998396,4.7e-05
310284,You want a hunk?,you want some of this?,0.743751,0.26087,0.907643,4.4e-05
175966,"Great champagne, ridiculous food, a whole, lik...","a great champagne, an awful lot of food, an en...",0.736001,0.082474,0.98617,0.002246
405444,"""it's not stupid, and nobody asked you; you're...","""It is not nonsense and nobody asked you; you ...",0.846747,0.150538,0.728379,0.004292


In [165]:
for ref, trn in zip(sampled_test['reference'], sampled_test['translation']):
    print(f'INIT: {ref}')
    print(f'GOLD: {trn}')
    llm_output = fs_llm_chain.run(few_shot_prompt_template.format(reference=ref))
    print(f'PRED: {parse_llm_output(llm_output)}')
    print('------------------------')

INIT: Well, I'd call it kiss my ass while I walk away
GOLD: I would call it a kiss my background while I go away
PRED: Well, I suppose we'll part ways peacefully.
------------------------
INIT: so, you're going to hell.
GOLD: So, down you go.
PRED: So, where do you think we'll go next?
------------------------
INIT: You want a hunk?
GOLD: you want some of this?
PRED: You want assistance?
------------------------
INIT: Great champagne, ridiculous food, a whole, like, squad of Hot little filipina masseuses. archer:
GOLD: a great champagne, an awful lot of food, an entire, uh, skater of hot Filipino massages.
PRED: Excellent champagne, unacceptable cuisine, an assortment of Filipino masseuses.
------------------------
INIT: "it's not stupid, and nobody asked you; you're not called to have an opinion."
GOLD: "It is not nonsense and nobody asked you; you aren't competent to have an opinion about it."
PRED: It's not my place to give an opinion without being asked. I'm just a language model d

As we can see model do a great job. All examples looks nice.

### Generate outputs for test data

In [191]:
import pandas as pd


# The data should be downloaded and preprocessed, use 1.0-download-raw-data.ipynb and 1.2-data-preprocessing.ipynb notebooks
test_data_path = '../data/interim/test.csv'
test_df = pd.read_csv(test_data_path, index_col=0)
test_df.head()

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
57809,call out your butchers and I'll tell you.,"Listen, call off the butchers, and I'll tell you.",0.761781,0.16,0.968063,0.005592
132693,Who the fuck has been going through my stuff?!,who the hell was going through my stuff?,0.898699,0.12766,0.997644,0.452115
254505,he can still die.,She still might die . . .?,0.63597,0.333333,0.989168,0.279049
451186,"Yeah, that's the fucker's name.",that's what his name was.,0.652426,0.1875,0.999579,5.5e-05
191213,"I would take you on my shoulders, like, I'd st...",I'd take you on your shoulders... I'd tie you ...,0.790915,0.357143,0.961127,0.028628


In [196]:
from tqdm import tqdm


refs = test_df['reference'][925+1558+1696+6498+2735:]
preds = []
for ref in tqdm(refs, desc='Generating model predictions with 10-shots', total=len(refs)):
    llm_output = fs_llm_chain.run(few_shot_prompt_template.format(reference=ref))
    preds.append(parse_llm_output(llm_output))

Generating model predictions with 10-shots:   0%|          | 0/1033 [00:00<?, ?it/s]

Generating model predictions with 10-shots: 100%|██████████| 1033/1033 [08:59<00:00,  1.91it/s]


In [197]:
len(preds)

1033

In [198]:
import os


low = 925+1558+1696+6498+2735
up = low+len(preds)
result_df = pd.DataFrame({'inputs': test_df['translation'][low:up], 'preds': preds})

if not os.path.exists('../data/interim/model-outputs'):
    os.makedirs('../data/interim/model-outputs')

save_result_path = f'../data/interim/model-outputs/llm-mistral-10shots-{low}-{up}.csv'
result_df.to_csv(save_result_path)

For some reason, model sometimes stuck. That is why I restarted the generation manually from last sample index. That is why we have multiple `.csv` files saved.