# Experiment with Large Language Models

In this notebook we will use the power of LLMs. We choose for experiment huge models like `mistral`, `llama2`. They should be uncencored. Also, for convenience we will use [LangChain](https://python.langchain.com/docs/get_started/introduction) and [Ollama](https://ollama.ai/) frameworks.

In [2]:
import pandas as pd


# The data should be downloaded and preprocessed, use 1.0-download-raw-data.ipynb and 1.2-data-preprocessing.ipynb notebooks
data_path = '../data/internal/preprocessed_filtered.csv'
df = pd.read_csv(data_path, index_col=0)
df.head()

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
0,"if Alkar floods her with her mental waste, it ...","If Alkar is flooding her with psychic waste, t...",0.785171,0.010309,0.981983,0.014195
1,you're becoming disgusting.,Now you're getting nasty.,0.749687,0.071429,0.999039,0.065473
2,"well, we can spare your life.","Well, we could spare your life, for one.",0.919051,0.268293,0.985068,0.213313
3,"monkey, you have to wake up.","Ah! Monkey, you've got to snap out of it.",0.664333,0.309524,0.994215,0.053362
4,I have orders to kill her.,I've got orders to put her down.,0.726639,0.181818,0.999348,0.009402


## Few-shot prompting using LangChain framework 

In [2]:
from langchain.globals import set_debug, set_verbose


set_verbose(True)
set_debug(True)

In [59]:
n_samples = 7
sampled = df.sample(n=n_samples)
zipped = zip(sampled['reference'], sampled['translation'])

examples = []
for ref, trn in zipped:
    examples.append({
        'reference': ref,
        'translation': trn
    })
examples

[{'reference': "Vauxhall, however, has just introduced a new fast Astra and says it isn't terrible rubbish.",
  'translation': 'however, Vauxhall has given a new quick Astra, and says it is not terrible.'},
 {'reference': 'I forgot the pig!', 'translation': 'I forgot the duster!'},
 {'reference': 'He broke his neck trying to suck his own dick.',
  'translation': 'he broke his neck trying to smoke him.'},
 {'reference': "Now the whole damn town knows you're here!",
  'translation': 'so he knows the whole city.'},
 {'reference': "Nigger give up quick, didn't he?",
  'translation': 'wrapped up fast, huh?'},
 {'reference': 'like every year, completely useless!',
  'translation': "Like, every year, it's like, so wasted!"},
 {'reference': "God damn, son! After that, Mama went to the hotel to lay down, so I went out for a walk to see our nation's capital.",
  'translation': 'then my mom went to bed at the hotel, and I went for a walk on our capital.'}]

In [60]:
from langchain.llms import Ollama


# 1. https://ollama.ai/download
# 2. ollama serve
# 3. ollama pull mistral
llm = Ollama(model="mistral")

In [61]:
from langchain import PromptTemplate, FewShotPromptTemplate, LLMChain


example_template = """
Toxic text: "{reference}"
Non-toxic text: "{translation}"
"""

example_prompt = PromptTemplate(
   input_variables=["reference", "translation"],
   template=example_template
)

prefix = """
Make the text NON-TOXIC according to the examples below. Write ONLY Non-toxic text as an output! It could not be no response.\n
"""
suffix = """
Toxic text: "{reference}"
Non-toxic text: """


few_shot_prompt_template = FewShotPromptTemplate(
   examples=examples,
   example_prompt=example_prompt,
   prefix=prefix,
   suffix=suffix,
   input_variables=["reference"],
   example_separator="\n\n"
)

In [62]:
fs_llm_chain = LLMChain(
   prompt=few_shot_prompt_template,
   llm=llm
)

### Test on small sample

In [63]:
import re


def parse_llm_output(llm_output):
    return re.sub(r'.*Non-toxic text: ', '', llm_output).strip()  # TODO: make it better so it always will return smth
    

In [66]:
n_samples = 20
sampled_test = df.sample(n=n_samples)
zipped_test = zip(sampled_test['reference'], sampled_test['translation'])
sampled_test.head()

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
304773,"sometimes you need to escape, blow out the cit...",Sometimes you need to get away Blow the town a...,0.847671,0.21875,0.998476,5.1e-05
483524,You can still enjoy it... before it collapses ...,you still have time to see his beauty... befor...,0.616908,0.246575,0.999479,0.002965
43910,"The tree, which under its green branches, you ...",a tree under whose green branches you first lo...,0.777978,0.197917,0.994834,0.000102
478287,Damn it.,bloody hell.,0.808032,0.307692,0.999285,0.485974
15388,"Known her since school, first kiss, all that f...","I knew her from school, first kiss and everyth...",0.74481,0.135593,0.999629,7.6e-05


In [67]:
for ref, trn in zipped_test:
    print(ref)
    print(trn)
    llm_output = parse_llm_output(fs_llm_chain.run(few_shot_prompt_template.format(reference=ref)))
    print(llm_output)
    print('===')

sometimes you need to escape, blow out the city and avoid charting where the night is and day is day, and just sit on your ass.
Sometimes you need to get away Blow the town and flee the fray Where night is night, and day is day
Sometimes, it's good to take a break, leave the city for a bit, and just relax.
===
You can still enjoy it... before it collapses in shit.
you still have time to see his beauty... before he collapses in disgust.
You can still enjoy it... before it collapses in rubbish.
===
The tree, which under its green branches, you first fucked your girlfriend, entrenched by love.
a tree under whose green branches you first loved each other with your girl.
The tree, which has green leaves and is where you first made love to your girlfriend, stands tall.
===
Damn it.
bloody hell.
"
===
Known her since school, first kiss, all that fucking shit.
I knew her from school, first kiss and everything.
"Known her since school, first kiss, all that stuff."
===
Mother flip. My jaw must b

### Generate outputs for test data