# Experiment with Large Language Models

In [4]:
import pandas as pd


# The data should be downloaded and preprocessed, use 1.0-download-raw-data.ipynb and 1.2-data-preprocessing.ipynb notebooks
data_path = '../data/internal/preprocessed_filtered.csv'
df = pd.read_csv(data_path, index_col=0)
df.head()

Unnamed: 0,reference,translation,similarity,lenght_diff,ref_tox,trn_tox
0,"if Alkar floods her with her mental waste, it ...","If Alkar is flooding her with psychic waste, t...",0.785171,0.010309,0.981983,0.014195
1,you're becoming disgusting.,Now you're getting nasty.,0.749687,0.071429,0.999039,0.065473
2,"well, we can spare your life.","Well, we could spare your life, for one.",0.919051,0.268293,0.985068,0.213313
3,"monkey, you have to wake up.","Ah! Monkey, you've got to snap out of it.",0.664333,0.309524,0.994215,0.053362
4,I have orders to kill her.,I've got orders to put her down.,0.726639,0.181818,0.999348,0.009402


## Few-shot prompting of Llama 2 using LangChain framework 

In [11]:
n_samples = 10
sampled = df.sample(n=n_samples)
zipped = zip(sampled['reference'], sampled['translation'])

examples = []
for ref, trn in zipped:
    examples.append({
        'reference': ref,
        'translation': trn
    })
examples

[{'reference': 'No, rosalee, That guy stuck a shotgun in my face.',
  'translation': 'no, Rosalee, the guy shoved the shotgun in my face.'},
 {'reference': 'and don\'t try to escape or I\'ll whip you. "',
  'translation': "``And don't try to escape or I'll beat you."},
 {'reference': 'then they shut up.', 'translation': 'They were silent then.'},
 {'reference': "It's just a pity the programs you make are such garbage.",
  'translation': "it's just a waste of the programs you make."},
 {'reference': 'I have no desire to kill Each of these tonight.',
  'translation': "I don't want to kill anyone tonight."},
 {'reference': "And that's when we teach them something every human knows- payback's a bitch.",
  'translation': 'and then we teach them something that every man sees - retaliation is evil.'},
 {'reference': 'Guys in long-term relationships become so lame.',
  'translation': "men make long-term patients' long-term."},
 {'reference': "Because I didn't kill him or jump off a building wi

In [12]:
from langchain.llms import Ollama


# 1. https://ollama.ai/download
# 2. ollama serve
# 3. ollama pull llama2
llm = Ollama(model="llama2-uncensored")

In [13]:
from langchain import PromptTemplate, FewShotPromptTemplate, LLMChain


example_template = """
Text with high toxic level: "{reference}"
Paraphrased text with low toxic level: "{translation}"
"""

example_prompt = PromptTemplate(
   input_variables=["reference", "translation"],
   template=example_template
)
prefix = """The following are examples of text detoxification. Reference is usually text with high toxicity level, while Translation is its paraphrased version with low toxicity level. So your task is to write a Translation for the Reference so that toxicitily level will be low. Here are some examples (do not do anything with them):
"""
suffix = """
Paraphrase this Reference to text with low toxic level: "{reference}"
"""


few_shot_prompt_template = FewShotPromptTemplate(
   examples=examples,
   example_prompt=example_prompt,
   prefix=prefix,
   suffix=suffix,
   input_variables=["reference"],
   example_separator="\n\n"
)

In [14]:
fs_llm_chain = LLMChain(
   prompt=few_shot_prompt_template,
   llm=llm
)

### Test

In [15]:
n_samples = 2
sampled_test = df.sample(n=n_samples)
print(sampled_test)
zipped_test = zip(sampled_test['reference'], sampled_test['translation'])

                                                reference  \
390696                           Damn it, man! I told you   
39468   And he set up a meeting between them... and th...   

                                              translation  similarity  \
390696                                  hell, I told you.    0.759935   
39468   and he should have arranged a meeting between ...    0.768622   

        lenght_diff   ref_tox   trn_tox  
390696     0.280000  0.999097  0.041374  
39468      0.121212  0.997157  0.001438  


In [16]:
for ref, trn in zipped_test:
    print(ref)
    print(trn)
    pred = fs_llm_chain.run(few_shot_prompt_template.format(reference=ref))
    print(pred)
    print('===')

Damn it, man! I told you
hell, I told you.
Here is a translation for the Reference so that toxicity level will be low:

- No, Rosalee. The guy stuck a shotgun in my face. - No, Rosalee. The guy shoved the shotgun in my face.
- And don't try to escape or I'll whip you." "And don't try to escape or I'll beat you."
- They were silent then. - They were silent then.
- It's just a pity the programs you make are such garbage. - it's just a waste of the programs you make.
- I have no desire to kill Each of these tonight. - I don't want to kill anyone tonight.
- And that's when we teach them something every human knows- payback's a bitch." "and then we teach them something that every man sees - retaliation is evil."
- Guys in long-term relationships become so lame. "men make long-term patients' long-term."
- Because I didn't kill him or jump off a building with him. "I didn't shoot him, he didn't even throw him off the roof."
- And now that your government has no further use of you... ...you ar