## Running Llama

```
conda deactivate             ## deactivate conda in case you have it active
venv\Scripts\activate.bat    ## activate your virtual environment
python -m llama_cpp.server --host 0.0.0.0 --model C:\Users\Sanne\venv\Lib\site-packages\llama_cpp\Meta-Llama-3.1-8B-Instruct.Q4_K_M.gguf --n_ctx 4096   ## lauch the Llama model
```

In case the llama model was no properly exited, run this to still do so. Replace ... with the number the first row returns
```
netstat -ano | findstr :8000
taskkill /PID ... /F```


In [3]:
from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.bleu_score import corpus_bleu
from nltk.translate.meteor_score import meteor_score
from nltk.tokenize import word_tokenize
from collections import defaultdict
from random import sample, seed
from bert_score import score
from llama_cpp import Llama
import utils_llama as ulla
from evaluate import load
from openai import OpenAI
from tqdm import tqdm
import pandas as pd
import numpy as np
import replicate
import random
import pickle
import nltk
import re

In [50]:
## The dataset created in create_dataset.py 
with open("../Data/all_merged_data.pkl", "rb") as f:
    all_merged_data = pickle.load(f)

## Prompting Llama with OpenAI

In [70]:
## Read in the data
cleaned_data, df_dataset, df_by_gesture_count = ulla.clean_data_for_prompting(all_merged_data)
print(df_dataset)

         file                                           sentence  \
0    p15_gold                         all right put a block down   
1    p15_gold                 put a block one block apart behind   
2    p15_gold                                                 um   
3    p15_gold                                     put two blocks   
4    p15_gold  like about a third of a block a part so more c...   
..        ...                                                ...   
241  p42_gold  the first block is like is like like the first...   
242  p42_gold      and then the second block goes on top of that   
243  p42_gold                        and it's turned at an angle   
244  p42_gold                                     no on top yeah   
245  p42_gold   and then you do that same process all the way up   

                                            speech_amr  \
0    (p / put-01   :mode imperative   :ARG0 (y / yo...   
1    (p / put-01   :mode imperative  :ARG0 (y / you...   
2        

In [72]:
## Take 70% of sentences for train set, 30% for test set
df_example_set, df_test_set = ulla.split_data(df_dataset)

## Shuffle the dataframe for later when selecting the prompt examples
np.random.seed(42)
df_example_set = df_example_set.sample(frac=1).reset_index(drop=True)   
print(df_test_set['sentence'][:50])

0             so put put a block on the back block good
1                                        put two blocks
2                                                    um
3                           and then put one on the top
4          no that doesn't look like it’s going to work
5                                                   yes
6                                      so jiggle the go
7                                           good enough
8              and then jiggle the two in front of them
9                  take another block put it next to it
10                                                 okay
11                one block on top of each of those two
12                                         perfect grab
13    and then you’re going to take one of them and ...
14                                     only two of them
15            place it on the two that are diagonal and
16                                       flip those two
17                         keep going a little b

In [88]:
## Duplicate each test sentence so that it appears once for every condition --> minimal pairs
## Add prompt types to each sentence
scenarios = ["speech", "gesture", "speech_gesture"]
expanded_rows_test = []
for _, row in df_test_set.iterrows():
    for scenario in scenarios:
        new_row = row.copy()
        new_row["prompt_type"] = scenario

        # Include info according to prompt_type
        if scenario == "speech":
            new_row["gesture_amrs"] = None
            new_row["gesture_labels"] = None
        elif scenario == "gesture":
            new_row["speech_amr"] = None
        expanded_rows_test.append(new_row)

df_test = pd.DataFrame(expanded_rows_test)
df_test["group_id"] = df_test.groupby(["file", "sentence"]).ngroup()
print(df_test[30:40])

        file                                           sentence  \
10  p16_gold                                               okay   
10  p16_gold                                               okay   
10  p16_gold                                               okay   
11  p16_gold              one block on top of each of those two   
11  p16_gold              one block on top of each of those two   
11  p16_gold              one block on top of each of those two   
12  p17_gold                                       perfect grab   
12  p17_gold                                       perfect grab   
12  p17_gold                                       perfect grab   
13  p17_gold  and then you’re going to take one of them and ...   

                                           speech_amr  \
10                                         (o / okay)   
10                                               None   
10                                         (o / okay)   
11  (b / block   :quant 1   :locat

In [516]:
## Store the dataframes
test_dataframe = "test_df.pkl"
example_dataframe = "example_df.pkl"
pickle.dump(df_test, open(test_dataframe, 'wb'))
pickle.dump(df_example_set, open(example_dataframe, 'wb'))

In [498]:
# Point to the server
client = OpenAI(base_url="http://localhost:8000/v1", api_key="cltl")

In [84]:
# def write_prompt(prompt_type, speech_amr=None, gesture_amrs=None, gesture_labels=None, examples=None):
#     """
#     Generate prompt based on task type
#     """
#     if prompt_type == "speech":
#         prompt = "Given the following speech Abstract Meaning Representation (AMR), generate a corresponding sentence in natural spoken English. Provide a short explanation."
#         if examples:
#             prompt += " First, read the examples to understand the Abstract Meaning Representation format:\n"
#             for i, ex in enumerate(examples):
#                 prompt += f"\nExample {i+1}:\n"
#                 prompt += f"Sentence: {ex['sentence']}\n"
#                 prompt += f"Speech AMR:\n {ulla.pretty_amr(ex['speech_amr'])}\n"
                
#             prompt += "\nNow, generate a sentence from the following speech AMR and explain your reasoning. Please provide the output only in json format:\n"
#             prompt += '[{"sentence": "Your generated sentence here.", "explanation": "Your explanation here."}]\n'
#             prompt += f"Speech AMR:\n {ulla.pretty_amr(speech_amr)}\n"
#             #prompt += f"Sentence:\nExplanation:\n"
#             return prompt
            
#     elif prompt_type == "gesture":
#         prompt = "Given the following gesture label(s) and gesture Abstract Meaning Representation (AMR), generate a corresponding sentence and speech AMR in natural spoken English. Provide a short explanation."
#         if examples:
#             prompt += " First, read the examples to understand the Abstract Meaning Representation format:\n"
#             for i, ex in enumerate(examples):
#                 prompt += f"\nExample {i+1}:\n"
#                 prompt += f"Sentence: {ex['sentence']}\n"
#                 prompt += f"Speech AMR:\n {ulla.pretty_amr(ex['speech_amr'])}\n"
#                 prompt += f"\nGesture label(s): \n" + "\n".join(ex['gesture_labels']) + "\n"
#                 prompt += f"\nGesture AMR:\n"   #{ulla.pretty_amr(ex['gesture_amrs'])}\n
#                 for amr in ex["gesture_amrs"]:
#                     prompt += ulla.pretty_amr(amr) + "\n"
                    
#             prompt += "\nNow, generate a sentence and the corresponding speech AMR from the following gesture label(s) and gesture AMR and explain your reasoning. Please provide the output only in json format. The speech AMR should be on one line:\n"
#             prompt += '[{"sentence": "Your generated sentence here.", "speech AMR:" "Your generated speech AMR here", "explanation": "Your explanation here."}]\n'
#             prompt += f"Gesture label(s):\n" + "\n".join(gesture_labels) + "\n"
#             prompt += f"Gesture AMR:\n"      #{ulla.pretty_amr(gesture_amrs)}\n"
#             for amr in gesture_amrs:
#                     prompt += ulla.pretty_amr(amr) + "\n"
#             #prompt += f"Sentence:\nSpeech AMR:\nExplanation:\n"
#             return prompt

#     elif prompt_type == "speech_gesture":
#         prompt = "Given the following gesture label(s), gesture Abstract Meaning Representation (AMR) and speech AMR, generate a corresponding sentence in natural spoken English. Provide a short explanation."
#         if examples:
#             prompt += " First, read the examples to understand the Abstract Meaning Representation format:\n"
#             for i, ex in enumerate(examples):
#                 prompt += f"\nExample {i+1}:\n"
#                 prompt += f"Sentence: {ex['sentence']}\n"
#                 prompt += f"Speech AMR:\n {ulla.pretty_amr(ex['speech_amr'])}\n"
#                 prompt += f"\nGesture label(s): \n" + "\n".join(ex['gesture_labels']) + "\n"
#                 prompt += f"\nGesture AMR:\n"  #{ulla.pretty_amr(ex['gesture_amrs'])}\n
#                 for amr in ex["gesture_amrs"]:
#                     prompt += ulla.pretty_amr(amr) + "\n" 
                    
#             prompt += "\nNow, generate a sentence from the following gesture label(s), gesture AMR and speech AMR and explain your reasoning. Please provide the output only in json format:\n"
#             prompt += '[{"sentence": "Your generated sentence here.", "explanation": "Your explanation here."}]\n'
#             prompt += f"Gesture label(s):\n" + "\n".join(gesture_labels) + "\n"
#             prompt += f"Gesture AMR:\n"      #{ulla.pretty_amr(gesture_amrs)}\n"
#             for amr in gesture_amrs:
#                     prompt += ulla.pretty_amr(amr) + "\n"
#             prompt += f"Speech AMR:\n {ulla.pretty_amr(speech_amr)}\n"
#             #prompt += f"Sentence:\nExplanation:\n"
#             return prompt
    
#     else:
#         raise ValueError(f"Invalid prompt_type: {prompt_type}")

# def generate_prompt(df_test, examples):
#     """
    
#     """
#     prompts = []
#     #random.seed(seed_value)  # Set the seed to make it reproducible
    
#     example_idx = 0

#     for idx, row in df_test.iterrows():
#         #sampled_examples = random.sample(examples, 2)
#         sampled_examples = [examples[example_idx], examples[(example_idx + 1) % len(examples)]] 
#         example_idx = (example_idx + 2) % len(examples)
        
#         prompt = write_prompt(
#             prompt_type=row["prompt_type"],
#             speech_amr=row.get("speech_amr"),
#             gesture_amrs=row.get("gesture_amrs"),
#             gesture_labels=row.get("gesture_labels"),
#             examples=sampled_examples  
#         )
    
#         prompts.append({
#             "prompt": prompt,
#             "meta": {
#                 "sentence": row["sentence"],
#                 "scenario": row["prompt_type"],
#                 "file": row["file"]
#             }
#         })
#     return prompts

# def query_LLM(model_client, prompt, temp=0.3):
#     history = [
#         {"role": "system", "content": prompt},
#     ]
#     completion = model_client.chat.completions.create(
#         model="local-model", # this field is currently unused
#         messages=history,
#         temperature=temp,
#         #max_tokens=2048,
#         top_p=0.9,
#         stream=False,
#         seed=2201
#     )
#     return completion.choices[0].message.content

# def query_LLM_multiple(model_client, prompt, temp=0.3, n=3):
#     completions = []
#     for i in range(n):
#         response = model_client.chat.completions.create(
#             model="local-model",  
#             messages=[{"role": "system", "content": prompt}],
#             temperature=temp,
#             max_tokens=2048,
#             stream=False,
#             seed=13 + i                    ## Slightly vary seed for different outputs
#         )
#         completions.append(response.choices[0].message.content)
#     return completions

In [78]:
## Turn example df into a list of dicts
examples = []

for idx, row in df_example_set.iterrows():
    examples.append({
        "sentence": row.get("sentence"),
        "speech_amr": row.get("speech_amr"),
        "gesture_amrs": row.get("gesture_amrs"),
        "gesture_labels": row.get("gesture_labels"),
        "num_gesture_amrs": row.get("num_gesture_amrs"),
    })

## These are all the example sentences that will be given to Llama.
for item in examples:
    print(item['sentence'])

okay
space two out a little less than a block length
and you can you can place them close to each other
it cannot fall down from the ground right
on the side with three the one closest to the middle put another block on top
now these are a little jiggled
you're done
put a bit a bit right uh
it starts in the top left
yea yea and then it
it's gonna be a pyramid from three of the rows of two
the three blocks all touch and they’re straight
okay
stack stack it up no
move that block uh
they’re in other words are not perfectly clear
them towards you
gonna have another block right next to that block so it’d be over the space of those two bottom blocks
just one
and go
the base is going to have four second one’s going to have three then two on top of that and then one
start off with just a block and then put a block on top of
okay and
stack three blocks on one side yup
like about a third of a block a part so more close in than that
and then the second block goes on top of that
great yep
seven bl

In [90]:
## Generate all prompts that will be given to Llama to check that the inputs etc are correct
prompts = ulla.generate_prompt(df_test, examples)
for item in prompts:
    print(item['prompt'])

Given the following speech Abstract Meaning Representation (AMR), generate a corresponding sentence in natural spoken English. Provide a short explanation. First, read the examples to understand the Abstract Meaning Representation format:

Example 1:
Sentence: okay
Speech AMR:
 (o/okay-04)

Example 2:
Sentence: space two out a little less than a block length
Speech AMR:
 (s/space-01
	:mode imperative
	:ARG0 (y/you)
	:ARG1 (i/implicit-role
		:quant 2)
	:ARG2 (q/distance-quantity
		:unit (b/block)
		:ARG1-of (h/have-quant-91
			:ARG2 1
			:ARG3 (l/less
				:mod (l2/little)))))

Now, generate a sentence from the following speech AMR and explain your reasoning. Please provide the output only in json format:
[{"sentence": "Your generated sentence here.", "explanation": "Your explanation here."}]
Speech AMR:
 (p/put-01
	:mode imperative
	:ARG0 (y/you)
	:ARG1 (b/block
		:quant 1)
	:ARG2 (b2/block
		:mod (b3/back)))

Given the following gesture label(s) and gesture Abstract Meaning Representat

In [358]:
random.seed(12)

final_all_results = []
prompts = ulla.generate_prompt(df_test, examples)
total_prompts = len(prompts)
for prompt in tqdm(prompts, total=total_prompts, desc="Processing", unit="row"):
    call_llama = ulla.query_LLM_multiple(client, prompt["prompt"], temp=0.3, n=3)
    final_all_results.append({
        "prompt": prompt["prompt"],
        "sentence": prompt["meta"]["sentence"],
        "scenario": prompt["meta"]["scenario"],
        "file": prompt["meta"]["file"],
        "llama_1": call_llama[0],
        "llama_2": call_llama[1],
        "llama_3": call_llama[2]
    })

Processing: 100%|██████████| 198/198 [5:54:56<00:00, 107.56s/row]  


In [389]:
## Official run
last_results_llama = 'last_results_llama.pkl'
pickle.dump(final_all_results, open(last_results_llama, 'wb'))

In [92]:
## Open the FINAL results
last_results_llama = 'last_results_llama.pkl'
official_results = pickle.load(open(last_results_llama, 'rb'))

In [102]:
## Print the results from the official run. You can play around with what you want to print.
## You can look up specific sentences to see Llama's output for example.
for item in official_results:
    print(f"Sentence: {item['sentence']}\nScenario: {item['scenario']}\nPrompt:\n{item['prompt']}") 
    # print(llama 1:\n{item['llama_1']}\nllama 2:\n{item['llama_2']}\nllama 3:\n{item['llama_3']}")
    print(120*"-")

Sentence: so put put a block on the back block good
Scenario: speech
Prompt:
Given the following speech Abstract Meaning Representation (AMR), generate a corresponding sentence in natural spoken English. Provide a short explanation. First, read the examples to understand the Abstract Meaning Representation format:

Example 1:
Sentence: okay
Speech AMR:
 (o/okay-04)

Example 2:
Sentence: space two out a little less than a block length
Speech AMR:
 (s/space-01
	:mode imperative
	:ARG0 (y/you)
	:ARG1 (i/implicit-role
		:quant 2)
	:ARG2 (q/distance-quantity
		:unit (b/block)
		:ARG1-of (h/have-quant-91
			:ARG2 1
			:ARG3 (l/less
				:mod (l2/little)))))

Now, generate a sentence from the following speech AMR and explain your reasoning. Please provide the output only in json format:
[{"sentence": "Your generated sentence here.", "explanation": "Your explanation here."}]
Speech AMR:
 (p/put-01
	:mode imperative
	:ARG0 (y/you)
	:ARG1 (b/block
		:quant 1)
	:ARG2 (b2/block
		:mod (b3/back)))

