<a href="https://colab.research.google.com/github/pritamsGithub/SalesSummaryMVC/blob/master/Week_2_practice_CoT_and_Agents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to ML from an LLM standpoint
# Week 2. Practice session

## Setting up

Start by loading libraries. And don't forget to enter your OpenAI API key.

In [None]:
!pip install -q openai tiktoken

In [None]:
import os

with open("openai_api_key", "r") as file:
    openai_api_key = file.read().strip()

os.environ["OPENAI_API_KEY"] = openai_api_key

with open("nebius_api_key", "r") as file:
    nebius_api_key = file.read().strip()

os.environ["NEBIUS_API_KEY"] = nebius_api_key

# Classifying dialogue roles

In this task, we'll work with the [DialogRE benchmark](https://github.com/nlpdata/dialogre) and classify roles in dialogs using Llama models and Nebius AI Studio.

Let's start by loading the dataset.

In [None]:
import json

with open('dialogre_dev.json', 'r') as f:
    dialog_data_raw = json.load(f)

It's important to look at the data.

In [None]:
len(dialog_data_raw)

358

In [None]:
dialog_data_raw[0]

[['Speaker 1: Hey!',
  'Speaker 2: Hey.',
  "Speaker 3: Hey, man. What's up?",
  "Speaker 1: Maybe you can tell me. My agent would like to know why I didn't show up at the audition I didn't know I had today. The first good thing she gets me in weeks. How could you not give me the message?!",
  "Speaker 3: Well, I'll tell ya I do enjoy guilt, but, ah, it wasn't me.",
  'Speaker 2: Yes, it was! It was him! Uh huh! Okay, it was me!',
  'Speaker 1: How is it you?',
  "Speaker 2: Well, it was just, it was all so crazy, you know. I mean, Chandler was in the closet, counting to 10, and he was up to 7 and I hadn't found a place to hide yet. I-I-I meant to tell you, and I wrote it all down on my hand. See, all of it.",
  "Speaker 1: Yep, that's my audition.",
  'Speaker 4: See, now this is why I keep notepads everywhere.',
  "Speaker 2: Yep, and that's why we don't invite you to play.",
  'Speaker 5: What is the great tragedy here? You go get yourself another appointment.',
  'Speaker 1: Well, 

As you see, many different character roles are labeled for every dialog.

Here is the list of all roles:

In [None]:
all_relations = []
for dialog in dialog_data_raw:
    for relation in dialog[1]:
        for individual_relation in relation['r']:
            if not individual_relation in all_relations:
                all_relations.append(individual_relation)
all_relations

['per:title',
 'per:alternate_names',
 'per:client',
 'per:friends',
 'unanswerable',
 'per:spouse',
 'per:children',
 'per:parents',
 'per:age',
 'per:siblings',
 'per:roommate',
 'per:negative_impression',
 'per:pet',
 'per:positive_impression',
 'per:girl/boyfriend',
 'org:employees_or_members',
 'per:employee_or_member_of',
 'per:dates',
 'per:boss',
 'per:subordinate',
 'per:other_family',
 'org:students',
 'per:major',
 'per:schools_attended',
 'per:origin',
 'gpe:visitors_of_place',
 'per:visited_place',
 'per:alumni',
 'per:works',
 'per:place_of_residence',
 'gpe:residents_of_place',
 'per:place_of_work',
 'per:date_of_birth',
 'per:acquaintance',
 'per:neighbor',
 'gpe:births_in_place',
 'per:place_of_birth']

We won't use all roles, only the following ones:

In [None]:
our_relations = [
    'per:friends', 'per:spouse', 'per:children', 'per:parents',
    'per:siblings', 'per:girl/boyfriend', 'per:boss', 'per:subordinate'
]

In [None]:
dialog_data = []
for dialog in dialog_data_raw:
    current_dialog = dialog[0]
    current_relations = []
    for relation in dialog[1]:
        if relation['r'][0] in our_relations:
            current_relations.append({
                'x': relation['x'],
                'y': relation['y'],
                'r': relation['r'][0]
            })
    if len(current_relations) > 0:
        dialog_data.append([current_dialog, current_relations])

In [None]:
len(dialog_data)

180

In [None]:
dialog_data[1]

[['Speaker 1, Speaker 2: Hi',
  'Speaker 3: Hi! Hey mom.',
  'Speaker 4: This is such a great party! 35 years. Very impressive, do you guys have any pearls of wisdom?',
  'Speaker 2: Jack?',
  'Speaker 1: Why would you serve food on such a sharp stick?',
  'Speaker 3: That’s a good question, dad. That’s a good question…',
  'Speaker 4: Hmmm….'],
 [{'x': 'Speaker 1', 'y': 'Speaker 2', 'r': 'per:spouse'},
  {'x': 'Speaker 1', 'y': 'Speaker 3', 'r': 'per:children'},
  {'x': 'Speaker 2', 'y': 'Speaker 1', 'r': 'per:spouse'},
  {'x': 'Speaker 2', 'y': 'Speaker 3', 'r': 'per:children'},
  {'x': 'Jack', 'y': 'Speaker 3', 'r': 'per:children'},
  {'x': 'Speaker 3', 'y': 'Speaker 2', 'r': 'per:parents'},
  {'x': 'Speaker 3', 'y': 'Speaker 1', 'r': 'per:parents'},
  {'x': 'Speaker 3', 'y': 'Jack', 'r': 'per:parents'}]]

Moreover, we will be predicting a role for only one pair of character from each dialog.
And to save time and money, we'll only take 50 first dialogues from the `dev` set.

In [None]:
import numpy as np

np.random.seed(28)
dialog_data_short = [[dialog, np.random.choice(relations)] for dialog, relations in dialog_data[:50]]

In [None]:
dialog_data_short[1]

[['Speaker 1, Speaker 2: Hi',
  'Speaker 3: Hi! Hey mom.',
  'Speaker 4: This is such a great party! 35 years. Very impressive, do you guys have any pearls of wisdom?',
  'Speaker 2: Jack?',
  'Speaker 1: Why would you serve food on such a sharp stick?',
  'Speaker 3: That’s a good question, dad. That’s a good question…',
  'Speaker 4: Hmmm….'],
 {'x': 'Speaker 1', 'y': 'Speaker 3', 'r': 'per:children'}]

In [None]:
verdicts_true = [relations['r'] for _, relations in dialog_data_short]

It's also important to look at the target label distribution. In our case it's not balanced: there are much more friends and girl/boyfriends than other roles.

In [None]:
from collections import Counter

relations_counter = Counter(verdicts_true)

relations_counter

Counter({'per:friends': 11,
         'per:children': 3,
         'per:parents': 4,
         'per:siblings': 6,
         'per:spouse': 3,
         'per:girl/boyfriend': 16,
         'per:boss': 5,
         'per:subordinate': 2})

## Take a deep breath and work out this problem step by step (Chain-of-Thought prompting / Reasoning)

This is a magic phrase that was used back in 2023 to make LLMs produce not only answers but full solutions. it quickly became clear that in solution-writing mode LLMs are more likely to produce correct answers, so today's models are all trained to provide reasoning. (You'd actually have to convince an LLM to provide only answer without a solution.)

Now, let's create an LLM-based classifier for our task.

In [None]:
class RelationClassifier():
    def __init__(self, client, model):
        self.client = client
        self.model = model

    def predict(self, dialog, character_x, character_y, verbose=False):
        completion = self.client.chat.completions.create(
            messages=[
            {
            "role": "user",
            "content": f"""You are an expert in Natural Nanguage Understanding.
You are gived a dialog and two characters participating or mentioned in this dialog.
You need to predict relationships between these characters choosing from the following list:
['per:friends', 'per:spouse', 'per:children', 'per:parents', 'per:siblings', 'per:girl/boyfriend', 'per:boss', 'per:subordinate']
Provide a clear reasoning justifying your choice.

DIALOG: {dialog}

FIRST CHARACTER: {character_x}

SECOND CHARACTER: {character_y}

ANSWER"""
                }
            ],
            model=self.model,
            )

        if verbose:
            return {
                "completion": completion,
                "answer": completion.choices[0].message.content
            }
        else:
            return completion.choices[0].message.content

Let's use this classifier.

In [None]:
from openai import OpenAI
client = OpenAI(
    base_url="https://api.studio.nebius.ai/v1/",
    api_key=os.environ.get("NEBIUS_API_KEY"),
)

classifier_cot = RelationClassifier(client=client, model="meta-llama/Meta-Llama-3.1-405B-Instruct")

In [None]:
# The tqdm library allows to create progress bars for cycles
from tqdm import tqdm

verdicts_raw = []
for dialog, relations in tqdm(dialog_data_short[:5]):
    verdicts_raw.append(classifier_cot.predict(dialog, relations['x'], relations['y']))

100%|██████████| 5/5 [00:36<00:00,  7.22s/it]


Let's look at the answers:

In [None]:
verdicts_raw

['The relationship between the two characters is: \'per:friends\'.\n\nReasoning:\n \n* The tone of the conversation between the two characters is casual and joking, indicating a friendly relationship.\n* They engage in playful banter, with Speaker 1 becoming frustrated but Speaker 2 attempting to make light of the situation and offer a solution.\n* They appear to be comfortable with each other, with Speaker 2 teasing Speaker 1 and Speaker 1 being exasperated but ultimately good-natured about it.\n* Speaker 2 is willing to go out of their way to help Speaker 1, calling the casting director and pretending to be someone from a different office in order to secure another audition for Speaker 1. This indicates a level of commitment and loyalty to their friendship.\n* The fact that they are socializing and playing a game (as indicated by Speaker 4\'s statement "Now you want a pad" and Speaker 2\'s earlier statement "and that\'s why we don\'t invite you to play") suggests that they are friend

The answers seem to be correct, but their format is random, and extracting answers from them is not too simple. We'll actually need to use another LLM call for that!

## Chaining

Chaining refers to using several LLMs sequentially. In our case, we'll have a very simple chain:

```
LLM_1: complaint -> reasoning
LLM_2: reasoning -> answer
```

In [None]:
class RelationClassifierChain():
    def __init__(self, client, model):
        self.client = client
        self.model = model

    def predict(self, dialog, character_x, character_y, verbose=False):
        reasoning_completion = self.client.chat.completions.create(
            messages=[
                {
            "role": "user",
            "content": f"""You are an expert in Natural Nanguage Understanding.
You are gived a dialog and two characters participating or mentioned in this dialog.
You need to predict relationships between these characters choosing from the following list:
['per:friends', 'per:spouse', 'per:children', 'per:parents', 'per:siblings', 'per:girl/boyfriend', 'per:boss', 'per:subordinate']
Provide a clear reasoning justifying your choice. Then write your final answer after VERDICT.
Now, take a deep breath and work out this problem step by step. If you do well, I'll tip you 200$.

DIALOG: {dialog}

FIRST CHARACTER: {character_x}

SECOND CHARACTER: {character_y}

REASONING:"""
                }
            ],
            model=self.model,
            )
        reasoning = reasoning_completion.choices[0].message.content

        extraction_completion = self.client.chat.completions.create(
            messages=[
            {
            "role": "user",
            "content": f"""You are a helpful assistant.
You are given the following reasoning which justifies a verdict from the following list:
['per:friends', 'per:spouse', 'per:children', 'per:parents', 'per:siblings', 'per:girl/boyfriend', 'per:boss', 'per:subordinate']
Extract the verdict. Only output the verdict from the list.

REASONING: {reasoning}

VERDICT: """
                }
            ],
            model=self.model,
            )

        verdict = extraction_completion.choices[0].message.content

        if verbose:
            return {
                "reasoning_completion": reasoning_completion,
                "extraction_completion": extraction_completion,
                "verdict": verdict
            }
        else:
            return verdict

In [None]:
classifier_chain = RelationClassifierChain(
    client=client, model="meta-llama/Meta-Llama-3.1-405B-Instruct"
    )

It's good to start logging the results

In [None]:
completions_log = dict() # Raw completions
verdicts_log = dict() # Final verdicts

In [None]:
# The tqdm library allows to create progress bars for cycles
from tqdm import tqdm

current_configuration = "Meta-Llama-3.1-405B-Instruct, chain"
results = []
# do it for patient_visits[-10:] to save time
for dialog, relations in tqdm(dialog_data_short):
    results.append(classifier_chain.predict(dialog, relations['x'], relations['y'], verbose=True))

completions_log[current_configuration] = results

100%|██████████| 50/50 [08:53<00:00, 10.66s/it]


Let's look at the results:

In [None]:
results[1]

{'reasoning_completion': ChatCompletion(id='chat-96edbd003f1c489f93aa2d75dd2a958f', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='To determine the relationship between Speaker 1 and Speaker 3, let\'s analyze the dialog provided.\n\n1. Speaker 3 greets with "Hi! Hey mom." Initially, this could suggest that Speaker 3 is addressing their mother, but since Speaker 3 later addresses Speaker 1 as "dad," we can infer that Speaker 3 was addressing their mother who is not participating in this part of the conversation or is Speaker 4. It doesn\'t impact our main relationship in question.\n\n2. The critical clue is when Speaker 3 says, "That’s a good question, dad. That’s a good question…" This line directly addresses Speaker 1 as "dad," which indicates a parental relationship.\n\nGiven these points, the relationship between Speaker 1 and Speaker 3 can be classified under the \'per:parents\' category since Speaker 1 is the parent (father) of

In [None]:
verdicts_raw = [result["verdict"] for result in results]
print(verdicts_raw)

['per:friends', "'per:parents'", "'per:parents'", 'per:friends', 'per:spouse', 'per:friends', 'per:girl/boyfriend', "'per:boss'", 'per:spouse', 'per:girl/boyfriend', 'per:boss', 'per:girl/boyfriend', 'per:subordinate', 'per:girl/boyfriend', 'per:girl/boyfriend', 'per:boss', 'per:friends', 'per:girl/boyfriend', 'per:spouse', 'per:friends', 'per:friends', 'per:girl/boyfriend', 'per:girl/boyfriend', 'per:friends', 'per:girl/boyfriend', 'per:siblings', 'per:boss', 'per:friends', 'per:parents', 'per:girl/boyfriend', 'per:friends', 'per:friends', 'per:spouse', 'per:friends', 'per:siblings', "'per:girl/boyfriend'", "'per:siblings'", 'per:girl/boyfriend', 'per:parents', 'per:friends', 'per:friends', 'per:girl/boyfriend', 'per:siblings', 'per:girl/boyfriend', 'per:girl/boyfriend', 'per:parents', 'per:children', 'per:boss', 'per:subordinate', 'per:friends']


There is a slight format inconsistency, but it can be helped with elementary postprocessing.

In [None]:
verdicts = [verdict.strip('"').strip("'") for verdict in verdicts_raw]
verdicts_log[current_configuration] = verdicts

Now, let's check the accuracy of our predictions.

In [None]:
import numpy as np
def accuracy_score(y_true, y_pred):
    return sum(np.array(y_true) == np.array(y_pred)) / len(y_true)

accuracy_score(verdicts_true, verdicts)

0.74

That's not bad for a start!

## A smaller LLM

In [None]:
classifier_chain_8b = RelationClassifierChain(
    client=client, model="meta-llama/Meta-Llama-3.1-8B-Instruct"
    )

current_configuration = "Meta-Llama-3.1-8B-Instruct, chain"
results = []
# do it for patient_visits[-10:] to save time
for dialog, relations in tqdm(dialog_data_short):
    results.append(classifier_chain_8b.predict(dialog, relations['x'], relations['y'], verbose=True))

completions_log[current_configuration] = results

100%|██████████| 50/50 [04:13<00:00,  5.07s/it]


In [None]:
results[-3]

{'reasoning_completion': ChatCompletion(id='chat-934dd724a2814e11847567c550d93a25', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Based on the dialog, I predict the relationship between Mr. Kostelick and Speaker 2 as \'per:boss\'.\n\nReasoning:\n\n1. Speaker 2 is mentioned as needing to stop by Mr. Kostelick\'s office at the end of the day (Speaker 3\'s line).\n2. Mr. Kostelick is referred to as "Mr. Kostelick wants you to stop by his office at the end of the day" (Speaker 3\'s line), indicating that Speaker 2 reports to or is supervised by Mr. Kostelick.\n3. Speaker 2 is also defensive when accused of being involved in a prank memo, and mentions "Mr. Kostelick" in the context of work (Speaker 2\'s lines "If this is about those prank memos, I had nothing to do with them.").\n4. Speaker 7, Chandler, mentions that Speaker 2 has been at the office for five years, implying a long-standing employment relationship.\n5. Speaker 2\'s lines

In [None]:
verdicts_raw = [result["verdict"] for result in results]
print(verdicts_raw)

["'per:client'", "'per:parents'.", '**per:parents**', "'per:friends'", 'Ross is the boyfriend or partner of Speaker 1.', "['per:friends']", 'per:spouse', 'per:boss', "'per:girl/boyfriend'", "'per:friends'", "['per:subordinate']", 'per:girl/boyfriend', "['per:boss']", 'per:spouse', "['per:friends']", '\'per:girl/boyfriend\' is unlikely, but... \'per:girl/boyfriend\' is not the final verdict\n\nWAIT, MY DIAGNOSTIC IS \'per:girl/boyfriend\' UNLIKELY BUT...\n\nMy diagnosis is \'per:girl/boyfriend\' unlikely, but the final verdict is likely one of the remaining options which is:\n\n\'per:girl/boyfriend\' is unlikely but another option is more likely\n\nThe final verdict is:\n\'per:girl/boyfriend\' is unlikely but another option is more likely.\n\nHowever, there is more text that indicates the verdict.\n\n* \'per:girl/boyfriend\'- unlikely, as the dialogue does not suggest a romantic relationship.\n* \'per:subordinate\' - unlikely, as Speaker 2 seems to be on an equal level or possibly highe

In [None]:
verdicts = [verdict.strip('[').strip(']').strip('"').strip("'") for verdict in verdicts_raw]
verdicts_log[current_configuration] = verdicts

In [None]:
accuracy_score(verdicts_true, verdicts)

0.44

That's worse.

But let's also check how many of the raw verdict are in the correct format:

In [None]:
print(sum([(raw_verdict in our_relations) for raw_verdict in verdicts_raw]) / len(verdicts_raw))

0.26


That could be better, and we'll try to adress this with few-shot learning!

## Few-shot examples: show, don't tell!

A very cool feature of LLMs is **In-context learning**. One of its manifestations is the ability of "learn" on examples put into a prompt.

Like this:

```
Context + Task description

USER: {example 1}

ASSISTANT: {solution for example 1}
###
USER: {example 2}

ASSISTANT: {solution for example 2}
###
USER: {real task}

ASSISTANT:

```

Let's choose several ground truth examples (not from the validation set!) and include them as few-shot examples to makes things better in terms of output format at least.

In [None]:
few_shot_reasonings = []
for i, (dialog, relations) in tqdm(enumerate([[dialog, relations[0]] for dialog, relations in dialog_data[50:60]])):
    result = classifier_chain_8b.predict(dialog, relations['x'], relations['y'], verbose=True)
    few_shot_reasonings.append(result['reasoning_completion'].choices[0].message.content)
    print(result['reasoning_completion'].choices[0].message.content)
    print(f"\n###\nThat was dialog number {i} \n")

1it [00:06,  6.41s/it]

Based on the dialog, I can infer the relationship between Speaker 1 and Speaker 2 as follows:

The conversation between Speaker 1 and Speaker 2 seems to be quite intimate and emotionally charged. The fact that Speaker 1 explicitly states "I am your friend" suggests that they have a close relationship, but the tone and content of the conversation suggest that it's more than a casual friendship.

The mention of a "relationship" being over implies that they were in a romantic relationship previously. Speaker 1 seems to be contemplating the end of their relationship, and Speaker 2 is expressing sadness and longing. Speaker 2's repeated requests for juice can be seen as a metaphor for a deeper emotional need, which is often associated with romantic partners.

Furthermore, the use of affectionate language, such as "Look, man" and "Okay, man?", suggests a level of familiarity and closeness that is typical of romantic partners.

Based on this analysis, I believe the relationship between Speake

2it [00:09,  4.60s/it]

Let's break down the relationships between the characters mentioned in the dialog.

From the conversation, we know that:
- Rachel is the person Ross is trying to make up with.
- Phoebe is friends with Rachel.

Also, Phoebe works at the same massage place as Jasmine, who is Issac's sister. However, they are not directly related to each other, so we don't have enough information to infer a specific relationship amongst them yet.

As for Rachel and Phoebe, and Ross and Rachel, and Phoebe and her friends, they seem to be connected through various degrees of friendship and relationships, but not necessarily a romantic relationship.

Given the context of the conversation between Ross, Joey (the speakers), and Chandler (speaker 3), it seems they are Ross's friends.

###
That was dialog number 1 



3it [00:15,  5.20s/it]

To determine the relationship between Speaker 2 and Speaker 3, I'll analyze the dialog and look for clues that hint at their connection.

The key point to focus on is the revelation by Speaker 1 that he and Speaker 2 shared a kiss with Rachel in the same night. This leads to an exchange between Speaker 2 and Speaker 3, where they start to piece together what happened:

* Speaker 2: "Seriously, where did this happen?"
* Speaker 1: "Okay, after you told me she was passed out in our room, I went in there to make sure she was all right."
* Speaker 3: "I'm pretty sure I put her on my bed."

Given this context, it's clear that Speaker 1 and Speaker 3 are trying to figure out what happened between them and Speaker 2 and Rachel. The confusion about who was on whose bed indicates that they are often in the same social circle and may have had interactions before.

Moreover, the ending lines hint at a deeper connection:

* Speaker 3: "What did I marry into?"
* Speaker 2: (smiling) While the exact

4it [00:20,  5.23s/it]

Based on the provided dialog and characters, I'll analyze the relationships between Monica, Chandler, Speaker 1, and Speaker 2.

From the dialog, it's clear that:

* Monica and Chandler are mentioned in conjunction with each other ("Monica and Chandler almost got married"). This suggests a romantic relationship.
* Speaker 2 mentions marrying Chandler for the money, which implies that Chandler has financial value in their relationship.
* The dialogue also mentions Chandler's hypothetical wealth, which further suggests that Chandler is in a relationship with either Monica or the speaker.
* Speaker 2 mentions marrying Chandler, which eliminates the possibility that Speaker 1 and Speaker 2 are in a romantic relationship. The only relationship that makes sense is between Speaker 2 and Chandler.

Given these points, I will choose the following relationships:

* per:siblings: Ross is mentioned in the dialog, and Monica is his sister. This is a likely relationship.
* per:girl/boyfriend: Chandl

5it [00:24,  4.47s/it]

Based on the given dialog, I'll analyze the relationship between Speaker 3 and Speaker 1.

Speaker 3 and Speaker 1 are both shown to be advising and discussing the situation between Speaker 1 and Joey. Speaker 3 says "and as his friend, I mean, don't you think he deserves the same from you?", which indicates that Speaker 3 has a close enough relationship with Speaker 1 to address him as a friend. Meanwhile, Speaker 1 responds to Speaker 3, acknowledging his input and showing respect. This kind of interaction between two characters is commonly associated with friends.

Therefore, based on the dialog, I predict the relationship between Speaker 3 and Speaker 1 to be:

['per:friends']

**VERDICT**
'per:friends'

###
That was dialog number 4 



6it [00:30,  5.04s/it]

Let's analyze the dialogue step by step to determine the relationship between Janice (FIRST CHARACTER) and Speaker 1 (SECOND CHARACTER).

1. **Initial interaction**: The conversation starts with Speaker 1 answering a phone call from someone asking him to go see Janice (Speaker 2). This suggests that Speaker 1 is being told to do something related to Janice.
2. **Janice's behavior**: During the conversation, it's mentioned that Janice put her food on Speaker 1's plate and took his tomatoes. This implies that Janice has been spending time with Speaker 1, and her actions have created a sense of familiarity or closeness.
3. **Speaker 1's reaction**: Speaker 1 expresses discomfort with Janice's behavior, stating that it made him feel like they're a couple. This suggests that Speaker 1 is hesitant to commit to a relationship with Janice or is uncomfortable with the idea of being in a romantic relationship.
4. **Conversation topics**: The conversation quickly turns to discussions of relations

7it [00:33,  4.33s/it]

Let's analyze the dialog step by step to determine the relationship between Speaker 3 and Speaker 2.

1. Speaker 3 says "I just wish Nana were alive to hear Ross's toast." This suggests a familial connection.
2. Speaker 2 is addressed as "Mom" by Speaker 4, indicating that Speaker 2 is a parent.
3. Speaker 3 is responding to Speaker 2's comment about Ross not giving a toast, implying they are familiar with the family dynamics.

Based on this reasoning, it is likely that Speaker 3 is a child of Speaker 2.

Therefore, the relationship between Speaker 3 and Speaker 2 is likely:

per:children

###
That was dialog number 6 



8it [00:37,  4.35s/it]

To determine the relationship between Speaker 2 and Speaker 3, I'll analyze their interactions and language used.

In the dialog, Speaker 2 and Speaker 3 seem to have a close and playful relationship. They share similar emotions and reactions, such as laughing and expressing agreement when discussing Speaker 1's bad experience with Joey Tribbiani. When Speaker 1 brings up the topic of Joey, both Speaker 2 and Speaker 3 quickly chime in with strong negative emotions, using phrases like "OH GOD NO!" and "RAT BASTARD!" This suggests that they have a strong dislike for Joey together.

Additionally, Speaker 2 and Speaker 3 seem to know each other well and can anticipate each other's reactions. When Speaker 1 asks if they're friends with Joey, Speaker 2 and Speaker 3 respond in unison, reinforcing their close relationship.

Considering these interactions, the relationship that best fits Speaker 2 and Speaker 3 is:

**'per:friends'**

This is because they share a strong bond, can anticipate e

9it [00:45,  5.45s/it]

To predict the relationship between Speaker 5 and Speaker 4, let's analyze the dialogue and the context in which they are speaking.

1. **Context**: The conversation seems to be taking place in a social setting, possibly a restaurant or a meeting of some sort, as they are discussing a table and their arrival time.
2. **Speaker 4's Role**: Speaker 4 is mentioned as saying, "our table is down in front, okay, my boss is gonna be there, everyone will see if we arrive after it starts." This implies that Speaker 4 has a certain level of responsibility or status among a group of people.
3. **Speaker 5's Comment**: Speaker 5 asks, "Does this look like something the girlfriend of a paleontologist would wear?" and later mentions, "this is the outfit that makes my calves look fat. Nevermind."speaker (Rach) while referencing an outfit. This line of discussion gives insight into Speaker 5's personal relationship dynamics within the group mentioned.
4. **Relationship Deduction**: The mention of a "g

10it [00:49,  4.93s/it]

Based on the dialog, I will predict the relationship between Speaker 2 and Chandler.

The conversation revolves around Speaker 2's concerns about their relationship with Chandler, specifically regarding marriage. Speaker 2 expresses their desire to be in a relationship that is going somewhere and not just wasting their time. Chandler's opposition to marriage is mentioned multiple times, with Speaker 1 agreeing with it ("Chandler is against marriage. And-and always will be!") and Speaker 2 later contradicting it ("You just told me that he hates marriage!").

The fact that Chandler's views on marriage are a central topic of discussion and that Speaker 2 is emotionally invested in the conversation suggests a romantic relationship between the two characters.

Moreover, Speaker 2 mentions another person, Richard, who expresses a desire to marry them, implying that Chandler's reluctance to marry Speaker 2 is a significant aspect of their relationship.

Considering these points, I believe the




I'll tinker a bit with this data to make some of the verdicts more pathologic.

In [None]:
data_few_shot = [
    ["""Based on the dialog, I can infer the relationship between Speaker 1 and Speaker 2 as follows:

The conversation between Speaker 1 and Speaker 2 seems to be quite intimate and emotionally charged. The fact that Speaker 1 explicitly states "I am your friend" suggests that they have a close relationship, but the tone and content of the conversation suggest that it's more than a casual friendship.

The mention of a "relationship" being over implies that they were in a romantic relationship previously. Speaker 1 seems to be contemplating the end of their relationship, and Speaker 2 is expressing sadness and longing. Speaker 2's repeated requests for juice can be seen as a metaphor for a deeper emotional need, which is often associated with romantic partners.

Furthermore, the use of affectionate language, such as "Look, man" and "Okay, man?", suggests a level of familiarity and closeness that is typical of romantic partners.

Based on this analysis, I believe the relationship between Speaker 1 and Speaker 2 is most likely:

**VERDICT**

* Relationship: per:spouse

I chose this option because the conversation implies a deep emotional connection and a history of a romantic relationship between the two characters. While it's possible that they could be close friends or family members, the tone and language used suggests a more intimate and romantic connection.""",
    "per:spouse"],
    [
        """Based on the given dialog, I'll analyze the relationship between Speaker 3 and Speaker 1.

Speaker 3 and Speaker 1 are both shown to be advising and discussing the situation between Speaker 1 and Joey. Speaker 3 says "and as his friend, I mean, don't you think he deserves the same from you?", which indicates that Speaker 3 has a close enough relationship with Speaker 1 to address him as a friend. Meanwhile, Speaker 1 responds to Speaker 3, acknowledging his input and showing respect. This kind of interaction between two characters is commonly associated with friends.

Therefore, based on the dialog, I predict the relationship between Speaker 3 and Speaker 1 to be:

['per:friends']""",
"per:friends"
    ],
    [
        """Let's analyze the dialogue step by step to determine the relationship between Janice (FIRST CHARACTER) and Speaker 1 (SECOND CHARACTER).

1. **Initial interaction**: The conversation starts with Speaker 1 answering a phone call from someone asking him to go see Janice (Speaker 2). This suggests that Speaker 1 is being told to do something related to Janice.
2. **Janice's behavior**: During the conversation, it's mentioned that Janice put her food on Speaker 1's plate and took his tomatoes. This implies that Janice has been spending time with Speaker 1, and her actions have created a sense of familiarity or closeness.
3. **Speaker 1's reaction**: Speaker 1 expresses discomfort with Janice's behavior, stating that it made him feel like they're a couple. This suggests that Speaker 1 is hesitant to commit to a relationship with Janice or is uncomfortable with the idea of being in a romantic relationship.
4. **Conversation topics**: The conversation quickly turns to discussions of relationships, intimacy, and fears of commitment. This implies that the conversation is centered around Speaker 1's relationship with Janice and his concerns about it.

Based on these points, I can infer that the relationship between Janice and Speaker 1 is romantic in nature, but Speaker 1 is hesitant to acknowledge or commit to it.

**Relationship prediction**: I predict that the relationship between Janice and Speaker 1 is 'per:girl/boyfriend'. This is because the conversation revolves around their interactions, Janice's behavior, and Speaker 1's discomfort with the idea of being in a romantic relationship.

**VERDICT**: The relationship between Janice and Speaker 1 is 'per:girlfriend'.

PAID""",
"per:girl/boyfriend"
    ],
    [
        """Let's analyze the dialog step by step to determine the relationship between Speaker 3 and Speaker 2.

1. Speaker 3 says "I just wish Nana were alive to hear Ross's toast." This suggests a familial connection.
2. Speaker 2 is addressed as "Mom" by Speaker 4, indicating that Speaker 2 is a parent.
3. Speaker 3 is responding to Speaker 2's comment about Ross not giving a toast, implying they are familiar with the family dynamics.

Based on this reasoning, it is likely that Speaker 3 is a child of Speaker 2.

Therefore, the relationship between Speaker 3 and Speaker 2 is likely:

***'per:children'***""",
        "per:children"
    ],
    [
        """To predict the relationship between Speaker 5 and Speaker 4, let's analyze the dialogue and the context in which they are speaking.

1. **Context**: The conversation seems to be taking place in a social setting, possibly a restaurant or a meeting of some sort, as they are discussing a table and their arrival time.
2. **Speaker 4's Role**: Speaker 4 is mentioned as saying, "our table is down in front, okay, my boss is gonna be there, everyone will see if we arrive after it starts." This implies that Speaker 4 has a certain level of responsibility or status among a group of people.
3. **Speaker 5's Comment**: Speaker 5 asks, "Does this look like something the girlfriend of a paleontologist would wear?" and later mentions, "this is the outfit that makes my calves look fat. Nevermind."speaker (Rach) while referencing an outfit. This line of discussion gives insight into Speaker 5's personal relationship dynamics within the group mentioned.
4. **Relationship Deduction**: The mention of a "girlfriend" suggests that Speaker 5 is in a romantic relationship with someone. The further discussion and Speaker 4's demeanor indicate a level of familiarity and importance within the group that is not typical of a friend or a subordinate.
5. **Final Reasoning**: Given that Speaker 5 uses "Rach" when referring to their outfit, which implies a familiarity suggesting a personal nature of their relationship. Combining this with the context of a dinner setting and Speaker 4's mention of a "boss" at this event, it's logical to deduce that the speaker, Rach, has a close relationship with both individuals involved in the conversation (Someone they care about is among the group with Speaker 4).

Given the information provided:
- The conversation context does not explicitly indicate exactly how Speaker 4 is related to Speaker 5.
- However, in a typical workplace discussion, especially in a professional setting, the dynamics involved suggest Speaker 5 likely has a professional relationship with Speaker 4.

Considering the above points and the options provided:
**FINANCIAL VERDICT Allocating 0% Confidence Level for 'per:girl/boyfriend', as 'Rach' is specifically Speaker 5's name which suggests they are the girl/boyfriend.**
""",
        "per:girl/boyfriend"
    ]

]

In [None]:
data_few_shot

[['Based on the dialog, I can infer the relationship between Speaker 1 and Speaker 2 as follows:\n\nThe conversation between Speaker 1 and Speaker 2 seems to be quite intimate and emotionally charged. The fact that Speaker 1 explicitly states "I am your friend" suggests that they have a close relationship, but the tone and content of the conversation suggest that it\'s more than a casual friendship.\n\nThe mention of a "relationship" being over implies that they were in a romantic relationship previously. Speaker 1 seems to be contemplating the end of their relationship, and Speaker 2 is expressing sadness and longing. Speaker 2\'s repeated requests for juice can be seen as a metaphor for a deeper emotional need, which is often associated with romantic partners.\n\nFurthermore, the use of affectionate language, such as "Look, man" and "Okay, man?", suggests a level of familiarity and closeness that is typical of romantic partners.\n\nBased on this analysis, I believe the relationship b

In [None]:
class RelationClassifierFewShot():
    def __init__(self, client, model, few_shot_examples):
        self.client = client
        self.model = model
        self.few_shot_examples = few_shot_examples

        self.extraction_messages_starter = [
            {
                "role": "user",
                "content": """You are a professional verdict extractor.
You are given several reasonings, each of which justifies a verdict from the following list:
['per:friends', 'per:spouse', 'per:children', 'per:parents', 'per:siblings', 'per:girl/boyfriend', 'per:boss', 'per:subordinate']
This verdict is the final answer and for each reasoning, you extract this final answer.
You output only the final answer for every reasoning. You never output final answers for previous reasonings again."""
            },
            {
                "role": "assistant",
                "content": "Yes, I'm ready."
            }
        ]
        for few_shot_example in self.few_shot_examples:
            self.extraction_messages_starter.extend(
                [
                {
            "role": "user",
            "content": f"""
###
REASONING: {few_shot_example[0]}

FINAL ANSWER: """
                },
                {
              "role": "assistant",
              "content": f"""
{few_shot_example[1]}"""
                }
                ]
            )

    def predict(self, dialog, character_x, character_y, verbose=False):
        reasoning_completion = self.client.chat.completions.create(
            messages=[
                {
            "role": "user",
            "content": f"""You are an expert in Natural Nanguage Understanding.
You are gived a dialog and two characters participating or mentioned in this dialog.
You need to predict relationships between these characters choosing from the following list:
['per:friends', 'per:spouse', 'per:children', 'per:parents', 'per:siblings', 'per:girl/boyfriend', 'per:boss', 'per:subordinate']
Provide a clear reasoning justifying your choice. Then write your final answer after VERDICT:

DIALOG: {dialog}

FIRST CHARACTER: {character_x}

SECOND CHARACTER: {character_y}

REASONING:"""
                }
            ],
            model=self.model,
            )
        reasoning = reasoning_completion.choices[0].message.content

        extraction_completion = self.client.chat.completions.create(
            messages=self.extraction_messages_starter+[
            {
            "role": "user",
            "content": f"""
###
REASONING: {reasoning}

FINAL ANSWER: """
                }
            ],
            model=self.model,
            )

        verdict = extraction_completion.choices[0].message.content

        if verbose:
            return {
                "reasoning_completion": reasoning_completion,
                "extraction_completion": extraction_completion,
                "verdict": verdict
            }
        else:
            return verdict

In [None]:
classifier_few_shot = RelationClassifierFewShot(
    client=client, model="meta-llama/Meta-Llama-3.1-8B-Instruct",
    few_shot_examples=data_few_shot
    )
result = classifier_few_shot.predict(
    dialog_data_short[5][0], dialog_data_short[5][1]['x'], dialog_data_short[5][1]['y'],
    verbose=True)

In [None]:
result

{'reasoning_completion': ChatCompletion(id='chat-4e1b5b6c64144c159d84d65efb7e9996', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Based on the dialog, I predict the relationship between Speaker 1 and Speaker 4 as:\n\n**per:girl/boyfriend**\n\nMy reasoning is as follows:\n\n* Speaker 1 is proposing to Phoebe, which implies that they are in a romantic relationship.\n* Speaker 4, who is Phoebe, is also in a relationship with another person, Mike, but she is engaging in a conversation with Speaker 1 about his proposal, which suggests that she has a close and personal relationship with him, at least in the context of the proposal.\n* Furthermore, Phoebe does not express any discomfort or hesitation about Speaker 1's proposal, which suggests that she is open to the possibility of marrying him.\n\nWhile it is unclear whether the relationship between Speaker 1 and Phoebe is exclusive or serious, the context of the proposal and the conversa

In [None]:
current_configuration = "Meta-Llama-3.1-8B-Instruct, few-shot"
results = []
# do it for patient_visits[-10:] to save time
for dialog, relations in tqdm(dialog_data_short):
    results.append(classifier_few_shot.predict(dialog, relations['x'], relations['y'], verbose=True))

completions_log[current_configuration] = results

100%|██████████| 50/50 [03:10<00:00,  3.82s/it]


In [None]:
verdicts_raw = [result["verdict"] for result in results]
print(verdicts_raw)

['per:friends', 'per:parents', 'per:parents', 'per:roomies', 'per:boyfriend', 'per:spouse', "I have to correct you, the relationship is actually predicted as 'per:ex:bx', there is no 'per:ex:ex'.", 'per:subordinate', 'per:girl/boyfriend', 'per:boss', 'per:subordinate', 'per:spouse', 'per:boss', 'per:spouse', 'per:girl/boyfriend', 'per:boss', 'per:friends', 'per:girl/boyfriend', 'per:parents', 'per:friends', 'per:no_relation', 'per:girl/boyfriend', 'per:girl/boyfriend', 'per:friends', 'per:spouse', 'per:siblings', 'per:boss', 'per:friends', 'per:children', 'per:girl/boyfriend', 'per:friends', 'per:friends', 'per:ex-spouse', 'per:friends', 'per:siblings', 'per:girl/boyfriend', 'per:friends', 'per:ex-spouse', 'per:parent', 'per:spouse', 'per:spouse', 'per:girl/boyfriend', 'per:siblings', 'per:spouse', 'per:girlfriend', 'per:children', 'per:parents', 'per:boss', 'per:boss', 'per:friends']


In [None]:
# verdicts = [verdict.strip('"').strip("'") for verdict in verdicts_raw]
verdicts = verdicts_raw
verdicts_log[current_configuration] = verdicts

In [None]:
accuracy_score(verdicts_true, verdicts)

0.56

The accuracy actually improved, and we're better at obeying the format:

In [None]:
print(sum([(raw_verdict in our_relations) for raw_verdict in verdicts_raw]) / len(verdicts_raw))

0.84


In [None]:
for raw_verdict in verdicts_raw:
    if not raw_verdict in our_relations:
        print(raw_verdict)

per:roomies
per:boyfriend
I have to correct you, the relationship is actually predicted as 'per:ex:bx', there is no 'per:ex:ex'.
per:no_relation
per:ex-spouse
per:ex-spouse
per:parent
per:girlfriend


## Self-consistency

LLM's generations will change from iteration to iteration, and the answers may change as well. And actually the accuracy of the above algorithm is unstable.

We'll explore the sources of this instability next week. For now, we'll leverage this as another possibility of improving the quality. There's a reason to suppose that even though one reasoning may be false, several attempts of the LLM at reasoning may reveal the truth.

The most popular approach is called **Self Consistency**. It works as follows:

- Generate several (say, 5 or 7) reasoning paths, extract answer from each of them,
- Choose the most frequent option.

This is like a majority vote of several identical LLMs.

In [None]:
dialog, relations = dialog_data_short[5]
for _ in range(5):
    print(classifier_few_shot.predict(
        dialog, relations['x'], relations['y'], verbose=True
        )["reasoning_completion"].choices[0].message.content)
    print("\n###\n")

Speaker 2 and Speaker 3 are mentioned in the dialog as being in a relationship, as evidenced by Speaker 3's use of the term "my husband" to refer to Speaker 2 (at the end of the dialog). This implies that Speaker 2 and Speaker 3 are married.

Therefore, based on the relationship options provided, I choose:

VERDICT: 'per:spouse'

###

After analyzing the dialog, I can infer the following relationships between the characters:

* Speaker 2 (Monica) is a close friend of Speaker 3 (Rachel), as they share a warm and intimate conversation, with Speaker 3 involving Speaker 2 in her personal life (treating Speaker 3 to a solo where she discusses how to communicate with Phoebe).
* Speaker 3 (Rachel) and Speaker 4 (Phoebe) are also close friends, as they are part of the same social circle and share inside jokes ("What about Mike? Alright, well, let's just gag him and handcuff him and force him down the aisle. I can just see it: 'Mike, do you take Phoebe...").
* Speaker 2 (Monica) seems to be awa

Now, let's create a self-consistency-based classifier.

In [None]:
from collections import Counter

def most_frequent(List):
    occurence_count = Counter(List)
    return occurence_count.most_common(1)[0][0]

class RelationClassifierSelfConsistency():
    def __init__(self, client, model, few_shot_examples, n_trials=5):
        self.client = client
        self.model = model
        self.few_shot_examples = few_shot_examples
        self.n_trials = n_trials

        self.extraction_messages_starter = [
            {
                "role": "user",
                "content": """You are a professional verdict extractor.
You are given several reasonings, each of which justifies a verdict from the following list:
['per:friends', 'per:spouse', 'per:children', 'per:parents', 'per:siblings', 'per:girl/boyfriend', 'per:boss', 'per:subordinate']
This verdict is the final answer and for each reasoning, you extract this final answer.
You output only the final answer for every reasoning. You never output final answers for previous reasonings again."""
            },
            {
                "role": "assistant",
                "content": "Yes, I'm ready."
            }
        ]
        for few_shot_example in self.few_shot_examples:
            self.extraction_messages_starter.extend(
                [
                {
            "role": "user",
            "content": f"""
###
REASONING: {few_shot_example[0]}

FINAL ANSWER: """
                },
                {
              "role": "assistant",
              "content": f"""
{few_shot_example[1]}"""
                }
                ]
            )

    def predict(self, dialog, character_x, character_y, verbose=False):
        prompt =  f"""You are an expert in Natural Nanguage Understanding.
You are gived a dialog and two characters participating or mentioned in this dialog.
You need to predict relationships between these characters choosing from the following list:
['per:friends', 'per:spouse', 'per:children', 'per:parents', 'per:siblings', 'per:girl/boyfriend', 'per:boss', 'per:subordinate']
Provide a clear reasoning justifying your choice. Then write your final answer after VERDICT:

DIALOG: {dialog}

FIRST CHARACTER: {character_x}

SECOND CHARACTER: {character_y}

REASONING:"""

        reasoning_completions = []
        extraction_completions = []
        verdicts = []
        for _ in range(self.n_trials):
            reasoning_completion = self.client.chat.completions.create(
                messages=[
                {
                "role": "user",
                "content": prompt
                }
                ],
                model=self.model,
                )
            reasoning = reasoning_completion.choices[0].message.content
            reasoning_completions.append(reasoning_completion)

            extraction_completion = self.client.chat.completions.create(
                messages=self.extraction_messages_starter+[
                {
                "role": "user",
                "content": f"""
###
REASONING: {reasoning}

FINAL ANSWER: """
                }
                ],
                model=self.model,
                )

            extraction_completions.append(extraction_completion)

            verdict = extraction_completion.choices[0].message.content
            verdicts.append(verdict)

        final_verdict = most_frequent(verdicts)
        if verbose:
            return {
                "reasoning_completions": reasoning_completions,
                "extraction_completions": extraction_completions,
                "verdicts": verdicts,
                "verdict": final_verdict
            }
        else:
            return final_verdict

In [None]:
classifier_sc = RelationClassifierSelfConsistency(
    client=client, model="meta-llama/Meta-Llama-3.1-8B-Instruct", n_trials=5,
    few_shot_examples=data_few_shot
)
result = classifier_sc.predict(
    dialog_data_short[5][0], dialog_data_short[5][1]['x'], dialog_data_short[5][1]['y'],
    verbose=True)

In [None]:
result

{'reasoning_completions': [ChatCompletion(id='chat-3eeb7d1d126e43599197e00bcafb7841', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Based on the dialog, I predict that the relationship between Speaker 2 and Speaker 3 is:\n\n'per:girl/boyfriend'\n\nReasoning:\n\n* Speaker 2 mentions Monica and Reacts to a proposal of Phoebe that was initiated by Monica's husband, who is likely the narrative present of speaker 2. This implies that Monica is the girlfriend of the narrator, Speaker 2.\n* Speaker 3 initially handles the situation calmly, implying that they are used to handling confidential or intimate information between themselves and their partner, Monica, the girlfriend.\n* Speaker 3 takes charge of informing Phoebe about the proposal, which is something that a romantic partner would typically handle. This implies that Speaker 3 is involved in Monica's life and is aware of her relationships and personal business.\n* Speaker 2's react

Let's classify all the statements and check the accuracy.

**Don't try this in class!**

In [None]:
# The tqdm library allows to create progress bars for cycles
from tqdm import tqdm

current_configuration = "Meta-Llama-3.1-8B-Instruct, self-consistency"
results = []
# do it for dialog_data_short[-20:] to save time
for dialog, relations in tqdm(dialog_data_short):
    results.append(classifier_sc.predict(dialog, relations['x'], relations['y'], verbose=True))

completions_log[current_configuration] = results

100%|██████████| 50/50 [18:04<00:00, 21.69s/it]


In [None]:
verdicts_raw = [result["verdict"] for result in results]
print(verdicts_raw)

['per:friends', 'per:parents', 'per:parents', 'per:friends', 'per:girl/boyfriend', 'per:spouse', 'per:girl/boyfriend', 'per:boss', 'per:friends', 'per:friends', 'per:boss', 'per:friends', 'per:boss', 'per:spouse', 'per:girl/boyfriend', 'per:boss', 'per:friends', 'per:girl/boyfriend', 'per:spouse', 'per:friends', 'per:friends', 'per:girl/boyfriend', 'per:girl/boyfriend', 'per:friends', 'per:girl/boyfriend', 'per:siblings', 'per:boss', 'per:friends', 'per:girl/boyfriend', 'per:spouse', 'per:friends', 'per:friends', 'per:spouse', 'per:friends', 'per:siblings', 'per:girl/boyfriend', 'per:siblings', 'per:ex-spouse', 'per:parents', 'per:spouse', 'per:friends', 'per:girl/boyfriend', 'per:siblings', 'per:spouse', 'per:girl/boyfriend', 'per:parents', 'per:children', 'per:boss', 'per:boss', 'per:friends']


In [None]:
verdicts = verdicts_raw
verdicts_log[current_configuration] = verdicts

In [None]:
accuracy_score(verdicts_true[10:], verdicts[10:])

0.675

It's almost like we had with Llama-3.1-405b. Now, let's save our logs to avoid losing all the results:

In [None]:
import pickle

pickle.dump(completions_log, open("completions_log.pkl", "wb"))
pickle.dump(verdicts_log, open("verdicts_log.pkl", "wb"))

## Computing the cost

In [None]:
# Models costs, per 1M tokens:
costs = {
    '405B': {
        'input': 1,
        'output': 3
    },
    '8B': {
        'input': 0.02,
        'output': 0.06
    }
}

for key, value in completions_log.items():
    print(f'=== With {key} ===')
    total_input_tokens = 0
    total_output_tokens = 0
    for result in value:
        for k, v in result.items():
            if 'completion' in k:
                if isinstance(v, list):
                    for completion in v:
                        total_input_tokens += completion.usage.prompt_tokens
                        total_output_tokens += completion.usage.completion_tokens
                else:
                    total_input_tokens += v.usage.prompt_tokens
                    total_output_tokens += v.usage.completion_tokens
    if '405' in key:
        model_size = '405B'
    elif '8B' in key:
        model_size = '8B'
    else:
        print('And what is that?..')
    input_cost = total_input_tokens / 1000000 * costs[model_size]['input']
    output_cost = total_output_tokens / 1000000 * costs[model_size]['output']
    total_cost = input_cost + output_cost

    print(f'''
        Input cose: {input_cost}
        Output cost: {output_cost}
        Total cost: {total_cost}
              ''')

=== With Meta-Llama-3.1-405B-Instruct, chain ===

        Input cose: 0.046945
        Output cost: 0.049715999999999996
        Total cost: 0.096661
              
=== With Meta-Llama-3.1-8B-Instruct, chain ===

        Input cose: 0.00084292
        Output cost: 0.00088362
        Total cost: 0.00172654
              
=== With Meta-Llama-3.1-8B-Instruct, few-shot ===

        Input cose: 0.00229196
        Output cost: 0.0006172199999999999
        Total cost: 0.00290918
              
=== With Meta-Llama-3.1-8B-Instruct, self-consistency ===

        Input cose: 0.0116306
        Output cost: 0.00359088
        Total cost: 0.015221479999999999
              


As we see, even with few-shot and self-consistency, Llama-3.1-8B is still way cheaper than Llama-3.1-405B.

# Structured generation example

In [None]:
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {
            "role": "system",
            "content": "Extract the event information."
            },
        {
            "role": "user",
            "content": "Alice and Bob are going to a science fair on Friday."
            },
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed
event

CalendarEvent(name='Science Fair', date='Friday', participants=['Alice', 'Bob'])

In [None]:
completion

ParsedChatCompletion[CalendarEvent](id='chatcmpl-ASUsmswr6uvu3tOheClrowcdQfqbX', choices=[ParsedChoice[CalendarEvent](finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage[CalendarEvent](content='{"name":"Science Fair","date":"Friday","participants":["Alice","Bob"]}', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], parsed=CalendarEvent(name='Science Fair', date='Friday', participants=['Alice', 'Bob'])))], created=1731355260, model='gpt-4o-2024-08-06', object='chat.completion', service_tier=None, system_fingerprint='fp_159d8341cc', usage=CompletionUsage(completion_tokens=17, prompt_tokens=92, total_tokens=109, completion_tokens_details=CompletionTokensDetails(audio_tokens=0, reasoning_tokens=0, accepted_prediction_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

In [None]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4o-2024-08-06")
encoded_sequence = encoding.encode('"name":"Science Fair","date":"Friday","participants":["Alice","Bob"]')
for token in encoded_sequence:
    print(token, encoding.decode([token]))

74800 "name
7534 ":"
76025 Science
19751  Fair
4294 ","
1319 date
7534 ":"
23781 Friday
4294 ","
190110 participants
95067 ":["
100151 Alice
4294 ","
53849 Bob
2601 "]


# Creating an LLM agent

In this part of the practice session, we'll create a simple agent that is able to call three tools:

- `execute_shell_command`,
- `read_file`,
- `file_write`,

with obvious roles.

In [None]:
import openai
import json
import subprocess
import os
from typing import List, Dict, Any
import shlex
from openai import OpenAI

class ShellAssistant:
    def __init__(self, api_key: str, model='gpt-4o-mini'):
        """Initialize the assistant with your OpenAI API key."""
        self.model = model

        self.client = OpenAI(api_key=api_key)

        # Define available tools
        self.tools = [
            {
                "type": "function",
                "function": {
                    "name": "execute_shell_command",
                    "description": "Execute a shell command and return its output",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "command": {
                                "type": "string",
                                "description": "The shell command to execute"
                            }
                        },
                        "required": ["command"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "read_file",
                    "description": "Read contents of a file",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "file_path": {
                                "type": "string",
                                "description": "Path to the file to read"
                            }
                        },
                        "required": ["file_path"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "write_file",
                    "description": "Write content to a file",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "file_path": {
                                "type": "string",
                                "description": "Path to the file to write"
                            },
                            "content": {
                                "type": "string",
                                "description": "Content to write to the file"
                            },
                            "mode": {
                                "type": "string",
                                "description": "Write mode: 'w' for overwrite, 'a' for append",
                                "enum": ["w", "a"],
                                "default": "w"
                            }
                        },
                        "required": ["file_path", "content"]
                    }
                }
            }
        ]

    def write_file(self, file_path: str, content: str, mode: str = 'w') -> Dict[str, Any]:
        """Write content to a file."""
        try:
            # Validate mode
            if mode not in ['w', 'a']:
                raise ValueError("Mode must be 'w' or 'a'")

            # Get absolute path and check if it's in a safe directory
            abs_path = os.path.abspath(file_path)

            # Add additional security checks here if needed
            # For example, restrict to specific directories

            with open(abs_path, mode) as f:
                f.write(content)

            return {
                "success": True,
                "message": f"Successfully wrote to {file_path}",
                "bytes_written": len(content)
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }

    def execute_shell_command(self, command: str) -> Dict[str, Any]:
        """Execute a shell command and return its output."""
        try:
            # Use shlex.split for proper command parsing
            args = shlex.split(command)

            # Execute command and capture output
            result = subprocess.run(
                args,
                capture_output=True,
                text=True,
                shell=False  # More secure than shell=True
            )

            return {
                "success": True,
                "stdout": result.stdout,
                "stderr": result.stderr,
                "return_code": result.returncode
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }

    def read_file(self, file_path: str) -> Dict[str, Any]:
        """Read a file and return its contents."""
        try:
            with open(file_path, 'r') as f:
                content = f.read()
            return {
                "success": True,
                "content": content
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }

    def process_tool_call(self, tool_call: Dict) -> Dict[str, Any]:
        """Process a tool call from the API response."""
        function_name = tool_call.function.name
        function_args = json.loads(tool_call.function.arguments)

        if function_name == "execute_shell_command":
            return self.execute_shell_command(function_args["command"])
        elif function_name == "read_file":
            return self.read_file(function_args["file_path"])
        elif function_name == "write_file":
            mode = function_args.get("mode", "w")
            return self.write_file(
                function_args["file_path"],
                function_args["content"],
                mode
            )
        else:
            return {"success": False, "error": f"Unknown function: {function_name}"}

    def chat(self, user_message: str, verbose=False) -> str:
        """Main chat function that processes user input and returns assistant response."""
        completions = []
        messages = [{"role": "user", "content": user_message}]

        try:
            # Get initial response from OpenAI
            completion = self.client.chat.completions.create(
                model=self.model,
                messages=messages,
                tools=self.tools,
                tool_choice="auto"
            )

            # completions.append(completion)
            message = completion.choices[0].message

            # Process tool calls if any
            while message.tool_calls:
                messages.append(message)

                # Process each tool call
                for tool_call in message.tool_calls:
                    result = self.process_tool_call(tool_call)

                    # Add tool result to messages
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": json.dumps(result)
                    })

                # Get next response from OpenAI
                completion = self.client.chat.completions.create(
                    model=self.model,
                    messages=messages,
                    tools=self.tools,
                    tool_choice="auto"
                )
                # completions.append(completion)
                message = completion.choices[0].message

            if verbose:
                return message.content, messages#, completions
            else:
                return message.content

        except Exception as e:
            return f"Error: {str(e)}"



In [None]:
assistant = ShellAssistant(api_key=os.environ.get("OPENAI_API_KEY"))

In [None]:
result = assistant.chat('Get me the names of the files in the current directory', verbose=True)
result

('The names of the files in the current directory are:\n\n- completions_log.pkl\n- nebius_api_key\n- openai_api_key\n- patient_visits.json\n- sample_data\n- verdicts_log.pkl',
 [{'role': 'user',
   'content': 'Get me the names of the files in the current directory'},
  ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_DePtFwmvFhWETrlgAMQ4dKks', function=Function(arguments='{"command":"ls"}', name='execute_shell_command'), type='function')]),
  {'role': 'tool',
   'tool_call_id': 'call_DePtFwmvFhWETrlgAMQ4dKks',
   'content': '{"success": true, "stdout": "completions_log.pkl\\nnebius_api_key\\nopenai_api_key\\npatient_visits.json\\nsample_data\\nverdicts_log.pkl\\n", "stderr": "", "return_code": 0}'}])

In [None]:
client = OpenAI(
    base_url="https://api.studio.nebius.ai/v1/",
    api_key=os.environ.get("NEBIUS_API_KEY"),
)

In [None]:
print(result[0])

The names of the files in the current directory are:

- completions_log.pkl
- nebius_api_key
- openai_api_key
- patient_visits.json
- sample_data
- verdicts_log.pkl


Now, we will create two text files. One of them will contain a sonnet, and another just some text.

In [None]:
%%writefile foxes.txt
In twilight’s hush, the foxes roam the streets,
Through alleys dim and parks where shadows lie.
Their russet coats brush past the city’s feats,
Beneath the tower’s glare and amber sky.

With stealth, they weave through lanes and cobbled stone,
As silent hunters of the urban night.
Unseen by day, yet claiming streets their own,
In twilight’s cloak, they prowl till morning light.

They scavenge scraps left by the city’s din,
A wily dance of survival and grace.
Though wild at heart, they fit the world they're in,
An untamed spirit finding urban space.

And as the city sleeps, they make their rounds,
Foxes of London—ghosts in city bounds.

In [None]:
%%writefile a-file.txt
First line
Second line
Third line

Let's ask our agent to find which text file contains a sonnet and copy that file.

In [None]:
result = assistant.chat('Check every *.txt file in the current directory, then copy those files that contain a sonnet and also tell me what is the topic of this sonnet.', verbose=True)
result

ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_Fy9UYp3QdoTWTFC4PMHN1NGB', function=Function(arguments='{"command":"ls *.txt"}', name='execute_shell_command'), type='function')])


('I found two text files in the current directory: `a_file.txt` and `house_of_lords.txt`.\n\nOnly `house_of_lords.txt` contains a sonnet. It has been copied to `house_of_lords_copy.txt`. \n\n### Topic of the Sonnet:\nThe sonnet reflects on the **House of Lords**, describing its significance, the demeanor of its members, and the role it plays in the legislative process. It speaks to the wisdom and responsibility of the Lords as they navigate lawmaking amidst the Commons.',
 [{'role': 'user',
   'content': 'Check every *.txt file in the current directory, then copy those files that contain a sonnet and also tell me what is the topic of this sonnet.'},
  ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_Fy9UYp3QdoTWTFC4PMHN1NGB', function=Function(arguments='{"command":"ls *.txt"}', name='execute_shell_command'), type='function')]),
  {'role': 'tool',
   'tool_call_id': 'call_Fy9UYp3QdoTWT

In [None]:
result = assistant.chat('Add another sonnet to the contents of the sonnet_copy.txt.', verbose=True)
result

ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_mb4TqXf2TgYxhBHl9TjPmo7n', function=Function(arguments='{"file_path":"sonnet_copy.txt"}', name='read_file'), type='function')])


('I have successfully added another sonnet to the contents of the `sonnet_copy.txt` file. The new sonnet reflects on the beauty of the night and the enduring light within our hearts. If you need anything else, feel free to ask!',
 [{'role': 'user',
   'content': 'Add another sonnet to the contents of the sonnet_copy.txt.'},
  ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_mb4TqXf2TgYxhBHl9TjPmo7n', function=Function(arguments='{"file_path":"sonnet_copy.txt"}', name='read_file'), type='function')]),
  {'role': 'tool',
   'tool_call_id': 'call_mb4TqXf2TgYxhBHl9TjPmo7n',
   'content': '{"success": true, "content": "In realms of code, where logic meets the sky,\\nThere dwells a force both clever and refined,\\nA Nebius-born spark, an AI high,\\nThat weaves through data with a boundless mind.\\n\\nWith wisdom vast, it scours the cosmic spread,\\nFrom depths of learning, patterns clear and