<a href="https://colab.research.google.com/github/nemanovich/LLM-essentials/blob/main/Structured_inputs_and_outputs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLM Engineering Essentials by Nebius Academy


Author: Alex Umnov


# Structured Inputs and Outputs

Previously, we learnt how to prompt an LLM in such a way that it understands what you want from it and gives a relevant answer. In this notebook we'll continue this discussion by understanding

* How to make prompts reusable by using **prompt templates**
* How to ensure that an LLM creates its outputs in a convenient, easily parsable format

Let's start by running some code which will help us in the whole notebook:

In [2]:
!pip install openai -qU

In [3]:
from google.colab import userdata
from openai import OpenAI
import os

os.environ['NEBIUS_API_KEY'] = userdata.get("nebius_api_key")

nebius_client = OpenAI(
    base_url="https://api.studio.nebius.ai/v1/",
    api_key=os.environ.get("NEBIUS_API_KEY"),
)

llama_model = "meta-llama/Llama-3.3-70B-Instruct"

def prettify_string(text, max_line_length=80):
    """Prints a string with line breaks at spaces to prevent horizontal scrolling.

    Args:
        text: The string to print.
        max_line_length: The maximum length of each line.
    """

    output_lines = []
    lines = text.split("\n")
    for line in lines:
        current_line = ""
        words = line.split()
        for word in words:
            if len(current_line) + len(word) + 1 <= max_line_length:
                current_line += word + " "
            else:
                output_lines.append(current_line.strip())
                current_line = word + " "
        output_lines.append(current_line.strip())  # Append the last line
    return "\n".join(output_lines)

def answer_with_llm(prompt: str,
                    system_prompt="You are a helpful assistant",
                    max_tokens=512,
                    client=nebius_client,
                    model=llama_model,
                    prettify=True,
                    temperature=None) -> str:

    messages = []

    if system_prompt:
        messages.append(
            {
                "role": "system",
                "content": system_prompt
            }
        )

    messages.append(
        {
            "role": "user",
            "content": prompt
        }
    )

    completion = client.chat.completions.create(
        model=model,
        messages=messages,
        max_tokens=max_tokens,
        temperature=temperature
    )

    if prettify:
        return prettify_string(completion.choices[0].message.content)
    else:
        return completion.choices[0].message.content


# Prompt templates

In an LLM-powered system, there's always a layer of prompting logic hidden from the user. For example, ChatGPT, Claude, Gemini and others have quite elaborate **system prompts** that set up rules and guardrails of LLM's communication with the user.

However, in some cases a system prompting isn't a flexible enough mechanism. Imagine, for example,

* a customer support bot that needs to be aware of the user's geography to give relevant answers about locally available products
* a railway service support bot that needs to be aware of today's railway strikes and other calamities

You'll likely need to insert this information in the middle of the prompt; and for such things, **prompt templates** are a great tool.

Basically, a **prompt template** is a template string like

```python
"some fixed information {template placeholder 1}
some more fixed information {template placeholder 2}"
```

where the template placeholders are to be filled in just before an actual LLM call.

Let's check several neat ways of wrapping this logic.

First of all, you can write your own wrapper. In the example below, `m['content'].format(**kwargs)` allows to put as much formatting as you wish into the user's message.

In [4]:
from typing import List, Dict

class MessagesPromptTemplate():
    messages: List[Dict]

    def __init__(self, messages: List[Dict]):
        self.messages = messages

    def format(self, **kwargs):
        return [
            {
                "role":  m['role'],
                "content": m['content'].format(**kwargs)
            }
            for m in self.messages
        ]

In [5]:
prompt_template = MessagesPromptTemplate(
    messages = [
        {"role": "system", "content": "You only answer in rhymes"},
        {"role": "user", "content": "Tell me about {city}"}
    ]
)

In [6]:
prompt_template.format(city="Paris")

[{'role': 'system', 'content': 'You only answer in rhymes'},
 {'role': 'user', 'content': 'Tell me about Paris'}]

Let's try calling an llm with different variables

In [7]:
outputs = nebius_client.chat.completions.create(
    messages=prompt_template.format(city="Paris"),
    model=llama_model
).choices[0].message.content
print(outputs)

In Paris, the city of delight,
The Eiffel Tower shines with great light.
The Seine River flows, a wondrous sight,
As art and love fill the morning light.

The Louvre Museum, a treasure to see,
Holds masterpieces of history.
The Champs-Élysées, a street so fine and free,
Invites you to stroll with glee.

Montmartre's hills, a charming place to roam,
With street performers and artists at home.
The food, a culinary dream to call your own,
Croissants and cheese, a taste to make you moan.

In Paris, the city of love and desire,
You'll find magic that sets your heart on fire.
So come and visit, don't be shy or tired,
In Paris, your heart will be inspired.


In [8]:
outputs = nebius_client.chat.completions.create(
    messages=prompt_template.format(city="Amsterdam"),
    model=llama_model,
).choices[0].message.content
print(outputs)

Amsterdam's a city so fine,
In the Netherlands, it does shine.
With canals and bridges, a wondrous sight,
It's a place to visit, both day and night.

The Rijksmuseum's there, with art on display,
And the Anne Frank House, to learn and to say.
The city's got charm, with its narrow streets too,
And the people are friendly, with a "hallo" or two.

The Vondelpark's green, with a peaceful little space,
And the Jordaan neighborhood, with its quaint little face.
You can take a boat tour, or ride a bike with glee,
In Amsterdam, the city, that's full of energy.

So if you ever visit, don't be shy or slow,
Get out and explore, and let the city show,
You its beauty and magic, its history and fun,
In Amsterdam, the city, that's number one!


The prompt template class we've written is very primitive and would fail if, for example, some keys aren't inputted.

One of the good implementations of prompt templates can be found in LangChain [PromptTemplates](https://python.langchain.com/docs/concepts/prompt_templates/)

In [9]:
!pip install langchain -qU

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/1.0 MB[0m [31m9.0 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.0/1.0 MB[0m [31m19.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/442.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m442.8/442.8 kB[0m [31m24.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [10]:
from langchain_core.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate([
    ("system", "You only answer in rhymes"),
    ("user", "Tell me about {city}")
])

prompt_template.invoke({"city": "Madrid"})

ChatPromptValue(messages=[SystemMessage(content='You only answer in rhymes', additional_kwargs={}, response_metadata={}), HumanMessage(content='Tell me about Madrid', additional_kwargs={}, response_metadata={})])

**Note:** You don't have to use LangChain llm calls or anything else, you can only take their PromptTemplate implementation.

However, there's quiet a bit of useful code in that library.

In [11]:
from langchain_core.messages import convert_to_openai_messages

In [12]:
templated_messages = convert_to_openai_messages(prompt_template.invoke({"city": "Madrid"}).to_messages())
templated_messages

[{'role': 'system', 'content': 'You only answer in rhymes'},
 {'role': 'user', 'content': 'Tell me about Madrid'}]

In [13]:
outputs = nebius_client.chat.completions.create(
    messages=templated_messages,
    model=llama_model,
).choices[0].message.content
print(outputs)

Madrid's a city, so fine and so fair,
Located in Spain, with culture to share.
From Prado to Retiro, the sights are a delight,
A city that's vibrant, day and night.

The Royal Palace is grand, a must-see to explore,
And the food, oh the food, is a culinary score.
Tapas and paella, a taste sensation so fine,
In Madrid, you'll dine, with a joy that's divine.

The nightlife is lively, with music and cheer,
In Malasaña, the fun, will last throughout the year.
So visit Madrid, and let your spirit soar,
In this city, you'll find, so much to adore.


# Structuring LLM outputs

In many cases you require not just a free text answer, but something particular you can use later in your system. For example, if you want your LLM to classify a customet's intent to later pass the conversation to a relevant department, you need to extract the particular intent class from the LLM's answer.

To parse your LLM outputs conveniently, it's wise to structure them in a specific way. We've already discussed some prompting tricks in Topic 1; this time, we'll learn several more reliable ways of making the LLM abide a deisgnated output format.

## Basic output structuring

As a basic way to structure your output, you can "ask" an LLM to present the output in a specific format. For example:

In [14]:
outputs = nebius_client.chat.completions.create(
    messages=[{
        'role': 'user',
        'content': """Design one role play character\'s name, class and a short description.
Present it as a markdown list"""}],
    model=llama_model,
).choices[0].message.content
print(outputs)

* **Name:** Eira Shadowglow
* **Class:** Moonlit Ranger
* **Description:** A skilled and agile hunter with unparalleled accuracy, Eira navigates the shadows with ease, using her deep connection to nature to track and protect the innocent from the forces of darkness.


While this is quite good, it's not very reliable. A better way would be to show some examples to LLM so that it knows what we expect.

These examples are known as **few-shot examples** and the prompting technique itself - as **few-shot prompting**.

In [15]:
outputs = nebius_client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Design one role play character\'s name, class and a short description. Present it as a markdown list.\n'\
            "Examples:\n"\
            "\n"\
            "- **Name:** Randalf the Yellow;\n"\
            "- **Class:** Fire mage;\n"\
            "- **Proficiency:** Pyro magic;\n"\
            "- **Resistance:** Fire;\n"\
            "\n"\
            "- **Name:** Bonan;\n"\
            "- **Class:** Barbarian;\n"\
            "- **Proficiency:** Axe;\n"\
            "- **Resistance:** Mental magic;\n"\
        }
    ],
    model=llama_model,
).choices[0].message.content
print(outputs)

- **Name:** Eriol the Shadow;
- **Class:** Rogue;
- **Proficiency:** Stealth and daggers;
- **Resistance:** Poison;


As you can see, LLM captured the format pretty well.


In [16]:
outputs = nebius_client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Solve the following equation and output only the answer number without reasoning after "Answer:"\n' \
            '123 * 321 = ?\n' \
            'Answer:'
        }
    ],
    model=llama_model,
).choices[0].message.content
print(outputs)

39543


Ever though the answer isn't correct (LLMs are notoriously bad at arithmetics), the output structure is correct and easy to parse out.

However, we can do even better.

## Structured outputs

Modern LLMs support outputing in a specific format, for example we can use "JSON mode" to force outputs to be in JSON fromat.

In [17]:
json_output = nebius_client.chat.completions.create(
    messages=[{'role': 'user', 'content': 'Design a role play character\'s name, class and a short description in json format'}],
    model=llama_model,
    response_format={"type": "json_object"}
).choices[0].message.content
json_output

'{"name": "Eryndor Thorne", "class": "Shadow Assassin", "description": "A mysterious and deadly rogue with unparalleled stealth and agility, feared by his enemies and respected by his allies."}'

This is useful, because that'll make it much easier for you later to parse the outputs:

In [18]:
import json
json.loads(json_output)

{'name': 'Eryndor Thorne',
 'class': 'Shadow Assassin',
 'description': 'A mysterious and deadly rogue with unparalleled stealth and agility, feared by his enemies and respected by his allies.'}

We can go another step further and actually define a `pydantic` model to create a schema for our outputs:

In [19]:
from typing import List
from pydantic import BaseModel

class CharacterProfile(BaseModel):
    name: str
    age: int
    special_skills: List[str]
    traits: List[str]
    character_class: str
    origin: str

completion = nebius_client.chat.completions.create(
    model=llama_model,
    messages=[
        {"role": "user", "content": "Design a role play character"}
    ],
    extra_body={
        "guided_json": CharacterProfile.model_json_schema()
    }
)

CharacterProfile.model_validate_json(completion.choices[0].message.content)

CharacterProfile(name='Eira Shadowglow', age=25, special_skills=['Stealth', 'Archery', 'Magic'], traits=['Curious', 'Independent', ' Resourceful'], character_class='Ranger', origin='Elven Realm')

So no we have predefined format of outputs, which is easy to work with.

Another way to structure outputs is using examples

Let's consider an example from a famous [MMLU dataset](https://huggingface.co/datasets/cais/mmlu):

In [20]:
question = "Which of the following statements about Ethernets is typically FALSE?"

A = "Ethernets use circuit switching to send messages."
B = "Ethernets use buses with multiple masters."
C = "Ethernet protocols use a collision-detection method to ensure that messages are transmitted properly."
D = "Networks connected by Ethernets are limited in length to a few hundred meters."

correct_answer = "A"

Ideally we want our LLM to solve this "test" by answering to us with a letter corresponding to the right answer. This will also make calculating metrics much easier. Let's see what would happen.

In [21]:
output = nebius_client.chat.completions.create(
    messages=[{
        "role": "user",
        "content": f"""
Answer the following question with one of the options listed below
Question: {question}
A: {A}
B: {B}
C: {C}
D: {D}
Answer:
"""}],
    model=llama_model,
).choices[0].message.content
print(output)

A: Ethernets use circuit switching to send messages.

Explanation: Ethernets actually use packet switching, not circuit switching. Packet switching allows multiple devices to share the same communication channel, making it more efficient and scalable. Circuit switching, on the other hand, dedicates a communication channel to a single connection for the duration of the transmission.

The other options are true:

* B: Ethernets do use buses with multiple masters, which means that multiple devices can transmit data at the same time.
* C: Ethernet protocols, such as CSMA/CD (Carrier Sense Multiple Access with Collision Detection), use collision detection to ensure that messages are transmitted properly and to minimize collisions between devices.
* D: Networks connected by Ethernets are indeed limited in length to a few hundred meters, typically up to 100 meters for twisted-pair Ethernet and up to 2 kilometers for fiber-optic Ethernet.


As you can see, it did output the right answer, but if we do a simple comparison, we'll get into trouble:

In [22]:
output == correct_answer

False

So let's teach our model to answer in the right way using so-called Few Shot Prompting also known as In-Context Learning. We essentially show the model some examples in the prompt to teach it in which format we want the answer to be

In [23]:
output = nebius_client.chat.completions.create(
    messages=[{
        "role": "user",
        "content": f"""
Examples:
Question: The IP protocol is primarily concerned with
A: Routing packets through the network
B: Reliable delivery of packets between directly connected machines
C: Reliable delivery of large (multi-packet) messages between machines that are not necessarily directly connected
D: Dealing with differences among operating system architectures
Answer:
A

Question: Which of the following is NOT a property of bitmap graphics?
A: Fast hardware exists to move blocks of pixels efficiently
B: Realistic lighting and shading can be done.
C: All line segments can be displayed as straight.
D: Polygons can be filled with solid colors and textures.
Answer:
A

Task:
Answer the following question with one of the options listed below. Only ouput the answer in the same format as the examples.
Question: {question}
A: {A}
B: {B}
C: {C}
D: {D}
Answer:
"""}],
    model=llama_model,
).choices[0].message.content
print(output)

A


In [24]:
output == correct_answer

True

We also have observed that for some models the dialog format is actually a better way to structure the Few-Shot examples

In [25]:
output = nebius_client.chat.completions.create(
    messages=[{
        "role": "user",
        "content": f"""
User: Answer the following question with one of the options listed below.
Question: The IP protocol is primarily concerned with
A: Routing packets through the network
B: Reliable delivery of packets between directly connected machines
C: Reliable delivery of large (multi-packet) messages between machines that are not necessarily directly connected
D: Dealing with differences among operating system architectures
Answer:
Assistant: A

User: Answer the following question with one of the options listed below.
Question: Which of the following is NOT a property of bitmap graphics?
A: Fast hardware exists to move blocks of pixels efficiently
B: Realistic lighting and shading can be done.
C: All line segments can be displayed as straight.
D: Polygons can be filled with solid colors and textures.
Answer:
Assistant: A

User: Answer the following question with one of the options listed below.
Question: {question}
A: {A}
B: {B}
C: {C}
D: {D}
Answer:
Assitant:
"""}],
    model=llama_model,
).choices[0].message.content
print(output)

A

Explanation: Ethernets use packet switching, not circuit switching, to send messages. Circuit switching is typically used in telephone networks, where a dedicated circuit is established between the caller and the receiver for the duration of the call. In contrast, Ethernets use packet switching, where data is broken into packets and transmitted independently through the network. 

So, the correct answer is A: Ethernets use circuit switching to send messages.


Theoretically we don't even need to show the model relevant examples if we want it to learn the output formatting

In [26]:
output = nebius_client.chat.completions.create(
    messages=[{
        "role": "user",
        "content": f"""
Question: Choose the letter A
A: A
B: B
C: C
D: D
Answer:
A

Question: Which is the biggest number?
A: 1
B: 2
C: 3
D: 4
Answer:
D

Answer the following question with one of the options listed below
Question: {question}
A: {A}
B: {B}
C: {C}
D: {D}
Answer:
"""}],
    model=llama_model,
).choices[0].message.content
print(output)

A

Explanation: Ethernets use packet switching, not circuit switching. Circuit switching is typically used in telephone networks, where a dedicated circuit is established between the caller and the receiver for the duration of the call. In contrast, Ethernets use a packet-switching approach, where data is broken into packets and transmitted independently over the network. 

The other options are true: 

* B: Ethernets do use buses with multiple masters, allowing multiple devices to share the same communication channel.
* C: Ethernet protocols, such as CSMA/CD (Carrier Sense Multiple Access with Collision Detection), do use collision-detection methods to ensure that messages are transmitted properly and to handle collisions when multiple devices try to transmit at the same time.
* D: Networks connected by Ethernets are indeed limited in length to a few hundred meters, due to signal attenuation and other physical limitations.


**Note:** Sometimes you can confuse the model if you have examples from the distribution, which is different than your data's one. So for the best results try to match the distribution.

## Function Calling

We can use tools in OpenAI api as well. Let's see how we can use web search with just the api:

In [27]:
!pip install tavily-python -qU

We'll need a Tavily API key which you can get from [here](https://app.tavily.com/sign-in).

Then either use google's secret storage or put it into a file and upload it.

In [28]:
#os.environ['TAVILITY_API_KEY"] = open(".tavily_api_key").read()
os.environ["TAVILY_API_KEY"] = userdata.get("tavily_api_key")

from tavily import TavilyClient

tavily_client = TavilyClient()

response = tavily_client.search("Who is Leo Messi?", topic="general")

print(response['results'])

[{'url': 'https://en.wikipedia.org/wiki/Lionel_Messi', 'title': 'Lionel Messi - Wikipedia', 'content': "He is the most decorated player in the history of professional football having won 45 team trophies.Messi's records include most goals in a calendar year (91), most goals for a single club (672 for Barcelona), most goals in La Liga (474), most goal contributions in the FIFA World Cup (21), and most goal contributions in the Copa América (32).", 'score': 0.752468, 'raw_content': None}, {'url': 'https://www.britannica.com/biography/Lionel-Messi', 'title': "Lionel Messi | Biography, Trophies, Records, Ballon d'Or ... - Britannica", 'content': 'Lionel Messi scored 73 goals during the 2011–12 season while playing for FC Barcelona, breaking a 39-year-old record for single-season goals in a major European football league. Messi’s play continued to rapidly improve over the years, and by 2008 he was one of the most dominant players in the world, finishing second to Manchester United’sCristian

Now we can define a `tool` description for client, so that the model knows how to use it.

We will only expose `query` and `topic` parameters.

We also need to write short descriptions to explain what the tool and the parameters are for. Note that it's not for you, but for the LLM :) So please make sure you provide a clear explanation.

Tool usage is sort of an extension of "JSON mode" because in the end we get a dict of parameters, parsed from the JSON.

In [29]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "web-search",
            "description": "Retrieves results from web search",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "What you search for",
                    },
                    "topic": {
                        "type": "string",
                        "description": "Search topic either 'general' or 'news'",
                        "enum": ["general", "news"]
                    },
                },
                "required": ["query"],
            },
        }
    },
]


messages = []
messages.append({"role": "system", "content": "If you are asked about the factual information, create a function call instead. If you already searched, use the results to give an answer."})
messages.append({"role": "user", "content": "What is the name of the cat from Shrek?"})
chat_response = nebius_client.chat.completions.create(
    messages=messages, tools=tools, model=llama_model
)
chat_response

ChatCompletion(id='chatcmpl-097a6cca562f4d16a8dc16e15de7039b', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='chatcmpl-tool-32db6ab9762f4cfab5bbe74be50d15bc', function=Function(arguments='{"query": "Shrek cat name", "topic": "general"}', name='web-search'), type='function')], reasoning_content=None), stop_reason=128008)], created=1753630131, model='meta-llama/Llama-3.3-70B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=32, prompt_tokens=267, total_tokens=299, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)

And we can also try to ask for some news-worthy content to see if LLM decides on a different `topic`.

In [30]:
messages = []
messages.append({"role": "system", "content": "If you are asked about the factual information, create a function call instead. If you already searched, use the results to give an answer."})
messages.append({"role": "user", "content": "What happened in London today?"})
chat_response = nebius_client.chat.completions.create(
    messages=messages, tools=tools, model=llama_model
)
chat_response

ChatCompletion(id='chatcmpl-fed2c0b673c847588439707776235d07', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='chatcmpl-tool-4dd086e2cd2e4caf8e8d607806139e1f', function=Function(arguments='{"query": "London news today", "topic": "news"}', name='web-search'), type='function')], reasoning_content=None), stop_reason=128008)], created=1753630132, model='meta-llama/Llama-3.3-70B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=31, prompt_tokens=262, total_tokens=293, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)

Now we can extract the function usage output from the result

In [31]:
chat_response.choices[0].message.tool_calls[0]

ChatCompletionMessageToolCall(id='chatcmpl-tool-4dd086e2cd2e4caf8e8d607806139e1f', function=Function(arguments='{"query": "London news today", "topic": "news"}', name='web-search'), type='function')

You might be wondering, why do we include tool usage in structured output topic.

Thing is, you can also use this functionality to structure your output. You don't have to use a real function as your tool. Let's use our previous example

In [32]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "create_rpg_character",
            "description": "Creates a character based on attributes and description",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "Name of the character",
                    },
                    "age": {
                        "type": "integer",
                        "description": "Age of the character",
                    },
                    "special_skills": {
                        "type": "array",
                        "description": "List of special skills of the character",
                        "items": {
                            "type": "string"
                        }
                    },
                    "traits": {
                        "type": "array",
                        "description": "List of traits of the character",
                        "items": {
                            "type": "string"
                        }
                    },
                    "character_class": {
                        "type": "string",
                        "description": "Class of the character",
                        "enum": ["mage", "rogue", "barbarian", "knight", "paladin"]
                    },
                    "origin": {
                        "type": "string",
                        "description": "Origin of the character",
                        "enum": ["human", "elf", "orc", "undead"]
                    },
                },
                "required": ["name", "age", "special_skills", "traits", "character_class", "origin"],
            },
        }
    },
]


In [33]:
messages = []
messages.append({"role": "system", "content": "If you are asked to create a character, use `create_rpg_character` tool."})
messages.append({"role": "user", "content": "Generate a random character for my new session"})
chat_response = nebius_client.chat.completions.create(
    messages=messages, tools=tools, model=llama_model
)
chat_response.choices[0].message.tool_calls[0].function.arguments

'{"name": "Eryndor Thorne", "age": "27", "special_skills": "[\\"stealth\\", \\"persuasion\\"]", "traits": "[\\"brave\\", \\"loyal\\"]", "character_class": "rogue", "origin": "human"}'

# **Practice task. LLM Information extraction**




The goal of this task is to create a system, which extracts data about events from free text into a predictable format.

Let's imagine that you work for a marketing agency, and you need to gather analytics about the passing events dedicated to AI and Machine Learning. For that, you need to process press releases and extract:
- Event name,
- Event date,
- Number of participants,
- Number of speakers,
- Attendance price.

Of course, you can do it manually, but it's much more fun to use Generative AI! So, your task will be to write a function that does this with only one request to OpenAI API.

Below there is an example of a press release (generated by ChatGPT, of course, so that both the event and the personae are fictional).

<blockquote>
<p>PRESS RELEASE

InnovAI Summit 2023: A Glimpse into the Future of Artificial Intelligence</p>

City of Virtue, Cyberspace - November 8, 2023 - The most anticipated event of the year, InnovAI Summit 2023, successfully concluded last weekend, on November 5, 2023. Held in the state-of-the-art VirtuTech Arena, the summit saw a massive turnout of over 3,500 participants, from brilliant AI enthusiasts and researchers to pioneers in the field.

Esteemed speakers took to the stage to shed light on the latest breakthroughs, practical implementations, and ethical considerations in AI. Dr. Evelyn Quantum, renowned for her groundbreaking work on Quantum Machine Learning, emphasized the importance of this merger and how it's revolutionizing computing as we know it. Another keynote came from Prof. Leo Nexus, whose current project 'AI for Sustainability' highlights the symbiotic relationship between nature and machine, aiming to use AI in restoring our planet's ecosystems.

This year's panel discussion, moderated by the talented Dr. Ada Neura, featured lively debates on the limits of AI in creative arts. Renowned digital artist, Felix Vortex, showcased how he uses generative adversarial networks to create surreal art pieces, while bestselling author, Iris Loom, explained her experiments with AI-assisted story crafting.

Among other highlights were hands-on workshops, interactive Q&A sessions, and an 'AI & Ethics' debate which was particularly well-received, emphasizing the need for transparency and fairness in AI models. An exclusive 'Start-up Alley' allowed budding entrepreneurs to showcase their innovations, gaining attention from global venture capitalists and media.

The event wrapped up with an announcement for InnovAI Summit 2024, set to be even grander. Participants left with a renewed enthusiasm for the vast possibilities that the AI and ML world promises.

For media inquiries, please contact:
Jane Cipher
Director of Communications, InnovAI Summit
Email: jane.cipher@innovai.org
Phone: +123-4567-8910</p>
</blockquote>

More specifically, you should write a function

```python
parse_press_release(pr: str) -> dict
```

where the output should be in the format

```python
{
  name: 'InnovAI Summit 2023',
  date: '08.11.2023',
  n_participants: 3500,
  n_speakers: 4,
  price:
}
```

If any of the four characteristics is not mentioned in the text, put `None` in the respective field.

At the end, calculate the statistics of right answers and analyse what kind of mistakes you "model" makes the most.

**Hints and suggestions:**
- It's gonna be more convenient to experiment in Nebius AI Studio's playground https://studio.nebius.com/playground.
- You need to be very accurate with what you want from the model.
- It will help if you specify in the prompt that the output should be in JSON format, this way you will spend less time parsing the output. But be careful. Though some models are easily prompted to output a JSON, please check the output format. It may contain excessive formatting, for example:
<pre><code>```json
{"name": "InnovAI Summit 2023", ...}
```</pre></code>
Actually, examining LLM outputs and their format is a must when working with them

- Please be careful with the details. For example, Jane Cipher in the text above is not a speaker and shouldn't be counter as such (how to get rid of a contact person?). Also pay attention to the date format,
- If the model is too wilful with the output format, don't hesitate to show some examples. Decreasing the temperature of predictions can help reduce the creativity of the answer, which is what we want for such task.
- Debugging an LLM-powered application may become a tough business. When you think that you've polished it, an LLM can still surprise you. So, we don't expect 100% accuracy in this task, but we expect that you do your best to achieve high quality results.

**Bonus points**:
Try writing the solution using:
- Structured JSON Output
- Guiding JSON Output using Structures

In [34]:
press_release = """PRESS RELEASE

InnovAI Summit 2023: A Glimpse into the Future of Artificial Intelligence

City of Virtue, Cyberspace - November 8, 2023 - The most anticipated event of the year, InnovAI Summit 2023, successfully concluded last weekend, on November 5, 2023. Held in the state-of-the-art VirtuTech Arena, the summit saw a massive turnout of over 3,500 participants, from brilliant AI enthusiasts and researchers to pioneers in the field.

Esteemed speakers took to the stage to shed light on the latest breakthroughs, practical implementations, and ethical considerations in AI. Dr. Evelyn Quantum, renowned for her groundbreaking work on Quantum Machine Learning, emphasized the importance of this merger and how it's revolutionizing computing as we know it. Another keynote came from Prof. Leo Nexus, whose current project 'AI for Sustainability' highlights the symbiotic relationship between nature and machine, aiming to use AI in restoring our planet's ecosystems.

This year's panel discussion, moderated by the talented Dr. Ada Neura, featured lively debates on the limits of AI in creative arts. Renowned digital artist, Felix Vortex, showcased how he uses generative adversarial networks to create surreal art pieces, while bestselling author, Iris Loom, explained her experiments with AI-assisted story crafting.

Among other highlights were hands-on workshops, interactive Q&A sessions, and an 'AI & Ethics' debate which was particularly well-received, emphasizing the need for transparency and fairness in AI models. An exclusive 'Start-up Alley' allowed budding entrepreneurs to showcase their innovations, gaining attention from global venture capitalists and media.

The event wrapped up with an announcement for InnovAI Summit 2024, set to be even grander. Participants left with a renewed enthusiasm for the vast possibilities that the AI and ML world promises.

For media inquiries, please contact: Jane Cipher Director of Communications, InnovAI Summit Email: jane.cipher@innovai.org Phone: +123-4567-8910"""

In [56]:
import re
import json
from typing import Optional, Union

from pydantic import BaseModel, Field, model_validator

SYSTEM_PROMPT: str = """Act as a professional secretary. User input is a press release.
Extract following data from that release: "Event name" (name), "Event date" (date),
"Number of participants" (n_participants), "Number of speakers" (n_speakers),
"Attendance price" (price). Be thorough and follow the format. If any fields is not
mentioned in the text, put None in that field. 'Event date' must be valid and in format
dd.mm.yyyy or dd.mm.yyyy-dd.mm.yyyy. Take into account that 'Event date' may be a range of dates.
Price must be in format 'USD 10' or 'EUR 123'. Put in n_speakers amount of overtly mentioned
speakers with names, including everyone who will be speaking at the event
(except presenters, moderators or organizers)."""

class PressReleaseData(BaseModel):
    name: str
    date: Optional[str] = Field(
        default=None,
        pattern=r"^\d{2}\.\d{2}\.\d{4}(\-\d{2}\.\d{2}\.\d{4})?$",
        examples=["12.25.2023", "10.01.2024-10.05.24"],
    )
    n_participants: Optional[int] = Field(default=None, nullable=True)
    n_speakers: Optional[Union[int, str]] = Field(default=None, nullable=True)
    price: Optional[str] = Field(
        default=None,
        examples=["USD 250", "EUR 1024"],
    )

    @model_validator(mode='after')
    def convert_zeros(self) -> 'PressReleaseData':
        for field, value in self.__dict__.items():
            if isinstance(value, int) and value == 0:
                #None is a str because dataset has a wrong type - e.g `n_speakers: 'None'`
                setattr(self, field, "None")
        return self

def parse_press_release(pr: str,
                        model: str = "meta-llama/Meta-Llama-3.1-70B-Instruct"
                        ) -> dict:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": str(pr)}
    ]
    completion = nebius_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.0,
        extra_body={
            "guided_json": PressReleaseData.model_json_schema()
        },
        #response_format={"type": "json_object"}
    )
    content = completion.choices[0].message.content
    return PressReleaseData.model_validate_json(content).model_dump()


### Testing

We've prepared a small dataset for you to test your prompt on. Provided you've written your function, try running the following code. At the end you also have an opportunity to look at the results in a table side-by-side in with_results.csv. Your goal is to get at least 60% of fields right..

In [36]:
!pip install --upgrade gdown
!gdown -O press_release_extraction.csv https://docs.google.com/spreadsheets/d/15IGdc3MV8864lxrLxsug0Ij480p76T1EAwBM7WGT_OI/export?format=csv

Downloading...
From: https://docs.google.com/spreadsheets/d/15IGdc3MV8864lxrLxsug0Ij480p76T1EAwBM7WGT_OI/export?format=csv
To: /content/press_release_extraction.csv
16.0kB [00:00, 25.3MB/s]


In [37]:
import pandas
pr_df = pandas.read_csv("press_release_extraction.csv")
pr_df.head()

Unnamed: 0,pr_text,pr_parsed
0,InnovAI Summit 2023: A Glimpse into the Future...,"{\n ""name"": ""InnovAI Summit 2023"",\n ""date"":..."
1,Press Dispatch: 'Artificial Mariners: Navigati...,"{""name"": ""Artificial Mariners: Navigatin' the ..."
2,FOR IMMEDIATE RELEASE\n\nAI Innovators Convene...,"{""name"": ""Annual Machine Learning Symposium 20..."
3,Press Release: Cutting-Edge Innovations Debute...,"{""name"": ""AI Advancements Summit 2023"",\n ""dat..."
4,"Press Release: Innovative Minds Gather at ""AI ...","{""name"": ""AI Horizon 2023"",\n ""date"": ""15.10.2..."


In [38]:
pr_df.pr_parsed[0]

'{\n  "name": "InnovAI Summit 2023",\n  "date": "05.11.2023",\n  "n_participants": 3500,\n  "n_speakers": 4,\n  "price": "None"\n}'

In [57]:
import json

models = [
    "deepseek-ai/DeepSeek-V3-0324",
    "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "meta-llama/Meta-Llama-3.1-405B-Instruct"
]
parsed_list = []
fields = {
    "name": str,
    "date": str,
    "n_speakers": int,
    "n_participants": int,
    "price": str
}
correct_fields = 0
for model in models:
  print(f"Use LLM model {model}")
  for row in pr_df.itertuples():
      parsed_release = parse_press_release(pr = row.pr_text, model = model)
      parsed_list.append(json.dumps(parsed_release, indent=4))
      golden = json.loads(row.pr_parsed)
      for field, field_type in fields.items():
          golden_field = golden[field]
          parsed_field = parsed_release.get(field)
          try:
              parsed_field = field_type(parsed_field)
          except (ValueError, TypeError):
              pass
          if golden_field == parsed_field:
              correct_fields += 1
          else:
              print(f"For {golden['name']} {field} {parsed_release.get(field)} doesn't seem the same as {golden[field]}")

  print(f"Correctly extracted {correct_fields} out of {5*len(pr_df)}")
  correct_fields = 0

Use LLM model deepseek-ai/DeepSeek-V3-0324
For AI Horizon 2023 n_speakers None doesn't seem the same as None
Correctly extracted 34 out of 35
Use LLM model meta-llama/Meta-Llama-3.1-8B-Instruct
For Artificial Mariners: Navigatin' the AI Seas n_speakers 4 doesn't seem the same as 5
For Annual Machine Learning Symposium 2023 n_speakers 3 doesn't seem the same as 4
For AI Advancements Summit 2023 price None doesn't seem the same as USD 950
For AI Horizon 2023 n_speakers 4 doesn't seem the same as None
Correctly extracted 31 out of 35
Use LLM model meta-llama/Meta-Llama-3.1-405B-Instruct
Correctly extracted 35 out of 35


### Bonus points
- Try and compare different ways of establishing the correct answer formatting
- Try and compare different LLMs