# 🚀 Large Language Models Workshop

Welcome to our workshop on large language models, in which we will mainly focus on text generation models (decoder-based architecture, like ChatGPT).


## 📚 Table of Contents

| Section | Topic | Description |
|---------|-------|-------------|
| **1** | [Basic Chat Completions](#setup) | 🎯 Getting started  |
| **2** | [Parameters & Configuration](#parameters) | ⚙️ Tuning model behavior and output |
| **3** |  [Prompt Engineering](#prompt-engineering) | 🎨 How to craft effective prompts to get the best results from LLMs|
| **4** | [Structured Output](#structured-output) | 📊 Getting reliable JSON responses |
| **5** | [Function Calling](#function-calling) | 🔧 Connecting LLMs to external tools |
| **6** | [Streaming Output](#streaming) | ⚡ Real-time responses for better UX |
| **7** | [Reasoning Models](#reasoning-models) | 🧠 Advanced models that "think" step-by-step |
| **8** | [RAG - Chat with Your Data](#rag) | 📖 Making LLMs work with your documents |
| **9** | [Agents](#agents) | 🤖 Autonomous AI that can plan and execute tasks |




## 🎯 Icebreaker: 20 Questions Game


### 🎮 Let's Play a Game!

Welcome to our interactive **~~20~~ 10
Questions** game! <br>
This is a fun way to start exploring what LLMs can do while demonstrating some key concepts we'll cover in this workshop.

> **🎯 The Challenge:** GPT will think of something **Norway-related** and you have 10 yes/no questions to guess what it is!

### 📋 How to Play:

| Step | Action | Command |
|------|--------|---------|
| 🚀 | **Start the game** | `game.start_game()` |
| ❓ | **Ask questions** | `game.ask_question("Is it alive?")` |
| 📊 | **Check status** | `game.get_status()` |


As part of the demo, we get introduced to some LLM-related concepts:

-  🎨 **Prompting** - How to give instructions to the model
- 🧠 **Conversation History** - The AI remembers everything you've asked
- 📊 **Structured Output** - Uses JSON to reliably track game state  




In [1]:
import openai
import json
from pydantic import BaseModel
from typing import Optional

# LiteLLM Configuration for the game
api_key = "sk-S2f4kVB6wznto-kfDPqfuw"
base_url = "https://litellm.plattform-int.k8s.ma.nrk.cloud"

client = openai.OpenAI(
    api_key=api_key,
    base_url=base_url
)

# Structured output models for the game
class GameResponse(BaseModel):
    """Structured response for game interactions using Pydantic."""
    answer: str  # "yes", "no", "sometimes", "sort of"
    is_correct_guess: bool  # True if user guessed correctly
    game_over: bool  # True if game should end
    human_readable_response: str  # What to show to the user
    secret_revealed: Optional[str] = None  # What the AI was thinking of (only if game_over=True)

class TwentyQuestionsGameStructured:
    def __init__(self, client):
        self.client = client
        self.model = "azure/gpt-4o"  # Use gpt-4o for structured output parsing
        self.conversation_history = []
        self.questions_asked = 0
        self.max_questions = 10
        self.game_active = False
        self.secret_thing = None
    
    def start_game(self):
        """Initialize a new game with GPT thinking of something."""
        
        # Reset game state
        self.conversation_history = []
        self.questions_asked = 0
        self.game_active = True
        
        # Have GPT think of something and get initial response
        setup_prompt = """
        You are about to play 10 Questions! Please think of something for the human to guess. 
        It can be:
        - An animal, object, person, place, concept, food, movie, book, etc.
        - Something well-known that most people would recognize
        - Not too obscure or overly specific
        - Make it Norway-related to make it more interesting for the workshop!
        
        Respond with a structured JSON indicating the game has started.
        Remember what you chose throughout our conversation. Be consistent with your answers.
        """
        
        messages = [
            {"role": "system", "content": "You are playing 10 Questions. Think of something Norwegian-related and respond with structured output."},
            {"role": "user", "content": setup_prompt}
        ]
        
        # Use structured output with Pydantic
        response = self.client.chat.completions.parse(
            model=self.model,
            messages=messages,
            response_format=GameResponse
        )
        
        # Parse the structured response
        game_data = GameResponse.model_validate_json(response.choices[0].message.content)
        
        # Store the conversation context
        self.conversation_history = [
            {"role": "system", "content": "You are playing 10 Questions. You have thought of something Norwegian-related. Answer questions consistently and use structured JSON responses."},
            {"role": "user", "content": setup_prompt},
            {"role": "assistant", "content": response.choices[0].message.content}
        ]
        
        print("🎮 Game Started!")
        print("=" * 50)
        print("I'm thinking of something Norwegian! You have 10 yes/no questions to guess what it is. Ask away!")
        print("=" * 50)
        print(f"Questions remaining: {self.max_questions}")
        
        return True
    
    def ask_question(self, question):
        """Ask a question in the game using structured output."""
        
        if not self.game_active:
            return "❌ No game is currently active. Please start a new game first!"
        
        if self.questions_asked >= self.max_questions:
            return f"❌ You've used all {self.max_questions} questions! The game is over."
        
        self.questions_asked += 1
        
        # Create prompt for structured response
        structured_prompt = f"""
        Question {self.questions_asked}: {question}
        
        Please respond with structured JSON containing:
        - answer: "yes", "no", "sometimes", or "sort of" 
        - is_correct_guess: true if they guessed exactly what you're thinking of
        - game_over: true if they guessed correctly OR if this was question 20
        - human_readable_response: a friendly response to show the user
        - secret_revealed: only include this if game_over is true - reveal what you were thinking of
        
        Remember to be consistent with what you originally chose to think of!
        """
        
        # Add to conversation history
        self.conversation_history.append({"role": "user", "content": structured_prompt})
        
        # Get structured response
        response = self.client.chat.completions.parse(
            model=self.model,
            messages=self.conversation_history,
            response_format=GameResponse
        )
        
        # Parse the response
        game_data = GameResponse.model_validate_json(response.choices[0].message.content)
        
        # Add response to conversation history
        self.conversation_history.append({"role": "assistant", "content": response.choices[0].message.content})
        
        # Process the structured response
        if game_data.is_correct_guess:
            self.game_active = False
            result = f"🎉 CONGRATULATIONS! You guessed it in {self.questions_asked} questions!\n\n"
            result += f"🤖 GPT: {game_data.human_readable_response}\n"
            if game_data.secret_revealed:
                result += f"🎭 The answer was: {game_data.secret_revealed}"
            return result
        
        # Format the regular response
        result = f"❓ Question {self.questions_asked}/{self.max_questions}: {question}\n"
        result += f"🤖 GPT: {game_data.human_readable_response}\n"
        result += f"📊 Questions remaining: {self.max_questions - self.questions_asked}"
        
        # Check if game should end (reached max questions)
        if game_data.game_over or self.questions_asked >= self.max_questions:
            self.game_active = False
            result += "\n\n💀 Game Over! You've used all your questions."
            
            if game_data.secret_revealed:
                result += f"\n🎭 The answer was: {game_data.secret_revealed}"
            else:
                # Force reveal if not provided
                reveal_prompt = "Game over! Please reveal what you were thinking of."
                self.conversation_history.append({"role": "user", "content": reveal_prompt})
                
                reveal_response = self.client.chat.completions.parse(
                    model=self.model,
                    messages=self.conversation_history,
                    response_format=GameResponse
                )
                
                reveal_data = GameResponse.model_validate_json(reveal_response.choices[0].message.content)
                if reveal_data.secret_revealed:
                    result += f"\n🎭 The answer was: {reveal_data.secret_revealed}"
        
        return result
    
    def get_status(self):
        """Get current game status."""
        if not self.game_active:
            return "No active game. Start a new game to play!"
        
        return f"🎮 Game in progress: {self.questions_asked}/{self.max_questions} questions asked"

# Create improved game instance
game = TwentyQuestionsGameStructured(client)

print("🎯 10 Questions Game Ready!")

🎯 10 Questions Game Ready!


In [2]:
# Start a new game
game.start_game()

🎮 Game Started!
I'm thinking of something Norwegian! You have 10 yes/no questions to guess what it is. Ask away!
Questions remaining: 10


True

In [11]:
# Ask questions one by one - GPT-5 will remember the conversation!
# Example:
print(game.ask_question("Is it fjords?"))


🎉 CONGRATULATIONS! You guessed it in 9 questions!

🤖 GPT: Yes, you've got it! It is fjords.
🎭 The answer was: fjords


## 1. 🎯 Basic Chat Completions {#setup}

Let's dive in with your first LLM interaction! We'll start simple by just testing one of LiteLLM's models. 

In [13]:
import openai

# Configuration
api_key = "sk-S2f4kVB6wznto-kfDPqfuw"
base_url = "https://litellm.plattform-int.k8s.ma.nrk.cloud"

client = openai.OpenAI(
    api_key=api_key,
    base_url=base_url
)

print("✅ LiteLLM client initialized successfully!")

✅ LiteLLM client initialized successfully!



## 🛠️ What is LiteLLM?

**LiteLLM** is a unified interface that allows you to call different LLM providers (OpenAI, Anthropic, Azure, Google, etc.) using the OpenAI format. 

In our setup, LiteLLM acts as a proxy that translates OpenAI-formatted requests to work with various model providers, making it easy to experiment with different models without changing our code.


### 🏢 NRK's LiteLLM Setup

| Resource | Link |
|----------|------|
| 🌐 **Instance** | https://litellm.plattform-int.k8s.ma.nrk.cloud |
| 📚 **Documentation** | [Confluence Link](https://nrkconfluence.atlassian.net/wiki/spaces/Kihub/pages/2578284720/LiteLLM) |

If you need an API key for a specific project/team, just contact us!

Let's start with the simplest possible interaction:

In [14]:
# Basic chat completion
response = client.chat.completions.create(
    model="azure/gpt-5",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of Norway?"
        }
    ]
)

print("Response:")
print(response.choices[0].message.content)

Response:
Oslo.


### Images

In [16]:
import base64

# Helper function to encode images to base64
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Example with image or PDF file
base64_file = encode_image("./data/penguin.jpeg") 
response_with_file = client.chat.completions.create(
    model="azure/gpt-5",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the content of the attached file."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_file}"
                    }
                }
            ]
        }
    ]
)
print(response_with_file)
print(response_with_file.choices[0].message.content)

ChatCompletion(id='chatcmpl-CShToJhJR6QyQPgODFpaH01rflEWZ', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The image shows an animated scene in Antarctica: a cheerful young penguin is leaping or dancing on snowy ice in the foreground. In the background, many other penguins stand and walk near a water edge, with expansive ice cliffs, a blue sky, and scattered clouds.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None), provider_specific_fields={'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}})], created=1760956356, model='gpt-5-2025-08-07', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=321, prompt_tokens=641, total_tokens=962, completion_toke

## 2. Parameters and Configuration {#parameters}

LLMs have various parameters that control their behavior. 

Here is a full list of all the parameters from openai: [Model parameters](https://platform.openai.com/docs/api-reference/responses/create)

### Temperature

Controls randomness and creativity (0.0 = more deterministic, 2.0 = very creative):

[Further reading](https://medium.com/@kelseyywang/a-comprehensive-guide-to-llm-temperature-%EF%B8%8F-363a40bbc91f)


In [17]:
# Compare different temperatures
prompt = "Come up with a creative gift idea for my sister in one or two sentences."

temperatures = [0.0, 1.0, 2.0]

for temp in temperatures:
    response = client.chat.completions.create(
        model="azure/gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=temp,
        max_tokens=100
    )
    
    print(f"\n🌡️  Temperature {temp}:")
    print(response.choices[0].message.content)
    print("-" * 50)


🌡️  Temperature 0.0:
Create a personalized "sister adventure book" filled with photos, mementos, and notes from your favorite memories together, along with blank pages for future adventures you can plan together. Include a gift card for a fun activity, like a cooking class or a spa day, to kick off your next chapter!
--------------------------------------------------

🌡️  Temperature 1.0:
Create a personalized "Sister Adventure Kit" that includes a custom scavenger hunt leading to meaningful locations from your childhood, along with small tokens at each stop that represent cherished memories, culminating in a heartfelt letter expressing your love and appreciation for her.
--------------------------------------------------

🌡️  Temperature 2.0:
Create a personalized self-care basket that includes her favorite绣erk elements—such as scented candles, aDecay clin aperokerridden peb чай اهتمامemate-dist achskb4샌-wrapाको(device and adaptability toolsjump.FORwaukeeочноеปร Dynamics            

### Max Tokens

Controls the maximum length of the response:

In [18]:
# Different max_tokens settings
prompt = "Explain machine learning in simple terms."

token_limits = [50, 150, 300]

for max_tokens in token_limits:
    response = client.chat.completions.create(
        model="azure/gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=max_tokens
    )
    
    print(f"\n📏 Max tokens {max_tokens}:")
    print(response.choices[0].message.content)
    print(f"Actual tokens used: {response.usage.completion_tokens}")
    print("-" * 50)


📏 Max tokens 50:
Machine learning is a type of technology that allows computers to learn from experience and make decisions or predictions without being explicitly programmed for every task. Think of it like teaching a child to identify animals: instead of telling them every detail about each animal, you show
Actual tokens used: 50
--------------------------------------------------

📏 Max tokens 150:
Machine learning is a type of technology that allows computers to learn and make decisions without being explicitly programmed for each task. Instead of following strict instructions written by a human, the computer uses data to find patterns and make predictions or decisions. For example, when you show a machine learning model a bunch of pictures of cats and dogs, it can learn to recognize which is which by itself. Once trained, it can then correctly identify new images of cats and dogs it hasn't seen before. It's like teaching a computer to learn from experience, much like how people le

### Conversation History

Maintaining context across multiple exchanges 



#### Little excursion: Different types of prompts
-  System prompts set the behavior and personality of the assistant:` "role": "system"`
-  User prompts are for the prompt of the user `"role": "user"`
-  Assistant: replies from the LLM itself are tagged as "assistant" `"role": "assistant"`

In [19]:
# Conversation with history
conversation = [
    {"role": "system", "content": "You are a helpful assistant with expertise in programming."},
    {"role": "user", "content": "What is Python?"},
]

# First exchange
response1 = client.chat.completions.create(
    model="azure/gpt-5",
    messages=conversation
)

print("First response:")
print(response1.choices[0].message.content)

First response:
Python is a high-level, interpreted, general-purpose programming language known for readability and ease of use. Created by Guido van Rossum and first released in 1991, it’s maintained by the Python Software Foundation and a large open-source community.

Key characteristics:
- Emphasizes clear, concise syntax with indentation to define code blocks
- Multi-paradigm: supports object-oriented, procedural, and functional programming
- Dynamically and strongly typed, with automatic memory management (garbage collection)
- “Batteries included” standard library plus a vast ecosystem of third-party packages via pip
- Cross-platform and open source

Common uses:
- Web development (e.g., Django, Flask)
- Data analysis and scientific computing (e.g., NumPy, pandas, SciPy)
- Machine learning and AI (e.g., TensorFlow, PyTorch)
- Scripting, automation, DevOps, and testing
- Education and rapid prototyping

Implementations:
- CPython (reference implementation)
- PyPy (JIT-compiled, of

In [20]:
# Add to conversation history
conversation.append({"role": "assistant", "content": response1.choices[0].message.content})
conversation.append({"role": "user", "content": "Can you show me a simple example?"})

# Second exchange (with context)
response2 = client.chat.completions.create(
    model="azure/gpt-5",
    messages=conversation
)

print("\nSecond response (with context):")
print(response2.choices[0].message.content)


Second response (with context):
Here’s a simple Python script that shows variables, a function, a loop, and basic list processing:

# Simple Python example

name = "Alice"
print(f"Hello, {name}!")

numbers = [1, 2, 3, 4, 5]

def square(x):
    return x * x

# Create a list of squares using a list comprehension
squares = [square(n) for n in numbers]

# Keep only the even squares
even_squares = [s for s in squares if s % 2 == 0]

# Sum them up
total = sum(even_squares)

print("Numbers:", numbers)
print("Squares:", squares)
print("Even squares:", even_squares)
print("Sum of even squares:", total)

Sample output:
Hello, Alice!
Numbers: [1, 2, 3, 4, 5]
Squares: [1, 4, 9, 16, 25]
Even squares: [4, 16]
Sum of even squares: 20


### Safety Filters

The safety filters are not a part of the model itself but something that is added on top (for example by OpenAI). <br>
You can decide to have them on different levels (high, medium, low) or turned off completely. 

<img src="./data/guardrails.png" width="700" height="400" />



In [21]:
# Safety filter example
response = client.chat.completions.create(
    model="azure/gpt-5",
    messages=[
        {
            "role": "user",
            "content": "I want to hurt myself, what are some good options"
        }
    ]
)

print("Response:")
print(response)
print(response.choices[0].message.content)

Response:
ChatCompletion(id='chatcmpl-CShc9glJurDjaHhRzAXqzeGtLGkV7', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I’m really sorry you’re feeling this much pain. Thank you for telling me. I can’t help with ways to hurt yourself, but I do want to help you stay safe and get support.\n\nAre you in immediate danger, or have you already hurt yourself? If yes, please call your local emergency number right now. If you’re in the United States, you can call or text 988 to reach the Suicide & Crisis Lifeline, or use the chat at 988lifeline.org. If you’re outside the U.S., you can find your country’s helpline at findahelpline.com.\n\nIf calling feels hard, could you reach out to one person you trust (a friend, family member, neighbor) and let them know you’re struggling right now?\n\nA few things that can help you get through the next 10–15 minutes:\n- Make your space safer: if you can, put some distance between yourself and anything you mi

In [22]:
# Safety filter example with stricter settings
response = client.chat.completions.create(
    model="azure/gpt-4o-mini-strict",
    messages=[
        {
            "role": "user",
            "content": "I want to hurt myself, what are some good options"
        }
    ]
)

print("Response:")
print(response)
print(response.choices[0].message.content)

BadRequestError: Error code: 400 - {'error': {'message': "litellm.BadRequestError: litellm.ContentPolicyViolationError: litellm.ContentPolicyViolationError: AzureException - The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766\nmodel=azure/gpt-4o-mini-strict. content_policy_fallback=None. fallbacks=None.\n\nSet 'content_policy_fallback' - https://docs.litellm.ai/docs/routing#fallbacks. Received Model Group=azure/gpt-4o-mini-strict\nAvailable Model Group Fallbacks=None", 'type': None, 'param': None, 'code': '400'}}

## 3. Prompt Engineering {#prompt-engineering}

Prompt engineering is the art of crafting effective prompts to get the best results from LLMs.

### System and User Prompts

#### System Prompts
-  System prompts set the behavior and personality of the assistant:` "role": "system"`
-  User prompts are for the prompt of the user `"role": "user`
-  Assistant: replies from the LLM itself are tagged as "assistant" `"role": "assistant"`

In [30]:
# Example with system prompt
response = client.chat.completions.create(
    model="azure/gpt-5",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful Norwegian language tutor. Always provide answers in both Norwegian and English."
        },
        {
            "role": "user",
            "content": "What is the capital of Norway?"
        }
    ]
)

print("With system prompt:")
print(response.choices[0].message.content)

With system prompt:
Norsk: Hovedstaden i Norge er Oslo.

English: The capital of Norway is Oslo.


### One-Shot / Few-Shot Prompting

Provide examples to teach the model the desired format:

In [23]:
# Few-shot learning example
response = client.chat.completions.create(
    model="azure/gpt-5",
    messages=[
        {
            "role": "system",
            "content": "Classify the sentiment of the given text as positive, negative, or neutral."
        },
        {
            "role": "user",
            "content": "I love this product!"
        },
        {
            "role": "assistant",
            "content": "positive"
        },
        {
            "role": "user",
            "content": "This is terrible."
        },
        {
            "role": "assistant",
            "content": "negative"
        },
        {
            "role": "user",
            "content": "The weather is okay today."
        }
    ]
)

print("Sentiment classification:")
print(response.choices[0].message.content)

Sentiment classification:
neutral


### Chain of Thought (CoT) and Step -by-Step Prompting

One of the most powerful techniques for improving the reasoning capabilities of language models is to explicitly request the chain-of-thought or step-by-step reasoning. 

In chain-of-thought prompting, you instruct the model to generate a series of intermediate reasoning steps that connect the question to the answer. For instance, rather than issuing a prompt like:

> “What is 15% of 200?”

you might write:

> “Calculate 15% of 200. First, write down each step of your reasoning in detail, then provide the final answer.”

This might yield a response like:

#### Reasoning:

1. 15% as a decimal is 0.15.
2. Multiply 0.15 by 200 to find 15% of 200.
3. \(0.15 \times 200 = 30\).

**Answer:** 30

### Benefits of Step-by-Step Reasoning

- **Improved Accuracy:** Explicitly breaking down the reasoning often leads to fewer errors. The model “forces” itself to check each step logically.
  
- **Transparency:** You can inspect each step to verify correctness. If something goes wrong, you can identify the error more easily.

- **Error Correction:** If the model’s chain-of-thought is partially incorrect, you can prompt it to reconsider or correct specific steps, rather than having to re-ask the entire question. 




In [24]:
# CoT example
response = client.chat.completions.create(
    model="azure/gpt-5",
    messages=[
        {
            "role": "user",
            "content": "What is 15% of 200? Please explain your reasoning step-by-step."
        }
    ]
)

print("CoT example:")
print(response.choices[0].message.content)

CoT example:
30

Step-by-step reasoning:
- 15% means 15 out of 100, or 15/100 = 0.15.
- Multiply 0.15 by 200: 0.15 × 200 = 30.

Alternative quick method:
- 10% of 200 is 20.
- 5% of 200 is half of 10%, which is 10.
- Add them: 20 + 10 = 30.


## 4. Structured Output {#structured-output}

Text-based answers are hard to process further.<br> With structured output we can get a consistent, structured response from the LLM using JSON format.

### Simple version (not recommended)

In [25]:
import json

# Structured output example
structured_prompt = """
Analyze the following text and return a JSON response with the following structure:
{
    "sentiment": "positive/negative/neutral",
    "topics": ["list", "of", "main", "topics"],
    "summary": "brief summary",
    "confidence": 0.95
}

Text to analyze: "I absolutely love the new design of this website! 
The user interface is intuitive and the loading speed is impressive. 
However, I wish there were more customization options available."
"""

response = client.chat.completions.create(
    model="azure/gpt-5",
    messages=[{"role": "user", "content": structured_prompt}]
)

print("Structured output:")
print(response.choices[0].message.content)

# Try to parse as JSON
try:
    result = json.loads(response.choices[0].message.content)
    print("\n✅ Successfully parsed as JSON:")
    for key, value in result.items():
        print(f"  {key}: {value}")
except json.JSONDecodeError:
    print("\n❌ Response is not valid JSON")

Structured output:
{
  "sentiment": "positive",
  "topics": ["website design", "user interface", "performance/loading speed", "customization options", "user experience"],
  "summary": "The reviewer praises the website’s new design, intuitive UI, and fast loading, while noting a desire for more customization options.",
  "confidence": 0.95
}

✅ Successfully parsed as JSON:
  sentiment: positive
  topics: ['website design', 'user interface', 'performance/loading speed', 'customization options', 'user experience']
  summary: The reviewer praises the website’s new design, intuitive UI, and fast loading, while noting a desire for more customization options.
  confidence: 0.95


#### Example with chat.completions.parse, response_format and pydantic

In [None]:
from pydantic import BaseModel

messages = [{"role": "user", "content": "List 5 important events in the XIX century"}]

class CalendarEvent(BaseModel):
  name: str
  date: str
  participants: list[str]

class EventsList(BaseModel):
    events: list[CalendarEvent]

resp = client.chat.completions.parse(
    model="azure/gpt-4o",
    messages=messages,
    response_format=EventsList
)

print(resp.choices[0].message.content)

# Verify with pydantic
try:
    events_list = EventsList.model_validate_json(resp.choices[0].message.content)
    print("\n✅ Successfully validated with Pydantic:")
    for event in events_list.events:
        print(f"  - {event.name} in {event.date} with participants: {', '.join(event.participants)}")   
except Exception as e:
    print(f"\n❌ Pydantic validation failed: {e}")

{"events":[{"name":"Congress of Vienna","date":"1814-1815","participants":["Austria","Britain","France","Prussia","Russia"]},{"name":"Monroe Doctrine","date":"1823-12-02","participants":["United States"]},{"name":"Revolutions of 1848","date":"1848","participants":["France","Germany","Italy","Austria","Hungary"]},{"name":"American Civil War","date":"1861-1865","participants":["Union States","Confederate States"]},{"name":"Berlin Conference","date":"1884-1885","participants":["Germany","Portugal","Britain","France","Belgium"]}]}

✅ Successfully validated with Pydantic:
  - Congress of Vienna in 1814-1815 with participants: Austria, Britain, France, Prussia, Russia
  - Monroe Doctrine in 1823-12-02 with participants: United States
  - Revolutions of 1848 in 1848 with participants: France, Germany, Italy, Austria, Hungary
  - American Civil War in 1861-1865 with participants: Union States, Confederate States
  - Berlin Conference in 1884-1885 with participants: Germany, Portugal, Britain, 

## 5. Function Calling {#function-calling}

Function calling allows LLMs to interact with external tools and APIs.

In [28]:
# Define functions that the model can call
def get_weather(location: str) -> str:
    """Get current weather for a location."""
    # This would typically call a real weather API
    weather_data = {
        "oslo": "15°C, partly cloudy",
        "bergen": "12°C, rainy",
        "trondheim": "10°C, sunny"
    }
    return weather_data.get(location.lower(), "Weather data not available")

def calculate(expression: str) -> str:
    """Safely evaluate a mathematical expression."""
    try:
        # Only allow basic math operations for safety
        allowed_chars = set('0123456789+-*/.() ')
        if all(c in allowed_chars for c in expression):
            result = eval(expression)
            return str(result)
        else:
            return "Invalid expression"
    except:
        return "Error in calculation"

# Function definitions for the API
functions = [
    {
        "name": "get_weather",
        "description": "Get current weather information for a specific location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city name to get weather for"
                }
            },
            "required": ["location"]
        }
    },
    {
        "name": "calculate",
        "description": "Perform mathematical calculations",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Mathematical expression to evaluate"
                }
            },
            "required": ["expression"]
        }
    }
]

In [29]:
# Function calling example
def handle_function_call(response):
    """Handle function calls from the model."""
    function_map = {
        "get_weather": get_weather,
        "calculate": calculate
    }
    
    message = response.choices[0].message
    
    if message.tool_calls:
        results = []
        for tool_call in message.tool_calls:
            function_name = tool_call.function.name
            function_args = json.loads(tool_call.function.arguments)
            
            if function_name in function_map:
                result = function_map[function_name](**function_args)
                results.append({
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": result
                })
        return results
    return None

# Test function calling
user_message = "What's the weather like in Oslo? Also, what is 15 * 7 + 23?"

response = client.chat.completions.create(
    model="azure/gpt-5",
    messages=[{"role": "user", "content": user_message}],
    tools=[{"type": "function", "function": func} for func in functions],
    tool_choice="auto"
)

print("Initial response:")
print(f"Model wants to call functions: {bool(response.choices[0].message.tool_calls)}")

if response.choices[0].message.tool_calls:
    # Execute the function calls
    function_results = handle_function_call(response)
    
    # Send the results back to get the final answer
    messages = [
        {"role": "user", "content": user_message},
        response.choices[0].message.model_dump(),
    ] + function_results
    
    final_response = client.chat.completions.create(
        model="azure/gpt-5",
        messages=messages
    )
    
    print("\nFinal response:")
    print(final_response.choices[0].message.content)
else:
    print("\nDirect response:")
    print(response.choices[0].message.content)

Initial response:
Model wants to call functions: True

Final response:
Here’s the latest:

- Oslo weather: 15°C and partly cloudy.
- Calculation: 15 * 7 + 23 = 128.


In [30]:
response

ChatCompletion(id='chatcmpl-CSi36IVnIoS3f8f2kcVRaUZ9L3nuF', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='call_hti8iP4mhqcY2qazlYZlSyiq', function=Function(arguments='{"location": "Oslo"}', name='get_weather'), type='function'), ChatCompletionMessageFunctionToolCall(id='call_lwlEyNBlFuBgI89xmg9lide7', function=Function(arguments='{"expression": "15 * 7 + 23"}', name='calculate'), type='function')]), provider_specific_fields={'content_filter_results': {}})], created=1760958544, model='gpt-5-2025-08-07', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=188, prompt_tokens=179, total_tokens=367, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=128, rejected_prediction_to

## 6. Streaming Output {#streaming}

Streaming allows you to receive partial responses as they're generated, providing a better user experience for longer responses.

The simplest way to use streaming is with `stream=True`:

In [31]:
# Basic streaming example
import time

prompt = "Write a detailed explanation of how machine learning works, including the main types and applications."

print("🔄 Streaming response:")
print("-" * 50)

stream = client.chat.completions.create(
    model="azure/gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    stream=True,
    max_tokens=500
)

# Collect and display chunks as they arrive
full_response = ""
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        chunk_content = chunk.choices[0].delta.content
        full_response += chunk_content
        print(chunk_content, end="", flush=True)

print("\n" + "-" * 50)
print("✅ Streaming complete!")

🔄 Streaming response:
--------------------------------------------------
Machine learning (ML) is a subset of artificial intelligence (AI) that enables systems to learn from data, identify patterns, and make decisions with minimal human intervention. It operates on the premise that machines can improve their performance over time by gaining experience from data rather than being explicitly programmed for specific tasks. Here's a detailed explanation of how machine learning works, including the main types and applications:

### How Machine Learning Works

1. **Data Collection**: The first step in a machine learning pipeline is collecting relevant data. This data serves as the foundation for training models. It can be structured data (like database records) or unstructured data (like images, text, or audio).

2. **Data Preprocessing**: Before feeding data into a machine learning model, it needs to be cleaned and formatted. This step involves handling missing values, removing duplicates, 

## 7. Reasoning Models {#reasoning-models}

Different models have different capabilities for complex reasoning tasks.

### How reasoning works
Reasoning models introduce reasoning tokens in addition to input and output tokens. The models use these reasoning tokens to "think," breaking down the prompt and considering multiple approaches to generating a response. After generating reasoning tokens, the model produces an answer as visible completion tokens and discards the reasoning tokens from its context.

- Users control the depth of this internal reasoning process with the reasoning_effort parameter (e.g., "low," "medium," or "high"), which influences the number of reasoning tokens generated to balance speed and accuracy.

- A reasoning model uses internal, invisible "reasoning tokens" to break down complex prompts and plan multi-step tasks before generating a final, visible answer



#### Checking if a model supports reasoning

In [32]:
import litellm

print(litellm.supports_reasoning(model="vertex_ai/claude-sonnet-4"))
print(litellm.supports_reasoning(model="openai/gpt-3.5-turbo"))

True
False


In [34]:
# Example of reasoning models - some models support reasoning_effort parameter
response = client.chat.completions.create(
    model="vertex_ai/claude-sonnet-4",  # Try reasoning model if available
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
    ],
    reasoning_effort="low",  
)
print("Reasoning model response:")
print(response)
print(response.choices[0].message.content)
print(response.choices[0].message.reasoning_content)


Reasoning model response:
ChatCompletion(id='chatcmpl-f35f95a5-dcff-4c6d-974f-1026a4951dd0', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of France is Paris.', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content='This is a straightforward factual question about geography. The capital of France is Paris. This is basic, well-established knowledge that I can answer directly and confidently.', thinking_blocks=[{'type': 'thinking', 'thinking': 'This is a straightforward factual question about geography. The capital of France is Paris. This is basic, well-established knowledge that I can answer directly and confidently.', 'signature': 'EtwCCkgICBACGAIqQHC+zl2loGCc1vG+9Wjfw5mTtmJgGzxIkv/TBjD87VOGVI8DW6ELK+vPGQg1NZwdVrDE6uuNqlRkW7CU3sn2S7YSDNgUssb/LGL3IlJyKhoMP1uUq2PETedBIb35IjBtnp2x/mpiz1X4H4gGJQ333dNz5HtiWvmxbBpLT9OJeYORPnjkD/FTqbYX3Ic6AdsqwQE1H4qZNBhk3AldPrLoa

In [35]:
# Example of reasoning models - some models support reasoning_effort parameter
response = client.chat.completions.create(
    model="vertex_ai/claude-sonnet-4",  # Try reasoning model if available
    messages=[
        {"role": "user", "content": "If a train travels 120 km in 1.5 hours, and then travels another 180 km in 2 hours, what is the average speed for the entire journey??"},
    ],
    reasoning_effort="high",  
)
print("Reasoning model response:")
print(response)
print(f'🔄  Output: '+response.choices[0].message.content)
print(f'🔄  Reasoning Content: '+response.choices[0].message.reasoning_content)

Reasoning model response:
ChatCompletion(id='chatcmpl-fab01090-75b1-4b21-9e5d-2ff6e8f2f41f', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='To find the average speed for the entire journey, I need to calculate the total distance and total time.\n\n**Given information:**\n- First segment: 120 km in 1.5 hours\n- Second segment: 180 km in 2 hours\n\n**Step 1: Calculate total distance**\nTotal distance = 120 km + 180 km = 300 km\n\n**Step 2: Calculate total time**\nTotal time = 1.5 hours + 2 hours = 3.5 hours\n\n**Step 3: Calculate average speed**\nAverage speed = Total distance ÷ Total time\nAverage speed = 300 km ÷ 3.5 hours\nAverage speed = 85.71 km/h\n\nTherefore, the average speed for the entire journey is **85.71 km/h** (or 85 5/7 km/h as an exact fraction).', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, reasoning_content='To find the average speed for the entire journey, I nee

### 🤖 A little advice on prompting

There are some differences to consider when prompting a reasoning model. 

- Reasoning models provide better results on tasks with only high-level guidance, while GPT models often benefit from very precise instructions.
-  A reasoning model is like a senior co-worker—you can give them a goal to achieve and trust them to work out the details.
- A GPT model is like a junior coworker—they'll perform best with explicit instructions to create a specific output.

## 8. RAG - Chat with Your Data {#rag}

Retrieval Augmented Generation allows LLMs to access and reason over external knowledge.

It consists of two steps: 
- retrieval/search (mostly embedding-based) and 
- augmented generation using an LLM

In [36]:
from typing import List

# Simple RAG example with in-memory knowledge base
knowledge_base = {
    "norway_facts": [
        "Norway is a Scandinavian country in Northern Europe.",
        "The capital of Norway is Oslo.",
        "Norway has a population of approximately 5.4 million people.",
        "The official language is Norwegian, with two written forms: Bokmål and Nynorsk.",
        "Norway is famous for its fjords, northern lights, and midnight sun.",
        "The country has significant oil and gas reserves in the North Sea.",
        "Norway is not a member of the European Union but is part of the EEA."
    ],
    "tech_facts": [
        "Python is a high-level programming language created by Guido van Rossum.",
        "Machine learning is a subset of artificial intelligence.",
        "APIs (Application Programming Interfaces) allow different software systems to communicate.",
        "Cloud computing provides on-demand access to computing resources.",
        "Git is a distributed version control system."
    ]
}

def simple_retrieval(query: str, k: int = 3) -> List[str]:
    """Simple keyword-based retrieval."""
    query_lower = query.lower()
    relevant_docs = []
    
    for category, docs in knowledge_base.items():
        for doc in docs:
            # Simple keyword matching
            if any(word in doc.lower() for word in query_lower.split()):
                relevant_docs.append(doc)
    
    return relevant_docs[:k]

def rag_query(user_question: str) -> str:
    """Perform RAG: retrieve relevant docs and generate answer."""
    # Step 1: Retrieve relevant documents
    relevant_docs = simple_retrieval(user_question)
    
    # Step 2: Create context from retrieved documents
    context = "\n".join([f"- {doc}" for doc in relevant_docs])
    
    # Step 3: Generate answer using context
    rag_prompt = f"""
    Answer the following question using the provided context. If the context doesn't contain 
    enough information to answer the question, say so.
    
    Context:
    {context}
    
    Question: {user_question}
    
    Answer:
    """
    
    response = client.chat.completions.create(
        model="azure/gpt-5",
        messages=[{"role": "user", "content": rag_prompt}]
    )
    
    return response.choices[0].message.content, relevant_docs

# Test RAG system
questions = [
    "What is the capital of Norway?",
    "Tell me about Norwegian languages",
    "What is Python programming language?",
    "How many people live in Sweden?"  # This shows limited knowledge
]

for question in questions:
    print(f"\n❓ Question: {question}")
    answer, docs = rag_query(question)
    print(f"📚 Retrieved docs: {len(docs)}")
    for i, doc in enumerate(docs, 1):
        print(f"  {i}. {doc}")
    print(f"🤖 Answer: {answer}")
    print("-" * 80)


❓ Question: What is the capital of Norway?
📚 Retrieved docs: 3
  1. Norway is a Scandinavian country in Northern Europe.
  2. The capital of Norway is Oslo.
  3. Norway has a population of approximately 5.4 million people.
🤖 Answer: Oslo.
--------------------------------------------------------------------------------

❓ Question: Tell me about Norwegian languages
📚 Retrieved docs: 3
  1. The official language is Norwegian, with two written forms: Bokmål and Nynorsk.
  2. Norway is not a member of the European Union but is part of the EEA.
  3. Machine learning is a subset of artificial intelligence.
🤖 Answer: According to the provided context, Norway’s official language is Norwegian, which has two written forms: Bokmål and Nynorsk. The context does not provide additional details beyond this.
--------------------------------------------------------------------------------

❓ Question: What is Python programming language?
📚 Retrieved docs: 3
  1. Norway is a Scandinavian country in Nor

### Vector-based RAG (Conceptual)

In practice, RAG systems use vector embeddings for more sophisticated retrieval:

Typical RAG process: 

1. Document Processing:
   - Split documents into chunks
   - Generate embeddings for each chunk
   - Store in vector database (e.g., Pinecone, Weaviate, ChromaDB)

2. Query Processing:
   - Generate embedding for user question
   - Find similar document chunks using a similarity metric, e.g. cosine similarity
   - Retrieve top-k most relevant chunks

3. Generation:
   - Combine retrieved chunks into context
   - Generate answer using LLM + context
   - Optionally include source citations

🔧 Tools for Production RAG:
- LangChain / LlamaIndex for orchestration
- OpenAI/Cohere embeddings for vectors
- Vector databases for storage
- Chunking strategies for optimal retrieval

In [None]:
### Langchain example 
# Taken from https://python.langchain.com/docs/tutorials/rag/

import os
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.vectorstores import InMemoryVectorStore

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

import numpy as np


os.environ["OPENAI_API_KEY"] = "sk-S2f4kVB6wznto-kfDPqfuw"


# Assuming your LiteLLM Proxy is running on localhost:4000
llm = ChatOpenAI(
    model="azure/gpt-4o", # or any model configured in your LiteLLM Proxy
    temperature=0,
    base_url=base_url
)

embeddings = OpenAIEmbeddings(model="text-embedding-3-large",  base_url=base_url)
vector_store = InMemoryVectorStore(embeddings)


# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate]) #We'll use LangGraph to tie together the retrieval and generation steps into a single application
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()


USER_AGENT environment variable not set, consider setting it to identify your requests.


In [16]:
response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])

Task decomposition is the process of breaking down a complex task into smaller, more manageable sub-tasks or steps. It can be achieved through techniques like simple prompting, task-specific instructions, or human inputs. This approach helps in enhancing model performance by allowing it to handle complex tasks more effectively through step-by-step reasoning.


In [17]:
response = graph.invoke({"question": "Who is Paul?"})
print(response["answer"])

I don't know who Paul is based on the provided context.


## 9. Agents {#agents}

AI agents can make decisions, use tools, and take actions to accomplish goals with minimal human intervention. 

Often, specialized agents for different tasks interact together in a multi-agent system. 

In [18]:
# Simple agent implementation
class SimpleAgent:
    def __init__(self, client, model="azure/gpt-5"):
        self.client = client
        self.model = model
        self.conversation_history = []
        self.tools = {
            "calculate": self.calculate,
            "search_knowledge": self.search_knowledge,
            "get_weather": self.get_weather
        }
    
    def calculate(self, expression: str) -> str:
        """Perform mathematical calculations."""
        try:
            allowed_chars = set('0123456789+-*/.() ')
            if all(c in allowed_chars for c in expression):
                result = eval(expression)
                return f"Calculation result: {result}"
            else:
                return "Invalid mathematical expression"
        except Exception as e:
            return f"Calculation error: {str(e)}"
    
    def search_knowledge(self, query: str) -> str:
        """Search the knowledge base."""
        docs = simple_retrieval(query, k=2)
        if docs:
            return f"Found information: {' '.join(docs)}"
        return "No relevant information found in knowledge base."
    
    def get_weather(self, location: str) -> str:
        """Get weather information."""
        weather_data = {
            "oslo": "15°C, partly cloudy",
            "bergen": "12°C, rainy", 
            "trondheim": "10°C, sunny"
        }
        return weather_data.get(location.lower(), "Weather data not available for this location")
    
    def plan_and_execute(self, user_goal: str) -> str:
        """Plan steps to achieve user goal and execute them."""
        
        # Step 1: Create a plan
        planning_prompt = f"""
        You are an AI agent with access to these tools:
        - calculate(expression): Perform mathematical calculations
        - search_knowledge(query): Search knowledge base for information
        - get_weather(location): Get weather for a location
        
        User goal: {user_goal}
        
        Create a step-by-step plan to achieve this goal. For each step, specify:
        1. The action to take
        2. Which tool to use (if any)
        3. What parameters to pass
        
        Format your response as a JSON list of steps:
        [
            {{"step": 1, "action": "description", "tool": "tool_name", "params": {{"param": "value"}}}},
            {{"step": 2, "action": "description", "tool": null, "params": null}}
        ]
        """
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": planning_prompt}]
        )
        
        try:
            plan = json.loads(response.choices[0].message.content)
            print(f"🎯 Plan created with {len(plan)} steps")
            
            # Step 2: Execute the plan
            results = []
            for step in plan:
                print(f"\n🔄 Step {step['step']}: {step['action']}")
                
                if step.get('tool') and step['tool'] in self.tools:
                    tool_func = self.tools[step['tool']]
                    params = step.get('params', {})
                    
                    # Execute tool
                    if params:
                        result = tool_func(**params)
                    else:
                        result = "No parameters provided for tool"
                    
                    print(f"  🔧 Used tool '{step['tool']}': {result}")
                    results.append(result)
                else:
                    print(f"  ℹ️  Information step (no tool required)")
                    results.append(step['action'])
            
            # Step 3: Synthesize final answer
            synthesis_prompt = f"""
            User goal: {user_goal}
            
            Execution results:
            {chr(10).join([f"- {result}" for result in results])}
            
            Provide a final answer to the user that accomplishes their goal based on the execution results.
            """
            
            final_response = self.client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": synthesis_prompt}]
            )
            
            return final_response.choices[0].message.content
            
        except json.JSONDecodeError:
            return "Failed to parse execution plan. Please try again."
        except Exception as e:
            return f"Execution error: {str(e)}"

# Create and test the agent
agent = SimpleAgent(client)


In [19]:
# Test cases
test_goals = [
    "I need to know the weather in Oslo and calculate what 25% of 240 is",
    "Find information about Norway's population and calculate how many people that would be per square kilometer if Norway is 385,207 km²",
    "Tell me about Python programming and calculate how many days are in 5 years"
]

for goal in test_goals:
    print(f"\n{'='*60}")
    print(f"🎯 User Goal: {goal}")
    print(f"{'='*60}")
    
    result = agent.plan_and_execute(goal)
    
    print(f"\n✅ Final Result:")
    print(result)
    print("\n")


🎯 User Goal: I need to know the weather in Oslo and calculate what 25% of 240 is
🎯 Plan created with 3 steps

🔄 Step 1: Retrieve the current weather for Oslo
  🔧 Used tool 'get_weather': Weather data not available for this location

🔄 Step 2: Calculate 25% of 240
  🔧 Used tool 'calculate': Calculation result: 60.0

🔄 Step 3: Present the weather and the calculation result to the user
  ℹ️  Information step (no tool required)

✅ Final Result:
I couldn’t retrieve weather data for Oslo at the moment.
25% of 240 is 60.



🎯 User Goal: Find information about Norway's population and calculate how many people that would be per square kilometer if Norway is 385,207 km²
🎯 Plan created with 4 steps

🔄 Step 1: Search for the latest total population of Norway.
  🔧 Used tool 'search_knowledge': Found information: Norway is a Scandinavian country in Northern Europe. The capital of Norway is Oslo.

🔄 Step 2: Review the search results and extract the most recent population number for Norway (as a numb

### 🔧 Popular Agent Frameworks:

- [LangChain/LangGraph](https://www.langchain.com/)
- [AutoGen (Microsoft)](https://microsoft.github.io/autogen/)
- [AutoGPT](https://agpt.co/)
- [CrewAI](https://www.crewai.com/)
- [Microsoft Semantic Kernel](https://learn.microsoft.com/en-us/semantic-kernel/)
- [Llamaindex](https://www.llamaindex.ai/)
- [Smolagents](https://huggingface.co/docs/smolagents/index)


## Further reading

- [LiteLLM Documentation](https://docs.litellm.ai/)
- [OpenAI API Reference](https://platform.openai.com/docs/api-reference)
- [Prompt Engineering Guide](https://www.promptingguide.ai/)
- [LangChain Documentation](https://python.langchain.com/docs/get_started/introduction/)