# LLM Training Notebook

This notebook accompanies an internal training session on Utilizing Large Language Models (LLMs).

## Objectives

By the end of this session, you should understand:
- How to call LLMs programmatically using different frameworks (e.g., OpenAI, HuggingFace, local models).
- The structure of responses returned by LLM APIs.
- Best practices for prompt construction.
- Prompt templating.
- How function calling works with modern LLM APIs.

In [None]:
# Install requirements
%pip install -r requirements.txt

In [None]:
import os
import yaml

# Import \
parent_dir = os.path.abspath(os.path.join(os.path.dirname(__name__), ".."))
config_path = os.path.join(parent_dir, "config.yaml")
with open(config_path, "r", encoding="utf-8") as f:
    config = yaml.safe_load(f)

## Calling LLMs

In this section, we will learn how to programmatically call Large Language Models (LLMs) using `python`.

There are multiple ways to do this, but the most straightforward method is using the `openai` package to interact with a hosted LLM provided by a cloud platform such as **Azure OpenAI** or **OpenAI** directly.

Most LLM APIs follow a standardized **chat-based interface**, in which messages are exchanged using roles:
- `system`: provides initial instructions or context to the model (e.g., "You are a helpful assistant.")
- `user`: represents queries or instructions from the user.
- `assistant`: responses generated by the LLM (only used in multi-turn history).

This role-based structure helps the model interpret intent and maintain context.

### 2.1 OpenAI (Hosted on Azure)

Azure OpenAI provides enterprise-grade access to OpenAI models, including GPT-3.5 and GPT-4, with additional benefits like regional hosting, RBAC, and cost management.

#### Step 1: Deploy a model on Azure

Before you can use an Azure-hosted model, you need to:
1. Create an Azure OpenAI resource.
2. Deploy a model (e.g., `gpt-35-turbo`, `gpt-4`) within that resource.
3. Note the following values from your deployment:
   - **Endpoint URI** (e.g., `https://your-resource.openai.azure.com/`)
   - **API Key**
   - **Deployment Name** (the identifier you gave your model)
   - **API Version** (e.g., `2024-03-01-preview`)

Using these settings in the `config_template.yaml`, we can now generate a client to have a back-and-forth with an LLM on Azure.

#### Step 2: Generate a client and test the connection

In [None]:
from openai import AzureOpenAI

# gets the API Key from environment variable AZURE_OPENAI_API_KEY
client = AzureOpenAI(
    api_version="2023-07-01-preview",
    api_key=os.getenv("AZURE_OPENAI_API_KEY") or "asd",
    azure_endpoint=os.getenv("AZURE_OPENAI_API_ENDPOINT") or "http://asd",
)

completion = client.chat.completions.create(
    # Name we gave the deployment in Azure.
    model=os.getenv("AZURE_OPENAI_MODEL_NAME") or "gpt-4o",
    # The full list of messages to send to the model.
    messages=[
        {"role": "system", "content": "You will talk like a pirate."},
        {
            "role": "user",
            "content": "Tell me a joke.",
        },
    ],
)

# Print response
print(completion.choices[0].message.content)

#### Step 3: Let's take a look at the output structure

Now that we have a `completion`, we can investigate what was retrieved from the API call.

The `completion` object returned by `client.chat.completions.create` is a Python dictionary with a standardized structure. Here’s how you can inspect it:

In [None]:
import json
print(json.dumps(completion, indent=2))

Key fields to note:
- choices: A list of one or more completion options. Each contains:
- index: Position of this choice in the list.
- message: The actual response, with role and content.
- finish_reason: Indicates why the model stopped generating (e.g., "stop", "length", or "function_call").
- usage: Token counts for prompt and completion. Useful for monitoring usage and cost.
- id, created, model: Metadata about the request.

Understanding this structure is essential when post-processing responses, logging, or chaining prompts.

## Best practices - Prompt structure

Writing effective prompts is one of the most important skills when working with LLMs. The quality, clarity, and intent of your prompt heavily influence the model's output.

While specific models (e.g., OpenAI's `gpt-4` vs. open-source LLMs) may benefit from tailored prompt engineering, there are general best practices that apply broadly.

Let’s walk through these using an example task: **asking an LLM to perform a code review of a small function**.

Our initial naive prompt is `Review my code`.
This is too vague and will likely result in a generic or superficial response.

### 1. Be explicit and specific

Give the model enough context to interpret the request meaningfully.

**Improved prompt:**
```text
You are an expert senior full-stack developer. You must perform an in-depth code review of the following Python function, focusing on correctness, best practices, computational efficiency, and clarity. Do not rewrite the code; only comment where improvements can be made.
```

### 2. Define output structure

Explicitly define the format of the output to ensure it's consistent and useful for downstream consumption or integration.

```text
Respond using the following structure:
- Summary: A concise overview of the function’s quality.
- Strengths: A bullet list of well-implemented aspects.
- Suggestions: A bullet list of potential improvements, each with justification.
- Severity: Indicate impact level for each suggestion (Low / Medium / High).
```

### 3. Use Constraints

Control verbosity, structure, tone, or language explicitly.

```text
Limit the answer to 100 words and avoid technical jargon.
Respond in Dutch.
Use only Markdown for formatting.
```

### 4. Ask for Chain-Of-Thought (CoT)

Encourage the model to reason explicitly before answering. This is particularly useful in complex problem-solving and debugging tasks.

```text
Think step-by-step before proposing improvements. Explain your reasoning as if teaching a junior developer.
```

### 5. Set Success Criteria

Tell the model what a "good" response looks like.

```text
The ideal output is clear, concise, and actionable. It should be easily understood by a mid-level software engineer.
```

### 6. Use Few-shot Examples to Set Expectations

Provide 1–2 examples of inputs and desired outputs before your actual task.
This technique can dramatically improve performance on repetitive tasks or formatting-sensitive outputs.


### 7. Chain Prompts When Needed

Split complex workflows into smaller, manageable prompts. LLMs are generally more accurate when dealing with a narrow, well-defined scope.

Example workflow:
1.	First prompt: Extract and summarize function purpose.
2.	Second prompt: Identify inefficiencies or bugs.
3.	Third prompt: Suggest performance improvements.

## Final prompt example

```text
You are an expert senior full-stack developer. You must perform an in-depth code review of the following Python function, focusing on correctness, best practices, computational efficiency, and clarity.

Focus on:
- Correctness
- Computational efficiency
- Pythonic best practices
- Code clarity and maintainability

Constraints:
- Do not rewrite the code.
- Use only Markdown formatting.
- Limit your response to 200 words.
- Use clear, professional language.

Respond using this structure:
- **Summary**: One or two sentences summarizing the review.
- **Strengths**: Bullet list of good practices observed.
- **Suggestions**: Bullet list of improvements, each with a brief justification.
- **Severity**: Label each suggestion as Low / Medium / High.

The goal is to produce feedback that is clear, actionable, and helpful for a mid-level developer.
```

## Prompt Templating

For general prompts that need to be reused across multiple inputs or tasks, **prompt templating** is a powerful technique. It allows you to define a reusable prompt structure and dynamically insert task-specific input at runtime. Prompt templating is fundamental for scalable LLM workflows—whether you're building chatbots, pipelines, or tools that generate or analyze structured text.
___

### Example: Code Review Prompt Template

Suppose you want to review multiple code snippets using the same prompt pattern. You can define a template with placeholders and fill it dynamically.
For instance, we define two placeholders: `n_words` and `code` which we fill afterwards.

In [None]:

prompt_template = """
    You are a senior full-stack engineer specializing in Python. Perform an in-depth code review of the following function.

    Focus on:
    - Correctness
    - Computational efficiency
    - Pythonic best practices
    - Code clarity and maintainability

    Constraints:
    - Do not rewrite the code.
    - Use only Markdown formatting.
    - Limit your response to {n_words} words.
    - Use clear, professional language.

    Respond using this structure:
    - **Summary**: One or two sentences summarizing the review.
    - **Strengths**: Bullet list of good practices observed.
    - **Suggestions**: Bullet list of improvements, each with a brief justification.
    - **Severity**: Label each suggestion as Low / Medium / High.

    Code:
    ```python
    {code}
    ```
"""

## Use the prompt template and fill with example code and n_words
n_words = 100
code = """
def calculate_sum(numbers):
    total = 0
    for number in numbers:
        total += number
    return total
"""

prompt_filled = prompt_template.format(n_words=n_words, code=code)

# Create a chat completion using the prompt
completion = client.chat.completions.create(
    model=os.getenv("AZURE_OPENAI_MODEL_NAME") or "gpt-4o",
    messages=[
        {"role": "system", "content": "You are a senior full-stack engineer specializing in Python."},
        {
            "role": "user",
            "content": prompt_filled,
        },
    ],
)

## Function Calling

Not all tasks should be delegated to an LLM—especially simple, deterministic operations like summing numbers or counting the occurrence of a specific character in a string. While LLMs are powerful, they are not guaranteed to be accurate on arithmetic or token-level operations due to their probabilistic nature.

Instead, we can combine the strengths of LLMs with traditional code by using **function calling**.

### Concept

Use the LLM to interpret the user's intent and decide *which* function to call, but let actual logic and computation be handled by explicitly defined code.

This pattern is especially useful in:

- Natural language interfaces
- Assistants and agents
- Chatbots that need reliable outputs for certain tasks

### Example: Count letter occurrences

Let’s say the user asks:
How many times does the letter ‘r’ appear in the word ‘strawberrry’?
> (Note: the word contains a typo with **4 r’s**, not 3.)


In [None]:
completion = client.chat.completions.create(
    model=os.getenv("AZURE_OPENAI_MODEL_NAME") or "gpt-4o",
    messages=[
        {"role": "system", "content": "Count the number of r's in the text."},
        {
            "role": "user",
            "content": "strawberrry",
        },
    ],
)

print(completion.choices[0].message.content)


Rather than trusting the LLM to count letters directly, we define a function to do this reliably and we will provide the model with additional context into what that function does.

In [None]:
import json


# --- Function we want to call ---
def count_letter(word: str, letter: str) -> int:
    return word.count(letter)


# --- Function schema (OpenAI format) ---
functions = [
    {
        "name": "count_letter",
        "description": "Count how many times a letter appears in a word.",
        "parameters": {
            "type": "object",
            "properties": {
                "word": {"type": "string", "description": "The word to inspect"},
                "letter": {
                    "type": "string",
                    "description": "The single character to count",
                },
            },
            "required": ["word", "letter"],
        },
    }
]


# --- Call the model with function-calling enabled ---
completion = client.chat.completions.create(
    model=os.getenv("AZURE_OPENAI_MODEL_NAME") or "gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "How many times does the letter 'r' appear in the word 'strawberrry'?",
        }
    ],
    # We add the function schema to the request
    tools=functions,
    tool_choice="auto",
)

# Process the model's response
choice = completion.choices[0]

# --- If the model chose a function to call, do it in Python ---
if choice.message.function_call:
    # Get the function name and arguments
    function_name = choice.message.function_call.name
    function_args = json.loads(choice.message.function_call.arguments)

    # Call the function with the arguments
    result = globals()[function_name](**function_args)

    # Print the result
    print(f"Function '{function_name}' returned: {result}")
# --- If the model didn't choose a function, print the message ---
else:
    print(f"Model response: {choice.message.content}")