# Basic LLM Calls and RAG Workflow
## Quick Start Guide for Large Language Models (LLMs)

For practice purposes, I use OpenAI's GPT-4o-mini models for text based Q&A.  In this exercise, I will show how to make basic LLM calls and implement a Retrieval-Augmented Generation (RAG) workflow using Python. The "database" used in this example is a simple structured dummy dataset, which can be easily replaced with any other data source (Notebook 5).

All environment variables are set in the `.env` file, which is not included in this repository for security reasons. A brief description of the `.env` file is provided below.

The basic LLM call is made using the `openai` Python package, which is installed via pip. 

```python
import openai

client = openai.Client(api_key="your-api-key",base_url= 'your-base-url')  # Set your OpenAI API key and base URL

response = openai.ChatCompletion.create(
    model="gpt-4o-mini", # or "gpt-5-nano"
    temperature=0.7, # controls randomness in the response
    max_tokens=100, # maximum number of tokens in the response
    top_p=1.0,
    messages=[
        {"role": "user", "content": message}
    ]
)
message = "What is the capital of France?"
messages = "You are a helpful assistant. Please answer the question: " + message
print(response.choices[0].message.content)
```
### Getting started with OpenAI Module

In [None]:
#Optional
!pip install openai
!pip install python-dotenv

### Set up the OpenAI client

With the latest OpenAI Python SDK, the way to call API has changed compared to older versions. The key difference is:

- You must create a client instance (OpenAI()) to call the API.

- The client instance will automatically read API key and base URL from environment variables if you don’t pass them explicitly.


```python
from openai import OpenAI

client = OpenAI(api_key="your-api-key", api_base="your-base-url")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)
```



### Getting Environment Variables from the .env File
It is a good practice to store sensitive information like API keys in environment variables. You can use the `dotenv` or os package to load these variables from a `.env` file.

You should avoid hardcoding sensitive information in your code. Instead, use environment variables to store them securely.

A typical `.env` file might look like this:

```
OPENAI_API_KEY=your-api-key
OPENAI_API_BASE_URL=your-base-url
other_variable=value
```


In [8]:
import openai
from openai import OpenAI
import os

# Set API key and base URL globally
api_key = os.getenv("OPENAI_API_KEY") # Use correct env var name from your .env or manually set it
print("OpenAI API Key:", api_key[:10] + "..." if api_key else "Not found")  # Only show first 10 chars for security

base_url = os.getenv("OPENAI_API_BASE", "https://api.openai.com/v1")  # Use correct env var name from your .env
print("OpenAI Base URL:", base_url)

model = os.getenv("OPENAI_MODEL", "gpt-5-nano")  # Use correct env var name from your .env
print("OpenAI Model:", model)  # Print the model being used

# Initialize OpenAI client with API key and base URL
client = OpenAI(
    api_key=api_key,
    base_url=base_url
)


OpenAI API Key: sk-nzH4YCa...
OpenAI Base URL: https://xiaoai.plus/v1
OpenAI Model: gpt-5-nano


### Calling the OpenAI API
To get a response from the OpenAI API, you can use the `client.chat.completions.create` method. This method takes several parameters, including the model to use, the temperature, and the messages to send to the model.

The basic syntax for calling the OpenAI API is as follows:

Step 1: Import the necessary libraries and set up the OpenAI client.

Step 2: Create a client instance with your API key and base URL.

`Step 3: Define a function to generate a response from the LLM, optionally using RAG data.`

There are several critical parameters to consider when making an LLM call:
- `model`: The model to use for the LLM call, such as "gpt-4o-mini" or "gpt-4o-mini-preview". This determines the capabilities, performance and cost of the LLM.

- `max_tokens`: The maximum number of tokens to generate in the response. This limits the length of the response and helps control costs. Not available in the gpt-5-nano model.

- `top_p`: Controls the diversity of the response. A value of 1.0 means no filtering, while lower values (e.g., 0.9) restrict the response to more probable tokens. For example, setting `top_p=0.9` will only consider the top k most likely tokens for the response, which can help reduce randomness and improve coherence. If you set `top_p` to 0, the model will always choose the most likely next token, resulting in very deterministic responses.

- `messages`: A list of messages to send to the model. Each message is a dictionary with a `role` (either "user" or "assistant") and `content` (the text of the message). This allows for multi-turn conversations and context management.

- `temperature`: The temperature parameter in a language model (LLM) is a **scalar** value that controls the randomness of the model's predictions. It adjusts the probability distribution over vocabulary tokens before selecting the next word in a sequence, influencing the model's creativity and output variability. Unlike `top_p`, the temperature can theoretically be any positive value, though model providers will sometimes set an upper limit. A higher temperature (e.g., 0.7) results in more creative responses, while a lower temperature (e.g., 0.2) results in more focused and deterministic responses. 

- `repetition_penalty`: A parameter that discourages the model from repeating the same phrases or words in its response. A higher value (e.g., 1.2) increases the penalty for repetition, which can help produce more varied and interesting responses. **Not available in the gpt models though.**

In [18]:
# Example of using GPT-4o-mini
message = "What is the capital of France?"

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": message}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    max_tokens=500,  # Limit response length
    temperature=0.7,  # Control randomness
    top_p=0.5,  # Use all tokens
    
)
print("Response from GPT-4o-mini:", response.choices[0].message.content)

Response from GPT-4o-mini: The capital of France is Paris.


In [None]:
# Example usage of the OpenAI client with gpt-5-nano model
# This example assumes you have set the OPENAI_API_KEY and OPENAI_API_BASE environment variables correctly.
# If you haven't set them, you can replace them with your actual API key and base URL.
message = "What is the capital of France?"

messages = "You are a helpful assistant. Please answer the question: " + message 
#this is the final prompt sent

response = client.chat.completions.create(
    model= model, # or "gpt-5-nano" for the preview version
    # temperature and max_tokens are not supported for gpt-5-nano
    # temperature=0.7, # controls randomness in the response (not supported for gpt-5-nano)
    # max_tokens=100, # maximum number of tokens in the response (not supported for gpt-5-nano)
    top_p=1.0,
    messages=[
        {"role": "user", "content": message}
    ]
)

print(response.choices[0].message.content)
print("Model used is:", model)
# this will return the response from the model, which should be "Paris" for this question

Paris.
Model used is: gpt-5-nano


LLMs can answer quetions about non-text data, such as images, audio, and video.

In [14]:
# Answer question from picture with 5-nano and image
image_url ='https://static.independent.co.uk/s3fs-public/thumbnails/image/2017/03/28/13/kitten.jpg?quality=75&width=1250&crop=3%3A2%2Csmart&auto=webp'

response_pic = client.chat.completions.create(
    model="gpt-5-nano",
    # temperature and max_tokens are not supported for gpt-5-nano
    # temperature=0.7, # not supported for gpt-5-nano
    # max_tokens=300, # not supported for gpt-5-nano
    top_p=1.0,
    messages=[
        {"role": "user", 
         "content": [{"type": "text", "text": "What is this a picture of?"}, 
                     {"type": "image_url", "image_url": image_url}]
            }
        ]
)
print(response_pic.choices[0].message.content)

A kitten (cat) inside a cardboard box with its mouth open.


In [13]:
import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

local_image_path = './resource/cat.jpg'  # Path to the local image file (corrected path)

# Encode the image
base64_image = encode_image(local_image_path)

response_pic = client.chat.completions.create(
    model="gpt-5-nano",
    # temperature and max_tokens are not supported for gpt-5-nano
    # temperature=0.7, # not supported for gpt-5-nano
    # max_tokens=300, # not supported for gpt-5-nano
    top_p=1.0,
    messages=[
        {"role": "user", 
         "content": [{"type": "text", "text": "What is this a picture of?"}, 
                     {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}]
            }
        ]
)
print(response_pic.choices[0].message.content)

Two cats.


### LLM Calls with Multiple Rounds
This is useful for more complex interactions where the model needs to maintain context over multiple exchanges.

Different from the single round call, we need to `maintain a list of messages` that represent the conversation history. The messages list is constructed with alternating user and assistant roles.

The `messages` list looks like this:

[{"role": "user", "content": "What is the capital of France?"}, 

{"role": "assistant", "content": "The capital of France is Paris."},

{"role": "user", "content": "What is the population of Paris?"},

{"role": "assistant", "content": "The population of Paris is approximately 2.1 million."}
...
]

The messages represent the conversation history, allowing the model to understand the context of the current interaction. In some sense, it is a type of `Retrieval-Augmented Generation (RAG)` where the model retrieves relevant information from the `conversation history` to generate a response.

#### Example of Multi-Round Chat

Run the following code to see how to make a multi-round chat with the OpenAI API. Make sure to replace the `api_key` and `base_url` with your own values or set them in the `.env` file.

```python

In [None]:
# Importing necessary libraries
import os
from openai import OpenAI
from IPython.display import Markdown, display

# Load environment variables from .env file
api_key = os.getenv("OPENAI_API_KEY") # Use correct env var name from your .env or manually set it
print("OpenAI API Key:", api_key[:10] + "..." if api_key else "Not found")  # Only show first 10 chars for security
base_url = os.getenv("OPENAI_API_BASE", "https://api.openai.com/v1")  # Use correct env var name from your .env
print("OpenAI Base URL:", base_url)

# Initialize OpenAI client with API key and base URL
client = OpenAI(
    api_key=api_key,
    base_url=base_url
)

In [None]:
def multi_round_chat(messages):
    response = client.chat.completions.create(
        model="gpt-5-nano", # or "gpt-5-nano" for the preview version
        # temperature and max_tokens are not supported for gpt-5-nano
        # temperature=0.7, # not supported for gpt-5-nano
        # max_tokens=400, # not supported for gpt-5-nano
        top_p=1.0,
        messages=messages
    )
    return response.choices[0].message.content  

user_input = input("You: ")
print("You:", user_input)
# Initialize messages with the user's input
messages = [{"role": "user", "content": user_input}]
for _ in range(3):
    response = multi_round_chat(messages)
    print("Assistant:")
    display(Markdown(response)) 
    print("-" * 50) 
    messages.append({"role": "assistant", "content": response})
    
    user_input = input("You: ")
    print("You:", user_input)
    messages.append({"role": "user", "content": user_input})


The code above allows for a multi-round conversation with the LLM. The user can input a message, and the assistant will respond based on the conversation history. The loop continues until the user decides to stop. It can be viewed as a simple chat interface with the LLM.


## A Simple Database: Dictionary or JSON
A large quantity of data are stored in a structured format, such as JSON or XML. LLMs can also return structured data in the response.
```python
house_data = [
    {
        "address": "123 Main St",
        "price": 500000,
        "bedrooms": 3,
        "bathrooms": 2,
        "features": ["garage", "garden"]
    },
    {
        "address": "456 Elm St",
        "price": 600000,
        "bedrooms": 4,
        "bathrooms": 3,
        "features": ["pool", "fireplace"]
    }
]
```
This is a simple example of how to structure the response from an LLM. With f-strings, you can easily format the output to include variable data in a readable way. For example:
```python
description = f"The house at {house_data[0]['address']} has {house_data[0]['bedrooms']} bedrooms and is priced at ${house_data[0]['price']}. It features a {', '.join(house_data[0]['features'])}."

print(description)
```
This will output:
```
The house at 123 Main St has 3 bedrooms and is priced at $500000. It features a garage, garden.
```

### Example of using a simple database
Run the following code to see how to use a `simple database (a list of dictionaries in this case)` to store and retrieve information. This can be useful for applications like real estate listings, product catalogs, or any other structured data. The operation transforms the data into a readable format which can be `stored in a variable` and retrieved later. 

```python

In [None]:
# Example data for a real estate listing
house_data = [
    {"address": "123 Main St, Springfield", "price": 500000, "bedrooms": 3, "bathrooms": 2, "features": ["garage", "garden"]},
    {"address": "456 Elm St, Springfield", "price": 600000, "bedrooms": 4, "bathrooms": 3, "features": ["pool", "fireplace"]},
    {"address": "789 Oak St, Springfield", "price": 450000, "bedrooms": 2, "bathrooms": 1, "features": ["fenced yard", "new roof"]},
    {"address": "101 Pine St, Springfield", "price": 700000, "bedrooms": 5, "bathrooms": 4, "features": ["home office", "basement"]},
    {"address": "202 Maple St, Springfield", "price": 550000, "bedrooms": 3, "bathrooms": 2, "features": ["deck", "modern kitchen"]},
]
# Accessing the data with simple print and formatting
print("Real Estate Listings:")
for house in house_data:
    print(f"Address: {house['address']}, Price: ${house['price']}, Bedrooms: {house['bedrooms']}, Bathrooms: {house['bathrooms']}, Features: {', '.join(house['features'])}")
    print("-" * 50)  # Separator for readability

# Improved formatting with descriptive text
def house_info(houses):
    layout = ''
    for house in houses:
        layout += f"House located at {house['address']} is priced at ${house['price']}. It has {house['bedrooms']} bedrooms and {house['bathrooms']} bathrooms. Notable features include: {', '.join(house['features'])}.\n"
        layout += "=*=" * 20 + "\n"  # Separator for readability
    return layout
print("Formatted Real Estate Listings:")
formatted_info = house_info(house_data)
print(formatted_info)

## Putting It All Together (LLM + Structured Data)
You can combine the LLM calls with structured data to create a more complex application. For example, you can use the LLM to generate a summary of a dataset or to answer questions about the data.

The workflow can be summarized as follows:
1. **Load the data**: Load the structured data from a file or database.
2. **Process the data**: Use Python to process the data and prepare it for the
3. **Call the LLM**: Use the processed data as input to the LLM, either as part of the prompt or as a separate message in a multi-round conversation.   

`Diagram`
```mermaid
flowchart TD
    A[Load Data] --> B[Process Data]        
    B --> C[Call LLM]
    C --> D[Receive Response]
    D --> E[Display Result]
    E --> F[User Input]
    F --> C
```
### Simple Q&A Example with Structured Data


In [None]:
# Initializing
import openai
import os
from openai import OpenAI
from IPython.display import Markdown, display

api_key = os.getenv("OPENAI_API_KEY") # Use correct env var name from your .env or manually set it
print("OpenAI API Key:", api_key[:10] + "..." if api_key else "Not found")  # Only show first 10 chars for security
base_url = os.getenv("OPENAI_API_BASE", "https://api.openai.com/v1")  # Use correct env var name from your .env
print("OpenAI Base URL:", base_url)

# Initialize OpenAI client with API key and base URL
client = OpenAI(
    api_key=api_key,
    base_url=base_url,
)


In [None]:
# Load data
house_data = [
    {"address": "123 Main St, Springfield", "price": 500000, "bedrooms": 3, "bathrooms": 2, "features": ["garage", "garden"]},
    {"address": "456 Elm St, Springfield", "price": 600000, "bedrooms": 4, "bathrooms": 3, "features": ["pool", "fireplace"]},
    {"address": "789 Oak St, Springfield", "price": 450000, "bedrooms": 2, "bathrooms": 1, "features": ["fenced yard", "new roof"]},
    {"address": "101 Pine St, Springfield", "price": 700000, "bedrooms": 5, "bathrooms": 4, "features": ["home office", "basement"]},
    {"address": "202 Maple St, Springfield", "price": 550000, "bedrooms": 3, "bathrooms": 2, "features": ["deck", "modern kitchen"]},
]
def house_info(houses):
    layout = ''
    for house in houses:
        layout += f"House located at {house['address']} is priced at ${house['price']}. It has {house['bedrooms']} bedrooms and {house['bathrooms']} bathrooms. Notable features include: {', '.join(house['features'])}.\n"
        layout += "=*=" * 20 + "\n"  # Separator for readability
    return layout

# Define a function to call LLM
def sys_prompt(house_data, user_query):
    prompt = "You are a real estate assistant. Use the following houses information to answer users queries:\n"
    prompt += house_info(house_data)
    prompt += "\nNow, answer the user's query based on the provided house information. If you don't know the answer, say 'I don't know'. \n"
    prompt += f"User Query: {user_query}\n"
    return prompt


def generate_llm_response(prompt, api_key=api_key, base_url=base_url):
        client = OpenAI(api_key=api_key, base_url=base_url)
        response = client.chat.completions.create(
        model="gpt-5-nano",  # or "gpt-5-nano" for the preview version
        # temperature and max_tokens are not supported for gpt-5-nano
        # temperature=0.7, # not supported for gpt-5-nano
        # max_tokens=500, # not supported for gpt-5-nano
        top_p=1.0,
        messages=[{"role": "user", "content": prompt}]
    )
        return response.choices[0].message.content


In [None]:
# Start the conversation
user_input = input("You: ")
print("You:", user_input)
# Initialize messages with the user's input
messages = [{"role": "user", "content": user_input}]

# Generate response using the LLM with data
response = generate_llm_response (sys_prompt(house_data, user_input), api_key=api_key, base_url=base_url)
print("Assistant:")
display(Markdown(response))



### Muti-Round Conversation Example with Structured Data


In [None]:
for _ in range(3):
    user_input = input("You: ")
    print("You:", user_input)
    messages.append({"role": "user", "content": user_input})
    
    response = generate_llm_response(sys_prompt(house_data, user_input), api_key=api_key, base_url=base_url)
    print("\nAssistant:")
    display(Markdown(response)) 
    messages.append({"role": "assistant", "content": response})

"""The code above allows for a multi-round conversation with the LLM. 
The user can input a message, and the assistant will respond based on the conversation history. 
The loop continues until the user decides to stop. It can be viewed as a simple chat interface with the LLM."""

print("End of conversation. Thank you for using the assistant!")

### Example with RAG as an parameter
In the following cell, we define a function that takes a prompt and an optional RAG parameter. If RAG is enabled, the function will format the prompt with the structured data before making the LLM call.


In [None]:
def generate_llm_with_rag(prompt, api_key=api_key, base_url=base_url, use_RAG=True):

    client = OpenAI(api_key=api_key, base_url=base_url)
    if use_RAG:
        prompt = sys_prompt(house_data, prompt)  # Use the sys_prompt function to format the prompt with RAG data
        response = client.chat.completions.create(
        model="gpt-5-nano",  # or "gpt-5-nano" for the preview version
        # temperature and max_tokens are not supported for gpt-5-nano
        # temperature=0.7, # not supported for gpt-5-nano
        # max_tokens=500, # not supported for gpt-5-nano
        top_p=1.0,
        messages=[{"role": "user", "content": prompt}]
    )
        return response.choices[0].message.content
    else:
        # If not using RAG, just return the prompt
        response = client.chat.completions.create(
        model="gpt-5-nano",  # or "gpt-5-nano" for the preview version
        # temperature and max_tokens are not supported for gpt-5-nano
        # temperature=0.7, # not supported for gpt-5-nano
        # max_tokens=500, # not supported for gpt-5-nano
        top_p=1.0,
        messages=[{"role": "user", "content": prompt}]
    )
        return response.choices[0].message.content


In [None]:

user_input = input("You: ")
print("You:", user_input)
# Initialize messages with the user's input
response_generate_llm_with_rag = generate_llm_with_rag(user_input, api_key=api_key, base_url=base_url, use_RAG=True)
print("RAG Assistant:")
display(Markdown(response_generate_llm_with_rag))

user_input = input("You: ")
print("You:", user_input)
# Generate response using the LLM without RAG
response_generate_llm_with_rag_no_rag = generate_llm_with_rag(user_input, api_key=api_key, base_url=base_url, use_RAG=False)
print("Assistant (without RAG):")
display(Markdown(response_generate_llm_with_rag_no_rag))



========END=========