# 1. Connecting to and Querying a DataSource (XLSX - SQLLite)

This is a series of example of how you can connect Agents to different sources of data with Microsoft Azure's OpenAI Service

This is the <b>Third</b> file of 3 similar scripts

## Note Upfront
If you want to your LLM model to have knowledge of your own data you can use  
  - Fine Tune your model with your data but this is very expensive to do (CPU) and doesn't work well with data that changes.  
  - RAG with a vectordatabase: this is suitable for documents, pictures, ..., but what about data in structured storage like databases or spreadsheets?
  - use a connector to a database: this is what this module is about
  
- In this series we will show how you can set up an Agent to 
  - use Langchain agents to connect to SQL Database or CSV file
  - via Azure OpenAI Assistants API (function calling + code interpreter) (stateful management + short term memory)
  - via Azure OpenAI Fucntion Calling: to perform tasks based on your questions  
- the 2nd part of this exercise connects to a SQLlite database, you have to have it installed on your environment  
- This demo is based upon on https://learn.deeplearning.ai/courses/building-your-own-database-agent  

## prereqs 
0. setup your local repo with a clone from this gitrepo. Don't forget to run the requirements.txt
1. have a MS Azure Account; with a valid subscription  
    - running through the course steps cost me < â‚¬0.50 but keep an eye on the costs. (portal.azure.com > search for 'Invoices' > Select 'Invoices' > Cost Management > Cost Analysis)  
    - remove the project when no longer needed to avoid recurrent costs.      
2. have a AI Foundry project with a deployed model
* Create a project -> Azure AI Foundry Resource  
    - chose a meaningful name, subscription you have setup, resource group (or create a new one), region (I typically pick Sweden Central as most of the AI Models are there)
* Pick the right urls & credentials !!  
    - pick the API Key and put it in your local .env with   
    - libraries: PICK AZURE OpenAI: something like https://<project_name>-resource.openai.azure.com/  

Your .env needs to look something like
AZURE_OPENAI_API_KEY=<your_api_key>  
AZURE_URL=https://<project_name>-resource.openai.azure.com/<br>

* Deploy a model: You can pick anymodel but I work with the gpt-4.1-mini model and model version. Put that in the .env file to have all parameters in one location  
AZURE_OPENAI_MODEL=gpt-4.1-mini  
AZURE_OPENAI_MODEL_VERSION=2025-03-01-preview
## 1.4 Loading a XLSX and using the build-in CODE INTERPRETER

### 1.3.1 Step 1: Setting up your Azure & Langchain

In [2]:
import os
import pandas as pd
from IPython.display import Markdown, HTML, display
from dotenv import load_dotenv
import json
import time

# Load environment variables from .env file
load_dotenv(override=True)   # avoid the sytem set parameters to override your local the .env file

True

In [3]:

from openai import AzureOpenAI

# Azure OpenAI Configuration (CORRECT endpoint from Azure Portal)
# endpoint = "https://js-alphacentauri-resource.cognitiveservices.azure.com/"
endpoint = os.getenv("AZURE_URL")
deployment = os.getenv("AZURE_OPENAI_MODEL")
v_model = os.getenv("AZURE_OPENAI_MODEL")
api_version = os.getenv("AZURE_OPENAI_MODEL_VERSION")  # Updated to latest API version for Responses API support

# Get API key from environment
subscription_key = os.getenv("AZURE_OPENAI_API_KEY")

# if issues, uncomment the following to validate the keys are correctly read
# print("âœ… Azure OpenAI client configured")
# print(f"Endpoint: {endpoint}")
# print(f"Deployment: {deployment}")
# print(f"API Version: {api_version}")    
# print(f"Subscription Key: {subscription_key}")
# print(f"API Key (1st 5 Chars): {subscription_key[:5]}...")

# Create Azure OpenAI client
client = AzureOpenAI(
    api_version=api_version,
    azure_endpoint=endpoint,
    api_key=subscription_key,
)

In [4]:
# Test the connection
response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "Say hello and tell me you're working! But to lighten up the atmosphere, tell a joke about Generative AI and Langchain.",
        }
    ],
    model=deployment
)

print(response.choices[0].message.content)


Hello! I'm up and running, ready to assist you. To lighten the mood, hereâ€™s a joke for you:

Why did the Generative AI break up with Langchain?

Because it got tired of all the callbacks! ðŸ˜„

How can I help you today?


## 1.4: Leveraging CODE_INTERPRETER 

###Code Interpreter

In [5]:
client = AzureOpenAI(
    azure_endpoint=endpoint,
    api_key=subscription_key,
    api_version=api_version
    )

In [6]:
# Upload the CSV file
file_path = "./data/synthetic_sales_data.csv"

with open(file_path, "rb") as f:
    uploaded = client.files.create(
        file=f,
        purpose="assistants"  # Azure currently uses "assistants" until official user_data support
    )
print("Uploaded file id:", uploaded.id)

Uploaded file id: assistant-CFjSeKKrBiE16hSzZHpi4n


In [7]:
# Prepare query and use Responses API
user_question = "What's the sales total per region per year"
response = client.responses.create(
    model=v_model,
    instructions="""You are a sales business intelligence assistant answering questions about sales data. The CSV file has been uploaded and is available for analysis. 
    Answer only with the response to the question, no other text or comments, no reasoning, no explanation. 
    Do not ask follow-up questions""",
    tools=[
        {
            "type": "code_interpreter",
            "container": {"type": "auto", "file_ids": [uploaded.id]}
        }
    ],
    input=[{
        "role": "user",
        "content": user_question
    }]
)

In [8]:
print(response.model_dump_json(indent=2))

{
  "id": "resp_00640b25058dc77b006908d36bfb40819399e69a3f7e55f3a1",
  "created_at": 1762186091.0,
  "error": null,
  "incomplete_details": null,
  "instructions": "You are a sales business intelligence assistant answering questions about sales data. The CSV file has been uploaded and is available for analysis. \n    Answer only with the response to the question, no other text or comments, no reasoning, no explanation. \n    Do not ask follow-up questions",
  "metadata": {},
  "model": "gpt-4.1-mini",
  "object": "response",
  "output": [
    {
      "id": "ci_00640b25058dc77b006908d36e35908193b414e65fa8180df2",
      "code": "import pandas as pd\n\n# Load the data\nfile_path = '/mnt/data/assistant-CFjSeKKrBiE16hSzZHpi4n-synthetic_sales_data.csv'\ndf = pd.read_csv(file_path)\n\n# Check the columns to understand the structure\ndf.columns",
      "container_id": "cntr_6908d36c802c8190882f10bb2bd918440d6eb9dbf3b9fb8f",
      "outputs": null,
      "status": "completed",
      "type": "cod

In [9]:
# Extract assistant text content from the response object
for item in response.output:
    # Check if this is a message (has 'role' attribute)
    if hasattr(item, 'role') and item.role == "assistant":
        # Iterate through content items
        if hasattr(item, 'content'):
            for content_item in item.content:
                if hasattr(content_item, 'type') and content_item.type == "output_text":
                    if hasattr(content_item, 'text'):
                        print(content_item.text)
                        print("\n---\n")  # Separator between messages

| Region | Year | Revenue  |
|--------|------|----------|
| East   | 2023 | 379,388  |
| East   | 2024 | 255,207  |
| East   | 2025 | 914,767  |
| East   | 2026 | 1,090,312|
| East   | 2027 | 615,688  |
| East   | 2028 | 296,825  |
| East   | 2029 | 642,100  |
| East   | 2030 | 440,879  |
| East   | 2031 | 894,307  |
| North  | 2023 | 664,430  |
| North  | 2024 | 776,848  |
| North  | 2025 | 1,124,744|
| North  | 2026 | 425,787  |
| North  | 2027 | 1,152,745|
| North  | 2028 | 340,727  |
| North  | 2029 | 645,810  |
| North  | 2030 | 391,115  |
| North  | 2031 | 336,765  |
| South  | 2023 | 806,581  |
| South  | 2024 | 859,722  |
| South  | 2025 | 656,457  |
| South  | 2026 | 1,687,648|
| South  | 2028 | 1,013,571|
| South  | 2029 | 1,494,355|
| South  | 2030 | 522,308  |
| West   | 2023 | 1,252,562|
| West   | 2024 | 843,120  |
| West   | 2025 | 979,114  |
| West   | 2027 | 1,462,810|
| West   | 2028 | 197,609  |
| West   | 2029 | 109,376  |
| West   | 2030 | 1,673,645|

---

