# 1. Connecting to and Querying a DataSource (XLSX - SQLLite)

This is a series of example of how you can connect Agents to different sources of data with Microsoft Azure's OpenAI Service

This is the <b>SECOND</b> file of 3 similar scripts

## Note Upfront
If you want to your LLM model to have knowledge of your own data you can use  
  - Fine Tune your model with your data but this is very expensive to do (CPU) and doesn't work well with data that changes.  
  - RAG with a vectordatabase: this is suitable for documents, pictures, ..., but what about data in structured storage like databases or spreadsheets?
  - use a connector to a database: this is what this module is about
  
- In this series we will show how you can set up an Agent to 
  - use Langchain agents to connect to SQL Database or CSV file
  - via Azure OpenAI Assistants API (function calling + code interpreter) (stateful management + short term memory)
  - via Azure OpenAI Fucntion Calling: to perform tasks based on your questions  
- the 2nd part of this exercise connects to a SQLlite database, you have to have it installed on your environment  
- This demo is based upon on https://learn.deeplearning.ai/courses/building-your-own-database-agent  

## prereqs 
0. setup your local repo with a clone from this gitrepo. Don't forget to run the requirements.txt
1. have a MS Azure Account; with a valid subscription  
    - running through the course steps cost me < â‚¬0.50 but keep an eye on the costs. (portal.azure.com > search for 'Invoices' > Select 'Invoices' > Cost Management > Cost Analysis)  
    - remove the project when no longer needed to avoid recurrent costs.      
2. have a AI Foundry project with a deployed model
* Create a project -> Azure AI Foundry Resource  
    - chose a meaningful name, subscription you have setup, resource group (or create a new one), region (I typically pick Sweden Central as most of the AI Models are there)
* Pick the right urls & credentials !!  
    - pick the API Key and put it in your local .env with   
    - libraries: PICK AZURE OpenAI: something like https://<project_name>-resource.openai.azure.com/  

Your .env needs to look something like
AZURE_OPENAI_API_KEY=<your_api_key>  
AZURE_URL=https://<project_name>-resource.openai.azure.com/<br>

* Deploy a model: You can pick anymodel but I work with the gpt-4.1-mini model and model version. Put that in the .env file to have all parameters in one location  
AZURE_OPENAI_MODEL=gpt-4.1-mini  
AZURE_OPENAI_MODEL_VERSION=2025-03-01-preview

## 1.3 Connecting to database via Tools calls and predefined sql 

### 1.3.1 Step 1: Setting up your Azure & Langchain

In [4]:
import os
import pandas as pd
from IPython.display import Markdown, HTML, display
from dotenv import load_dotenv
import json

# Load environment variables from .env file
load_dotenv(override=True)   # avoid the sytem set parameters to override your local the .env file

True

In [5]:

from openai import AzureOpenAI

# Azure OpenAI Configuration (CORRECT endpoint from Azure Portal)
# endpoint = "https://js-alphacentauri-resource.cognitiveservices.azure.com/"
endpoint = os.getenv("AZURE_URL")
deployment = os.getenv("AZURE_OPENAI_MODEL")
v_model = os.getenv("AZURE_OPENAI_MODEL")
api_version = os.getenv("AZURE_OPENAI_MODEL_VERSION")  # Updated to latest API version for Responses API support

# Get API key from environment
subscription_key = os.getenv("AZURE_OPENAI_API_KEY")

# if issues, uncomment the following to validate the keys are correctly read
# print("âœ… Azure OpenAI client configured")
# print(f"Endpoint: {endpoint}")
# print(f"Deployment: {deployment}")
# print(f"API Version: {api_version}")    
# print(f"Subscription Key: {subscription_key}")
# print(f"API Key (1st 5 Chars): {subscription_key[:5]}...")

# Create Azure OpenAI client
client = AzureOpenAI(
    api_version=api_version,
    azure_endpoint=endpoint,
    api_key=subscription_key,
)

In [6]:
# Test the connection
response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "Say hello and tell me you're working! But to lighten up the atmosphere, tell a joke about Generative AI and Langchain.",
        }
    ],
    model=deployment
)

print(response.choices[0].message.content)


Hello! I'm working hard and ready to assist you. To lighten up the mood, hereâ€™s a joke for you:

Why did the Generative AI break up with LangChain?

Because it got tired of all the endless chainsâ€”and wanted to generate some original sparks instead! ðŸ˜„


## 1.3.2 STEP 4. Azure OpenAI Function Calling
We basically define sqlqueries as function and use the function calling method (with optional parameters)  
The advantage is that   
- it is more secure, no REPL needed  
- it is more precise, less room for error: the agent doesn't have to figure out how to construct the query and that leaves less room for error  
The disadvantage is less flexibility, you basically predefine the sql queries   


### 1.3.2.1 Simple Tool example
Let's start with a simple function/tool definition and 'calling it from the LLM'

In [7]:
## example of fucntion 
def get_current_weather(location, unit="Celsius"):
    """Get the current weather in a given location. 
    The default unit when not specified is Celsius"""
    if "merelbeke" in location.lower():
        return json.dumps(
            {"location": "Merelbeke", "country":"Belgium", "temperature": "20", "unit": unit}
        )
    elif "antwerpen" in location.lower():
        return json.dumps(
            {"location": "Antwerpen", "country":"Belgium", "temperature": "25", "unit": unit}
        )
    elif "las vegas" in location.lower():
        return json.dumps(
            {"location": "Las Vegas", "country":"USA", "temperature": "35", "unit": unit}
        )
    else:
        return json.dumps(
            {"location": location, "country":"unknown", "temperature": "unknown", "unit":  unit}
        )

get_current_weather("Merelbeke")

'{"location": "Merelbeke", "country": "Belgium", "temperature": "20", "unit": "Celsius"}'

In [8]:
# user prompt
weather_messages = [
    {"role": "user",
     "content": """What's the weather like in Merelbeke,
                   Antwerpen, and Las Vegass?"""
    }
]

#tool definition
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": """Get the current weather in a given
                              location.The default unit when not
                              specified is Celsius""",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": """The city and state,
                                        e.g. San Francisco, CA""",
                    },
                    "unit": {
                        "type": "string",
                        "default":"Celsius",
                        "enum": [ "Fahrenheit", "Celsius"],
                        "description": """The messuring unit for
                                          the temperature.
                                          If not explicitly specified
                                          the default unit is 
                                          Celsius"""
                    },
                },
                "required": ["location"],
            },
        },
    }
]

In [None]:
#defining the OpenAI clietn class
client = AzureOpenAI(
    api_version=api_version,
    azure_endpoint=endpoint,
    api_key=subscription_key,
)

#just call the class
response = client.chat.completions.create(
    model=v_model,
    messages=weather_messages,
    tools=tools,
    tool_choice="auto", 
)

response_message = response.choices[0].message
print ("Response message: \n" , response_message.model_dump_json(indent=2), "\n\n" )
##########################################################################################################
# in the tool_calls you can the LLM model INDICATES that we should use the get_current_weather function
# we still need to execute it in the python code
##########################################################################################################

tool_calls = response_message.tool_calls
print("Tool calls: \n" , tool_calls, "\n\n")
# print ("TOOLS CALLS: \n" , tool_calls.model_dump_json(indent=2) , "\n\n" )

Response message: 
 {
  "content": null,
  "refusal": null,
  "role": "assistant",
  "annotations": [],
  "audio": null,
  "function_call": null,
  "tool_calls": [
    {
      "id": "call_w5FHLIbyGosBvJCfdttANQK1",
      "function": {
        "arguments": "{\"location\": \"Merelbeke\"}",
        "name": "get_current_weather"
      },
      "type": "function"
    },
    {
      "id": "call_oyGsgAv9ZW3tiuE4RPkzShM6",
      "function": {
        "arguments": "{\"location\": \"Antwerpen\"}",
        "name": "get_current_weather"
      },
      "type": "function"
    },
    {
      "id": "call_m0796sByjrVpekoH4xSrOZAj",
      "function": {
        "arguments": "{\"location\": \"Las Vegas\"}",
        "name": "get_current_weather"
      },
      "type": "function"
    }
  ]
} 


Tool calls: 
 [ChatCompletionMessageFunctionToolCall(id='call_w5FHLIbyGosBvJCfdttANQK1', function=Function(arguments='{"location": "Merelbeke"}', name='get_current_weather'), type='function'), ChatCompletionMessageFu

In [15]:
available_functions = {
    "get_current_weather": get_current_weather,
} 

answers = []

##############################################################################################################
# Overhere we check if the LLM recommends to do function calling and if so which ones with which parameters
# We run over them and call the weather function
##############################################################################################################

if tool_calls:
   
    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)
        function_response = function_to_call(
            location=function_args.get("location"),
            unit=function_args.get("unit", "Celsius")  ## you can see from the function=Fuction that the only argument is location
        )
        answers.append(
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": function_response,
            }
        )  
    print ("Answers: \n" , answers , "\n\n" ) 
       
    def print_arguments(obj):
        "Recursively find and print JSON arguments or content with key/value pairs."
        if isinstance(obj, list):
            for item in obj:
                print_arguments(item)
        elif isinstance(obj, dict):
            # if it looks like a serialized JSON string of arguments/content, try parsing it
            for key, val in obj.items():
                if key in ("arguments", "content") and isinstance(val, str) and val.strip().startswith("{"):
                    try:
                        parsed = json.loads(val)
                        for k, v in parsed.items():
                            if k == "unit":
                                print(f"{k}: {v} \n")
                                v = 1
                            else:
                                print(f"{k}: {v}")
                    except Exception:
                        pass
                else:
                    print_arguments(val)

print_arguments(answers)

Answers: 
 [{'tool_call_id': 'call_w5FHLIbyGosBvJCfdttANQK1', 'role': 'tool', 'name': 'get_current_weather', 'content': '{"location": "Merelbeke", "country": "Belgium", "temperature": "20", "unit": "Celsius"}'}, {'tool_call_id': 'call_oyGsgAv9ZW3tiuE4RPkzShM6', 'role': 'tool', 'name': 'get_current_weather', 'content': '{"location": "Antwerpen", "country": "Belgium", "temperature": "25", "unit": "Celsius"}'}, {'tool_call_id': 'call_m0796sByjrVpekoH4xSrOZAj', 'role': 'tool', 'name': 'get_current_weather', 'content': '{"location": "Las Vegas", "country": "USA", "temperature": "35", "unit": "Celsius"}'}] 


location: Merelbeke
country: Belgium
temperature: 20
unit: Celsius 

location: Antwerpen
country: Belgium
temperature: 25
unit: Celsius 

location: Las Vegas
country: USA
temperature: 35
unit: Celsius 



### 1.3.2.2 Tool Calling for SQL Statements

In [36]:
from sqlalchemy import create_engine

# Path to your SQLite database file
# database_file_path = "./data/test.db"
database_file_path = "./data/sales_db2.db"

#read in the data from the csv file
df = pd.read_csv("./data/synthetic_sales_data.csv", sep="#").fillna(value=0)

# Create an engine to connect to the SQLite database
# SQLite only requires the path to the database file
engine = create_engine(f'sqlite:///{database_file_path}')
# file_url = "./data/all-states-history.csv"
# df = pd.read_csv(file_url).fillna(value = 0)
# df.to_sql(
#     'all_states_history',
#     con=engine,
#     if_exists='replace',
#     index=False
# )
df.to_sql(
    'sales_db',
    con=engine,
    if_exists='replace',
    index=False
)

100

In [55]:
import numpy as np
from sqlalchemy import text

# ---------------------------------------------------------------------
# These functions provide SQL access helpers for the 'sales_db' table.
# They are intended to be called by an LLM (or any automation layer)
# that wants to query specific metrics (KPI_01, profit, cost) filtered
# by region and product line.
#
# Structure:
#   a) Define the SQL query
#   b) Execute the query
#   c) Return the result as a dictionary or NaN if no result
#
# Note: These functions rely on a globally available SQLAlchemy 'engine'
#       and expect a 'sales_db' table with columns:
#       region, productline, KPI_01, profit, cost, etc.
# ---------------------------------------------------------------------

def Total_KPI_01(reg, prodline):
    """
    Return KPI_01 values for a given region and product line.

    LLM_HINT:
    - Use this function when the user asks for KPI_01 data for a specific region/product line.
    - It reads from 'sales_db' and filters on both region and productline.
    - Returns a list of records (as dicts) if data exists, or NaN otherwise.

    Args:
        reg (str): The region name (e.g. 'East', 'West').
        prodline (str): The product line (e.g. 'Clothing', 'Electronics').

    Returns:
        list[dict] | np.nan: The query result as a list of dictionaries, or NaN if empty.
    """
    try:
        query = f"""
        SELECT region, productline,  KPI_01
        FROM sales_db
        WHERE region = '{reg}' AND 
              productline = '{prodline}'
        group by region, productline;
        """
        query = text(query)

        with engine.connect() as connection:
            result = pd.read_sql_query(query, connection)
            
        if not result.empty:
            return result.to_dict('records')
        else:
            return np.nan
        return result
    except Exception as e:
        print(e)
        return np.nan
    
    
def sum_profit_cost (reg, prodline):
    """
    Return the total (sum) of profit and cost for a given region and product line.

    LLM_HINT:
    - Use this function when the user asks for combined totals of profit and cost
      filtered by region and productline.
    - The function groups by both region and productline.

    Args:
        reg (str): The region name.
        prodline (str): The product line name.

    Returns:
        list[dict] | np.nan: The query result as a list of dictionaries, or None if empty.
    """
    try:
        query = f"""
        SELECT region, productline, sum(profit), sum(cost)
        FROM sales_db
        where region = '{reg}'  AND
            productline = '{prodline}'
        group by region, productline;
        """
        query = text(query)

        with engine.connect() as connection:
            result = pd.read_sql_query(query, connection)
            
        if not result.empty:
            return result.to_dict('records')
        else:
            return [{'Region': reg, 'ProductLine': prodline, 'sum(profit)': None, 'sum(cost)': None}]
        return result
    except Exception as e:
        print(e)
        return [{'Region': reg, 'ProductLine': prodline, 'sum(profit)': None, 'sum(cost)': None}]


In [56]:
print(Total_KPI_01("East","Electronics"))

print(sum_profit_cost("East","Clothing"))

print(sum_profit_cost("Americas","Clothing"))

[{'Region': 'East', 'ProductLine': 'Electronics', 'KPI_01': 80.44}]
[{'Region': 'East', 'ProductLine': 'Clothing', 'sum(profit)': 632182.71, 'sum(cost)': 1705407.29}]
[{'Region': 'Americas', 'ProductLine': 'Clothing', 'sum(profit)': None, 'sum(cost)': None}]


In [57]:
sql_messages = [
    {"role": "user",
     "content": """ How much what the summed up KPI_01 for Region East and productline Electronics?"""
    },
    {"role": "user",
     "content": """ What is the sum of the profits and costs for Region East and productline Clothing"""
    },
    {"role": "user",
     "content": """ What is the sum of the profits and costs for Region Americas and productline Clothing"""
    },
    
]

In [58]:
tools_sql = [
    {
        "type": "function",
        "function": {
            "name": "get_total_kpi_01_for_region_productline",
            "description": """Retrieves the sum of the KPI_01 for a specified region and specified productline.""",
            "parameters": {
                "type": "object",
                "properties": {
                    "reg": {
                        "type": "string",
                        "description": """The name of the region
                                          (e.g., 'East', 'West')."""
                    },
                    "prodline": {
                        "type": "string",
                        "description": """The name of the productline 
                                          (e.g. 'Toys','Clothing')."""
                    }
                },
                "required": ["reg", "prodline"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_sum_profit_cost_for_region_productline",
            "description": """Retrieves the sum of the profits and the costs for a specified region and specified productline
                                """,
            "parameters": {
                "type": "object",
                "properties": {
                    "reg": {
                        "type": "string",
                        "description": """The name of the region
                                          (e.g., 'East', 'West')."""
                    },
                    "prodline": {
                        "type": "string",
                        "description": """The name of the productline 
                                          (e.g. 'Toys','Clothing')."""
                    }
                },
                "required": ["reg", "prodline"]
            }
        }
    }
]

In [61]:
#just call the class
response = client.chat.completions.create(
    model=v_model,
    messages=sql_messages,
    tools=tools_sql,
    tool_choice="auto", 
)

response_sql_message = response.choices[0].message
tool_calls = response_sql_message.tool_calls

print (tool_calls)
sql_answers = []

available_functions = {
    "get_total_kpi_01_for_region_productline": Total_KPI_01,
    "get_sum_profit_cost_for_region_productline":sum_profit_cost
}  

if tool_calls:
   
    for tool_call in tool_calls:
        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)
        function_response = function_to_call(
            reg=function_args.get("reg"),
            prodline=function_args.get("prodline"),
        )
        sql_answers.append(
            {
                "tool_call_id": tool_call.id,
                "role": "tool",
                "name": function_name,
                "content": json.dumps(function_response),
            }
        ) 
    print ("Answers: \n" , sql_answers , "\n\n" ) 

[ChatCompletionMessageFunctionToolCall(id='call_DW5a9FBBJ897Uzw55NB5eSpt', function=Function(arguments='{"reg": "East", "prodline": "Electronics"}', name='get_total_kpi_01_for_region_productline'), type='function'), ChatCompletionMessageFunctionToolCall(id='call_l2eoOq6txC4vT0Eh1EYcBJsL', function=Function(arguments='{"reg": "East", "prodline": "Clothing"}', name='get_sum_profit_cost_for_region_productline'), type='function'), ChatCompletionMessageFunctionToolCall(id='call_bsPXNPzUzGl0OXqOOkml3ziY', function=Function(arguments='{"reg": "Americas", "prodline": "Clothing"}', name='get_sum_profit_cost_for_region_productline'), type='function')]
Answers: 
 [{'tool_call_id': 'call_DW5a9FBBJ897Uzw55NB5eSpt', 'role': 'tool', 'name': 'get_total_kpi_01_for_region_productline', 'content': '[{"Region": "East", "ProductLine": "Electronics", "KPI_01": 80.44}]'}, {'tool_call_id': 'call_l2eoOq6txC4vT0Eh1EYcBJsL', 'role': 'tool', 'name': 'get_sum_profit_cost_for_region_productline', 'content': '[{"Reg

In [64]:
for ans in sql_answers:
    content = json.loads(ans['content'])
    # handle both list and dict formats safely
    data = content[0] if isinstance(content, list) else content

    print(f"- Request: {ans['name']}")
    print(f"- Region: {data.get('Region', '')}")
    print(f"- ProdLine: {data.get('ProductLine', '')}")
    print(f"- KPI_01: {data.get('KPI_01', '')}")
    print(f"- Sum of profit: {data.get('sum(profit)', '')}")
    print(f"- Sum of Cost: {data.get('sum(cost)', '')} \n\n")

- Request: get_total_kpi_01_for_region_productline
- Region: East
- ProdLine: Electronics
- KPI_01: 80.44
- Sum of profit: 
- Sum of Cost:  


- Request: get_sum_profit_cost_for_region_productline
- Region: East
- ProdLine: Clothing
- KPI_01: 
- Sum of profit: 632182.71
- Sum of Cost: 1705407.29 


- Request: get_sum_profit_cost_for_region_productline
- Region: Americas
- ProdLine: Clothing
- KPI_01: 
- Sum of profit: None
- Sum of Cost: None 




In [71]:
second_response = client.chat.completions.create(
            model=v_model,
            messages=sql_messages,
        )
print (second_response)
print(json.dumps(second_response, default=lambda o: getattr(o, '__dict__', str(o)), indent=2))


ChatCompletion(id='chatcmpl-CXqlpUKpAR50YP3m1KUNNH3rkRf1t', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Could you please provide the data or the table that contains the information for KPI_01, profits, costs, regions, and product lines? This will help me calculate the sums accurately.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None), content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'protected_material_text': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1762183709, model='gpt-4.1-mini-2025-04-14', object='chat.completion', service_tier=None, system_fingerprint='fp_3dcd5944f5', usage=CompletionUsage(completion_tokens=38, prompt_tokens=64, total_tokens=102, completion_tokens_details=CompletionTokens