## TODOs

- [x] take an Excel file and convert it to markdown 
- [  ] then upload to the service 
- [  ] to use File Search
 

# **Sales Analyst Agent**

## **About the Scenario**
This notebook demonstrates a practical use case where an AI Agent is leveraged to gain actionable insights and address key analytical questions related to AdventureWorks sales analysis. It combines file search, computation, and data visualization to streamline decision-making.

The scenario utilizes an AI Agent integrated with two powerful tools: *File Search* and *Code Interpreter*. These tools work together to retrieve sales data from within an Excel Workbook and calculate sales metrics, replicating real-world workflows in retail sales.

### Key Steps:
1. *File Conversion*: Convert Excel Workboot into Markdown tables.
2. *Upload Data*:Import a Markdown file containing the tables into the OpenAI Project.
3. *Perform Sales Analysis*: Leverage *File Search* and *Code Interpreter to compute key sales metrics and insights.
4. *Generate Report*: Leverage *Code Interpreter* to generate sales insights visualizations and leverage Python libraries to render a report.

## **Data**
This scenario uses files from the folder [`data/`](./data/) in this repo. You can clone this repo or copy this folder to make sure you have access to these files when running the sample.

## **Time**
You should expect to spend 10-15 minutes building and running this scenario. 

## **Before you begin**

#### Step 1: Install required libraries
Install dependencies directly within a Jupyter notebook is a good practice because it ensures that all required packages are installed in the correct versions, making the notebook self-contained and reproducible. This approach helps other users or collaborators to set up the environment quickly and avoid potential issues related to missing or incompatible packages.

In [3]:
# Install the packages
%pip install -r ./requirements.txt

print("\nPackages installed successfully.")

Processing c:\developer\repos\local\notebooks\packages\azure_ai_projects-1.0.0b1-py3-none-any.whl (from -r ./requirements.txt (line 7))
azure-ai-projects is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
Note: you may need to restart the kernel to use updated packages.

Packages installed successfully.


#### Step 2: Setting up the environment
Before we begin, we need to load the necessary environment variables from a `.env` file. These variables include sensitive information such as API keys and endpoint URLs, which are crucial for running the code successfully.

Here’s what you need to do:
- Ensure your `.env` file is properly configured in the `.venv/.env` format. We have provided an template `.env` file, `.env.example` for your reference.
- Verify that all required secrets are included in the file before running the code.


The `.env` file must contain the following secrets:
- PROJECT_CONNECTION_STRING: URL to connect to the Azure OpenAI Project to access project resources.
- AZURE_OPENAI_DEPLOYMENT: The name of the Azure OpenAI model deployment.

Now, let’s load these variables and get started!

<code style="background:yellow;color:black">Note: Make sure to keep your `.env` file secure and avoid sharing it publicly. </code>

*For more information about leveraging Python Virtual Environments can be found [here](https://docs.python.org/3/library/venv.html).*

In [4]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Retrieve the secrets
__PROJECT_CONNECTION_STRING = os.getenv("PROJECT_CONNECTION_STRING")
__AZURE_OPENAI_DEPLOYMENT = os.getenv("AZURE_OPENAI_DEPLOYMENT")

# Verify environment variables
if not all([__PROJECT_CONNECTION_STRING, __AZURE_OPENAI_DEPLOYMENT]):
    raise EnvironmentError("One or more environment variables are missing. Please check the .env file.")
else:
    print("Environment variables loaded successfully.")

Environment variables loaded successfully.


In [2]:
!az account show

{
  "environmentName": "AzureCloud",
  "homeTenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47",
  "id": "921496dc-987f-410f-bd57-426eb2611356",
  "isDefault": true,
  "managedByTenants": [],
  "name": "Azure OpenAI - Agents - Experiments",
  "state": "Enabled",
  "tenantDefaultDomain": "microsoft.onmicrosoft.com",
  "tenantDisplayName": "Microsoft",
  "tenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47",
  "user": {
    "name": "johnsonjes@microsoft.com",
    "type": "user"
  }
}


In [1]:
!az ad signed-in-user show

{
  "@odata.context": "https://graph.microsoft.com/v1.0/$metadata#users/$entity",
  "businessPhones": [],
  "displayName": "Jess Johnson",
  "givenName": "Jess",
  "id": "196fc068-4019-48a4-a185-9339ef62e6b2",
  "jobTitle": "Senior Customer Engineer",
  "mail": "jess.johnson@microsoft.com",
  "mobilePhone": null,
  "officeLocation": "PITTSBURGH-910 RIVER/Mobile",
  "preferredLanguage": null,
  "surname": "Johnson",
  "userPrincipalName": "johnsonjes@microsoft.com"
}


## **Azure OpenAI Agent Setup**

#### Step 1: Initializing the Azure Agent Runtime Client

Next, we’ll initialize the Azure Agent Runtime client. This client allows us to interact with Azure OpenAI Project Agents. We will use a `DefaultAzureCredential` to authenticate, meaning you will have to be logged in with the Azure CLI.

In [5]:
import os
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

# Initialize the Azure AI Project client
project_client = AIProjectClient.from_connection_string(
    credential=DefaultAzureCredential(),
    conn_str=os.environ["PROJECT_CONNECTION_STRING"],
)

agent_client = project_client.agents

print("Agent client created successfully.")

Agent client created successfully.


### Step 2: Define `excel_to_markdown_tables` function

In [6]:
import os
import pandas as pd

def excel_to_markdown_tables(excel_file: str, data_dir_path: str="data") -> None:
    """
    Converts each sheet of an Excel file to separate Markdown table files.

    Parameters:
        excel_file (str): Path to the Excel file.
    """
    try:
        

        # Load the Excel file
        workbook = pd.ExcelFile(excel_file)
        print(f"Workbook '{excel_file}' successfully loaded.")

        # Get the base name of the Excel file without the extension
        base_name = os.path.splitext(os.path.basename(excel_file))[0]

        print(f"Base name of the Excel file: {base_name}")

        # Create a directory with the same name as the workbook, if it doesn't exist
        output_dir = os.path.join(data_dir_path, base_name)
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)
            print(f"Output directory '{output_dir}' created.")
        else:
            print(f"Output directory '{output_dir}' already exists.")

        
        # Iterate through each sheet in the workbook
        for sheet_name in workbook.sheet_names:
            print(f"Processing sheet: {sheet_name}")
            
            # Read the sheet into a DataFrame
            df = workbook.parse(sheet_name)
            
            # Convert the DataFrame to a Markdown table
            md = df.to_markdown(index=False, tablefmt="pipe")
            
            # Define the output file name
            output_file = os.path.join(output_dir, f"{sheet_name}.md")
            
            # Write the Markdown table to the file, overwriting if it exists
            with open(output_file, "w") as f:
                f.write(md)
            
            print(f"Markdown table for sheet '{sheet_name}' saved to '{output_file}'.")
            
        print("All sheets processed successfully.")
       

    except FileNotFoundError:
        print(f"Error: The file '{excel_file}' was not found.")
        raise

    except Exception as e:
        print(f"An error occurred while processing the Excel file: {e}")
        raise

print("Function 'excel_to_markdown_tables' created successfully.")

Function 'excel_to_markdown_tables' created successfully.


In [7]:
# Convert the Excel file to a Markdown table
excel_to_markdown_tables("data/AdventureWorksSales.xlsx", "data")
print("\nExcel file converted to Markdown tables successfully.")

Workbook 'data/AdventureWorksSales.xlsx' successfully loaded.
Base name of the Excel file: AdventureWorksSales
Output directory 'data\AdventureWorksSales' already exists.
Processing sheet: Sales Order_data
Markdown table for sheet 'Sales Order_data' saved to 'data\AdventureWorksSales\Sales Order_data.md'.
Processing sheet: Sales Territory_data
Markdown table for sheet 'Sales Territory_data' saved to 'data\AdventureWorksSales\Sales Territory_data.md'.
Processing sheet: Sales_data
Markdown table for sheet 'Sales_data' saved to 'data\AdventureWorksSales\Sales_data.md'.
Processing sheet: Reseller_data
Markdown table for sheet 'Reseller_data' saved to 'data\AdventureWorksSales\Reseller_data.md'.
Processing sheet: Date_data
Markdown table for sheet 'Date_data' saved to 'data\AdventureWorksSales\Date_data.md'.
Processing sheet: Product_data
Markdown table for sheet 'Product_data' saved to 'data\AdventureWorksSales\Product_data.md'.
Processing sheet: Customer_data
Markdown table for sheet 'Cus

### Step 3: Define `upload_dir_markdown` function

In [8]:
"""
1 - pass in a directory path to the files to be uploaded
2 - iterate through the files in the directory if the file is not markdown, skip
3 - check if the file type and name exists already in the project
4 - if it does, delete the cloud file
5 - upload the local file to the cloud
"""

def upload_markdown(local_dir: str, agent) -> list:
    uploaded_files = []

    # Get the list of files in the local directory
    local_files = os.listdir(local_dir)
    print(f"Local files: {local_files}")

    # Get the list of files in the cloud directory
    cloud_files = agent.list_files()
    print(f"Cloud files: {cloud_files}")

    # Iterate through the local files
    for local_file in local_files:
        print(f"Processing local file: {local_file}")

        # Check if the file is a Markdown file
        if not local_file.endswith(".md"):
            print(f"Skipping non-Markdown file: {local_file}")
            continue
        else:
            print(f"Processing Markdown file: {local_file}")

            # Check if the file already exists in the cloud
            for cloud_file in cloud_files.data:
                if cloud_file.filename == local_file:
                    print(f"Deleting existing cloud file: {cloud_file.filename}")
                    #agent.delete_file(cloud_file.filename)

    return uploaded_files

files = upload_markdown("data/AdventureWorksSales", agent_client)

Local files: ['Customer_data.md', 'Date_data.md', 'Product_data.md', 'Reseller_data.md', 'Sales Order_data.md', 'Sales Territory_data.md', 'Sales_data.md', 'skip_file_test.txt']
Cloud files: {'object': 'list', 'data': [{'object': 'file', 'id': 'assistant-Ao0m9ODz7l8T9ytdU8OJK72d', 'purpose': 'assistants', 'filename': 'Customer_data.md', 'bytes': 2791535, 'created_at': 1732301842, 'status': 'processed', 'status_details': None}, {'object': 'file', 'id': 'assistant-cxxskDp3s4LfzQVnA2TmGqLf', 'purpose': 'assistants', 'filename': 'Customer_data.md', 'bytes': 2791535, 'created_at': 1732301840, 'status': 'processed', 'status_details': None}, {'object': 'file', 'id': 'assistant-WrK3C4ppdShoGIoz6xasNPeu', 'purpose': 'assistants', 'filename': 'Customer_data.md', 'bytes': 2791535, 'created_at': 1732301840, 'status': 'processed', 'status_details': None}, {'object': 'file', 'id': 'assistant-ZQG4NygtbFuJK9guZz8GutH1', 'purpose': 'assistants', 'filename': 'Customer_data.md', 'bytes': 2791535, 'create

In [91]:
import os

def upload_dir_markdown(dir: str, agent) -> list:
    """
    Uploads all Markdown files in a directory to the Azure AI Project.

    Parameters:
        dir (str): Directory containing the Markdown files.
    """
    try:
        # List of uploaded file IDs
        file_ids = []

        # Get the list of files in the directory
        local_files = os.listdir(dir)
        print(f"Files in directory '{dir}': {local_files}")

        cloud_files = agent_client.list_files()

        # Iterate through each file in the directory
        for local_file in local_files:
            print("Local file: ", local_file)
           
            if local_file.endswith(".md"):

                print("Cloud Files: ", cloud_files.data)
                for cloud_file in cloud_files.data:

                    """
                    # Delete existing file on Azure if it has the same name and purpose
                    if cloud_file.filename.endswith(".md") and cloud_file.purpose == "assistants" and cloud_file.filename == local_file:
                        agent.delete_file(cloud_file.id)
                        print(f"Deleted existing file: {cloud_file.filename}")

                     # Upload new file
                    with open(os.path.join(dir,local_file), "rb") as file_data:
                        f = agent.upload_file(file=file_data, purpose="assistants")

                        # Append file ID to the list
                        file_ids.append(f.id)
                    """
                """
                for service_file in existing_files.data:
                    if service_file.filename == local_file and service_file.purpose == "assistants":
                        #agent.delete_file(service_file.id)
                        print(f"Deleted existing file: {local_file}")

                
                    # Upload new file
                    with open(os.path.join(dir,local_file), "rb") as file_data:
                        f = agent.upload_file(file=file_data, purpose="assistants")

                        # Append file ID to the list
                        file_ids.append(f.id)

                    print(f"Uploaded file: {f.filename}")
                """
        return file_ids
    
    except FileNotFoundError:
        print(f"Error: The directory '{dir}' was not found.")
        raise

    except Exception as e:
        print(f"An error occurred while uploading the files: {e}")
        raise

print("Function 'upload_dir_markdown' created successfully.")

"""
# Directory containing files to upload
directory="data"
sales_file="sales_tables.md"
sales_file_id=None

# Check if the directory exists
if not os.path.isdir(directory):
    print(f"Directory '{directory}' does not exist.")
    raise FileNotFoundError(f"Directory '{directory}' does not exist.")

file_path = os.path.join(directory, sales_file)

# Check if the file exists
if not os.path.isfile(file_path):
    print(f"Skipping non-file item: {sales_file}")

try:
    # Delete existing file on Azure if it has the same name and purpose
    existing_files = agent_client.list_files()
    for f in existing_files.data:
        if f.filename == sales_file and f.purpose == "assistants":
            agent_client.delete_file(f.id)
            print(f"Deleted existing file: {sales_file}")

    # Upload new file
    with open(file_path, "rb") as file_data:
        file = agent_client.upload_file(file=file_data, purpose="assistants")
        portfolio_file_id = file.id
    print(f"Uploaded file: {sales_file}")
except Exception as e:
    print(f"Error processing file '{sales_file}': {e}")
"""

Function 'upload_dir_markdown' created successfully.


'\n# Directory containing files to upload\ndirectory="data"\nsales_file="sales_tables.md"\nsales_file_id=None\n\n# Check if the directory exists\nif not os.path.isdir(directory):\n    print(f"Directory \'{directory}\' does not exist.")\n    raise FileNotFoundError(f"Directory \'{directory}\' does not exist.")\n\nfile_path = os.path.join(directory, sales_file)\n\n# Check if the file exists\nif not os.path.isfile(file_path):\n    print(f"Skipping non-file item: {sales_file}")\n\ntry:\n    # Delete existing file on Azure if it has the same name and purpose\n    existing_files = agent_client.list_files()\n    for f in existing_files.data:\n        if f.filename == sales_file and f.purpose == "assistants":\n            agent_client.delete_file(f.id)\n            print(f"Deleted existing file: {sales_file}")\n\n    # Upload new file\n    with open(file_path, "rb") as file_data:\n        file = agent_client.upload_file(file=file_data, purpose="assistants")\n        portfolio_file_id = file.id

## **Azure OpenAI Agent Workflow**

In [90]:
uploaded_file_ids = upload_dir_markdown("data/AdventureWorksSales", agent_client)
print("\nMarkdown files uploaded successfully.")

print("Uploaded file IDs:", uploaded_file_ids)

Files in directory 'data/AdventureWorksSales': ['Customer_data.md', 'Date_data.md', 'Product_data.md', 'Reseller_data.md', 'Sales Order_data.md', 'Sales Territory_data.md', 'Sales_data.md']
Local file:  Customer_data.md
Cloud Files:  [{'object': 'file', 'id': 'assistant-Ao0m9ODz7l8T9ytdU8OJK72d', 'purpose': 'assistants', 'filename': 'Customer_data.md', 'bytes': 2791535, 'created_at': 1732301842, 'status': 'processed', 'status_details': None}, {'object': 'file', 'id': 'assistant-cxxskDp3s4LfzQVnA2TmGqLf', 'purpose': 'assistants', 'filename': 'Customer_data.md', 'bytes': 2791535, 'created_at': 1732301840, 'status': 'processed', 'status_details': None}, {'object': 'file', 'id': 'assistant-WrK3C4ppdShoGIoz6xasNPeu', 'purpose': 'assistants', 'filename': 'Customer_data.md', 'bytes': 2791535, 'created_at': 1732301840, 'status': 'processed', 'status_details': None}, {'object': 'file', 'id': 'assistant-ZQG4NygtbFuJK9guZz8GutH1', 'purpose': 'assistants', 'filename': 'Customer_data.md', 'bytes': 

vector store gets created when file is uploaded and attached to the message tag as file_search from thread and conversation scope