# BUSINESS SCIENCE GENERATIVE AI/ML TIPS
### AI-TIP 001 | HOW TO MAKE YOUR FIRST DATA ANALYSIS AGENT WITH PYTHON

**GOALS:**
- Create a data analysis agent that can answer questions about a dataset
- Use OpenAI's API to interact with the data

In [None]:
# Libraries
from langchain_openai import ChatOpenAI
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.agents.agent_types import AgentType

import os
from pathlib import Path
import sys
import yaml
from pprint import pprint
import pandas as pd

# Add the directory to the system path if not already present
current_dir = Path.cwd() / '001_pandas_dataframe_agent'
if str(current_dir) not in sys.path:
    sys.path.append(str(current_dir))

# Now I can import from the utils.parsers module
from utils.parsers import parse_json_to_dataframe

## 1.0 SET UP OPENAI API

Set up the OpenAI API key and model.

In [6]:
# OPENAI SETUP
model = 'gpt-4o-mini'

# Set your OpenAI API key
os.environ['OPENAI_API_KEY'] = yaml.safe_load(open('../credentials.yml'))['openai']

## 2.0 CREATE THE DATA ANALYSIS AGENT

- Load a CSV file of your data
- Create an agent that can answer questions about the data
- Set the return format to JSON
- Run the agent
- Parse the JSON

### Load your dataset

In [7]:
# Load your dataset
df = pd.read_csv('001_data_analysis_agent/data/customer_data.csv')
df

Unnamed: 0,TransactionID,Date,CustomerID,Product,Quantity,Price,Sales,Geography
0,1,2023-01-01,C021,Product C,4,50.0,200.0,North
1,2,2023-01-02,C021,Product B,7,20.0,140.0,North
2,3,2023-01-03,C030,Product E,6,30.0,180.0,North
3,4,2023-01-04,C010,Product D,8,25.0,200.0,South
4,5,2023-01-05,C008,Product A,4,15.0,60.0,West
...,...,...,...,...,...,...,...,...
95,96,2023-04-06,C016,Product D,5,25.0,125.0,South
96,97,2023-04-07,C019,Product B,9,20.0,180.0,East
97,98,2023-04-08,C002,Product B,8,20.0,160.0,East
98,99,2023-04-09,C006,Product C,1,50.0,50.0,South


### Create the data analysis agent

In [8]:
# Initialize the LLM
llm = ChatOpenAI(
    model_name=model,
    temperature=0
)

# Create an agent that can interact with the Pandas DataFrame
data_analysis_agent = create_pandas_dataframe_agent(
    llm, 
    df, 
    agent_type=AgentType.OPENAI_FUNCTIONS,
    suffix="Always return a JSON dictionary that can be parsed into a data frame containing the requested information.",
    verbose=True,
    allow_dangerous_code=True
)
data_analysis_agent

AgentExecutor(verbose=True, agent=RunnableAgent(runnable=RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: format_to_openai_function_messages(x['intermediate_steps']))
})
| ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], input_types={'agent_scratchpad': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annot

### Invoke (Run) the Data Analysis Agent

In [9]:
# Run the agent
response = data_analysis_agent.invoke("What are the total sales by geography?")
response

pprint(response['output'])



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df.groupby('geography')['sales'].sum().reset_index().to_json(orient='records')"}`


[0m[36;1m[1;3mKeyError: 'geography'[0m[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': 'df.columns.tolist()'}`


[0m[36;1m[1;3m['TransactionID', 'Date', 'CustomerID', 'Product', 'Quantity', 'Price', 'Sales', 'Geography'][0m[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df.groupby('Geography')['Sales'].sum().reset_index().to_json(orient='records')"}`


[0m[36;1m[1;3m[{"Geography":"East","Sales":3070.0},{"Geography":"North","Sales":3440.0},{"Geography":"South","Sales":3825.0},{"Geography":"West","Sales":3830.0}][0m[32;1m[1;3mHere is the total sales by geography:

```json
[
    {"Geography": "East", "Sales": 3070.0},
    {"Geography": "North", "Sales": 3440.0},
    {"Geography": "South", "Sales": 3825.0},
    {"Geography": "West", "Sales": 3830.0}
]
```[0m

[1m> Fi

### Post Processing: Convert the text response to a data frame

In [12]:
# Parse the JSON
response_df = parse_json_to_dataframe(response['output'])
response_df

Unnamed: 0,Geography,Sales
0,East,3070.0
1,North,3440.0
2,South,3825.0
3,West,3830.0


### Make a visualization from the AI Agent's response

In [15]:
# Make a visualization

import plotly.express as px

fig = px.bar(response_df, x='Geography', y='Sales', title='Sales by Geography')
fig

## 3.0 WANT TO LEARN HOW TO USE GENERATIVE AI AND LLMS FOR DATA SCIENCE?

- Join My Live 8-Week AI For Data Scientists Bootcamp
- Live Cohorts are happening once per quarter. Schedule:
    - Week 1: Live Kickoff Clinic + Local LLM Training + AI Fast Track
    - Week 2: Retrieval Augmented Generation (RAG) For Data Scientists
    - Week 3: Business Intelligence AI Copilot (SQL + Pandas Tools)
    - Week 4: Customer Analytics Agent Team (Multi-Agent Workflows)
    - Week 5: Time Series Forecasting Agent Team (Multi-Agent Machine Learning Workflows)
    - Week 6: LLM Model Deployment With AWS Bedrock
    - Week 7: Fine-Tuning LLM Models & RAG Deployments With AWS Bedrock
    - Week 8: AI App Deployment With AWS Cloud (Docker, EC2, NGINX)

- Enroll here: [https://learn.business-science.io/generative-ai-bootcamp-enroll](https://learn.business-science.io/generative-ai-bootcamp-enroll)