In [1]:
import os

from aita.datasource.snowflake import SnowflakeDataSource
from aita.datasource.postgresql import PostgreSqlDataSource
from aita.agent.base import AitaAgent
from aita.agent.sql import SqlAgent

In [2]:
# Ensure the OpenAI API key is set in the environment

In [3]:
# Create an simple Aita Agent with GPT-3.5-turbo model
aita_agent = AitaAgent("gpt-3.5-turbo")
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To find the top 5 customers who have made the most purchases, you would typically analyze your sales data. The process may vary depending on the format and structure of your data, but here is a general approach using SQL as an example:

Assuming you have a table named `orders` that contains information about customer orders, including the customer ID and the total amount of each purchase, you can write a SQL query to group the data by customer ID, sum the total purchase amounts for each customer, and then rank the customers based on their total purchase amounts. Here is an example query that could be used:

```sql
SELECT customer_id, SUM(total_amount) AS total_purchase_amount
FROM orders
GROUP BY customer_id
ORDER BY total_purchase_amount DESC
LIMIT 5;
```

In this query:
- `customer_id` is the customer identifier.
- `total_amount` is the total amount of each purchase.
- We are grouping the data by `customer_id` and calculating the total purchase amount for each customer using the `SUM

In [4]:
# Example of using a customized prompt template
aita_agent = AitaAgent("gpt-3.5-turbo") \
    .set_prompt_template("Data Schema: snowflake_sample_data.tpch_sf1")
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To identify the top 5 customers making the most purchases, you would typically need access to a dataset that includes information about customer transactions. The dataset would include details such as customer ID, purchase amount, purchase date, etc.

Here is a general approach to finding the top 5 customers making the most purchases:

1. **Data Preparation**: Load the dataset into a data analysis tool such as Python (using libraries like Pandas) or SQL (using queries).

2. **Grouping and Aggregation**: Group the data by customer ID and aggregate the total purchase amounts for each customer.

3. **Sorting**: Sort the aggregated data in descending order of the total purchase amounts to identify the customers making the most purchases.

4. **Select Top 5 Customers**: Select the top 5 customers based on the total purchase amounts.

Here is an example code snippet in Python using Pandas to achieve this:

```python
import pandas as pd

# Load dataset into a DataFrame (assuming 'df' is the D

## SqlAgent example

In [2]:
# Set up the Snowflake and PostgreSQL data sources
SNOWFLAKE_USER = os.environ.get("SNOWFLAKE_USER")
SNOWFLAKE_PASSWORD = os.environ.get("SNOWFLAKE_PASSWORD")
SNOWFLAKE_ACCOUNT = os.environ.get("SNOWFLAKE_ACCOUNT")
SNOWFLAKE_WAREHOUSE = os.environ.get("SNOWFLAKE_WAREHOUSE")
SNOWFLAKE_DATABASE = os.environ.get("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA = os.environ.get("SNOWFLAKE_SCHEMA")
SNOWFLAKE_ROLE = os.environ.get("SNOWFLAKE_ROLE")

In [3]:
sf_datasource = SnowflakeDataSource(
    user=SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    account=SNOWFLAKE_ACCOUNT,
    warehouse=SNOWFLAKE_WAREHOUSE,
    database=SNOWFLAKE_DATABASE,
    schema=SNOWFLAKE_SCHEMA,
    role=SNOWFLAKE_ROLE,
)

In [4]:
pg_datasource = PostgreSqlDataSource(
    connection_url="postgresql://@localhost:5432/aita"
)

In [5]:
# Example of using the AitaAgent with a Snowflake data source, This means no data catalog is provided to the agent.
aita_agent = AitaAgent("gpt-3.5-turbo") \
    .set_prompt_template("Data Schema: snowflake_sample_data.tpch_sf1") \
    .add_datasource(sf_datasource)
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who have made the most purchases from the provided Snowflake sample data schema `snowflake_sample_data.tpch_sf1`, you can use the following SQL query:

```sql
SELECT c.custkey, c.name, COUNT(o.orderkey) AS total_purchases
FROM snowflake_sample_data.tpch_sf1.customer c
JOIN snowflake_sample_data.tpch_sf1.orders o ON c.custkey = o.custkey
GROUP BY c.custkey, c.name
ORDER BY total_purchases DESC
LIMIT 5;
```

In this query:
- We are selecting the customer key `c.custkey`, customer name `c.name`, and counting the total number of orders made by each customer.
- We are joining the `customer` table with the `orders` table on the `custkey` to get the necessary information.
- We are grouping the results by customer key and name.
- We are ordering the results by the total number of purchases in descending order.
- Finally, we are limiting the output to the top 5 customers with the most purchases.

You can run this query in your Snowflake environment to get the top 5 cu

In [5]:
# Basic example of using the SQL agent, The data catalog is provided to the agent.
from aita.prompt.base import BasicDataSourcePromptTemplate

sql_agent = SqlAgent("gpt-3.5-turbo") \
    .set_prompt_template(BasicDataSourcePromptTemplate) \
    .add_datasource(sf_datasource)

In [6]:
sql_agent.chat("I want to get the top 5 customers which making the most purchases")


I want to get the top 5 customers which making the most purchases
Tool Calls:
  convert_to_pandas (call_0KopTnhCStfAIHok3d4yyv2X)
 Call ID: call_0KopTnhCStfAIHok3d4yyv2X
  Args:
    query: SELECT C.C_NAME, SUM(L.L_EXTENDEDPRICE) AS TOTAL_PURCHASES FROM CUSTOMER C JOIN ORDERS O ON C.C_CUSTKEY = O.O_CUSTKEY JOIN LINEITEM L ON O.O_ORDERKEY = L.L_ORDERKEY GROUP BY C.C_NAME ORDER BY TOTAL_PURCHASES DESC LIMIT 5


<generator object Pregel.stream at 0x13ffe9d90>

In [8]:
sql_agent.chat(allow_run_tool=True)

Name: sql_database_query

[('Customer#000143500', Decimal('7154828.98')), ('Customer#000095257', Decimal('6645071.02')), ('Customer#000087115', Decimal('6528332.52')), ('Customer#000134380', Decimal('6405556.97')), ('Customer#000103834', Decimal('6397480.12'))]

The top 5 customers who have made the most purchases are as follows:

1. Customer#000143500 - Total Purchases: $7,154,828.98
2. Customer#000095257 - Total Purchases: $6,645,071.02
3. Customer#000087115 - Total Purchases: $6,528,332.52
4. Customer#000134380 - Total Purchases: $6,405,556.97
5. Customer#000103834 - Total Purchases: $6,397,480.12

These customers have made the highest total purchases.


<generator object Pregel.stream at 0x118521ce0>

In [9]:
# Example of using the SQL agent to run a SQL query directly.
sample_sql_query = """
SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10
"""

sql_agent.chat(sample_sql_query)



SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10


The query you provided aims to retrieve the top 10 customers based on the total purchase amount. It calculates the sum of the `O_TOTALPRICE` from the `ORDERS` table for each customer and then ranks the customers in descending order of total purchase amount.

I will execute the provided query to get the top 10 customers based on their total purchase amount.
Tool Calls:
  sql_database_query (call_0XzB9V8OOyOvTu9qsGM4f7uD)
 Call ID: call_0XzB9V8OOyOvTu9qsGM4f7uD
  Args:
    query: SELECT C_CUSTKEY, C_NAME, SUM(O_TOTALPRICE) AS TOTAL_PURCHASE FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER JOIN SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS ON C_CUSTKEY = O_CUSTKEY GROUP BY C_CUSTKEY, C_NAME ORDER BY TOTAL_PURCHASE DESC LIMIT 10;
<generator object Pregel.stream 

In [10]:
sql_agent.call_tool("sql_database_query", tool_id="call_0XzB9V8OOyOvTu9qsGM4f7uD")

[(Decimal('143500'), 'Customer#000143500', Decimal('7012696.48')),
 (Decimal('95257'), 'Customer#000095257', Decimal('6563511.23')),
 (Decimal('87115'), 'Customer#000087115', Decimal('6457526.26')),
 (Decimal('131113'), 'Customer#000131113', Decimal('6311428.86')),
 (Decimal('103834'), 'Customer#000103834', Decimal('6306524.23')),
 (Decimal('134380'), 'Customer#000134380', Decimal('6291610.15')),
 (Decimal('69682'), 'Customer#000069682', Decimal('6287149.42')),
 (Decimal('102022'), 'Customer#000102022', Decimal('6273788.41')),
 (Decimal('98587'), 'Customer#000098587', Decimal('6265089.35')),
 (Decimal('85102'), 'Customer#000085102', Decimal('6135483.63'))]