In [9]:
import os

from aita.datasource.snowflake import SnowflakeDataSource
from aita.datasource.postgresql import PostgreSqlDataSource
from aita.agent.base import AitaAgent
from aita.agent.sql import SqlAgent

In [10]:
# Ensure the OpenAI API key is set in the environment

In [11]:
# Create an simple Aita Agent with GPT-3.5-turbo model
aita_agent = AitaAgent("gpt-3.5-turbo")
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who have made the most purchases, you would typically need access to a dataset that includes information about customers and their purchases. Here is a general outline of the steps you can follow to analyze this data:

1. **Data Preparation**: Load the dataset containing customer information and purchase history into your analysis environment. This dataset should ideally include columns such as customer ID, purchase amount, and purchase date.

2. **Data Cleaning**: Check for any missing or duplicate values in the dataset and handle them accordingly.

3. **Data Aggregation**: Aggregate the data to calculate the total purchase amount for each customer. You can use functions like `groupby` in Python (if you're using pandas) to aggregate the data.

4. **Ranking**: Once you have the total purchase amount for each customer, you can rank the customers based on their total purchase amounts in descending order.

5. **Select Top 5**: Finally, select the top 5 customers

In [12]:
# Example of using a customized prompt template
aita_agent = AitaAgent("gpt-3.5-turbo") \
    .set_prompt_template("Data Schema: snowflake_sample_data.tpch_sf1")
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who have made the most purchases, you would typically have a dataset with customer IDs and their respective purchase counts. Here is a general outline of how you can approach this using Python code:

1. Assuming you have a dataset or database table with customer IDs and purchase counts, you can load this data into a Pandas DataFrame.
2. You can then sort the DataFrame based on the purchase counts in descending order.
3. Finally, you can extract the top 5 customers with the highest purchase counts.

Here is some sample Python code that demonstrates this process:

```python
import pandas as pd

# Assuming 'data' is a dictionary or list containing customer IDs and purchase counts
data = {
    'Customer_ID': [1, 2, 3, 4, 5],
    'Purchase_Count': [20, 15, 30, 25, 18]
}

# Create a DataFrame from the data
df = pd.DataFrame(data)

# Sort the DataFrame by 'Purchase_Count' in descending order
df = df.sort_values(by='Purchase_Count', ascending=False)

# Get the top 5 

## SqlAgent example

In [13]:
# Set up the Snowflake and PostgreSQL data sources
SNOWFLAKE_USER = os.environ.get("SNOWFLAKE_USER")
SNOWFLAKE_PASSWORD = os.environ.get("SNOWFLAKE_PASSWORD")
SNOWFLAKE_ACCOUNT = os.environ.get("SNOWFLAKE_ACCOUNT")
SNOWFLAKE_WAREHOUSE = os.environ.get("SNOWFLAKE_WAREHOUSE")
SNOWFLAKE_DATABASE = os.environ.get("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA = os.environ.get("SNOWFLAKE_SCHEMA")
SNOWFLAKE_ROLE = os.environ.get("SNOWFLAKE_ROLE")

In [14]:
sf_datasource = SnowflakeDataSource(
    user=SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    account=SNOWFLAKE_ACCOUNT,
    warehouse=SNOWFLAKE_WAREHOUSE,
    database=SNOWFLAKE_DATABASE,
    schema=SNOWFLAKE_SCHEMA,
    role=SNOWFLAKE_ROLE,
)

In [15]:
pg_datasource = PostgreSqlDataSource(
    connection_url="postgresql://@localhost:5432/aita"
)

In [16]:
# Example of using the AitaAgent with a Snowflake data source, This means no data catalog is provided to the agent.
aita_agent = AitaAgent("gpt-3.5-turbo") \
    .set_prompt_template("Data Schema: snowflake_sample_data.tpch_sf1") \
    .add_datasource(sf_datasource)
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who have made the most purchases from the `snowflake_sample_data.tpch_sf1` dataset, you can use a SQL query. Here is an example query that you can use:

```sql
SELECT c_custkey, c_name, COUNT(o_orderkey) AS total_purchases
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchases DESC
LIMIT 5;
```

In this query:
- We are selecting the `c_custkey` (customer key), `c_name` (customer name), and counting the number of `o_orderkey` (orders) for each customer.
- We are joining the `customer` and `orders` tables on the `c_custkey` and `o_custkey` fields.
- We are grouping the results by `c_custkey` and `c_name`.
- We are ordering the results in descending order based on the total number of purchases.
- We are limiting the results to the top 5 customers with the most purchases.

You can run this query in your Snowflake environment to retrieve the top 5 customer

In [17]:
# Basic example of using the SQL agent, The data catalog is provided to the agent.
from aita.prompt.base import BasicDataSourcePromptTemplate

sql_agent = SqlAgent("gpt-3.5-turbo") \
    .set_prompt_template(BasicDataSourcePromptTemplate) \
    .add_datasource(sf_datasource)

In [18]:
sql_agent.chat("I want to get the top 5 customers which making the most purchases")


I want to get the top 5 customers which making the most purchases

To get the top 5 customers who made the most purchases, we will need to query the `CUSTOMER` and `ORDERS` tables to calculate the total purchases made by each customer. Then we can rank the customers based on their total purchases.

Here is the SQL query to achieve this:

```sql
SELECT
    C.C_CUSTKEY AS CustomerID,
    C.C_NAME AS CustomerName,
    SUM(O.O_TOTALPRICE) AS TotalPurchases
FROM
    TPCH_SF1.CUSTOMER C
JOIN
    TPCH_SF1.ORDERS O ON C.C_CUSTKEY = O.O_CUSTKEY
GROUP BY
    C.C_CUSTKEY, C.C_NAME
ORDER BY
    TotalPurchases DESC
LIMIT 5;
```

This query will join the `CUSTOMER` and `ORDERS` tables on the `C_CUSTKEY` and `O_CUSTKEY` columns, calculate the total purchases made by each customer, group the results by customer, order them by total purchases in descending order, and limit the output to the top 5 customers.

Let me run this query against the database to get the top 5 customers for you.
Tool Calls:
  s

<generator object Pregel.stream at 0x176445ca0>

In [8]:
sql_agent.chat(allow_run_tool=True)

Name: sql_database_query

[('Customer#000143500', Decimal('7154828.98')), ('Customer#000095257', Decimal('6645071.02')), ('Customer#000087115', Decimal('6528332.52')), ('Customer#000134380', Decimal('6405556.97')), ('Customer#000103834', Decimal('6397480.12'))]

The top 5 customers who have made the most purchases are as follows:

1. Customer#000143500 - Total Purchases: $7,154,828.98
2. Customer#000095257 - Total Purchases: $6,645,071.02
3. Customer#000087115 - Total Purchases: $6,528,332.52
4. Customer#000134380 - Total Purchases: $6,405,556.97
5. Customer#000103834 - Total Purchases: $6,397,480.12

These customers have made the highest total purchases.


<generator object Pregel.stream at 0x118521ce0>

In [9]:
# Example of using the SQL agent to run a SQL query directly.
sample_sql_query = """
SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10
"""

sql_agent.chat(sample_sql_query)



SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10


The query you provided aims to retrieve the top 10 customers based on the total purchase amount. It calculates the sum of the `O_TOTALPRICE` from the `ORDERS` table for each customer and then ranks the customers in descending order of total purchase amount.

I will execute the provided query to get the top 10 customers based on their total purchase amount.
Tool Calls:
  sql_database_query (call_0XzB9V8OOyOvTu9qsGM4f7uD)
 Call ID: call_0XzB9V8OOyOvTu9qsGM4f7uD
  Args:
    query: SELECT C_CUSTKEY, C_NAME, SUM(O_TOTALPRICE) AS TOTAL_PURCHASE FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER JOIN SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS ON C_CUSTKEY = O_CUSTKEY GROUP BY C_CUSTKEY, C_NAME ORDER BY TOTAL_PURCHASE DESC LIMIT 10;
<generator object Pregel.stream 

In [10]:
sql_agent.call_tool("sql_database_query", tool_id="call_0XzB9V8OOyOvTu9qsGM4f7uD")

[(Decimal('143500'), 'Customer#000143500', Decimal('7012696.48')),
 (Decimal('95257'), 'Customer#000095257', Decimal('6563511.23')),
 (Decimal('87115'), 'Customer#000087115', Decimal('6457526.26')),
 (Decimal('131113'), 'Customer#000131113', Decimal('6311428.86')),
 (Decimal('103834'), 'Customer#000103834', Decimal('6306524.23')),
 (Decimal('134380'), 'Customer#000134380', Decimal('6291610.15')),
 (Decimal('69682'), 'Customer#000069682', Decimal('6287149.42')),
 (Decimal('102022'), 'Customer#000102022', Decimal('6273788.41')),
 (Decimal('98587'), 'Customer#000098587', Decimal('6265089.35')),
 (Decimal('85102'), 'Customer#000085102', Decimal('6135483.63'))]