In [1]:
import os

from aita.datasource.snowflake import SnowflakeDataSource
from aita.datasource.postgresql import PostgreSqlDataSource
from aita.agent.base import AitaAgent
from aita.agent.sql import SqlAgent

In [2]:
aita_agent = AitaAgent("gpt-3.5-turbo")
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who are making the most purchases, you would typically need transaction data that includes customer information and purchase details. Assuming you have a dataset with this information, you can follow these steps using Python and pandas:

1. Load the transaction data into a pandas DataFrame.
2. Group the data by customer ID and count the number of purchases for each customer.
3. Sort the customers based on the number of purchases in descending order.
4. Select the top 5 customers.

Here is a sample code snippet to achieve this:

```python
import pandas as pd

# Load the transaction data into a pandas DataFrame
data = {
    'customer_id': [1, 2, 3, 1, 2, 3, 4, 5, 1, 2],
    'purchase_amount': [100, 150, 200, 120, 180, 220, 90, 80, 130, 170]
}

df = pd.DataFrame(data)

# Group the data by customer ID and count the number of purchases for each customer
customer_purchases = df.groupby('customer_id').size().reset_index(name='purchase_count')

# Sort the customers b

In [3]:
aita_agent = AitaAgent("gpt-3.5-turbo") \
    .set_prompt_template("Data Schema: snowflake_sample_data.tpch_sf1")
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To identify the top 5 customers who are making the most purchases, you would typically analyze transaction data to determine the total number of purchases made by each customer. Below are the general steps you can follow to achieve this:

1. **Data Collection**: Gather the transaction data that includes customer information and purchase details. This data can typically be found in a database or a file.

2. **Data Preprocessing**: Clean and preprocess the data to ensure consistency and accuracy. This may involve handling missing values, removing duplicates, and formatting the data appropriately.

3. **Data Aggregation**: Aggregate the transaction data based on customer information to calculate the total number of purchases made by each customer.

4. **Ranking Customers**: Rank the customers based on the total number of purchases in descending order.

5. **Select Top Customers**: Finally, select the top 5 customers with the highest number of purchases to identify the most valuable custom

In [4]:
SNOWFLAKE_USER = os.environ.get("SNOWFLAKE_USER")
SNOWFLAKE_PASSWORD = os.environ.get("SNOWFLAKE_PASSWORD")
SNOWFLAKE_ACCOUNT = os.environ.get("SNOWFLAKE_ACCOUNT")
SNOWFLAKE_WAREHOUSE = os.environ.get("SNOWFLAKE_WAREHOUSE")
SNOWFLAKE_DATABASE = os.environ.get("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA = os.environ.get("SNOWFLAKE_SCHEMA")
SNOWFLAKE_ROLE = os.environ.get("SNOWFLAKE_ROLE")

In [5]:
sf_datasource = SnowflakeDataSource(
    user=SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    account=SNOWFLAKE_ACCOUNT,
    warehouse=SNOWFLAKE_WAREHOUSE,
    database=SNOWFLAKE_DATABASE,
    schema=SNOWFLAKE_SCHEMA,
    role=SNOWFLAKE_ROLE,
)

In [6]:
pg_datasource = PostgreSqlDataSource(
    connection_url="postgresql://@localhost:5432/aita"
)

In [7]:
aita_agent = AitaAgent("gpt-3.5-turbo") \
    .set_prompt_template("Data Schema: snowflake_sample_data.tpch_sf1") \
    .add_datasource(sf_datasource)
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who have made the most purchases from the provided data schema (snowflake_sample_data.tpch_sf1), you can use a SQL query similar to the following:

```sql
SELECT c_custkey, c_name, COUNT(o_orderkey) AS num_purchases
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY num_purchases DESC
LIMIT 5;
```

In this query:
1. We are selecting the customer key (`c_custkey`), customer name (`c_name`), and counting the number of orders each customer has made using the `COUNT()` function.
2. We are joining the `customer` and `orders` tables on the customer key (`c_custkey`) to get the necessary data.
3. We are grouping the results by customer key and name.
4. We are ordering the results in descending order based on the number of purchases.
5. Finally, we are limiting the output to the top 5 customers with the most purchases.

You can execute this query in your Snowflake enviro

In [8]:
# Basic example of using the SQL agent
from aita.prompt.base import BasicDataSourcePromptTemplate

sql_agent = SqlAgent("gpt-3.5-turbo") \
    .set_prompt_template(BasicDataSourcePromptTemplate) \
    .add_datasource(sf_datasource) \

sql_agent.chat("I want to get the top 5 customers which making the most purchases")


I want to get the top 5 customers which making the most purchases

To get the top 5 customers who are making the most purchases, we can use the `ORDERS` and `CUSTOMER` tables. We will need to join these two tables based on the `O_CUSTKEY` in the `ORDERS` table and the `C_CUSTKEY` in the `CUSTOMER` table. Then, we can sum up the total purchases made by each customer and finally retrieve the top 5 customers with the highest total purchases.

I will first write the SQL query to perform this task. Let's proceed with executing the query to retrieve the top 5 customers.
Tool Calls:
  sql_database_query (call_QDe8K8ugn71zns1hXsEpLCYo)
 Call ID: call_QDe8K8ugn71zns1hXsEpLCYo
  Args:
    query: SELECT c.C_NAME AS Customer_Name, SUM(o.O_TOTALPRICE) AS Total_Purchases
FROM TPCH_SF1.CUSTOMER c
JOIN TPCH_SF1.ORDERS o ON c.C_CUSTKEY = o.O_CUSTKEY
GROUP BY c.C_NAME
ORDER BY Total_Purchases DESC
LIMIT 5;


<generator object Pregel.stream at 0x12e464d00>

In [9]:
sql_agent.chat(allow_run_tool=True)

Name: sql_database_query

[('Customer#000143500', Decimal('7012696.48')), ('Customer#000095257', Decimal('6563511.23')), ('Customer#000087115', Decimal('6457526.26')), ('Customer#000131113', Decimal('6311428.86')), ('Customer#000103834', Decimal('6306524.23'))]

The top 5 customers who have made the most purchases are as follows:

1. Customer#000143500 - Total Purchases: $7,012,696.48
2. Customer#000095257 - Total Purchases: $6,563,511.23
3. Customer#000087115 - Total Purchases: $6,457,526.26
4. Customer#000131113 - Total Purchases: $6,311,428.86
5. Customer#000103834 - Total Purchases: $6,306,524.23

These customers are ranked based on the total purchases they have made. Let me know if you need any further analysis or information.


<generator object Pregel.stream at 0x12fd92320>

In [9]:
# Example of using the SQL agent to run a SQL query directly.
sample_sql_query = """
SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10
"""

sql_agent.chat(sample_sql_query)



SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10


The query you provided aims to retrieve the top 10 customers based on the total purchase amount. It calculates the sum of the `O_TOTALPRICE` from the `ORDERS` table for each customer and then ranks the customers in descending order of total purchase amount.

I will execute the provided query to get the top 10 customers based on their total purchase amount.
Tool Calls:
  sql_database_query (call_0XzB9V8OOyOvTu9qsGM4f7uD)
 Call ID: call_0XzB9V8OOyOvTu9qsGM4f7uD
  Args:
    query: SELECT C_CUSTKEY, C_NAME, SUM(O_TOTALPRICE) AS TOTAL_PURCHASE FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER JOIN SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS ON C_CUSTKEY = O_CUSTKEY GROUP BY C_CUSTKEY, C_NAME ORDER BY TOTAL_PURCHASE DESC LIMIT 10;
<generator object Pregel.stream 

In [10]:
sql_agent.call_tool("sql_database_query", tool_id="call_0XzB9V8OOyOvTu9qsGM4f7uD")

[(Decimal('143500'), 'Customer#000143500', Decimal('7012696.48')),
 (Decimal('95257'), 'Customer#000095257', Decimal('6563511.23')),
 (Decimal('87115'), 'Customer#000087115', Decimal('6457526.26')),
 (Decimal('131113'), 'Customer#000131113', Decimal('6311428.86')),
 (Decimal('103834'), 'Customer#000103834', Decimal('6306524.23')),
 (Decimal('134380'), 'Customer#000134380', Decimal('6291610.15')),
 (Decimal('69682'), 'Customer#000069682', Decimal('6287149.42')),
 (Decimal('102022'), 'Customer#000102022', Decimal('6273788.41')),
 (Decimal('98587'), 'Customer#000098587', Decimal('6265089.35')),
 (Decimal('85102'), 'Customer#000085102', Decimal('6135483.63'))]