In [1]:
import os

from aita.datasource.snowflake import SnowflakeDataSource
from aita.datasource.postgresql import PostgreSqlDataSource
from aita.agent.base import AitaAgent
from aita.agent.sql import SqlAgent

In [2]:
aita_agent = AitaAgent("gpt-3.5-turbo")
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who have made the most purchases, you would typically need access to a database or dataset containing information about customers and their purchases. Assuming you have a dataset with this information, you can follow these general steps to identify the top 5 customers:

1. **Data Preparation:**
    - Load the dataset into a data analysis tool or programming environment like Python with libraries such as Pandas.
    - Ensure that the dataset contains information about customers and their purchases, including customer IDs or names, purchase amounts, and dates.

2. **Data Exploration:**
    - Explore the dataset to understand its structure and the information it contains.
    - Identify the columns that contain customer information and purchase amounts.

3. **Data Aggregation:**
    - Group the data by customer ID or name and aggregate the purchase amounts to calculate the total purchases made by each customer.
    - This can be done using functions like `groupb

In [3]:
aita_agent = AitaAgent("gpt-3.5-turbo") \
    .set_prompt_context("Data Schema: snowflake_sample_data.tpch_sf1")
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who have made the most purchases, you can use SQL queries to analyze the data in the `snowflake_sample_data.tpch_sf1` schema. Here is a sample SQL query that you can use to achieve this:

```sql
SELECT c.c_custkey, c.c_name, COUNT(o.o_orderkey) AS total_orders
FROM snowflake_sample_data.tpch_sf1.customer c
JOIN snowflake_sample_data.tpch_sf1.orders o ON c.c_custkey = o.o_custkey
GROUP BY c.c_custkey, c.c_name
ORDER BY total_orders DESC
LIMIT 5;
```

In this query:
- We are selecting the customer key (`c_custkey`) and customer name (`c_name`) from the `customer` table and counting the total number of orders made by each customer.
- We are joining the `customer` table with the `orders` table on the customer key (`c_custkey`).
- We are grouping the results by customer key and name.
- We are ordering the results by the total number of orders in descending order.
- Finally, we are limiting the output to the top 5 customers with the most purchases.

You can run thi

In [4]:
SNOWFLAKE_USER = os.environ.get("SNOWFLAKE_USER")
SNOWFLAKE_PASSWORD = os.environ.get("SNOWFLAKE_PASSWORD")
SNOWFLAKE_ACCOUNT = os.environ.get("SNOWFLAKE_ACCOUNT")
SNOWFLAKE_WAREHOUSE = os.environ.get("SNOWFLAKE_WAREHOUSE")
SNOWFLAKE_DATABASE = os.environ.get("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA = os.environ.get("SNOWFLAKE_SCHEMA")
SNOWFLAKE_ROLE = os.environ.get("SNOWFLAKE_ROLE")

In [5]:
sf_datasource = SnowflakeDataSource(
    user=SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    account=SNOWFLAKE_ACCOUNT,
    warehouse=SNOWFLAKE_WAREHOUSE,
    database=SNOWFLAKE_DATABASE,
    schema=SNOWFLAKE_SCHEMA,
    role=SNOWFLAKE_ROLE,
)

In [6]:
pg_datasource = PostgreSqlDataSource(
    connection_url="postgresql://@localhost:5432/aita"
)

In [7]:
aita_agent = AitaAgent("gpt-3.5-turbo") \
    .set_prompt_context("Data Schema: snowflake_sample_data.tpch_sf1") \
    .add_datasource(sf_datasource)
aita_agent.chat("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who make the most purchases, you can use SQL to query the data from the table `snowflake_sample_data.tpch_sf1`. You will need to look at the relevant columns that indicate customer information and purchase details.

Here is a sample SQL query that you can use to achieve this:

```sql
SELECT c.custkey, c.name, COUNT(o.orderkey) as total_purchases
FROM snowflake_sample_data.tpch_sf1.customer c
JOIN snowflake_sample_data.tpch_sf1.orders o ON c.custkey = o.custkey
GROUP BY c.custkey, c.name
ORDER BY total_purchases DESC
LIMIT 5;
```

In this query:
- `snowflake_sample_data.tpch_sf1.customer` is the table containing customer information.
- `snowflake_sample_data.tpch_sf1.orders` is the table containing order information.
- `c.custkey` is the customer key used to join the two tables.
- `c.name` is the customer name.
- `o.orderkey` is the order key used to count the total number of purchases for each customer.
- We are grouping by customer key and name to count the 

In [8]:
# Basic example of using the SQL agent
from aita.prompt.base import BasicDataSourcePromptTemplate

sql_agent = SqlAgent("gpt-3.5-turbo") \
    .set_prompt_context(BasicDataSourcePromptTemplate) \
    .add_datasource(sf_datasource) \

sql_agent.chat("I want to get the top 5 customers which making the most purchases")


I want to get the top 5 customers which making the most purchases

To get the top 5 customers who are making the most purchases, we can query the `CUSTOMER` and `ORDERS` tables to calculate the total purchases made by each customer. Then, we can rank the customers based on the total purchases and select the top 5 customers.

Here is the general approach we will follow:
1. Join the `CUSTOMER` and `ORDERS` tables on the `C_CUSTKEY` column.
2. Calculate the total purchases made by each customer by summing up the `O_TOTALPRICE` from the `ORDERS` table.
3. Rank the customers based on their total purchases.
4. Select the top 5 customers with the highest total purchases.

Let's proceed with executing this query to get the top 5 customers making the most purchases.
Tool Calls:
  sql_database_query (call_0gVJbKZrJx9GbYnLY62kMbZ5)
 Call ID: call_0gVJbKZrJx9GbYnLY62kMbZ5
  Args:
    query: SELECT C.C_CUSTKEY, C.C_NAME, SUM(O.O_TOTALPRICE) AS TOTAL_PURCHASES 
FROM CUSTOMER C 
JOIN ORDERS O ON C.C

<generator object Pregel.stream at 0x166a11c70>

In [9]:
sql_agent.chat(allow_run_tool=True)

Name: sql_database_query

[(Decimal('143500'), 'Customer#000143500', Decimal('7012696.48')), (Decimal('95257'), 'Customer#000095257', Decimal('6563511.23')), (Decimal('87115'), 'Customer#000087115', Decimal('6457526.26')), (Decimal('131113'), 'Customer#000131113', Decimal('6311428.86')), (Decimal('103834'), 'Customer#000103834', Decimal('6306524.23'))]

The top 5 customers who have made the most purchases are as follows:

1. Customer#000143500 with total purchases of $7,012,696.48
2. Customer#000095257 with total purchases of $6,563,511.23
3. Customer#000087115 with total purchases of $6,457,526.26
4. Customer#000131113 with total purchases of $6,311,428.86
5. Customer#000103834 with total purchases of $6,306,524.23

These customers have the highest total purchases among all customers.


<generator object Pregel.stream at 0x166925480>

In [9]:
# Example of using the SQL agent to run a SQL query directly.
sample_sql_query = """
SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10
"""

sql_agent.chat(sample_sql_query)



SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10


The query you provided aims to retrieve the top 10 customers based on the total purchase amount. It calculates the sum of the `O_TOTALPRICE` from the `ORDERS` table for each customer and then ranks the customers in descending order of total purchase amount.

I will execute the provided query to get the top 10 customers based on their total purchase amount.
Tool Calls:
  sql_database_query (call_0XzB9V8OOyOvTu9qsGM4f7uD)
 Call ID: call_0XzB9V8OOyOvTu9qsGM4f7uD
  Args:
    query: SELECT C_CUSTKEY, C_NAME, SUM(O_TOTALPRICE) AS TOTAL_PURCHASE FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER JOIN SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.ORDERS ON C_CUSTKEY = O_CUSTKEY GROUP BY C_CUSTKEY, C_NAME ORDER BY TOTAL_PURCHASE DESC LIMIT 10;
<generator object Pregel.stream 

In [10]:
sql_agent.call_tool("sql_database_query", tool_id="call_0XzB9V8OOyOvTu9qsGM4f7uD")

[(Decimal('143500'), 'Customer#000143500', Decimal('7012696.48')),
 (Decimal('95257'), 'Customer#000095257', Decimal('6563511.23')),
 (Decimal('87115'), 'Customer#000087115', Decimal('6457526.26')),
 (Decimal('131113'), 'Customer#000131113', Decimal('6311428.86')),
 (Decimal('103834'), 'Customer#000103834', Decimal('6306524.23')),
 (Decimal('134380'), 'Customer#000134380', Decimal('6291610.15')),
 (Decimal('69682'), 'Customer#000069682', Decimal('6287149.42')),
 (Decimal('102022'), 'Customer#000102022', Decimal('6273788.41')),
 (Decimal('98587'), 'Customer#000098587', Decimal('6265089.35')),
 (Decimal('85102'), 'Customer#000085102', Decimal('6135483.63'))]