In [1]:
import os

from aita.datasource.snowflake import SnowflakeDataSource
from aita.datasource.postgresql import PostgreSqlDataSource
from aita.agent.base import AitaAgent
from aita.agent.sql import SqlAgent

## Ensure the OpenAI API key is set in the environment

In [2]:
# Create an simple Aita Agent with GPT-3.5-turbo model
aita_agent = AitaAgent("gpt-3.5-turbo")
aita_agent.stream("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who are making the most purchases, you would typically need to analyze your sales data. If you have a dataset with information about customers and their purchases, you can use data analysis tools like Python (using libraries like pandas) or SQL to query your database.

Here is a general approach using SQL:

```sql
SELECT customer_id, COUNT(*) as purchase_count
FROM purchases
GROUP BY customer_id
ORDER BY purchase_count DESC
LIMIT 5;
```

In this SQL query:
- Replace `purchases` with the actual table name where your purchase data is stored.
- `customer_id` is the column that represents the customer identifier in your dataset.
- This query will count the number of purchases made by each customer, group the results by customer, order them in descending order of purchase count, and finally, limit the output to the top 5 customers.

If you are using Python with pandas, you can achieve the same result by loading your data into a DataFrame and then using the `groupb

In [3]:
# Example of using a customized prompt template
aita_agent = AitaAgent("gpt-3.5-turbo").set_prompt_context(
    "Data Schema: snowflake_sample_data.tpch_sf1"
)
aita_agent.stream("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who have made the most purchases, you would typically need access to a dataset that includes information about customer purchases. The steps to achieve this would involve using data analysis tools like Python or SQL to query and analyze the data. Here is a general approach you can take:

1. **Load the Data**: If the data is stored in a database, you can use SQL to extract the relevant information. If the data is in a CSV file or another format, you can use Python libraries like Pandas to load the data into a DataFrame.

2. **Aggregate the Data**: You will need to aggregate the data to calculate the total number of purchases made by each customer. This can be done using SQL queries or Pandas functions like groupby.

3. **Rank the Customers**: Once you have the total number of purchases for each customer, you can rank the customers based on this metric. In SQL, you can use the RANK() or ROW_NUMBER() functions. In Python, you can use the rank() method in Pandas.

## SqlAgent example

In [4]:
# Set up the Snowflake and PostgreSQL data sources
SNOWFLAKE_USER = os.environ.get("SNOWFLAKE_USER")
SNOWFLAKE_PASSWORD = os.environ.get("SNOWFLAKE_PASSWORD")
SNOWFLAKE_ACCOUNT = os.environ.get("SNOWFLAKE_ACCOUNT")
SNOWFLAKE_WAREHOUSE = os.environ.get("SNOWFLAKE_WAREHOUSE")
SNOWFLAKE_DATABASE = os.environ.get("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA = os.environ.get("SNOWFLAKE_SCHEMA")
SNOWFLAKE_ROLE = os.environ.get("SNOWFLAKE_ROLE")

In [5]:
sf_datasource = SnowflakeDataSource(
    user=SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    account=SNOWFLAKE_ACCOUNT,
    warehouse=SNOWFLAKE_WAREHOUSE,
    database=SNOWFLAKE_DATABASE,
    schema=SNOWFLAKE_SCHEMA,
    role=SNOWFLAKE_ROLE,
)

In [6]:
pg_datasource = PostgreSqlDataSource(connection_url="postgresql://@localhost:5432/aita")

In [7]:
# Example of using the AitaAgent with a Snowflake data source, This means no data catalog is provided to the agent.
aita_agent = (
    AitaAgent("gpt-3.5-turbo")
    .set_prompt_context("Data Schema: snowflake_sample_data.tpch_sf1")
    .add_datasource(sf_datasource)
)
aita_agent.stream("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who have made the most purchases from the provided data schema `snowflake_sample_data.tpch_sf1`, you can use SQL queries to analyze the data. Here is a sample SQL query that you can use to achieve this:

```sql
SELECT c_custkey, c_name, COUNT(o_orderkey) AS total_orders
FROM snowflake_sample_data.tpch_sf1.customer c
JOIN snowflake_sample_data.tpch_sf1.orders o ON c.c_custkey = o.o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_orders DESC
LIMIT 5;
```

In this SQL query:
- We are selecting the customer key (`c_custkey`), customer name (`c_name`), and counting the number of orders made by each customer.
- We are joining the `customer` table with the `orders` table on the `c_custkey` to `o_custkey` relationship.
- We are grouping the results by customer key and name.
- We are ordering the results by the total number of orders in descending order.
- Finally, we are limiting the output to the top 5 customers with the most orders.

You can run this query in you

In [8]:
# Basic example of using the SQL agent, The data catalog is provided to the agent.
from aita.prompt.base import BasicDataSourcePromptTemplate

sql_agent = (
    SqlAgent("gpt-3.5-turbo")
    .set_prompt_context(BasicDataSourcePromptTemplate)
    .add_datasource(sf_datasource)
)

In [9]:
sql_agent.stream("I want to get the top 5 customers which making the most purchases")


I want to get the top 5 customers which making the most purchases

To find the top 5 customers who have made the most purchases, we can query the `CUSTOMER` and `ORDERS` tables to calculate the total amount spent by each customer. Here's the general approach to achieve this:

1. Join the `CUSTOMER` and `ORDERS` tables on the `C_CUSTKEY` column.
2. Group the data by customer and calculate the total amount spent by each customer.
3. Sort the customers based on the total amount spent in descending order.
4. Select the top 5 customers.

Let's start by writing a SQL query to perform these steps.
Tool Calls:
  sql_database_query (call_uY19X7LPfkXBjgAn4NBZIhIx)
 Call ID: call_uY19X7LPfkXBjgAn4NBZIhIx
  Args:
    query: SELECT C.C_CUSTKEY, C.C_NAME, SUM(O.O_TOTALPRICE) AS TOTAL_AMOUNT_SPENT
FROM CUSTOMER C
JOIN ORDERS O ON C.C_CUSTKEY = O.O_CUSTKEY
GROUP BY C.C_CUSTKEY, C.C_NAME
ORDER BY TOTAL_AMOUNT_SPENT DESC
LIMIT 5;


<generator object Pregel.stream at 0x1373e7fa0>

In [10]:
sql_agent.stream(allow_run_tool=True)

Name: sql_database_query

[(Decimal('143500'), 'Customer#000143500', Decimal('7012696.48')), (Decimal('95257'), 'Customer#000095257', Decimal('6563511.23')), (Decimal('87115'), 'Customer#000087115', Decimal('6457526.26')), (Decimal('131113'), 'Customer#000131113', Decimal('6311428.86')), (Decimal('103834'), 'Customer#000103834', Decimal('6306524.23'))]

The top 5 customers who have made the most purchases are as follows:

1. Customer#000143500 - Total Amount Spent: $7,012,696.48
2. Customer#000095257 - Total Amount Spent: $6,563,511.23
3. Customer#000087115 - Total Amount Spent: $6,457,526.26
4. Customer#000131113 - Total Amount Spent: $6,311,428.86
5. Customer#000103834 - Total Amount Spent: $6,306,524.23

These customers have spent the most on purchases.


<generator object Pregel.stream at 0x1078ad500>

In [11]:
# Example of using the SQL agent to run a SQL query directly.
sample_sql_query = """
SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10
"""

sql_agent.stream(sample_sql_query)



SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10


It seems you have run the query again. The top 5 customers with the most purchases are:

1. Customer#000143500 - Total Purchase: $7,012,696.48
2. Customer#000095257 - Total Purchase: $6,563,511.23
3. Customer#000087115 - Total Purchase: $6,457,526.26
4. Customer#000131113 - Total Purchase: $6,311,428.86
5. Customer#000103834 - Total Purchase: $6,306,524.23

These customers have the highest total purchase amounts.


<generator object Pregel.stream at 0x1379dd820>