In [1]:
from aita.agent.base import AitaAgent
from aita.agent.sql import SqlAgent

## Ensure the OpenAI API key is set in the environment

In [2]:
# Create an simple Aita Agent with GPT-3.5-turbo model
aita_agent = AitaAgent("gpt-3.5-turbo")
aita_agent.stream("I want to get the top 5 customers which making the most purchases")

To find the top 5 customers who have made the most purchases, you can follow these steps:

1. **Aggregate the data**: Start by aggregating your data to get the total number of purchases made by each customer.

2. **Rank the customers**: Once you have the total number of purchases for each customer, you can rank the customers based on this metric.

3. **Select the top 5 customers**: Finally, you can select the top 5 customers with the highest number of purchases.

Here is an example query in SQL to achieve this:

```sql
SELECT customer_id, COUNT(*) AS total_purchases
FROM purchases
GROUP BY customer_id
ORDER BY total_purchases DESC
LIMIT 5;
```

In this query:
- `purchases` is the table where your purchase data is stored.
- `customer_id` is the column that identifies each customer.
- We are counting the number of purchases made by each customer and then ordering the results in descending order.
- Finally, we limit the results to the top 5 customers.

You can run this query on your datab

<generator object RunnableSequence.stream at 0x1756c8d60>

In [3]:
# Example of using a customized prompt template
aita_agent = AitaAgent("gpt-3.5-turbo").set_context_prompt(
    "Data Schema: snowflake_sample_data.tpch_sf1"
)
aita_agent.stream("I want to get the top 5 customers which making the most purchases")

To get the top 5 customers who have made the most purchases, you would typically need access to a dataset that includes information about customer purchases. The steps to achieve this would involve using data analysis tools like Python or SQL to query and analyze the data. Here is a general approach you can take:

1. **Load the Data**: If the data is stored in a database, you can use SQL to extract the relevant information. If the data is in a CSV file or another format, you can use Python libraries like Pandas to load the data into a DataFrame.

2. **Aggregate the Data**: You will need to aggregate the data to calculate the total number of purchases made by each customer. This can be done using SQL queries or Pandas functions like groupby.

3. **Rank the Customers**: Once you have the total number of purchases for each customer, you can rank the customers based on this metric. In SQL, you can use the RANK() or ROW_NUMBER() functions. In Python, you can use the rank() method in Pandas.

## SqlAgent example

In [1]:
import os
from aita.agent.base import AitaAgent
from aita.datasource.snowflake import SnowflakeDataSource

# Set up the Snowflake and PostgreSQL data sources
SNOWFLAKE_USER = os.environ.get("SNOWFLAKE_USER")
SNOWFLAKE_PASSWORD = os.environ.get("SNOWFLAKE_PASSWORD")
SNOWFLAKE_ACCOUNT = os.environ.get("SNOWFLAKE_ACCOUNT")
SNOWFLAKE_WAREHOUSE = os.environ.get("SNOWFLAKE_WAREHOUSE")
SNOWFLAKE_DATABASE = os.environ.get("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA = os.environ.get("SNOWFLAKE_SCHEMA")
SNOWFLAKE_ROLE = os.environ.get("SNOWFLAKE_ROLE")

In [2]:
sf_datasource = SnowflakeDataSource(
    user=SNOWFLAKE_USER,
    password=SNOWFLAKE_PASSWORD,
    account=SNOWFLAKE_ACCOUNT,
    warehouse=SNOWFLAKE_WAREHOUSE,
    database=SNOWFLAKE_DATABASE,
    schema=SNOWFLAKE_SCHEMA,
    role="ACCOUNTADMIN",
)

In [4]:
# Example of using the AitaAgent with a Snowflake data source, This means no data catalog is provided to the agent.
aita_agent = (
    AitaAgent("gpt-3.5-turbo")
    .set_context_prompt("Data Schema: snowflake_sample_data.tpch_sf1")
    .add_datasource(sf_datasource)
)

In [12]:
aita_agent.stream("I want to get the top 5 customers which making the most purchases", display=True)

To get the top 5 customers who have made the most purchases from the `snowflake_sample_data.tpch_sf1` dataset, you can use the following SQL query:

```sql
SELECT c.c_custkey, c.c_name, COUNT(o.o_orderkey) AS total_purchases
FROM snowflake_sample_data.tpch_sf1.customer c
JOIN snowflake_sample_data.tpch_sf1.orders o ON c.c_custkey = o.o_custkey
GROUP BY c.c_custkey, c.c_name
ORDER BY total_purchases DESC
LIMIT 5;
```

In this query:
- We are selecting the customer key (`c_custkey`), customer name (`c_name`), and counting the number of orders for each customer as `total_purchases`.
- We are joining the `customer` and `orders` tables on the `c_custkey` (customer key) and `o_custkey` (customer key in orders table).
- We are grouping the results by customer key and customer name.
- We are ordering the results by the total number of purchases in descending order.
- Finally, we are limiting the output to the top 5 customers with the most purchases.

You can run this query in your Snowflake en

<generator object RunnableSequence.stream at 0x17ec922f0>

In [3]:
# Basic example of using the SQL agent, The data catalog is provided to the agent.
from aita.agent.sql import SqlAgent
from aita.prompt.base import BasicContextPromptTemplate
from agent.graph import ToolMode

sql_agent = (
    SqlAgent("gpt-3.5-turbo")
    .set_context_prompt(BasicContextPromptTemplate)
    .add_datasource(sf_datasource)
)

In [4]:
sql_agent.stream("I want to get the top 5 customers which making the most purchases", display=True)


I want to get the top 5 customers which making the most purchases


NotImplementedError: Unsupported message type: <class 'generator'>

In [10]:
sql_agent.stream(allow_run_tool=True)

Name: sql_database_query

[(Decimal('143500'), 'Customer#000143500', Decimal('7012696.48')), (Decimal('95257'), 'Customer#000095257', Decimal('6563511.23')), (Decimal('87115'), 'Customer#000087115', Decimal('6457526.26')), (Decimal('131113'), 'Customer#000131113', Decimal('6311428.86')), (Decimal('103834'), 'Customer#000103834', Decimal('6306524.23'))]

The top 5 customers who have made the most purchases are as follows:

1. Customer#000143500 - Total Amount Spent: $7,012,696.48
2. Customer#000095257 - Total Amount Spent: $6,563,511.23
3. Customer#000087115 - Total Amount Spent: $6,457,526.26
4. Customer#000131113 - Total Amount Spent: $6,311,428.86
5. Customer#000103834 - Total Amount Spent: $6,306,524.23

These customers have spent the most on purchases.


<generator object Pregel.stream at 0x1078ad500>

In [11]:
# Example of using the SQL agent to run a SQL query directly.
sample_sql_query = """
SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10
"""

sql_agent.stream(sample_sql_query)



SELECT c_custkey, c_name, SUM(o_totalprice) AS total_purchase
FROM snowflake_sample_data.tpch_sf1.customer
JOIN snowflake_sample_data.tpch_sf1.orders
ON c_custkey = o_custkey
GROUP BY c_custkey, c_name
ORDER BY total_purchase
DESC LIMIT 10


It seems you have run the query again. The top 5 customers with the most purchases are:

1. Customer#000143500 - Total Purchase: $7,012,696.48
2. Customer#000095257 - Total Purchase: $6,563,511.23
3. Customer#000087115 - Total Purchase: $6,457,526.26
4. Customer#000131113 - Total Purchase: $6,311,428.86
5. Customer#000103834 - Total Purchase: $6,306,524.23

These customers have the highest total purchase amounts.


<generator object Pregel.stream at 0x1379dd820>