# Window Example with the Spark SQL ReAct Agent

From the blog post [Introducing Window Functions in Spark SQL](https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html) by Databricks.

In [1]:
from pyspark_ai import SparkAI

spark_ai = SparkAI(verbose=True)
spark_ai.activate()

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/09 14:45:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


In [2]:
spark = spark_ai._spark
# create a dataframe productRevenue
df = spark.createDataFrame(
    [
        ("Thin", "Cellphone", 6000),
        ("Normal", "Tablet", 1500),
        ("Mini", "Tablet", 5500),
        ("Ultra think", "Cellphone", 5000),
        ("Very thin", "Cellphone", 6000),
        ("Big", "Tablet", 2500),
        ("Bendable", "Cellphone", 3000),
        ("Foldable", "Cellphone", 3000),
        ("Pro", "Tablet", 4500),
        ("Pro2", "Tablet", 6500)
    ],
    ["product", "category", "revenue"]
)

In [3]:
df.ai.transform("What are the best-selling and the second best-selling products in every category?").show()

[92mINFO: [0mCreating temp view for the transform:
df.createOrReplaceTempView([33m"[39;49;00m[33mspark_ai_temp_view_85b651[39;49;00m[33m"[39;49;00m)[37m[39;49;00m



[1m> Entering new  chain...[0m
[32;1m[1;3mThought: I will use the ROW_NUMBER() function to rank the products in each category by revenue. Then, I will filter the results to only include the top 2 products in each category.
Action: query_validation
Action Input: 
```
WITH ranked_products AS (
  SELECT 
    product, 
    category, 
    revenue, 
    ROW_NUMBER() OVER(PARTITION BY category ORDER BY revenue DESC) as rank
  FROM spark_ai_temp_view_85b651
)
SELECT product, category, revenue
FROM ranked_products
WHERE rank <= 2
```[0m
Observation: [33;1m[1;3mOK[0m
Thought:[32;1m[1;3mI now know the final answer.
Final Answer: 
```
WITH ranked_products AS (
  SELECT 
    product, 
    category, 
    revenue, 
    ROW_NUMBER() OVER(PARTITION BY category ORDER BY revenue DESC) as rank
  FROM spark_ai_temp_view_85b

[Stage 0:>                                                        (0 + 10) / 10]

+---------+---------+-------+
|  product| category|revenue|
+---------+---------+-------+
|     Thin|Cellphone|   6000|
|Very thin|Cellphone|   6000|
|     Pro2|   Tablet|   6500|
|     Mini|   Tablet|   5500|
+---------+---------+-------+



                                                                                

In [4]:
df.ai.transform("What is the difference between the revenue of each product and the revenue of the best-selling product in the same category of that product?").show()

[92mINFO: [0mCreating temp view for the transform:
df.createOrReplaceTempView([33m"[39;49;00m[33mspark_ai_temp_view_5248a4[39;49;00m[33m"[39;49;00m)[37m[39;49;00m



[1m> Entering new  chain...[0m
[32;1m[1;3mThought: I need to find the maximum revenue for each category first, then subtract the revenue of each product from the maximum revenue of its category.
Action: query_validation
Action Input: 
```
WITH max_revenue AS (
  SELECT category, MAX(revenue) as max_revenue
  FROM spark_ai_temp_view_5248a4
  GROUP BY category
)
SELECT a.product, a.category, a.revenue, b.max_revenue - a.revenue as revenue_difference
FROM spark_ai_temp_view_5248a4 a
JOIN max_revenue b
ON a.category = b.category
```[0m
Observation: [33;1m[1;3mOK[0m
Thought:[32;1m[1;3mI now know the final answer.
Final Answer: 
```
WITH max_revenue AS (
  SELECT category, MAX(revenue) as max_revenue
  FROM spark_ai_temp_view_5248a4
  GROUP BY category
)
SELECT a.product, a.category, a.revenue, b.max_revenue 