## Spark SQL Features Demonstration

This notebook demonstrates key Spark SQL features with examples:
- Creating, using, and dropping Views
- Queries with and without CTEs (Common Table Expressions)
- Customers without orders (using Join vs SubQuery)
- CTAS (Create Table As Select)

### 1. Create a View
We create a view to simplify queries. Views act like virtual tables based on existing queries.

In [None]:
spark.sql("""
CREATE OR REPLACE TEMP VIEW completed_orders AS
SELECT * FROM orders WHERE order_status IN ('COMPLETE','CLOSED')
""")

### 2. Use a View
Querying from the view instead of the base table.

In [None]:
spark.sql("SELECT order_id, order_date FROM completed_orders LIMIT 10").show()

### 3. Drop a View
Remove the view when no longer needed.

In [None]:
spark.sql("DROP VIEW IF EXISTS completed_orders")

### 4. Query Without CTE
Calculate daily revenue directly in the query without a CTE.

In [None]:
spark.sql("""
SELECT to_date(o.order_date) AS order_date,
       ROUND(SUM(oi.order_item_subtotal),2) AS daily_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE','CLOSED')
GROUP BY to_date(o.order_date)
ORDER BY order_date
""").show(10, truncate=False)

### 5. Query With CTE
The same calculation, but using a CTE for better readability and reuse.

**CTE (Common Table Expression):** A temporary result set defined within the query using the `WITH` clause.

In [None]:
spark.sql("""
WITH revenue_cte AS (
  SELECT to_date(o.order_date) AS order_date,
         SUM(oi.order_item_subtotal) AS revenue
  FROM orders o
  JOIN order_items oi ON o.order_id = oi.order_item_order_id
  WHERE o.order_status IN ('COMPLETE','CLOSED')
  GROUP BY to_date(o.order_date)
)
SELECT order_date, ROUND(revenue,2) AS daily_revenue
FROM revenue_cte
ORDER BY order_date
""").show(10, truncate=False)

### 6. Customers Without Orders (Join Method)
Find customers who have not placed any orders, using a LEFT JOIN.

In [None]:
spark.sql("""
SELECT c.customer_id, c.customer_fname, c.customer_lname
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.order_customer_id
WHERE o.order_id IS NULL
LIMIT 10
""").show()

### 7. Customers Without Orders (Subquery Method)
Find customers without orders using a `NOT IN` subquery.

In [None]:
spark.sql("""
SELECT customer_id, customer_fname, customer_lname
FROM customers
WHERE customer_id NOT IN (
    SELECT DISTINCT order_customer_id FROM orders
)
LIMIT 10
""").show()