## Performance Optimization
## Analyze Query Plans

### Understand how Spark executes your query internally and identify performance bottlenecks like:

- Full table scans
- Skewed joins
- Unnecessary shuffles

### Query plan analysis helps you fix issues before scaling costs explode.

In [0]:
%sql
EXPLAIN FORMATTED
SELECT
  category_code,
  COUNT(*) AS total_events
FROM ecommerce_catalog.silver.events_nov_clean
WHERE event_time = '2024-10-15'
  AND event_type = 'purchase'
GROUP BY category_code;

## Partition Events Table by event_time

In [0]:
%sql
CREATE TABLE ecommerce_catalog.silver.events_delta_partitioned
USING DELTA
PARTITIONED BY (event_type)
AS
SELECT
  event_time,
  event_type,
  product_id,
  category_id,
  category_code,
  brand,
  price,
  user_id,
  user_session
FROM ecommerce_catalog.bronze.events_nov;


## Validate Partition Pruning

In [0]:
%sql
EXPLAIN FORMATTED
SELECT *
FROM ecommerce_catalog.silver.events_delta_partitioned
WHERE event_time = '2024-10-15';

## Apply ZORDER on High-Usage Columns
### Apply ZORDER on columns frequently used in filters.

- user_id → user behavior analysis
- product_id → product performance
- category_code → category analytics

ZORDER improves file-level skipping inside each partition.

In [0]:
%sql
OPTIMIZE ecommerce_catalog.silver.events_delta_partitioned
ZORDER BY (user_id, product_id);