<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/7_Basic_Query_Optimization/2_Basic_Optimization.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Basic Optimization

## Overview

### 🥅 Analysis Goals

- What we’re going to use for this dataset to do X e.g. Use the following in order to explore a dataset on experience and salaries
    - Major topic 1
    - Major topic 2
    - Major topic 3
- The end goal of this is e.g. Identify which jobs meet our expectations of years experience and total salary.

### 📘 Concepts Covered

General concepts we’re going to cover

- Concept 1
- Concept 2
- Concept 3

---
## Major Topic

### 📝 Notes

- Query optimization improves SQL performance by reducing execution time and resource usage.

**Basic Optimization Tips**  
- Use `INNER JOIN` instead of `LEFT JOIN` when unmatched rows aren’t needed.  
- Filter early using `WHERE`, not `HAVING`, to reduce processed rows.  
- Avoid `SELECT *`, select only required columns.  
- Use `UNION` instead of `UNION ALL` when removing duplicates is acceptable.  
- Pre-filter data before `GROUP BY` and `DISTINCT` to avoid unnecessary calculations.  
- Replace `OR` conditions with `IN` for better index usage.  
- Use `EXISTS` instead of `IN` for subqueries on large datasets.  
- Ensure data types match in comparisons to prevent slow implicit conversions.  

### 💻 Final Result

- Optimize the query to run more efficiently. 

#### Query Optimization

**Basic Optimization**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

In [None]:
WITH sales_data AS (
    SELECT
        customerkey,
        EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
        SUM(quantity * netprice * exchangerate) AS net_revenue,
        COUNT(orderkey) AS num_orders
    FROM sales
    GROUP BY customerkey
)
SELECT
    c.customerkey,
    s.cohort_year,
    CONCAT(TRIM(c.givenname), ' ', TRIM(c.surname)) AS cleaned_name,
    COALESCE(s.net_revenue, 0) AS net_revenue,
    COALESCE(s.num_orders, 0) AS total_orders,
    s.net_revenue / NULLIF(s.num_orders, 0) AS avg_order_value
FROM customer c
LEFT JOIN sales_data s ON c.customerkey = s.customerkey;
