# 07 — Subqueries & CTEs

In hotel tech, you're constantly answering questions like *"which bookings are above average?"* or *"how does this segment compare to the whole?"*. That requires nesting queries inside queries.

**What You'll Practice:**
- Subqueries in **SELECT** (scalar comparisons)
- Subqueries in **FROM** (derived tables)
- Subqueries in **WHERE** (filtering with `ANY`, `ALL`, `EXISTS`)
- CTEs — single and chained
- Correlated subqueries

---

### Quick Reference

```sql
-- Subquery in SELECT (scalar — must return one value)
SELECT col, (SELECT AVG(x) FROM t) AS avg_x FROM t;

-- Subquery in FROM (derived table — needs alias)
SELECT * FROM (SELECT ... ) AS sub;

-- Subquery in WHERE
WHERE col > ALL (SELECT col FROM t WHERE ...)    -- greater than every value
WHERE col > ANY (SELECT col FROM t WHERE ...)    -- greater than at least one
WHERE EXISTS (SELECT 1 FROM t WHERE t.id = o.id) -- row exists?

-- CTE (readable alternative to subqueries)
WITH cte AS (SELECT ...) SELECT * FROM cte;
```

In [1]:
%load_ext sql
%sql postgresql://admin:password@postgres:5432/mastery_db

---
## Section A — Subqueries in SELECT

---

### Quiz 1 — ADR vs Hotel Average

> **From: Revenue Analyst**  
> *For every booking, I need to see the ADR alongside the overall average ADR, and the difference. This way I can flag bookings that are significantly above or below average. Don't use a JOIN — use a subquery in the SELECT clause.*

**Skills:** Scalar subquery in SELECT

In [2]:
%%sql
SELECT
    hotel, country, adr,
    ROUND((SELECT AVG(adr) FROM hotel_bookings WHERE adr > 0)::numeric, 2) AS overall_avg,
    ROUND((adr - (SELECT AVG(adr) FROM hotel_bookings WHERE adr > 0))::numeric, 2) AS diff_from_avg
FROM hotel_bookings
WHERE adr > 0
ORDER BY diff_from_avg DESC
LIMIT 15


hotel,country,adr,overall_avg,diff_from_avg
City Hotel,PRT,5400.0,103.53,5296.47
City Hotel,ITA,510.0,103.53,406.47
Resort Hotel,PRT,508.0,103.53,404.47
City Hotel,PRT,451.5,103.53,347.97
Resort Hotel,PRT,450.0,103.53,346.47
Resort Hotel,PRT,437.0,103.53,333.47
Resort Hotel,PRT,426.25,103.53,322.72
Resort Hotel,ESP,402.0,103.53,298.47
Resort Hotel,MAR,397.38,103.53,293.85
Resort Hotel,PRT,392.0,103.53,288.47


<details><summary>Hint</summary>

```sql
SELECT
    hotel, country, adr,
    ROUND((SELECT AVG(adr) FROM hotel_bookings WHERE adr > 0)::numeric, 2) AS overall_avg,
    ROUND((adr - (SELECT AVG(adr) FROM hotel_bookings WHERE adr > 0))::numeric, 2) AS diff_from_avg
FROM hotel_bookings
WHERE adr > 0
ORDER BY diff_from_avg DESC
LIMIT 15;
```
</details>

---

### Quiz 2 — Each Booking vs Its Hotel's Average

> **From: Revenue Analyst**  
> *Actually, comparing against the OVERALL average isn't fair — Resort and City hotels have very different pricing. Can you compare each booking's ADR against the average ADR for THAT hotel type specifically? Use a correlated subquery.*

**Skills:** Correlated subquery (references outer query)

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    hotel, country, market_segment, adr,
    ROUND((
        SELECT AVG(b2.adr)
        FROM hotel_bookings b2
        WHERE b2.hotel = b1.hotel AND b2.adr > 0
    )::numeric, 2) AS hotel_avg_adr,
    ROUND((adr - (
        SELECT AVG(b2.adr)
        FROM hotel_bookings b2
        WHERE b2.hotel = b1.hotel AND b2.adr > 0
    ))::numeric, 2) AS diff
FROM hotel_bookings b1
WHERE adr > 0
ORDER BY diff DESC
LIMIT 15;
```
</details>

---
## Section B — Subqueries in FROM

---

### Quiz 3 — Market Segment Report with Totals

> **From: Commercial Director**  
> *I need each market segment with its total bookings AND the number of distinct countries it brings in. Use a subquery in FROM to get the country count, then JOIN it with the booking count.*

**Skills:** Derived table (subquery in FROM), JOIN

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    bc.market_segment,
    bc.total_bookings,
    cc.distinct_countries
FROM
    (SELECT market_segment, COUNT(*) AS total_bookings
     FROM hotel_bookings
     GROUP BY market_segment) bc
LEFT JOIN
    (SELECT market_segment, COUNT(DISTINCT country) AS distinct_countries
     FROM hotel_bookings
     WHERE country IS NOT NULL
     GROUP BY market_segment) cc
ON bc.market_segment = cc.market_segment
ORDER BY bc.total_bookings DESC;
```
</details>

---

### Quiz 4 — High-Value Bookings per Hotel

> **From: GM**  
> *For each hotel, find the total revenue (SUM of ADR) from bookings where the stay is longer than 7 nights. Then tell me what percentage of total revenue these long stays represent. Use subqueries in FROM.*

**Skills:** Two derived tables joined together

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    t.hotel,
    ROUND(t.total_rev::numeric, 2) AS total_revenue,
    ROUND(ls.long_stay_rev::numeric, 2) AS long_stay_revenue,
    ROUND((ls.long_stay_rev / t.total_rev * 100)::numeric, 2) AS pct_from_long_stays
FROM
    (SELECT hotel, SUM(adr) AS total_rev
     FROM hotel_bookings WHERE adr > 0
     GROUP BY hotel) t
JOIN
    (SELECT hotel, SUM(adr) AS long_stay_rev
     FROM hotel_bookings
     WHERE adr > 0 AND (stays_in_weekend_nights + stays_in_week_nights) > 7
     GROUP BY hotel) ls
ON t.hotel = ls.hotel;
```
</details>

---
## Section C — Subqueries in WHERE (ANY, ALL, EXISTS)

---

### Quiz 5 — Cheaper Than All Direct Bookings

> **From: OTA Manager**  
> *We promised our OTA partners that their guests wouldn't pay more than the cheapest direct booking rate. Find all bookings from 'Online TA' where the ADR is less than ALL direct booking ADRs for the same hotel. Use `ALL` keyword.*

**Skills:** Subquery with ALL

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT hotel, market_segment, country, adr
FROM hotel_bookings
WHERE market_segment = 'Online TA'
  AND adr > 0
  AND adr < ALL (
      SELECT adr FROM hotel_bookings
      WHERE market_segment = 'Direct' AND adr > 0
  )
ORDER BY adr
LIMIT 20;
```
</details>

---

### Quiz 6 — Countries with Repeat Guests

> **From: Loyalty Program Manager**  
> *I need a list of countries that have at least one repeat guest (`is_repeated_guest = 1`). Don't count — just check existence. Use `EXISTS`.*

**Skills:** EXISTS subquery

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT DISTINCT b1.country
FROM hotel_bookings b1
WHERE b1.country IS NOT NULL
  AND EXISTS (
      SELECT 1 FROM hotel_bookings b2
      WHERE b2.country = b1.country
        AND b2.is_repeated_guest = 1
  )
ORDER BY b1.country;
```
</details>

---

### Quiz 7 — Months That Beat Any Previous Year

> **From: Revenue Manager**  
> *For 2016, find months where the average ADR was higher than ANY month's average ADR in 2015. Use `ANY`.*

**Skills:** Subquery with ANY

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    arrival_date_month,
    ROUND(AVG(adr)::numeric, 2) AS avg_adr_2016
FROM hotel_bookings
WHERE arrival_date_year = 2016 AND adr > 0
GROUP BY arrival_date_month
HAVING AVG(adr) > ANY (
    SELECT AVG(adr)
    FROM hotel_bookings
    WHERE arrival_date_year = 2015 AND adr > 0
    GROUP BY arrival_date_month
)
ORDER BY avg_adr_2016 DESC;
```
</details>

---
## Section D — CTEs (Single & Chained)

---

### Quiz 8 — Cancellation Risk Scoring

> **From: Operations VP**  
> *Build a cancellation risk model using CTEs:*  
> *Step 1 — CTE: Get each booking's total nights, whether it was canceled, and the lead time.*  
> *Step 2 — CTE: Calculate cancellation rate by lead_time bucket (0–30, 31–90, 91–180, 181+).*  
> *Step 3 — Final query: Show each bucket with its cancellation rate, labeled as LOW/MEDIUM/HIGH risk.*

**Skills:** Multiple chained CTEs, CASE WHEN bucketing

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
WITH bookings AS (
    SELECT
        is_canceled,
        lead_time,
        CASE
            WHEN lead_time <= 30  THEN '0-30 days'
            WHEN lead_time <= 90  THEN '31-90 days'
            WHEN lead_time <= 180 THEN '91-180 days'
            ELSE '181+ days'
        END AS lead_bucket
    FROM hotel_bookings
),
bucket_stats AS (
    SELECT
        lead_bucket,
        COUNT(*) AS total,
        SUM(is_canceled) AS canceled,
        ROUND(SUM(is_canceled)::numeric / COUNT(*) * 100, 2) AS cancel_rate
    FROM bookings
    GROUP BY lead_bucket
)
SELECT
    lead_bucket, total, canceled, cancel_rate,
    CASE
        WHEN cancel_rate > 45 THEN 'HIGH'
        WHEN cancel_rate > 30 THEN 'MEDIUM'
        ELSE 'LOW'
    END AS risk_level
FROM bucket_stats
ORDER BY cancel_rate DESC;
```
</details>

---

### Quiz 9 — Guest Lifetime Value (LTV)

> **From: Loyalty Program Manager**  
> *Build a guest LTV estimate:*  
> *CTE 1: For each customer_type, compute avg ADR, avg total nights, and avg number of special requests.*  
> *CTE 2: Estimate LTV = avg_adr × avg_total_nights.*  
> *Final: Rank customer types by LTV and show all columns.*

**Skills:** Chained CTEs, computed columns

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
WITH segment_metrics AS (
    SELECT
        customer_type,
        COUNT(*) AS bookings,
        ROUND(AVG(adr)::numeric, 2) AS avg_adr,
        ROUND(AVG(stays_in_weekend_nights + stays_in_week_nights)::numeric, 1) AS avg_nights,
        ROUND(AVG(total_of_special_requests)::numeric, 2) AS avg_requests
    FROM hotel_bookings
    WHERE adr > 0 AND is_canceled = 0
    GROUP BY customer_type
),
ltv AS (
    SELECT *,
        ROUND((avg_adr * avg_nights)::numeric, 2) AS estimated_ltv
    FROM segment_metrics
)
SELECT *,
    RANK() OVER (ORDER BY estimated_ltv DESC) AS ltv_rank
FROM ltv;
```
</details>

---

### Quiz 10 — Rewrite Subquery as CTE

> **From: Senior Engineer (code review)**  
> *This subquery works but it's unreadable. Rewrite it using CTEs:*
>
> ```sql
> SELECT market_segment, total_bookings, avg_adr
> FROM (
>     SELECT market_segment, COUNT(*) AS total_bookings,
>            ROUND(AVG(adr)::numeric, 2) AS avg_adr
>     FROM hotel_bookings WHERE adr > 0
>     GROUP BY market_segment
> ) sub
> WHERE total_bookings > 5000
> ORDER BY avg_adr DESC;
> ```

**Skills:** Refactoring subquery → CTE

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
WITH segment_summary AS (
    SELECT
        market_segment,
        COUNT(*) AS total_bookings,
        ROUND(AVG(adr)::numeric, 2) AS avg_adr
    FROM hotel_bookings
    WHERE adr > 0
    GROUP BY market_segment
)
SELECT market_segment, total_bookings, avg_adr
FROM segment_summary
WHERE total_bookings > 5000
ORDER BY avg_adr DESC;
```
</details>

---
## Bonus — Free Play

In [None]:
%%sql


In [None]:
%%sql


---
**Next:** [08_functions_by_type.ipynb](./08_functions_by_type.ipynb) — numeric, datetime, string, and NULL functions.