# 08 — Functions by Data Type

Real hotel operations data is messy — NULLs, inconsistent strings, raw dates, imprecise numbers. SQL functions let you clean, transform, and enrich data at query time.

**What You'll Practice:**
- **Numeric:** ROUND, FLOOR, CEIL, MOD, ABS
- **DateTime:** EXTRACT, DATE_TRUNC, AGE, intervals, TO_DATE
- **String:** CONCAT, REPLACE, SUBSTRING, POSITION, TRIM, UPPER/LOWER, LENGTH
- **NULL:** COALESCE, NULLIF, imputation strategies

---

### Quick Reference (PostgreSQL)

| Category | Functions |
|----------|-----------|
| Numeric | `ROUND(x,n)`, `FLOOR(x)`, `CEIL(x)`, `MOD(x,y)`, `ABS(x)`, `POWER(x,n)` |
| DateTime | `EXTRACT(part FROM date)`, `DATE_TRUNC('month',date)`, `AGE(d1,d2)`, `TO_DATE(text,format)`, `NOW()`, `date + INTERVAL '2 days'` |
| String | `CONCAT(a,b)`, `a \|\| b`, `UPPER(s)`, `LOWER(s)`, `TRIM(s)`, `REPLACE(s,old,new)`, `SUBSTRING(s FROM n FOR len)`, `POSITION(sub IN s)`, `LENGTH(s)`, `LEFT(s,n)`, `RIGHT(s,n)` |
| NULL | `COALESCE(a,b,c)`, `NULLIF(a,b)` |

In [None]:
%load_ext sql
%sql postgresql://admin:password@postgres:5432/mastery_db

---
## Section A — Numeric Functions

---

### Quiz 1 — Spend Bins for Guest Segmentation

> **From: Marketing**  
> *We want to segment guests by how much they spent per night. Create spend bins: $0–50, $50–100, $100–150, $150–200, $200+. Use FLOOR to compute the bin, then count how many bookings fall into each. This will drive our email campaigns.*

**Skills:** FLOOR for binning, GROUP BY computed column

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    FLOOR(adr / 50) * 50 AS spend_bin_lower,
    (FLOOR(adr / 50) * 50 + 50) AS spend_bin_upper,
    COUNT(*) AS bookings,
    ROUND(AVG(adr)::numeric, 2) AS avg_adr
FROM hotel_bookings
WHERE adr > 0 AND adr < 500
GROUP BY FLOOR(adr / 50)
ORDER BY spend_bin_lower;
```
</details>

---

### Quiz 2 — Weekend vs Weekday Revenue Split

> **From: Revenue Manager**  
> *For each hotel, calculate estimated total revenue as `adr × total_nights`. Then break it down: what percent comes from weekend nights vs weekday nights? Use ROUND to keep it clean.*

**Skills:** Arithmetic with ROUND, percentage calculation

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    hotel,
    ROUND(SUM(adr * (stays_in_weekend_nights + stays_in_week_nights))::numeric, 0) AS total_revenue,
    ROUND(SUM(adr * stays_in_weekend_nights)::numeric, 0) AS weekend_revenue,
    ROUND(SUM(adr * stays_in_week_nights)::numeric, 0) AS weekday_revenue,
    ROUND(
        SUM(adr * stays_in_weekend_nights)::numeric /
        NULLIF(SUM(adr * (stays_in_weekend_nights + stays_in_week_nights)), 0) * 100, 1
    ) AS weekend_pct
FROM hotel_bookings
WHERE adr > 0 AND is_canceled = 0
GROUP BY hotel;
```
</details>

---
## Section B — DateTime Functions

---

### Quiz 3 — Build a Proper Arrival Date

> **From: Data Engineer**  
> *Our `hotel_bookings` table stores the arrival date as separate columns (year, month, day). That's terrible for date math. Create a proper `DATE` column by combining them using `TO_DATE` and string concatenation. Then add a `checkout_date` by adding total nights as an interval.*

**Skills:** TO_DATE, CONCAT, INTERVAL, date arithmetic

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    hotel,
    TO_DATE(
        arrival_date_year || '-' || arrival_date_month || '-' || arrival_date_day_of_month,
        'YYYY-Month-DD'
    ) AS arrival_date,
    TO_DATE(
        arrival_date_year || '-' || arrival_date_month || '-' || arrival_date_day_of_month,
        'YYYY-Month-DD'
    ) + (stays_in_weekend_nights + stays_in_week_nights) * INTERVAL '1 day' AS checkout_date,
    stays_in_weekend_nights + stays_in_week_nights AS total_nights
FROM hotel_bookings
WHERE is_canceled = 0
LIMIT 10;
```
</details>

---

### Quiz 4 — Monthly Booking Trends with DATE_TRUNC

> **From: BI Analyst**  
> *Using the `reservation_status_date` column (which IS a real date), truncate it to month using `DATE_TRUNC`. Then show the booking count and average ADR per month. This feeds our dashboard.*

**Skills:** DATE_TRUNC, EXTRACT

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    DATE_TRUNC('month', reservation_status_date::date) AS month,
    COUNT(*) AS bookings,
    ROUND(AVG(adr)::numeric, 2) AS avg_adr,
    SUM(is_canceled) AS cancellations
FROM hotel_bookings
WHERE adr > 0
GROUP BY DATE_TRUNC('month', reservation_status_date::date)
ORDER BY month;
```
</details>

---

### Quiz 5 — OTA Reservation: Days Until Arrival

> **From: Operations**  
> *Using `hotel_reservations`, the table has `arrival_year`, `arrival_month`, `arrival_date` as integers. Build a proper arrival date, then extract the day of week (1=Monday, 7=Sunday). Also show which quarter the arrival falls in.*

**Skills:** MAKE_DATE (or TO_DATE), EXTRACT(dow/quarter)

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    booking_id,
    MAKE_DATE(arrival_year, arrival_month, arrival_date) AS arrival_dt,
    EXTRACT(ISODOW FROM MAKE_DATE(arrival_year, arrival_month, arrival_date)) AS day_of_week,
    EXTRACT(QUARTER FROM MAKE_DATE(arrival_year, arrival_month, arrival_date)) AS quarter,
    lead_time
FROM hotel_reservations
LIMIT 15;
```
</details>

---
## Section C — String Functions

---

### Quiz 6 — Clean Guest Email Domains

> **From: Marketing**  
> *The `hotel_bookings` table has guest emails. Extract just the domain (after the @), convert to lowercase, then find the top 10 most common email providers.*

**Skills:** SUBSTRING, POSITION, LOWER, GROUP BY

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    LOWER(SUBSTRING(email FROM POSITION('@' IN email) + 1)) AS email_domain,
    COUNT(*) AS bookings
FROM hotel_bookings
WHERE email IS NOT NULL
GROUP BY email_domain
ORDER BY bookings DESC
LIMIT 10;
```
</details>

---

### Quiz 7 — Booking ID Generator

> **From: Engineering**  
> *We need to generate a booking reference for each row: format `HTL-{hotel_type_code}-{year}-{row_num}`. Use LEFT to get first 3 chars of hotel name, LPAD for zero-padded row number, and CONCAT to build it. Show 10 rows.*

**Skills:** CONCAT, LEFT, LPAD, ROW_NUMBER

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    CONCAT(
        'HTL-',
        UPPER(LEFT(REPLACE(hotel, ' Hotel', ''), 3)),
        '-',
        arrival_date_year,
        '-',
        LPAD(ROW_NUMBER() OVER (ORDER BY reservation_status_date)::text, 6, '0')
    ) AS booking_ref,
    hotel,
    country,
    adr
FROM hotel_bookings
LIMIT 10;
```
</details>

---

### Quiz 8 — Standardize Country Descriptions

> **From: Data Quality Team**  
> *The `meal` column has inconsistent values: 'BB', 'HB', 'FB', 'SC', 'Undefined'. Replace them with human-readable labels using CASE and REPLACE. Also TRIM any accidental whitespace. Output: hotel, meal code, meal label, count.*

**Skills:** CASE WHEN as a string mapper, TRIM

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    hotel,
    TRIM(meal) AS meal_code,
    CASE TRIM(meal)
        WHEN 'BB' THEN 'Bed & Breakfast'
        WHEN 'HB' THEN 'Half Board'
        WHEN 'FB' THEN 'Full Board'
        WHEN 'SC' THEN 'Self Catering'
        ELSE 'Not Selected'
    END AS meal_label,
    COUNT(*) AS bookings
FROM hotel_bookings
GROUP BY hotel, TRIM(meal)
ORDER BY hotel, bookings DESC;
```
</details>

---
## Section D — NULL Handling

---

### Quiz 9 — Fill Missing Agent IDs

> **From: Analytics Lead**  
> *Many bookings have NULL `agent` and `company` values. For reporting, replace NULL agent with 0 (meaning 'Direct / No Agent') and NULL company with 0 (meaning 'Individual'). Also use NULLIF to turn `adr = 0` into NULL so our averages aren't skewed.*

**Skills:** COALESCE, NULLIF

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
SELECT
    hotel,
    COALESCE(agent, 0)   AS agent_clean,
    COALESCE(company, 0) AS company_clean,
    adr AS adr_raw,
    NULLIF(adr, 0)       AS adr_no_zeros,
    ROUND(AVG(NULLIF(adr, 0)) OVER (PARTITION BY hotel)::numeric, 2) AS hotel_avg_adr_clean
FROM hotel_bookings
LIMIT 15;
```
</details>

---

### Quiz 10 — Impute Missing Values with Most Common

> **From: Data Scientist**  
> *For bookings where `country` is NULL, I want to fill it with the most common country for that hotel type. Write a CTE chain: (1) find the top country per hotel, (2) use COALESCE + a JOIN to fill NULLs.*

**Skills:** COALESCE + CTE + JOIN for imputation (advanced NULL handling)

In [None]:
%%sql


<details><summary>Hint</summary>

```sql
WITH country_counts AS (
    SELECT hotel, country, COUNT(*) AS cnt
    FROM hotel_bookings
    WHERE country IS NOT NULL
    GROUP BY hotel, country
),
top_country AS (
    SELECT hotel, country AS top_country
    FROM (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY hotel ORDER BY cnt DESC) AS rn
        FROM country_counts
    ) ranked
    WHERE rn = 1
)
SELECT
    b.hotel,
    b.country AS original_country,
    COALESCE(b.country, tc.top_country) AS imputed_country,
    b.adr
FROM hotel_bookings b
LEFT JOIN top_country tc ON b.hotel = tc.hotel
WHERE b.country IS NULL
LIMIT 15;
```
</details>

---
## Bonus — Free Play

In [None]:
%%sql


In [None]:
%%sql


---
**Next:** [09_data_analysis_applications.ipynb](./09_data_analysis_applications.ipynb) — pivoting, rolling calculations, and deduplication.