# 05 — Hotel Reservation Practice

A fresh dataset, empty cells, and **you write all the SQL**.

---

### Dataset: Hotel Reservation (~36k rows)

| Column | Description |
|--------|-------------|
| `booking_id` | Unique booking identifier (INN00001, INN00002, ...) |
| `no_of_adults` | Number of adults |
| `no_of_children` | Number of children |
| `no_of_weekend_nights` | Weekend nights in the stay |
| `no_of_week_nights` | Weekday nights in the stay |
| `type_of_meal_plan` | Meal Plan 1, Meal Plan 2, Meal Plan 3, Not Selected |
| `required_car_parking_space` | 0 or 1 |
| `room_type_reserved` | Room_Type 1 through Room_Type 7 |
| `lead_time` | Days between booking and arrival |
| `arrival_year` | Year of arrival (2017, 2018) |
| `arrival_month` | Month of arrival (1–12) |
| `arrival_date` | Day of month |
| `market_segment_type` | Online, Offline, Corporate, Complementary, Aviation |
| `repeated_guest` | 1 = returning guest, 0 = new |
| `no_of_previous_cancellations` | Prior cancellations by this guest |
| `no_of_previous_bookings_not_canceled` | Prior successful bookings |
| `avg_price_per_room` | Average price per room per night |
| `no_of_special_requests` | Number of special requests |
| `booking_status` | Canceled / Not_Canceled |

---
## Setup — Load Data

In [1]:
%load_ext sql
%sql postgresql://admin:password@postgres:5432/mastery_db

In [2]:
import pandas as pd
from sqlalchemy import create_engine

df = pd.read_csv('/app/data/hotel_reservation.csv')
print(f"Loaded {len(df):,} rows  x  {len(df.columns)} columns")

engine = create_engine("postgresql://admin:password@postgres:5432/mastery_db")
df.columns = df.columns.str.lower()  # lowercase column names for easier SQL
df.to_sql('hotel_reservations', engine, if_exists='replace', index=False)
print("Table 'hotel_reservations' created.")

Loaded 36,275 rows  x  19 columns
Table 'hotel_reservations' created.


In [None]:
%%sql
SELECT COUNT(*) AS total_rows FROM hotel_reservations;

In [3]:
%%sql
SELECT * FROM hotel_reservations LIMIT 5;

booking_id,no_of_adults,no_of_children,no_of_weekend_nights,no_of_week_nights,type_of_meal_plan,required_car_parking_space,room_type_reserved,lead_time,arrival_year,arrival_month,arrival_date,market_segment_type,repeated_guest,no_of_previous_cancellations,no_of_previous_bookings_not_canceled,avg_price_per_room,no_of_special_requests,booking_status
INN00001,2,0,1,2,Meal Plan 1,0,Room_Type 1,224,2017,10,2,Offline,0,0,0,65.0,0,Not_Canceled
INN00002,2,0,2,3,Not Selected,0,Room_Type 1,5,2018,11,6,Online,0,0,0,106.68,1,Not_Canceled
INN00003,1,0,2,1,Meal Plan 1,0,Room_Type 1,1,2018,2,28,Online,0,0,0,60.0,0,Canceled
INN00004,2,0,0,2,Meal Plan 1,0,Room_Type 1,211,2018,5,20,Online,0,0,0,100.0,0,Canceled
INN00005,2,0,1,1,Not Selected,0,Room_Type 1,48,2018,4,11,Online,0,0,0,94.5,0,Canceled


---
## Warm-Up: Explore the Data

**Q1.** How many distinct room types, meal plans, and market segments are there?

In [None]:
%%sql


**Q2.** What is the overall cancellation rate?

In [None]:
%%sql


**Q3.** Show the min, avg, median, and max of `avg_price_per_room`.

In [None]:
%%sql


---
## GROUP BY & Aggregations

**Q4.** Bookings and cancellation rate per `room_type_reserved`. Which room type has the highest cancel rate?

In [None]:
%%sql


**Q5.** Average price per room by `market_segment_type`, ordered highest to lowest.

In [None]:
%%sql


**Q6.** Monthly bookings count for each year. Is there a seasonal trend?

In [None]:
%%sql


**Q7.** Which `type_of_meal_plan` generates the most revenue (`SUM(avg_price_per_room)`)?

In [None]:
%%sql


---
## Window Functions

**Q8.** Rank room types by average price using `DENSE_RANK`.

In [None]:
%%sql


**Q9.** For each market segment, get the top 3 most expensive bookings (use `ROW_NUMBER` + subquery).

In [None]:
%%sql


**Q10.** Calculate a running total of bookings per month (across both years).

In [None]:
%%sql


**Q11.** Month-over-month change in average price using `LAG`.

In [None]:
%%sql


**Q12.** Split bookings into price quartiles using `NTILE(4)`. Show min/avg/max price per quartile.

In [None]:
%%sql


---
## Advanced Aggregations (ROLLUP, CUBE, FILTER)

**Q13.** Use `ROLLUP` to get revenue by `arrival_year → arrival_month` with year subtotals and a grand total.

In [None]:
%%sql


**Q14.** Use `FILTER` to pivot: for each room type, show count of Canceled vs Not_Canceled as separate columns.

In [None]:
%%sql


**Q15.** Use `CUBE` on `market_segment_type` and `room_type_reserved` to get all cross-combination booking counts.

In [None]:
%%sql


---
## CTEs & Business Logic

**Q16.** Write a CTE that computes cancellation rate per `market_segment_type`, then flag segments with rate > 30% as `'HIGH RISK'`.

In [None]:
%%sql


**Q17.** Find guests who are repeated (`repeated_guest = 1`) but still canceled. How many are there? What's their average lead time vs non-repeated guests who canceled?

In [None]:
%%sql


**Q18.** Create a "guest profile" CTE: total nights (`weekend + week`), total spend (`avg_price * total_nights`), and segment. Then find the top 10 highest-spend bookings.

In [None]:
%%sql


---
## Performance (Bonus)

**Q19.** Run `EXPLAIN ANALYZE` on a query that filters by `lead_time > 200`. Then create an index and compare.

In [None]:
%%sql


In [None]:
%%sql


**Q20.** Free play — write any query you want. Combine window functions, CTEs, FILTER, whatever you've learned!

In [None]:
%%sql


In [None]:
%%sql


In [None]:
%%sql
