# 03 Competitive Benchmarking

**Assignment: Identify comparable hotel groups, best practices, recommendations for underperformers, validation**

- Methodology: group by review volume and average rating tier (comparable peers)
- Performance across groups
- Best practices within peers
- Recommendations for underperformers

In [2]:
import sys
from pathlib import Path
project_root = Path.cwd() if (Path.cwd() / "src").exists() else Path.cwd().parent
sys.path.insert(0, str(project_root))

from src.benchmarking import get_reviews_df, comparable_groups_by_volume_and_rating, best_practices_within_peers, recommendations_for_underperformers
import pandas as pd

df = get_reviews_df(sample=True)
print("Reviews:", len(df))

Reviews: 5000


## 1. Comparable groups (volume + rating tier)

In [3]:
peers = comparable_groups_by_volume_and_rating(df)
print("Peer groups:", peers["peer_group"].nunique())
peers.groupby("peer_group").agg(n_hotels=("offering_id", "count"), avg_rating=("avg_rating", "mean")).head(10)

Peer groups: 5


Unnamed: 0_level_0,n_hotels,avg_rating
peer_group,Unnamed: 1_level_1,Unnamed: 2_level_1
0_0,111,1.466557
0_1,212,2.999533
0_2,159,3.640752
0_3,511,4.08651
0_4,589,4.8335


## 2. Best practices within peers (e.g. cleanliness)

In [4]:
best = best_practices_within_peers(peers, metric="avg_cleanliness")
best[best["rank_in_peer"] == 1][["offering_id", "peer_group", "n_reviews", "avg_rating", "avg_cleanliness"]].head(15)

Unnamed: 0,offering_id,peer_group,n_reviews,avg_rating,avg_cleanliness
6,73706,0_0,1,2.0,5.0
478,94194,0_0,1,2.0,5.0
1089,239806,0_0,1,2.0,5.0
1412,1201174,0_0,1,1.0,5.0
202,82357,0_1,1,3.0,5.0
213,82821,0_1,2,3.0,5.0
377,90986,0_1,1,3.0,5.0
436,93489,0_1,1,3.0,5.0
709,108981,0_1,2,3.0,5.0
725,109412,0_1,1,3.0,5.0


## 3. Underperformers (bottom 25% cleanliness in a peer group)

In [5]:
group = peers["peer_group"].mode().iloc[0]
rec = recommendations_for_underperformers(peers, peer_group=group, metric="avg_cleanliness", bottom_pct=0.25)
rec[["offering_id", "n_reviews", "avg_cleanliness", "peer_median", "gap"]].head(10)

Unnamed: 0,offering_id,n_reviews,avg_cleanliness,peer_median,gap
1211,507339,1,3.0,5.0,2.0
231,83372,1,3.0,5.0,2.0
142,80983,1,4.0,5.0,1.0
114,80593,2,4.0,5.0,1.0
1132,261234,1,4.0,5.0,1.0
715,109156,1,4.0,5.0,1.0
896,123022,1,4.0,5.0,1.0
901,123556,2,4.0,5.0,1.0
910,124956,1,4.0,5.0,1.0
874,120614,1,4.0,5.0,1.0


## 4. Validation

**Rationale:** Grouping by review volume and average rating tier yields comparable properties (similar market segment: e.g. high-volume + high-rating = established premium hotels).

**Data check:** If within-group variance of a key metric (e.g. avg_rating) is lower than the overall variance across all hotels, then peers in a group are more similar to each other than to the full set â€” supporting that the grouping identifies comparable properties. Results below.

In [6]:
# Within-group vs across-group variance (validation)
var_overall = peers["avg_rating"].var()
within_var = peers.groupby("peer_group")["avg_rating"].var()
mean_within_var = within_var.mean()
print(f"Variance of avg_rating across all hotels: {var_overall:.4f}")
print(f"Mean variance of avg_rating within each peer group: {mean_within_var:.4f}")
print(f"Within-group variance < overall? {mean_within_var < var_overall}")
print()
print("Variance by peer_group:")
print(within_var.to_string())
print()
print("Conclusion: Peers in the same group have more similar avg_rating (lower within-group variance)")
print("than the full set, so volume + rating tier successfully identifies comparable hotels.")

Variance of avg_rating across all hotels: 0.9071
Mean variance of avg_rating within each peer group: 0.0772
Within-group variance < overall? True

Variance by peer_group:
peer_group
0_0    0.264428
0_1    0.042242
0_2    0.016087
0_3    0.019856
0_4    0.043189

Conclusion: Peers in the same group have more similar avg_rating (lower within-group variance)
than the full set, so volume + rating tier successfully identifies comparable hotels.
