# 03. CoffeeKing Consulting Deliverables

**Goal:** Turn the SQL + analysis work into *actionable, consulting-style deliverables* for a hypothetical start-up brand called **CoffeeKing**.

**What this notebook produces**
- A shortlist of **launch targets** (city x concept) and **benchmark businesses**
- Saved output tables in SQLite (so results are reusable)

**Data source**
- Yelp Academic Dataset (business + review) loaded into `coffeeking.db`
- Tables used: `coffee_business`, `coffee_business_enriched`, `coffee_reviews`

In [1]:
import sqlite3
import pandas as pd
from pathlib import Path

# Set Path
DB_PATH = Path("../db/coffeeking.db")

# Connect
conn = sqlite3.connect(DB_PATH)
print("DB connected:", DB_PATH.resolve())

# List tables 
tables = pd.read_sql_query("""
SELECT name
FROM sqlite_master
WHERE type = 'table'
ORDER BY name;
""", conn)

tables

DB connected: /Users/kwaknakyung/projects/Coffeeking-Yelp/db/coffeeking.db


Unnamed: 0,name
0,coffee_analysis
1,coffee_business
2,coffee_business_enriched
3,coffee_reviews
4,coffeeking_reco_v1


In [3]:
def show_schema(table_name):
    return pd.read_sql_query(f"PRAGMA table_info({table_name});", conn)

# coffee_analysis schema check
schema_analysis = show_schema("coffee_analysis")
schema_analysis

Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,business_id,TEXT,0,,0
1,1,name,TEXT,0,,0
2,2,address,TEXT,0,,0
3,3,city,TEXT,0,,0
4,4,state,TEXT,0,,0
5,5,postal_code,TEXT,0,,0
6,6,latitude,REAL,0,,0
7,7,longitude,REAL,0,,0
8,8,stars,REAL,0,,0
9,9,review_count,INT,0,,0


In [4]:
# coffee_business_enriched schma check
schema_enriched = show_schema("coffee_business_enriched")
schema_enriched

Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,business_id,TEXT,0,,0
1,1,name,TEXT,0,,0
2,2,address,TEXT,0,,0
3,3,city,TEXT,0,,0
4,4,state,TEXT,0,,0
5,5,postal_code,TEXT,0,,0
6,6,latitude,REAL,0,,0
7,7,longitude,REAL,0,,0
8,8,stars,REAL,0,,0
9,9,review_count,INTEGER,0,,0


In [6]:
import pandas as pd

# 1) drop old analysis table (if exists)
conn.execute("DROP TABLE IF EXISTS coffee_analysis;")
conn.commit()

# 2) create analysis table: one row per business with review-based metrics
create_q = """
CREATE TABLE coffee_analysis AS
WITH review_agg AS (
  SELECT
    business_id,
    COUNT(*) AS n_reviews_loaded,
    AVG(stars) AS avg_review_stars
  FROM coffee_reviews
  GROUP BY business_id
)
SELECT
  b.business_id,
  b.name,
  b.state,
  b.city,
  b.address,
  b.postal_code,
  b.latitude,
  b.longitude,
  b.concept_group,
  b.has_food,
  b.has_alcohol,
  b.has_market,
  b.stars AS business_stars,
  b.review_count AS business_review_count,
  ra.n_reviews_loaded,
  ra.avg_review_stars,
  (LOG(1 + ra.n_reviews_loaded) * ra.avg_review_stars) AS visibility_score
FROM coffee_business_enriched b
JOIN review_agg ra
  ON b.business_id = ra.business_id;
"""
conn.execute(create_q)
conn.commit()

# 3) quick sanity check: schema + sample
schema = pd.read_sql_query("PRAGMA table_info(coffee_analysis);", conn)
sample = pd.read_sql_query("""
SELECT business_id, state, city, concept_group, n_reviews_loaded, avg_review_stars, visibility_score
FROM coffee_analysis
ORDER BY visibility_score DESC
LIMIT 5;
""", conn)

schema, sample


(    cid                   name  type  notnull dflt_value  pk
 0     0            business_id  TEXT        0       None   0
 1     1                   name  TEXT        0       None   0
 2     2                  state  TEXT        0       None   0
 3     3                   city  TEXT        0       None   0
 4     4                address  TEXT        0       None   0
 5     5            postal_code  TEXT        0       None   0
 6     6               latitude  REAL        0       None   0
 7     7              longitude  REAL        0       None   0
 8     8          concept_group  TEXT        0       None   0
 9     9               has_food   INT        0       None   0
 10   10            has_alcohol   INT        0       None   0
 11   11             has_market   INT        0       None   0
 12   12         business_stars  REAL        0       None   0
 13   13  business_review_count   INT        0       None   0
 14   14       n_reviews_loaded              0       None   0
 15   15

In [7]:
MIN_CITY_N = 80 

top_cities = pd.read_sql_query(f"""
WITH scored AS(
    SELECT
        state,
        city,
        visibility_score
    FROM coffee_analysis
),
scored_with_decile AS(
    SELECT
        *,
        NTILE(10) OVER (ORDER BY visibility_score DESC) AS decile
    FROM scored
),
city AS (
    SELECT
        state,
        city,
        COUNT(*) AS n_business,
        SUM(CASE WHEN decile = 1 THEN 1 ELSE 0 END) AS n_top10pct,
        ROUND(1.0 * SUM(CASE WHEN decile = 1 THEN 1 ELSE 0 END) / COUNT(*), 3) AS city_winner_rate,
        ROUND(AVG(visibility_score), 2) AS avg_visibility
    FROM scored_with_decile
    GROUP BY state,city
)
SELECT *
FROM city
WHERE n_business >= {MIN_CITY_N}
ORDER BY city_winner_rate DESC, avg_visibility DESC
LIMIT 10;
""", conn)

top_cities

Unnamed: 0,state,city,n_business,n_top10pct,city_winner_rate,avg_visibility
0,CA,Santa Barbara,160,44,0.275,7.47
1,LA,New Orleans,422,95,0.225,6.99
2,MO,Saint Louis,197,41,0.208,6.58
3,FL,Saint Petersburg,90,16,0.178,6.63
4,NV,Reno,265,41,0.155,6.24
5,IN,Indianapolis,375,56,0.149,5.84
6,FL,Clearwater,110,16,0.145,5.54
7,TN,Nashville,394,56,0.142,6.24
8,PA,Philadelphia,1066,147,0.138,5.89
9,AZ,Tucson,395,53,0.134,5.87


### Deliverable 1 - Where the market is strongest (City "Winner Rate")

To identify cities where highly visible coffee concepts are most likely to emerge, I define **winners** as businesses in the **top 10%** of the **Visibility Score** and compute a **city winner rate**:

- **Visibility Score** = log(1 + n_reviews_loaded) x avg_review_stars
- **City winner rate** = (# of winners in the city) / (total businesses in the city)

To reduce noise from very small markets, I only rank cities with **at least 80 businesses** in the dataset. The table above summarizes the **Top 10 cities by winner rate**, along with sample size and average visibility.

In [10]:
winner_city_pairs = [(row["state"], row["city"]) for _, row in top_cities.iterrows()]

city_filter_sql = ", ".join([f"('{s}','{c}')" for s, c in winner_city_pairs])

print(city_filter_sql)


('CA','Santa Barbara'), ('LA','New Orleans'), ('MO','Saint Louis'), ('FL','Saint Petersburg'), ('NV','Reno'), ('IN','Indianapolis'), ('FL','Clearwater'), ('TN','Nashville'), ('PA','Philadelphia'), ('AZ','Tucson')


In [11]:
city_concept_compare = pd.read_sql_query(f"""
WITH scored AS (
  SELECT
    business_id,
    state,
    city,
    concept_group,
    visibility_score
  FROM coffee_analysis
),
scored_with_decile AS (
  SELECT
    *,
    NTILE(10) OVER (ORDER BY visibility_score DESC) AS decile
  FROM scored
),
combo AS (
  SELECT
    state,
    city,
    concept_group,
    COUNT(*) AS n_business,
    SUM(CASE WHEN decile = 1 THEN 1 ELSE 0 END) AS n_top10pct,
    ROUND(1.0 * SUM(CASE WHEN decile = 1 THEN 1 ELSE 0 END) / COUNT(*), 3) AS winner_rate,
    ROUND(AVG(visibility_score), 2) AS avg_visibility
  FROM scored_with_decile
  GROUP BY state, city, concept_group
)
SELECT *
FROM combo
WHERE (state, city) IN ({city_filter_sql})
  AND concept_group IN ('coffee+alcohol', 'market/retail')
ORDER BY state, city, winner_rate DESC, avg_visibility DESC;
""", conn)

city_concept_compare


Unnamed: 0,state,city,concept_group,n_business,n_top10pct,winner_rate,avg_visibility
0,AZ,Tucson,coffee+alcohol,62,10,0.161,7.27
1,AZ,Tucson,market/retail,22,3,0.136,5.52
2,CA,Santa Barbara,coffee+alcohol,27,11,0.407,8.99
3,CA,Santa Barbara,market/retail,17,6,0.353,7.67
4,FL,Clearwater,market/retail,3,1,0.333,5.96
5,FL,Clearwater,coffee+alcohol,13,3,0.231,6.32
6,FL,Saint Petersburg,coffee+alcohol,19,5,0.263,7.77
7,FL,Saint Petersburg,market/retail,10,2,0.2,5.94
8,IN,Indianapolis,coffee+alcohol,41,12,0.293,7.9
9,IN,Indianapolis,market/retail,27,4,0.148,6.34


In [12]:
import numpy as np

df = city_concept_compare.copy()

# small sample warning
df["sample_flag"] = np.where(df["n_business"] < 25, "SMALL N (treat as signal)", "")

df["rank_in_city"] = df.groupby(["state","city"])["winner_rate"].rank(ascending=False, method="first")

reco_city = (
    df[df["rank_in_city"] == 1]
    .loc[:, ["state","city","concept_group","winner_rate","avg_visibility","n_business","n_top10pct","sample_flag"]]
    .rename(columns={
        "concept_group": "recommended_concept",
        "winner_rate": "recommended_winner_rate",
        "avg_visibility": "recommended_avg_visibility",
        "n_business": "recommended_n_business",
        "n_top10pct": "recommended_n_top10pct"
    })
    .sort_values(["recommended_winner_rate","recommended_avg_visibility"], ascending=False)
)

reco_city


Unnamed: 0,state,city,recommended_concept,recommended_winner_rate,recommended_avg_visibility,recommended_n_business,recommended_n_top10pct,sample_flag
2,CA,Santa Barbara,coffee+alcohol,0.407,8.99,27,11,
4,FL,Clearwater,market/retail,0.333,5.96,3,1,SMALL N (treat as signal)
10,LA,New Orleans,coffee+alcohol,0.329,7.84,73,24,
12,MO,Saint Louis,coffee+alcohol,0.324,7.88,34,11,
8,IN,Indianapolis,coffee+alcohol,0.293,7.9,41,12,
6,FL,Saint Petersburg,coffee+alcohol,0.263,7.77,19,5,SMALL N (treat as signal)
14,NV,Reno,market/retail,0.231,5.88,26,6,
16,PA,Philadelphia,coffee+alcohol,0.2,6.89,140,28,
18,TN,Nashville,coffee+alcohol,0.194,7.12,67,13,
0,AZ,Tucson,coffee+alcohol,0.161,7.27,62,10,


In [14]:
reco_city.to_sql("coffeeking_reco_v2", conn, if_exists="replace", index=False)

print("Saved table: coffeeking_reco_v2")

# sanity check
pd.read_sql_query("SELECT * FROM coffeeking_reco_v2 LIMIT 10;", conn)


Saved table: coffeeking_reco_v2


Unnamed: 0,state,city,recommended_concept,recommended_winner_rate,recommended_avg_visibility,recommended_n_business,recommended_n_top10pct,sample_flag
0,CA,Santa Barbara,coffee+alcohol,0.407,8.99,27,11,
1,FL,Clearwater,market/retail,0.333,5.96,3,1,SMALL N (treat as signal)
2,LA,New Orleans,coffee+alcohol,0.329,7.84,73,24,
3,MO,Saint Louis,coffee+alcohol,0.324,7.88,34,11,
4,IN,Indianapolis,coffee+alcohol,0.293,7.9,41,12,
5,FL,Saint Petersburg,coffee+alcohol,0.263,7.77,19,5,SMALL N (treat as signal)
6,NV,Reno,market/retail,0.231,5.88,26,6,
7,PA,Philadelphia,coffee+alcohol,0.2,6.89,140,28,
8,TN,Nashville,coffee+alcohol,0.194,7.12,67,13,
9,AZ,Tucson,coffee+alcohol,0.161,7.27,62,10,


# Executive Summary (CoffeeKing Launch Recommendation)

We built a simple "Recommendation Engine" using Yelp Review behavior to identify where CoffeeKing is most likely to achieve fast discoverability and strong customer sentiment.

### Core Metric (Visibility Score):

Visibility = log(1+ review_volume) x average review stars.

This captures "well-known AND well-liked" businesses while reducing distortion from extreme review-count outliers.

### Winner Definition:

A "winner" is a business in the top 10% of Visibility Score (top decile).

### Decision KPI (Winner Rate):

Winner Rate = (# winners) / (total businesses) within each city x concept segment.
This answers a business question direclty: *"If we open in this market with this concept, how often does that segment produce top performers?*

### Key Decision:

Across the strongest markets, coffee+alcohol is the most consistently recommended concept for maximizing early visiblity -- except Reno (NV) where market/retail performs better (suggesting a "destination market" angle fits that local market).

### Caution on small samples:

Some city segments have limited sample sizes (flagged as "SMALL N"). These should be treated as directional signals, not final conclusions.


In [15]:
reco_v2 = pd.read_sql_query("""
SELECT *
FROM coffeeking_reco_v2
ORDER BY recommended_winner_rate DESC, recommended_avg_visibility DESC;
""", conn)

reco_v2

Unnamed: 0,state,city,recommended_concept,recommended_winner_rate,recommended_avg_visibility,recommended_n_business,recommended_n_top10pct,sample_flag
0,CA,Santa Barbara,coffee+alcohol,0.407,8.99,27,11,
1,FL,Clearwater,market/retail,0.333,5.96,3,1,SMALL N (treat as signal)
2,LA,New Orleans,coffee+alcohol,0.329,7.84,73,24,
3,MO,Saint Louis,coffee+alcohol,0.324,7.88,34,11,
4,IN,Indianapolis,coffee+alcohol,0.293,7.9,41,12,
5,FL,Saint Petersburg,coffee+alcohol,0.263,7.77,19,5,SMALL N (treat as signal)
6,NV,Reno,market/retail,0.231,5.88,26,6,
7,PA,Philadelphia,coffee+alcohol,0.2,6.89,140,28,
8,TN,Nashville,coffee+alcohol,0.194,7.12,67,13,
9,AZ,Tucson,coffee+alcohol,0.161,7.27,62,10,


In [16]:
reco_v2 = reco_v2.copy()
reco_v2["expected_winners_in_segment"] = (reco_v2["recommended_winner_rate"] * reco_v2["recommended_n_business"]).round(1)

reco_v2[[
    "state", "city", "recommended_concept", "recommended_winner_rate", "recommended_n_business",
    "expected_winners_in_segment", "sample_flag"
]]

Unnamed: 0,state,city,recommended_concept,recommended_winner_rate,recommended_n_business,expected_winners_in_segment,sample_flag
0,CA,Santa Barbara,coffee+alcohol,0.407,27,11.0,
1,FL,Clearwater,market/retail,0.333,3,1.0,SMALL N (treat as signal)
2,LA,New Orleans,coffee+alcohol,0.329,73,24.0,
3,MO,Saint Louis,coffee+alcohol,0.324,34,11.0,
4,IN,Indianapolis,coffee+alcohol,0.293,41,12.0,
5,FL,Saint Petersburg,coffee+alcohol,0.263,19,5.0,SMALL N (treat as signal)
6,NV,Reno,market/retail,0.231,26,6.0,
7,PA,Philadelphia,coffee+alcohol,0.2,140,28.0,
8,TN,Nashville,coffee+alcohol,0.194,67,13.0,
9,AZ,Tucson,coffee+alcohol,0.161,62,10.0,


# CoffeeKing Launch Playbook (Pilot -> Learn -> Scale)

**Objective**: maximize early discoverability and strong customer sentiment using Yelp-based market signals. 

## Phase 1 - Pilot (highest "probability of winning" segments)

We prioritize markets where a given concept has a high winner rate (share of businesses in the top 10% Visibility Score). This is a practical "odds of success" metric for first-launch decisions.

### Pilot #1 - Santa Barbara, CA (coffee+alcohhol)

- Recommended concept: coffee+alcohol
- Winner rate: 0.407 (11 winners / 27 businesses)
- Avg visibility (segment): 8.99
- Why this pilot: Highest winner-rate segment in the dataset with strong average visibility, suggesting an "evening/social" coffee positioning is most likely to produce standout performers here.

### Pilot #2 - New Orleans, LA (coffee+alcohol)

- Recommended concept: coffee+alcohol
- Winner rate: 0.329 (24 /73)
- Avg visibility (segment): 7.84
- Why this pilot: Strong win probability with larger sample size, making it.  high-confidence market to validate a repeatable coffee+alcohol play.

### Pilot #3 - Indianapolis, IN (coffee+alcohol)

- Recommended concept: coffee+alcohol
- Winner rate: 0.293 (12 / 41)
- Avg visibility (segment): 7.90
- Why this pilot: Consistent coffee+alcohol strength across a meaningful market size; useful for testing whether the "social coffee" positioning generalizes beyond coastal/tourist-heavy areas. 

## Phase 2 - Alternatie Bet (market-specific concept fit)

Not all markets reward the same positioing. We explicitly include a "counter-strategy" market where a different concept is favored.

**Alternative Bet - Reno, NV (market/retail)**

- Recommended concept: market/retail
- Winner rate: 0.231 (6/26)
- Avg visibility (segment): 5.88
- Why it matters: Reno is a clear exception where market/retail outperforms coffee+alcohol. This suggests a destination-driven retail/market format may be a better fit than a nightlife/social coffee hybrid.

## Phase 3 - Scale Candidates (more winners in absolute terms)

Winner rate measures probability of producing winners, but scaling decisions also care about how many winners exist to learn from and benchmark against. Large markets can produce more winners even with a lower winner rate.

**Scale Candidate - Philadelphia, PA (coffee+alcohol)**

- Winner rate: 0.200 (28 / 140)
- Expected winners in segment: 28
- Why scale here: The win probability is lower than top pilots, but the market produces many winners in absolute count - ideal for. enchmarking, competitor mapping, and selecting differentiated positioning once the core concept is validated.

**Additional scale candidates (concept: coffee+alcohol)**

- Nashville, TN: winner rate: 0.194 (13/67) | expected winners 13
- Tucson, AZ: winner rate 0.161 (10/62) | expected winners 10
- Saint Louis, MO: winner rate 0.324 (11/34) | expected winners 11

These markets provide a balanced mix of solid win probability and enough winners for learning.

### Small-sample caution (treat as signal)

Some segments are flagged SMALL N and should not drive final decisions alone.

- Clearwater, FL (market/retail): winner rate 0.333 (1/3) - signal only
- Saint Petersburg, FL (coffee+alcohol): winner rate 0.263 (5/19) - signal only

Recommendation: use these as "watchlist" markets and validate with additional data (e.g., more comparable cities, operational constraints, or follow-up analysis).

## Final Recommendation (decision-ready)

If CoffeeKing's priority is fast discoverability + strong ratings, launch a coffee+alcohol hybrid in Santa Barbara and New Orleans as the primary pilots. In parallel, run as alternative-format pilot in Reno using market/retail positioing to test market-specific fit. Once the concept is validated, scale into larger markets like Philadelphia where the number of winners supports deeper competitor benchmarking and expansion planning. 

# Pilot KPI Scoreboard (First 60 days)

**Goal:** Validate that the chosen concept-market pairing can generate early traction (demand) without sacrificing quality (ratings).

### KPI 1 - Early Visibility (Demand x Quality)

- Definition: Visibility Score = log(1 + review volume) x avg rating
- Why it matters: Single metric that captures "people are talking about it" and "they like it."

### KPI 2 - Rreview Velocity

- Definition: reviews per week (or per month) post-launch
- Why it matters: Fastest proxy for local awareness + foot traffic

### KPI 3 - Rating Stability

- Definition: average rating + % of 1-2 star reviews
- Why it matters: Prevent "high volume but bad experience" scenarios

### KPI 4 - Concept Fit Signals

- Definition: share of reviews mentioning keywords aligned with the concept
    - coffee+alcohol: "cocktail", "wine", "bar", "happy hour", "night" 
    - market/retail: "market", "shop", "selection", "local goods"
- Why it matters: Confirms customers perceive you the way you intended.

### KPI 5 - Competitive Benchmark Position

- Definition: rank vs top local competitiors on visibility score (or review velocity)
- Why it matters: Ensures CoffeeKing is actually "winning the category" not just performing okay.

# How to Use This (Reusable SQL "Client Toolkit")

###  What this notebook leaves behind
- A clean SQLite database (`coffeeking.db`)
- A scoring table (`coffee_analysis`) with visibility_score + decile 
- A recommendation table (`coffeeking_reco_v2`) that picks the best concept per city (based on winner rate)

### Reusable SQL Snippets (copy/paste)
These are designed so a non-technical stakeholder can rerun the same analysis after updating the database.

In [None]:
# Code: Connect + helper runner

import pandas as pd
import sqlite3
from pathlib import Path

DB_PATH = Path("../db/coffeeking.db")
conn = sqlite3.connect(DB_PATH)
print("DB connected:", DB_PATH.resolve())

def q(sql: str):
    return pd.read_sql_query(sql, conn)


In [None]:
# Code: Pull the "top winners" (overall top 50 businesses)

q("""
SELECT
  state, city, concept_group,
  business_id,
  n_reviews_loaded,
  ROUND(avg_review_stars, 3) AS avg_review_stars,
  ROUND(visibility_score, 2) AS visibility_score,
  decile
FROM coffee_analysis
WHERE decile = 1
ORDER BY visibility_score DESC
LIMIT 50;
""")


In [None]:
# Code: City Leaderboard (probability of "winning")

q("""
WITH city AS (
  SELECT
    state,
    city,
    COUNT(*) AS n_business,
    SUM(CASE WHEN decile = 1 THEN 1 ELSE 0 END) AS n_top10pct,
    ROUND(1.0 * SUM(CASE WHEN decile = 1 THEN 1 ELSE 0 END) / COUNT(*), 3) AS city_winner_rate,
    ROUND(AVG(visibility_score), 2) AS avg_visibility
  FROM coffee_analysis
  GROUP BY state, city
)
SELECT *
FROM city
WHERE n_business >= 80
ORDER BY city_winner_rate DESC, avg_visibility DESC
LIMIT 15;
""")


In [None]:
# Code: Best concept per city (your "Recommendation Engine" output)

q("""
SELECT *
FROM coffeeking_reco_v2
ORDER BY recommended_winner_rate DESC, recommended_avg_visibility DESC;
""")


In [None]:
# Code: Compare concepts inside a chosen city (easy client question)
# Change the city/state at the top

STATE = "CA"
CITY = "Santa Barbara"

q(f"""
WITH combo AS (
  SELECT
    state, city, concept_group,
    COUNT(*) AS n_business,
    SUM(CASE WHEN decile = 1 THEN 1 ELSE 0 END) AS n_top10pct,
    ROUND(1.0 * SUM(CASE WHEN decile = 1 THEN 1 ELSE 0 END) / COUNT(*), 3) AS winner_rate,
    ROUND(AVG(visibility_score), 2) AS avg_visibility
  FROM coffee_analysis
  GROUP BY state, city, concept_group
)
SELECT *
FROM combo
WHERE state = '{STATE}' AND city = '{CITY}'
ORDER BY winner_rate DESC, avg_visibility DESC;
""")


# Launch Playbook - Pilot -> Alt Bet -> Scale

**Objective:** Maximize early discoverability (Visibility Score) by choosing the concept-market pairing that is most over-represented among "winners" (top 10% visibility businesses).

### Pilot markets (highest-confidence bets)

- **Santa Barbara, CA** -> coffee+alcohol *(winner_rate 0.407; n=27; top10=11)*
- **New Orleans, LA** -> coffee+alcohol *(0.329; n=73; top10=24)*
- **Saint Louis, MO** -> coffee+alcohol *(0.324; n=34; top10=11)*
- **Indianapolis, IN** -> coffee+alcohhol *(0.293; n=41; top10=12)*

### Alt-bets (strategic alternatives / local-fit plays)

- **Reno, NV -> market/retail** (0.231; n=26; top10=6)
    - This is the key exception where market/retail outperforms coffee+alcohol, suggesting a stronger retail-destination fit.
- **Philadelphia, PA -> coffee+alcohol** (0.200; n=140; top10=28)
    - Large market: produces many winners in absolute terms, but the probability of "winning" is lower than top pilot cities.

### Small-N signals (treat as directional, not definitive)

- **Clearwater, FL** -> market/retail (0.333; n=3; top10=1)
- **Saint Petersburg, FL** -> coffee+alcohol (0.263; n=19; top10=5)

### Scale plan

1. Pilot in 1-2 high-confidence markets with the recommended cooncept
2. Track KPI scoreboard for 60 days (visibility + review velocity + rating stability)
3. Expand into the next market tier based on repeatable KPI lift

# Risks, Assumptions, and Next Steps

### Risks / Assumptions

- **Concept tagging is rule-based** (keyword buckets). It's explainable and reproducible, but not perfect.
- **Yelp data reflects Yelp behavior** (reviewing patterns differ by city and demographic).
- **Winner rate depends on sample size** - small city x concept segments should be treated as "signals."
- **Visibility Score prioritizes discoverability,** not unit economics (rent, margins, competition costs not included).

### Next Steps

- Add price range ($-$$$$) and hours if available to refine segmentation
- Add *sentiment / topic extraction* from review text (validate "concept fit")
- Add geo clustering (neighborhood-level hotspots within top cities)
- Create a lightweight dashboard (top cities + concepts + competitors list)