
# SMS Outreach Pilot — Technical Notebook

**Author:** Lujean El‑Hadri  
**Purpose:** Document the data preparation, scoring approach, and visuals that power the **SMS Outreach Pilot** dashboard.  


---

## Business Goal

Increase **HRA completions** in a D‑SNP/Medicare population by:
- Prioritizing outreach to members most likely to convert and/or with greatest need,
- Using **SMS** as the primary channel when consent is present,
- Providing a clear **BI dashboard** for real‑time monitoring and triage.

**Key Questions this analysis supports:**
1. How are members distributed across **outreach_priority** levels?
2. Among those with and without **text_opt_in**, where are the biggest pockets of opportunity?
3. Where do we see higher **ER utilization** (proxy for risk) that may require white‑glove outreach?
4. Which members should be **top targets** today?

> This notebook powers the dashboard and serves as a reproducible, transparent reference.



## Data Dictionary (columns expected)

- `member_id` — unique identifier for a member  
- `outreach_score` — numeric score (higher = higher priority)  
- `outreach_priority` — **Low / Medium / High** (derived from score thresholds)  
- `hra_overdue_days` — days overdue for HRA  
- `text_opt_in` — 1 if member gave SMS consent, else 0  
- `er_visit_count` — recent ER visits count (or proxy measure)  
- `chronic_conditions` — count of chronic conditions

> If some columns are missing, the notebook will still run and skip related outputs.


In [6]:
import pandas as pd
import numpy as np
import plotly.express as px

from pathlib import Path

DATASET = Path(r"/mnt/data/sms_outreach_ranked.csv")
assert DATASET.exists(), f"Dataset not found: {DATASET}"

pd.options.display.float_format = lambda x: f"{x:,.2f}"


AssertionError: Dataset not found: \mnt\data\sms_outreach_ranked.csv

In [None]:
df = pd.read_csv(DATASET)
df.columns = [c.strip().lower() for c in df.columns]

print("Rows:", len(df))
df.head(10)


FileNotFoundError: [Errno 2] No such file or directory: '\\mnt\\data\\sms_outreach_ranked.csv'

In [None]:
summary = pd.DataFrame({
    "column": df.columns,
    "dtype": [df[c].dtype for c in df.columns],
    "non_null": [df[c].notna().sum() for c in df.columns],
    "nulls": [df[c].isna().sum() for c in df.columns]
})
summary


NameError: name 'df' is not defined


## Scoring & Priority Buckets (reproducible rules)

If `outreach_priority` is **missing**, we derive it from `outreach_score` using thresholds:
- **High:** ≥ 60  
- **Medium:** 50–59.99  
- **Low:** < 50

> These thresholds are configurable and should be tuned to business goals and capacity.


In [7]:
if "outreach_priority" not in df.columns:
    if "outreach_score" in df.columns:
        def bucket(score):
            try:
                if score >= 60: return "High"
                if score >= 50: return "Medium"
                return "Low"
            except Exception:
                return np.nan
        df["outreach_priority"] = df["outreach_score"].apply(bucket)
    else:
        df["outreach_priority"] = np.nan

if "text_opt_in" in df.columns:
    df["text_opt_in"] = df["text_opt_in"].fillna(0).astype(int)


NameError: name 'df' is not defined


## Overview Visuals (used in the BI dashboard)

### 1) Priority Distribution (donut)


In [8]:
if "outreach_priority" in df.columns:
    pr = df["outreach_priority"].value_counts(dropna=False).rename_axis("priority").reset_index(name="count")
    fig = px.pie(pr, values="count", names="priority", hole=0.55,
                 title="Outreach Priority Distribution")
    fig.update_traces(textposition="inside", textinfo="percent+label")
    fig.show()
else:
    print("Column 'outreach_priority' not found; skipping donut chart.")


NameError: name 'df' is not defined


### 2) Text Opt-In by Priority (stacked bars)


In [9]:

if set(["outreach_priority","text_opt_in"]).issubset(df.columns):
    grp = (df.groupby(["outreach_priority","text_opt_in"])["member_id"].count()
             .reset_index(name="count"))
    totals = grp.groupby("outreach_priority")["count"].transform("sum")
    grp["pct"] = grp["count"] / totals * 100

    fig = px.bar(grp.sort_values("outreach_priority"),
                 x="pct", y="outreach_priority",
                 color="text_opt_in", orientation="h",
                 title="Text Opt-In by Priority (share)",
                 labels={"pct":"% of members","outreach_priority":"Priority","text_opt_in":"Opt-in"})
    fig.update_layout(barmode="stack", xaxis_tickformat=".1f")
    fig.show()
else:
    print("Required columns not found; skipping stacked bars.")


NameError: name 'df' is not defined


### 3) ER Utilization by Priority (bar)


In [10]:

if set(["outreach_priority","er_visit_count"]).issubset(df.columns):
    er = (df.groupby("outreach_priority")["er_visit_count"].mean()
            .reset_index(name="avg_er_visits"))
    fig = px.bar(er.sort_values("outreach_priority"),
                 x="outreach_priority", y="avg_er_visits",
                 title="Average ER Visits by Priority",
                 labels={"outreach_priority":"Priority","avg_er_visits":"Average ER visits"})
    fig.show()
else:
    print("Required columns not found; skipping ER chart.")


NameError: name 'df' is not defined


## Top Outreach Targets


In [11]:

cols = [c for c in ["member_id","outreach_score","outreach_priority","hra_overdue_days","text_opt_in","er_visit_count","chronic_conditions"] if c in df.columns]
sdf = df.copy()
if "outreach_score" in sdf.columns:
    sdf = sdf.sort_values(["outreach_score","hra_overdue_days"] if "hra_overdue_days" in sdf.columns else ["outreach_score"], ascending=False)
top = sdf[cols].head(25) if cols else sdf.head(25)
top.reset_index(drop=True, inplace=True)
top


NameError: name 'df' is not defined


## Summary & Next Steps

- Priority bands align with capacity and expected conversion pockets.  
- Opt‑in concentration in higher priority tiers makes SMS effective.  
- Elevated ER utilization at higher priority indicates opportunities for concierge outreach.  
- Next: tune thresholds, test SMS templates/cadence, and iterate with weekly refresh.
