# HRA SMS Outreach Pilot - Stratification Analysis

This analysis simulates a SMS outreach campaign aimed at increasing Health Risk Assessment (HRA) completions among DSNP members.
Stratified scoring was used to identify high-priority members and explore demographic insights that could inform targeted communication strategies.

## Step 1: Load Required Libraries

Python libraries were imported for data manipulation and visualization.
- 'pandas' is used for data wrangling
- 'plotly.express' enables interactive plotting
- 'os' ensures the notebook works regardless of folder nesting

In [None]:
import os
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio

pio.renderers.default = 'notebook_connected'

## Step 2:Load the Cleaned and Scored Dataset

Relative pathing techinque is used so the notebook runs whether it is opened from the main project directory or a Jupyter subfolder.
This data set was generated by the 'sms_scoring.py' script and contains risk scores and eligibility flags.

In [None]:
notebook_dir = os.path.dirname(os.getcwd())
data_path = os.path.join(notebook_dir, 'data', 'sms_outreach_ranked.csv')
df = pd.read_csv(data_path)
eligible = df[df['eligible_for_sms'] == True]


## Step 3: Quick Overview of the Dataset

The shape and structure of the data set is examined, including a few sample records

In [None]:
# print("Dataset shape:", df.shape)
# print("\nSample data:")
# print(df.head())


## Step 4: Bar Chart - Member Count by Risk Tier

This chart shows how many members fall into each risk tier (Low, Medium, High) based on the scoring formula.
Risk tiers help segment the population for more strategic outreach.

In [None]:
eligible_risk_counts = eligible['risk_tier'].value_counts().reset_index()
eligible_risk_counts.columns = ['risk_tier', 'count']

fig_eligible_risk = px.bar(
    eligible_risk_counts,
    x='risk_tier',
    y='count',
    title='SMS Eligible Members by Risk Tier',
    text='count',
    color='risk_tier',
    color_discrete_map={'Low': 'lightgreen', 'Medium': 'orange', 'High': 'crimson'}
)
fig_eligible_risk.update_layout(xaxis_title='Risk Tier', yaxis_title='Eligible Member Count')
fig_eligible_risk.show()


## Step 5: Box Plot - Score Distribution by Risk Tier

This is where the spread and concentrarion of scores within each tier is visualized.
This validates that the scoring system provides separation across segments.

In [None]:
fig_score_box = px.box(
    df,
    x='risk_tier',
    y='score',
    color='risk_tier',
    title='Score Distribution by Risk Tier',
    color_discrete_map={'Low': 'lightgreen', 'Medium': 'orange', 'High': 'crimson'}
)
fig_score_box.update_layout(xaxis_title='Risk Tier', yaxis_title='Score')
fig_score_box.show()


## Step 6: Average Age & Chronic Conditions by Tier

A grouped bar chart compares average age and chronic condition count across the three risk tiers.
These are important indicatoes for tailoring messaged to member needs.

In [None]:
group_stats = df.groupby('risk_tier')[['age', 'chronic_conditions']].mean().round(1).reset_index()
fig_age_chronic = px.bar(
    group_stats.melt(id_vars='risk_tier', value_vars=['age', 'chronic_conditions']),
    x='risk_tier',
    y='value',
    color='variable',
    barmode='group',
    title='Avg Age and Chronic Conditions by Risk Tier'
)
fig_age_chronic.update_layout(xaxis_title='Risk Tier', yaxis_title='Value', legend_title='Metric')
fig_age_chronic.show()


### Comparing Strict vs. Tier-Based SMS Eligibility

This chart compares two ways of identifying members for SMS outreach:

- **Strict Eligibility** follows a detailed rule set: age, chronic conditions, dual-eligibility, past responses, etc.
- **Tier-Based Eligibility** simply targets all members in the **Medium** and **High** risk tiers.

Why it matters:
A broader outreach strategy (tier-based) may reach more at-risk members who still benefit from follow-up, even if they don’t meet every strict criteria. The difference in member counts below shows how expanding outreach can improve care visibility, potentially boost **HRA completion rates**, and help close care gaps that affect **STAR ratings and plan revenue**.

In [None]:
strict_eligible = df[df['eligible_for_sms'] == True]
tier_based_eligible = df[df['risk_tier'].isin(['High', 'Medium'])]

strict_counts = strict_eligible['risk_tier'].value_counts().reindex(['Low', 'Medium', 'High'], fill_value=0)
tier_counts = tier_based_eligible['risk_tier'].value_counts().reindex(['Low', 'Medium', 'High'], fill_value=0)

In [None]:

fig_compare = go.Figure(data=[
    go.Bar(name='Strict Eligibility', x=strict_counts.index, y=strict_counts.values, marker_color='indianred'),
    go.Bar(name='Tier-Based Eligibility', x=tier_counts.index, y=tier_counts.values, marker_color='steelblue')
])

fig_compare.update_layout(
    title='Comparison: Strict vs Tier-Based SMS Eligibility by Risk Tier',
    xaxis_title='Risk Tier',
    yaxis_title='Number of Members',
    barmode='group'
)

fig_compare.update_traces(textposition='outside', texttemplate='%{y}')

fig_compare.show()
               

## Notes:
- Low-risk members appear with missing values because they did not meet the SMS eligibility filters.
- This mirrors real-world scenarios where only certain member groups are targeted based on risk or compliance goals.
- Future enhancements could include message tailoring by zip code, language, or predicted responsiveness to boost HRA completion and STAR performance

## Overview:
This project simulates a targeted outreach strategy to help more high-risk members complete their Health Risk Assessments (HRAs). Using real-world data patterns, members were grouped by health needs, age, and past engagement. The system then ranked them to decide who should be contacted first. The visuals help care teams and strategy leads quickly identify where to focus their outreach.

Improving HRA completion rates directly supports higher STAR ratings — a critical quality measure for Medicare Advantage plans. This can lead to increased CMS bonus payments and enhanced plan reputation. Future updates may include sample text messages or tools that automate how messages are sent and tracked.