# Medi-Cal Enrollment AI Analytics

In this analysis of Medi-Cal enrollment data, we leveraged a suite of Snowflake AI SQL features to rapidly transform raw monthly enrollment figures into clear, actionable insights. We began by aggregating enrollment counts by eligibility group over a six-month window and computing the percentage change for each group. To classify each trend as “Increasing,” “Stable,” or “Decreasing,” we used the AI_CLASSIFY function, which applies a lightweight model to label patterns in the data for easy interpretation. Next, we enriched our output with natural-language summaries generated by AI_COMPLETE (using the Claude-3-7-sonnet model) driven by PROMPT templates, providing plain-English explanations of each group’s trajectory. 

For visualization, we turned our classified and summarized results into a horizontal diverging bar chart using Altair: positive percentage changes are shaded in blue, negative in light gray. The chart’s styling makes it easy for stakeholders to spot which eligibility groups are growing or shrinking at a glance. By combining AI-driven classification, summarization, and traditional SQL aggregations with intuitive visual encoding, this pipeline delivers a comprehensive, end-to-end view of enrollment trends.

In [None]:
# Import python packages
import pandas as pd
import altair as alt

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session

session = get_active_session()

# AI_CLASSIFY – Trend Tagging and AI_COMPLETE – Summary Generation

This notebook cell performs two AI-powered tasks on Medi-Cal enrollment data:

1. **Data Aggregation**  
   - Collects each eligibility group’s last six months of `NUMBER_OF_ENROLLEES` into a comma-delimited string (`ENROLLEE_COUNTS`).  
   - Records the first and last reporting dates (`RP_FIRST`, `RP_LAST`).

2. **Trend Classification**  
   - Uses **AI_CLASSIFY** with a fixed label set:
     - **Increasing** – enrollee counts are rising  
     - **Stable**     – enrollee counts show little or no change  
     - **Decreasing** – enrollee counts are falling  
   - Applies a **PROMPT** template to the `ENROLLEE_COUNTS` string and returns the most appropriate trend label for each group.

3. **Natural-Language Summary**  
   - Uses **AI_COMPLETE** (model `claude-3-7-sonnet`) with a **PROMPT** to generate a single plain-English summary per group.  
   - The summary includes the first and last values and the calculated percentage change over the period.

The result is a table `ENROLLMENT_TRENDS` containing:
- The original `ELIGIBILITY_GROUP`  
- `TREND_CLASSIFICATION` from **AI_CLASSIFY**  
- `SUMMARY` from **AI_COMPLETE**  
- Reporting Periods (`RP_FIRST`, `RP_LAST`) and the raw `ENROLLEE_COUNTS` for reference  

In [None]:
CREATE OR REPLACE TABLE ENROLLMENT_TRENDS AS
(
  WITH ENROLLMENT_DATA AS (
    SELECT
      ELIGIBILITY_GROUP,
      DATEADD(month, -5, MAX(REPORTING_PERIOD)) AS RP_FIRST,
      MAX(REPORTING_PERIOD)                           AS RP_LAST,
      LISTAGG(NUMBER_OF_ENROLLEES, ',') 
        WITHIN GROUP (ORDER BY REPORTING_PERIOD)      AS ENROLLEE_COUNTS,
      -- Build the raw prompt string for classification
      PROMPT(
        'The enrollee counts for {0} over the past six months between {1} and {2} are: {3}. '
        || 'Label the trend as Increasing, Stable, or Decreasing.',
        ELIGIBILITY_GROUP,
        DATEADD(month, -5, MAX(REPORTING_PERIOD)),
        MAX(REPORTING_PERIOD),
        LISTAGG(NUMBER_OF_ENROLLEES, ',') WITHIN GROUP (ORDER BY REPORTING_PERIOD)
      )                                              AS TREND_PROMPT
    FROM MEDI_CAL_ELIGIBILITY.ELIGIBILITY.MEDI_CAL_ENROLLMENT  
    WHERE REPORTING_PERIOD BETWEEN
      DATEADD(month, -5, (SELECT MAX(REPORTING_PERIOD) FROM MEDI_CAL_ELIGIBILITY.ELIGIBILITY.MEDI_CAL_ENROLLMENT))
      AND (SELECT MAX(REPORTING_PERIOD) FROM MEDI_CAL_ELIGIBILITY.ELIGIBILITY.MEDI_CAL_ENROLLMENT)
    GROUP BY ELIGIBILITY_GROUP
  )

  SELECT
    ELIGIBILITY_GROUP,

    -- Generate a plain-English summary of first/last counts and percent change
    AI_COMPLETE(
      'claude-3-7-sonnet', 
      PROMPT(
        'Summarize the trend for {0} over the past six months: first count = {1}, '
        || 'last count = {2}, and percent change in a single brief statement.',
        ELIGIBILITY_GROUP,
        RP_FIRST,
        RP_LAST
      )
    )                                             AS SUMMARY,

    -- Classify the trend label using the TREND_PROMPT
    AI_CLASSIFY(
      TREND_PROMPT,
      [
        { 'label': 'Increasing', 'description': 'The enrollee counts are increasing.' },
        { 'label': 'Stable',     'description': 'The enrollee counts are stable.' },
        { 'label': 'Decreasing', 'description': 'The enrollee counts are decreasing.' }
      ],
      {
        'task_description': 'Determine whether the given enrollee counts string indicates an increasing, stable, or decreasing trend.',
        'output_mode':      'single'
      }
    )                                             AS TREND_CLASSIFICATION,

    RP_FIRST,
    RP_LAST,
    ENROLLEE_COUNTS

  FROM ENROLLMENT_DATA
);


In [None]:
-- Show trends   
SELECT ELIGIBILITY_GROUP, SUMMARY, TREND_CLASSIFICATION:labels[0]::STRING AS TREND  FROM ENROLLMENT_TRENDS;

## Enrollment Trend Visualization

This cell transforms the precomputed `ENROLLMENT_TRENDS` table into an interactive, horizontal diverging bar chart that highlights percentage changes in Medi-Cal enrollment by eligibility group over a six-month period.

1. **Data Retrieval**  
   - Reads `ELIGIBILITY_GROUP`, `ENROLLEE_COUNTS`, `RP_FIRST`, and `RP_LAST` from the `ENROLLMENT_TRENDS` table in Snowflake.

2. **Parsing & Cleanup**  
   - Splits the comma-delimited `ENROLLEE_COUNTS` into the first and last monthly values.  
   - Converts these values to integers and drops any malformed rows.

3. **Calculations**  
   - Computes the percentage change between the first and last counts.  
   - Formats the change as a string (e.g. `+12.34%`) for easy reading.

4. **Date Range Extraction**  
   - Parses `RP_FIRST` and `RP_LAST` into Python `date` objects.  
   - Determines the overall reporting window for use in chart titles.

5. **Sorting**  
   - Orders the DataFrame by the numeric percentage change to prepare for the diverging layout.

6. **Visualization**  
   - Uses Altair to render a horizontal bar chart where:
     - Bars to the right (positive change) are colored blue.  
     - Bars to the left (negative change) are colored gray.  
   - The x-axis title dynamically displays the date range.  
   - The y-axis lists eligibility groups.

7. **Outcome**  
   - Provides a clear, at-a-glance view of which eligibility groups experienced the largest increases or decreases in enrollment, aiding data-driven decision making.  


In [None]:
# 1. Fetch data including RP_FIRST and RP_LAST
df = session.sql("""
  SELECT ELIGIBILITY_GROUP, ENROLLEE_COUNTS, RP_FIRST, RP_LAST
  FROM ENROLLMENT_TRENDS
""").to_pandas()

# 2. Vectorized parse of ENROLLEE_COUNTS into first and last counts
counts_df = df['ENROLLEE_COUNTS'].str.split(',', expand=True)
df['first_count'] = counts_df.iloc[:, 0].astype(int, errors='raise')
df['last_count']  = counts_df.iloc[:, -1].astype(int, errors='raise')

# 3. Compute percentage change (numeric and formatted)
df['pct_change_num'] = (df['last_count'] - df['first_count']) / df['first_count'] * 100
df['pct_change']     = df['pct_change_num'].map("{:+.2f}%".format)

# 4. Parse dates and determine the overall range
df['RP_FIRST'] = pd.to_datetime(df['RP_FIRST']).dt.date
df['RP_LAST']  = pd.to_datetime(df['RP_LAST']).dt.date
start_date = df['RP_FIRST'].min().isoformat()
end_date   = df['RP_LAST'].max().isoformat()

# 5. Sort for diverging order
df = df.sort_values('pct_change_num')

# 6. Display DataFrame for verification (Snowflake notebook will render it automatically)
df

# 7. Create horizontal diverging bar chart with dynamic x-axis title
x_title = f'Percentage Change ({start_date} to {end_date})'

chart = alt.Chart(df).mark_bar().encode(
    x=alt.X('pct_change_num:Q', title=x_title),
    y=alt.Y('ELIGIBILITY_GROUP:N', sort=list(df['ELIGIBILITY_GROUP']), title='Eligibility Group'),
    color=alt.condition(
        alt.datum.pct_change_num > 0,
        alt.value('steelblue'),
        alt.value('lightgray')
    )
).properties(
    width=600,
    height=300,
    title='Enrollment Percentage Change'
)

chart


## Enrollment Trend & Share Analysis

This notebook cell transforms the precomputed `ENROLLMENT_TRENDS` table into two interactive pie charts:

1. **Share of Total Enrollees by Group (Last Reporting Period)**  
   - **Data Loading:** Retrieves `ELIGIBILITY_GROUP`, the comma-delimited `ENROLLEE_COUNTS`, and `RP_LAST` (last reporting date) from Snowflake into a pandas DataFrame.  
   - **Vectorized Parsing:** Splits each `ENROLLEE_COUNTS` string on commas and casts to integers, extracting the **first** and **last** values as `first_count` and `last_count`.  
   - **Share Calculation:** Computes each group’s `share_last = last_count / sum(last_count)` in pandas.  
   - **Dynamic Title:** Derives the month-year label (e.g. “Dec 2024”) from `RP_LAST.max()` for the chart title.  
   - **Pie Chart:** Uses `mark_arc()` to display each group’s share of total enrollees in the final month, with tooltips showing raw `last_count` and percentage share.

2. **Distribution of Eligibility Groups by Precomputed Trend Classification**  
   - **Data Loading:** Pulls `TREND_CLASSIFICATION:labels[0]` directly from Snowflake as `trend`—no Python-side recalculation.  
   - **Aggregation:** Builds `df_trend` by counting how many groups fall into each `trend` category and computing `share_trend = count / total_groups`.  
   - **Pie Chart:** Uses `mark_arc()` to show the relative proportion of groups in each trend category, with tooltips for the trend label, raw group count, and percentage share.

### Key Efficiency Improvements
- **Fetch trends precomputed** in the data warehouse—no redundant Python logic for percent-change or thresholding.  
- **Fully vectorized parsing** of comma-delimited counts—no per-row loops or `apply`.  
- **Independent color scales & legends** ensure each pie only shows the categories it actually uses.  
- **Dynamic chart titles** automatically reflect the actual reporting period.  
- **Side-by-side rendering** for an at-a-glance comparison of enrollment distribution vs. trend breakdown.


In [None]:

# 1. Pull data (with pre-computed trend) and lowercase cols
df = (
    session
      .sql("""
        SELECT
          ELIGIBILITY_GROUP,
          ENROLLEE_COUNTS,
          RP_LAST,
          TREND_CLASSIFICATION:labels[0]::STRING AS trend
        FROM ENROLLMENT_TRENDS
      """)
      .to_pandas()
)
df.columns = df.columns.str.lower()

# 2. Split counts → first_count, last_count
counts = df['enrollee_counts'].str.split(',', expand=True).astype(int)
df['first_count'] = counts.iloc[:, 0]
df['last_count']  = counts.iloc[:, -1]
df = df.dropna(subset=['first_count', 'last_count'])

# 3a) Compute share of total for the last period
df['share_last'] = df['last_count'] / df['last_count'].sum()

# 3b) Build trend-summary table
df_trend = (
    df['trend']
      .value_counts()
      .rename_axis('trend')
      .reset_index(name='count')
)
df_trend['share_trend'] = df_trend['count'] / df_trend['count'].sum()

# 4. Pie 1: Enrollment by Eligibility Group
dec_label = pd.to_datetime(df['rp_last']).dt.strftime('%b %Y').max()
pie1 = alt.Chart(df).mark_arc().encode(
    theta='share_last:Q',
    color=alt.Color(
        'eligibility_group:N',
        title='Eligibility Group',
        legend=alt.Legend(orient='left')
    ),
    tooltip=[
        alt.Tooltip('eligibility_group:N', title='Group'),
        alt.Tooltip('last_count:Q',        title='Count',       format=','),
        alt.Tooltip('share_last:Q',        title='Pct of Total', format='.1%')
    ]
).properties(
    title=f'Enrollment by Eligibility Group ({dec_label})',
    width=350, height=350
)

# 5. Pie 2: Distribution of Pre-computed Trends
pie2 = alt.Chart(df_trend).mark_arc().encode(
    theta='share_trend:Q',
    color=alt.Color(
        'trend:N',
        title='Trend',
        legend=alt.Legend(orient='right')
    ),
    tooltip=[
        alt.Tooltip('trend:N',       title='Trend'),
        alt.Tooltip('count:Q',       title='# of Groups'),
        alt.Tooltip('share_trend:Q', title='Pct of Groups', format='.1%')
    ]
).properties(
    title='Count of Eligibility Groups by Trend',
    width=300, height=300
)

# 6. Concatenate **and** break both the scale _and_ the legend
chart = (
    pie1 | pie2
).resolve_scale(
    color='independent'
).resolve_legend(
    color='independent'
)

chart

