# Abstract
In ≤300 words:  
This study quantifies how the recent surge in GLP-1 use for weight loss affects Medicare support for drug costs, self-reported attitudes and projected resource utilization. …existing content shortened for brevity…

# Introduction
• Research question stated  
• Relevance to payers, providers, policymakers explained.  
• Literature gap paragraph.  

In [None]:
# Imports & configuration
import pandas as pd, numpy as np, seaborn as sns, matplotlib.pyplot as plt
plt.style.use("seaborn-v0_8-darkgrid")
sns.set_context("talk")

## Data Sources
1. KFF public-opinion CSVs (`data-UWjw5.csv`, `data-Gskui.csv`)  
2. CMS Innovation Center milestone archive (proxy for utilisation pressure).  
3. *(optional)* Social-media API (Twitter/X) for sentiment.  

In [None]:
# Load & preview KFF polling data
kff_path = "/Users/marcfridson/Project_Data_All/Data Files/KFF/"
raw_before = pd.read_csv(kff_path+"data-UWjw5.csv")
raw_after  = pd.read_csv(kff_path+"data-Gskui.csv")
display(raw_before.head(3))
display(raw_after.head(3))

In [None]:
# --- Data Wrangling ----------------------------------------------------------
def tidy_kff(df: pd.DataFrame, suffix: str):
    """Clean html tags, melt percentage columns into tidy long format."""
    df = df.copy()
    df.columns = df.columns.str.replace(r'[^\w\s]+', '', regex=True)
    df.iloc[:,0] = (df.iloc[:,0]
                    .str.replace(r'<.*?>', '', regex=True)
                    .str.strip()
                    .replace({'':np.nan})).ffill()
    long = (df
            .rename(columns={df.columns[0]:"Group"})
            .melt(id_vars="Group", var_name="Response", value_name="Pct")
            .assign(Phase=suffix))
    return long

before = tidy_kff(raw_before, "Before")
after  = tidy_kff(raw_after,  "After")
kff = pd.concat([before, after])
display(kff.head())

In [None]:
# Compute change in support after arguments
delta = (kff.pivot_table(index=["Group","Response"],
                         columns="Phase",
                         values="Pct")
               .assign(Delta=lambda d: d["After"]-d["Before"])
               .reset_index())
delta.head()

In [None]:
# --- EDA Visuals -------------------------------------------------------------
palette = {"Yes Medicare should cover the cost":"#4c72b0",
           "No Medicare should not cover the cost":"#c44e52"}
plt.figure(figsize=(10,6))
sns.barplot(data=delta.query("Response=='Yes Medicare should cover the cost'"),
            y="Group", x="Delta", palette="Blues_d")
plt.title("Shift in Public Support for Medicare Coverage of GLP-1s\n(After hearing pro/contra arguments)")
plt.xlabel("Δ Percentage Points")
plt.ylabel("")
plt.tight_layout()
plt.show()

### Interim Findings
Describe what segments show the greatest opinion swing, link to potential uptake of weight-loss coverage.  

In [None]:
# --- Utilisation Proxy using CMS Milestones file ----------------------------
cms_path = "/Users/marcfridson/Project_Data_All/Data Files/Innovation Center Milestones and Updates/2025-05-14/"
cms = pd.read_csv(cms_path+"Milestones and Updates-Upload-File-RCHD-05-14-2025.csv")
glp1_related = cms[cms["Model Name (ID's which detail page to populate)"]
                   .str.contains("Value-Based Insurance Design|Enhanced Medication Therapy Management",
                                 na=False)]
glp1_related["Date"] = pd.to_datetime(glp1_related["Date"])
monthly = glp1_related.groupby(pd.Grouper(key="Date", freq="M")).size()
monthly.plot(kind="line", figsize=(10,4), title="CMS Milestone Frequency (GLP-1 relevant programs)")
plt.ylabel("Count")
plt.show()

### Social-Media Sentiment (Skeleton)
*Uncomment and add API keys to run.*

```python
# import snscrape.modules.twitter as sntwitter
# import nltk, re, textblob
# tweets = [t.content for t in sntwitter.TwitterSearchScraper("Ozempic Wegovy since:2023-01-01").get_items()]
# # basic polarity scoring …
```

In [None]:
# --- Simple outcome simulation example --------------------------------------
# Suppose ↑support converts to ↑utilisation; simulate resource impact.
support_growth = delta.loc[delta.Response.str.startswith("Yes"),"Delta"].mean()
baseline_cost = 10600  # hypothetical annual GLP-1 drug cost
patients = 1_000_000   # hypothetical Medicare eligibles
added_cost = support_growth/100 * patients * baseline_cost
print(f"Projected added Medicare spend: ${added_cost/1e9:,.2f} B")

# Conclusions
Summarise insights, acknowledge limitations (small public-opinion sample, lack of claims data), suggest policy actions.  

## References
KFF (2024). Public opinion polling dataset on GLP-1 coverage…  
CMS Innovation Center Milestone Archive (accessed 2025-07-10)…  

## Data-Collection Checklist (raw-data sources only)

| # | Source | What to Grab | Where to Store | How |
|---|--------|--------------|----------------|-----|
| 1 | **KFF Health-Tracking Poll** | `data-UWjw5.csv`, `data-Gskui.csv` | `/Data Files/KFF/` | Manual download from KFF → save CSVs. |
| 2 | **CMS Innovation Milestones** | `Milestones and Updates-Upload-File-RCHD-05-14-2025.csv` | `/Data Files/Innovation Center Milestones and Updates/2025-05-14/` | Click “Export CSV” on CMS page. |
| 3 | **FDA FAERS** | Q1-2025 ASCII zip + schema | `/GLP1_Data/FDA/` | Download from FAERS dashboard → unzip. |
| 4 | **Reddit User Experiences** | JSON from ≥10 threads | `/GLP1_Data/Reddit/` | `snscrape reddit-submission … > reddit.json` |
| 5 | **Twitter/X API** | Tweets JSON (keyword: “Ozempic OR Wegovy”) | `/GLP1_Data/Twitter/` | Use snscrape or official API with bearer token. |
| 6 | **Optional Medicare Claims (LDS Part D)** | Annual Part D claims CSV/Parquet | `/Data Files/Claims/` | Request via CMS VRDC → decrypt & copy locally. |

### General procedure  
1. Create the listed sub-folders if they do not exist.  
2. Follow “What to Grab” for each row; keep original filenames when possible.  
3. Version-control small raw files; large claims/FAERS archives can be kept out of Git.  
4. After collection, run the notebook; code cells assume the structure above.