## 1. Background

In 2020, over 3.26 million Medicare beneficiaries depend on one or more forms of insulin to manage their medical conditions. The aggregated out-of-pocket cost for insulin alone was \\$1.03 billion. A Medicare beneficiary spends \\$572 out of pocket to cover the cost of insulin annually . Insulin is a life-saving medication and a critical component to medical management for many people. Cost burden increases the gap in insulin access which poses a significant risk on medical complications from unmanaged diabetes, such as amputation or heart attack . Center for Medicare and Medicaid Services (CMS) recognized the severe consequences from insulin inaccessibility. CMS implemented the Part D Senior Savings Model (SSM) on January 1, 2021. SSM offers supplementary insulin coverage which caps the out-of-pocket expense for insulin at \$35 per one-month supply during the deductible, initial coverage, and coverage gap phase. SSM aims to close the insulin access gap and improve medication adherence.

## 2. Objective

The objective of this project is to utilize multiple machine learning algorithms to predict the change in average 30-day insulin consumption among Medicare beneficiaries in response to the SSM.

## 3. Data

The final dataset I use for analysis consists of 10 years of microdata, from 2010-2019, about insulin consumption and expenditure, and comorbid conditions. There are 9,004 observations in this dataset, each observation is a sampled individual. It is important to note that this is a cross-sectional dataset where each observation in different survey year is not the same individual. The samples included in this analysis are limited to those who filled at least one insulin prescription during a given survey year.

The final dataset is comprised of three separate datasets: demand and expenditure, prescription details, and comorbid conditions. All of the dataset are from the Medical Expenditure Panel Survey (MEPS). The following section display examples of each dataset from the year 2010. The datasets are the same format across all the included survey years. However, due to the size of the datasets and the limited computational capability, I only include examples from 2010 to better understand the structure of the data.

### 3.1 Demand and Expenditure

This dataset contains data about insulin demand, out of pocket expenditure, and insurance coverages. This is a hierrachical data where each row of observation is of different unit type, data from one individual span across many rows. Consider the example below, all the row displayed contain data about a sampled individual in survey year 2010, let us called him A:

* Row 0 has a *RECTYPE* P meaning that it contains overall demographic and socioeconomic information of A, including his survey ID, age, age at first diagnosed with diabetes, household income, race, insurance coverage (beneficiaries of Medicare, Medicaid, private insurance, etc.), and other per individual information.
* Row 1 has a *RECTYPE* M meaning that it contains specific information about A's household. The main purpose of this type of data is for survey tracking, not applicable to this analysis.
* Row 2-4 has a *RECTYPE* F meaning that each row contains information per one prescription filled, including what type of medicine was filled, how much was paid by A, how much was paid by each of his insurance, how many day does that prescription fill supplied, etc.

The main purpose of this analysis is to predict demand per individual per 30-day. As such, I aggregated all type F data for any individual to create an aggregate insulin demand data per individual per survey year.

In [None]:
import numpy as np
import pandas as pd

# Example
diabetes10 = pd.read_excel('/Users/parimagphanthong/Library/CloudStorage/OneDrive-WashingtonStateUniversity(email.wsu.edu)/EconS 701 Capstone/diabetes2010.xlsx')

pd.set_option('display.max_columns', None) # set to display all columns

diabetes10.head()

### 3.2 Prescription details

This dataset contains the breakdown of each insulin prescription filled including strength and quantity. Each row is a prescription filled so insulin prescription data for an individual in a survey year spans across many rows. I, again, aggregated prescription quantity for any individual to create an aggregate insulin quantity supplied data per individual per survey year.

In [None]:
# Example
insulin10 = pd.read_excel('/kaggle/input/insulin/insulin2010.xlsx')


insulin10.head()

### 3.3 Cormobid conditions

This dataset contains the cormorbid conditions and other health related conditions. Each row is an individual.