## Matching Datasets

Things that I've learned:

- The Excels match the un-updated PDFs.

- Example: 
    - 1.9 Federal Transitional Reinsurance Program payments expected from HHS (as indicated by HHS as of 6/30). For Aetna Health of FL: 10,355,176.30.
    - Reported in the original PDF: 18628 Aetna Health Inc. (a FL corp.) FL 10,355,176.30 (28,025,200.75) 5,437,975.83
    - Reported in the updated PDF: 18628 Aetna Health Inc. (a FL corp.) FL 10,360,565.46 (28,025,200.75) 5,437,975.83

My goal with this Python notebook is to create final spreadsheets for each year. I imagine the final spreadsheet will be like:

HHOS ID ... MR_SUBMISSION_TEMPLATE_ID ... COMPANY NAME ... REPORTED VALUES ... ACTUAL VALUES

for 2014 and 2015. That way, we can do our time analysis.

In [90]:
import pandas as pd

# for debugging, display everything: 
pd.options.display.max_seq_items = 2000
pd.options.display.max_rows = 4000

In [109]:
from re import sub
from decimal import Decimal

# strip the pdf data to make it parseable

def strip_money_values( column ):
    i = 0
    for money in pdf_data[column]:
        if money == '$-' or money == 'N/E' or \
            money == 'N/A_MA_Issuer' or money == 'N/A_DefaultCharge' or \
            money == 'N/A_Default_Charge':
            pdf_data[column][i] = "N/A"
        else:
            value = Decimal(sub(r'[^\d.]', '', money))
            if '(' in money:
                value = -1 * value
            else:
                value = value
            pdf_data[column][i] = value
        i = i + 1

print("strip_money_values defined")

def perform_stripping( path ): 
    columns = ["HHS RISK ADJUSTMENT TRANSFER AMOUNT (INDIVIDUAL MARKET, INCLUDING CATASTROPHIC)",
            "REINSURANCE PAYMENT AMOUNT (OR NOT ELIGIBLE)",
            "HHS RISK ADJUSTMENT TRANSFERS AMOUNT (SMALL GROUP MARKET)"]

    for column in columns:
        strip_money_values(column)
    print(pdf_data)
    pdf_data.to_csv(path_or_buf=path)
    

print("perform_stripping defined")

strip_money_values defined
perform_stripping defined


In [112]:
# how to execute the above scripts to clean the data of all '$', ',', and '(')''s.
#pdf_data = pd.read_csv("input/2014-Benefit-Year-RI-RA-Updated.csv")
#perform_stripping("input/2014-Benefit-Year-RI-RA-Updated-numeric.csv")

In [113]:
# read in excel data
excel_data_2015 = pd.read_csv("input/2015_Part1_2_Summary_Data_Premium_Claims.csv", dtype=str)

# only keep the rows that we are looking up
a_data_2015 = excel_data_2015.loc[data_2015["ROW_LOOKUP_CODE"]
                            .isin({"FED_REINS_PAYMENTS", "FED_RISK_ADJ_NET_PAYMENTS"})]

# only keep the columns that contain numbers that we are looking up
b_data_2015 = a_data_2015[["MR_SUBMISSION_TEMPLATE_ID", "ROW_LOOKUP_CODE", "CMM_INDIVIDUAL_Q1", "CMM_SMALL_GROUP_Q1"]]

# remove rows that contain null values in *BOTH* columns that we are looking up
# (thus, threshold = 3 null columns to drop the column)
c_data_2015 = b_data_2015.dropna(thresh=3)

# drop duplicates in the CMM_INDIVIDUAL_Q1 and CMM_SMALL_GROUP_Q1 rows. 
# for some reason, companies are reported twice.
d_data_2015 = c_data_2015.drop_duplicates(subset=["CMM_INDIVIDUAL_Q1","CMM_SMALL_GROUP_Q1"])

# get member months data
#e_data_2015 = data_2015.loc[data_2015["ROW_LOOKUP_CODE"].isin({"MEMBER_MONTHS"})]

# remove all irrelevant data for individual market
#f_data_2015 = e_data_2015[["MR_SUBMISSION_TEMPLATE_ID", "ROW_LOOKUP_CODE", "CMM_INDIVIDUAL_Q1"]].dropna(how='any')
# remove all irrelevant data for small group market
#f1_data_2015 = e_data_2015[["MR_SUBMISSION_TEMPLATE_ID", "ROW_LOOKUP_CODE", "CMM_SMALL_GROUP_Q1"]].dropna(how='any')

# remove 0 entries; rename column
#g_data_2015 = f_data_2015.loc[float(f_data_2015["CMM_INDIVIDUAL_Q1"]) > 0]
#g_data_2015 = g_data_2015.rename(columns = {"CMM_INDIVIDUAL_Q1":"MEMBER_MONTHS_INDIVIDUAL"})
#g1_data_2015 = f1_data_2015.loc[float(f1_data_2015["CMM_SMALL_GROUP_Q1"]) > 0]
#g1_data_2015 = g1_data_2015.rename(columns = {"CMM_SMALL_GROUP_Q1":"MEMBER_MONTHS_SMALL_GROUP"})

# create one dataframe for reinsurance payments and another for risk adjustment payments
data_2015_reins = d_data_2015.loc[d_data_2015["ROW_LOOKUP_CODE"].isin({"FED_REINS_PAYMENTS"})]
data_2015_riskadj = d_data_2015.loc[d_data_2015["ROW_LOOKUP_CODE"].isin({"FED_RISK_ADJ_NET_PAYMENTS"})]

data_2015_reins = data_2015_reins.rename(columns={'CMM_INDIVIDUAL_Q1': 'REINSURANCE'})

del data_2015_reins["CMM_SMALL_GROUP_Q1"]

#print(data_2015_reins)
data_2015_reins = data_2015_reins.apply(pd.to_numeric, errors='ignore')
#print(data_2015_reins.dtypes)
#print(data_2015_reins)

data_2015_riskadj = data_2015_riskadj.rename({'CMM_INDIVIDUAL_Q1': 'RISK_TRANSFER_INDIVIDUAL',
                       'CMM_SMALL_GROUP_Q1':'RISK_TRANSFER_SMALLGROUP'})

In [114]:
df = pd.read_csv("input/2015-Benefit-Year-RI-RA-Not-Updated-numeric.csv")
df = df.rename(columns={'REINSURANCE PAYMENT AMOUNT (OR NOT ELIGIBLE)': 'REINSURANCE',
     'HHS RISK ADJUSTMENT TRANSFER AMOUNT (INDIVIDUAL MARKET, INCLUDING CATASTROPHIC)': 'RISK_TRANSFER_INDIVIDUAL',
     'HHS RISK ADJUSTMENT TRANSFERS AMOUNT (SMALL GROUP MARKET)':'RISK_TRANSFER_SMALLGROUP'})
try:
    del df["Unnamed: 0"]
except KeyError:
    pass

try: 
    del df["Unnamed: 0.1"]
except KeyError:
    pass

#print(df)
df = df.merge(data_2015_reins, on="REINSURANCE")
print(df)
#print(df.dtypes)

    HIOS ID               HIOS INPUTTED INSURANCE COMPANY NAME STATE  \
0     11082                       Aetna Life Insurance Company    AK   
1     93122                             Freedom Life Insurance    AL   
2     60079                           Coventry Health and Life    AR   
3     61273                             Freedom Life Insurance    AR   
4     65441                         Phoenix Health Plans, Inc.    AZ   
5     35305                   Trustmark Life Insurance Company    CA   
6     56887  County of Ventura, dba Ventura County Health C...    CA   
7     64618                  National Health Insurance Company    CA   
8     71408                             Moda Health Plan, Inc.    CA   
9     81914             Coventry Health Care of Delaware, Inc.    DE   
10    15980                           Humana Insurance Company    FL   
11    83883                Florida Health Solution HMO Company    FL   
12    24775                           Celtic Insurance Company  

MR_SUBMISSION_TEMPLATE_ID      int64
ROW_LOOKUP_CODE              float64
REINSURANCE                  float64
CMM_SMALL_GROUP_Q1           float64
dtype: object
        MR_SUBMISSION_TEMPLATE_ID  ROW_LOOKUP_CODE   REINSURANCE  \
26170                      134669              NaN  5.518706e+06   
40864                      135760              NaN  0.000000e+00   
43468                      135842              NaN  5.970215e+06   
43654                      135947              NaN  1.257778e+07   
44863                      136114              NaN  3.864422e+07   
48955                      136457              NaN  7.171320e+06   
49141                      136459              NaN  3.283275e+06   
49513                      136599              NaN  5.213869e+04   
59836                      137008              NaN  3.913465e+07   
60394                      137157              NaN  8.190500e+04   
65137                      137219              NaN  7.544022e+07   
65323                 

Empty DataFrame
Columns: [Unnamed: 0, Unnamed: 0.1, HIOS ID, HIOS INPUTTED INSURANCE COMPANY NAME, STATE, REINSURANCE PAYMENT AMOUNT (OR NOT ELIGIBLE), HHS RISK ADJUSTMENT TRANSFER AMOUNT (INDIVIDUAL MARKET, INCLUDING CATASTROPHIC), HHS RISK ADJUSTMENT TRANSFERS AMOUNT (SMALL GROUP MARKET), MR_SUBMISSION_TEMPLATE_ID_x, ROW_LOOKUP_CODE_x, CMM_SMALL_GROUP_Q1_x, MR_SUBMISSION_TEMPLATE_ID_y, ROW_LOOKUP_CODE_y, CMM_INDIVIDUAL_Q1, CMM_SMALL_GROUP_Q1_y]
Index: []


In [70]:
print(data_2015_reins)

       MR_SUBMISSION_TEMPLATE_ID     ROW_LOOKUP_CODE CMM_INDIVIDUAL_Q1  \
26170                     134669  FED_REINS_PAYMENTS           5518706   
40864                     135760  FED_REINS_PAYMENTS                 0   
43468                     135842  FED_REINS_PAYMENTS           5970215   
43654                     135947  FED_REINS_PAYMENTS       12577778.91   
44863                     136114  FED_REINS_PAYMENTS       38644223.02   
48955                     136457  FED_REINS_PAYMENTS           7171320   
49141                     136459  FED_REINS_PAYMENTS           3283275   
49513                     136599  FED_REINS_PAYMENTS          52138.69   
59836                     137008  FED_REINS_PAYMENTS       39134653.11   
60394                     137157  FED_REINS_PAYMENTS             81905   
65137                     137219  FED_REINS_PAYMENTS       75440222.47   
65323                     137253  FED_REINS_PAYMENTS        7001993.36   
65509                     137255  FED_