## Matching Datasets

Things that I've learned:

- The Excels match the un-updated PDFs.

- Example: 
    - 1.9 Federal Transitional Reinsurance Program payments expected from HHS (as indicated by HHS as of 6/30). For Aetna Health of FL: 10,355,176.30.
    - Reported in the original PDF: 18628 Aetna Health Inc. (a FL corp.) FL 10,355,176.30 (28,025,200.75) 5,437,975.83
    - Reported in the updated PDF: 18628 Aetna Health Inc. (a FL corp.) FL 10,360,565.46 (28,025,200.75) 5,437,975.83

My goal with this Python notebook is to create final spreadsheets for each year. I imagine the final spreadsheet will be like:

HHOS ID ... MR_SUBMISSION_TEMPLATE_ID ... COMPANY NAME ... REPORTED VALUES ... ACTUAL VALUES

for 2014 and 2015. That way, we can do our time analysis.

In [121]:
import pandas as pd

# for debugging, display everything: 
pd.options.display.max_seq_items = 2000
pd.options.display.max_rows = 4000

In [122]:
from re import sub
from decimal import Decimal

# strip the pdf data to make it parseable

def strip_money_values( column ):
    i = 0
    for money in pdf_data[column]:
        if money == '$-' or money == 'N/E' or \
            money == 'N/A_MA_Issuer' or money == 'N/A_DefaultCharge' or \
            money == 'N/A_Default_Charge':
            pdf_data[column][i] = "N/A"
        else:
            value = Decimal(sub(r'[^\d.]', '', money))
            if '(' in money:
                value = -1 * value
            else:
                value = value
            pdf_data[column][i] = value
        i = i + 1

print("strip_money_values defined")

def perform_stripping( path ): 
    columns = ["HHS RISK ADJUSTMENT TRANSFER AMOUNT (INDIVIDUAL MARKET, INCLUDING CATASTROPHIC)",
            "REINSURANCE PAYMENT AMOUNT (OR NOT ELIGIBLE)",
            "HHS RISK ADJUSTMENT TRANSFERS AMOUNT (SMALL GROUP MARKET)"]

    for column in columns:
        strip_money_values(column)
    print(pdf_data)
    pdf_data.to_csv(path_or_buf=path)
    

print("perform_stripping defined")

strip_money_values defined
perform_stripping defined


In [123]:
# how to execute the above scripts to clean the data of all '$', ',', and '(')''s.
#pdf_data = pd.read_csv("input/2014-Benefit-Year-RI-RA-Updated.csv")
#perform_stripping("input/2014-Benefit-Year-RI-RA-Updated-numeric.csv")
# THIS CREATES THE FOLLOWING SPREADSHEET:
# HIOS ID ... COMPANY NAME ... STATE ... REINSURANCE ... RISK ADJUSTMENT INDIDIVDUAL ... RISK ADJUSTMENT SMALL GROUP

In [149]:
# WE WANT TO CREATE THE FOLLOWING SPREADSHEET:
# SUBMISSION ID ... MEMBER MONTHS ... REINSURANCE ... RISK ADJUSTMENT INDIVIDUAL ... RISK ADJUSTMENT SMALL GROUP

# read in excel data
excel_data_2015 = pd.read_csv("input/2015_Part1_2_Summary_Data_Premium_Claims.csv", dtype=str)
excel_data_2015 = excel_data_2015.apply(pd.to_numeric, errors='ignore')

# only keep the rows that we are looking up
a_data_2015 = excel_data_2015.loc[excel_data_2015["ROW_LOOKUP_CODE"]
                            .isin({"FED_REINS_PAYMENTS", "FED_RISK_ADJ_NET_PAYMENTS", "MEMBER_MONTHS"})]

# only keep the columns that contain numbers that we are looking up
b_data_2015 = a_data_2015[["MR_SUBMISSION_TEMPLATE_ID", "ROW_LOOKUP_CODE", "CMM_INDIVIDUAL_Q1", "CMM_SMALL_GROUP_Q1"]]

# remove rows that contain null values in *BOTH* columns that we are looking up
# (thus, threshold = 3 null columns to drop the column)
c_data_2015 = b_data_2015.dropna(thresh=3)

# drop duplicates in the CMM_INDIVIDUAL_Q1 and CMM_SMALL_GROUP_Q1 rows. 
# for some reason, companies are reported twice.
d_data_2015 = c_data_2015.drop_duplicates(subset=["CMM_INDIVIDUAL_Q1","CMM_SMALL_GROUP_Q1"])

# get member months data as base for combining the rows
df = d_data_2015.loc[d_data_2015["ROW_LOOKUP_CODE"].isin({"MEMBER_MONTHS"})]
df = df[["MR_SUBMISSION_TEMPLATE_ID","CMM_INDIVIDUAL_Q1","CMM_SMALL_GROUP_Q1"]]
df["MEMBER_MONTHS_INDIVIDUAL"] = df["CMM_INDIVIDUAL_Q1"]
df["MEMBER_MONTHS_SMALL_GROUP"] = df["CMM_SMALL_GROUP_Q1"]
del df["CMM_INDIVIDUAL_Q1"]
del df["CMM_SMALL_GROUP_Q1"]

# merge the reinsurance
data_2015_reins = d_data_2015.loc[d_data_2015["ROW_LOOKUP_CODE"].isin({"FED_REINS_PAYMENTS"})]

# rename column to reinsurance
data_2015_reins["REINSURANCE"] = data_2015_reins["CMM_INDIVIDUAL_Q1"]
data_2015_reins = data_2015_reins[["MR_SUBMISSION_TEMPLATE_ID","REINSURANCE"]]
df = df.merge(data_2015_reins, on="MR_SUBMISSION_TEMPLATE_ID")

# merge the risk adjustment
data_2015_riskadj = d_data_2015.loc[d_data_2015["ROW_LOOKUP_CODE"].isin({"FED_RISK_ADJ_NET_PAYMENTS"})]

# rename columns for merging
data_2015_riskadj["RISK_TRANSFER_INDIVIDUAL"] = data_2015_riskadj["CMM_INDIVIDUAL_Q1"]
data_2015_riskadj["RISK_TRANSFER_SMALL_GROUP"] = data_2015_riskadj["CMM_SMALL_GROUP_Q1"]
data_2015_riskadj = data_2015_riskadj[["MR_SUBMISSION_TEMPLATE_ID","RISK_TRANSFER_INDIVIDUAL","RISK_TRANSFER_SMALL_GROUP"]]
df = df.merge(data_2015_riskadj, on="MR_SUBMISSION_TEMPLATE_ID")

print(df)

     MR_SUBMISSION_TEMPLATE_ID  MEMBER_MONTHS_INDIVIDUAL  \
0                       134669                  237567.0   
1                       135842                   69211.0   
2                       135947                  514810.0   
3                       136114                  890026.0   
4                       136457                  525565.0   
5                       136459                   37735.0   
6                       136599                     618.0   
7                       137008                  578453.0   
8                       137157                   43144.0   
9                       137219                 1434483.0   
10                      137253                  254595.0   
11                      137255                   78033.0   
12                      137266                 1180859.0   
13                      137276                    2147.0   
14                      137277                    1491.0   
15                      137278          

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [114]:
df = pd.read_csv("input/2015-Benefit-Year-RI-RA-Not-Updated-numeric.csv")
df = df.rename(columns={'REINSURANCE PAYMENT AMOUNT (OR NOT ELIGIBLE)': 'REINSURANCE',
     'HHS RISK ADJUSTMENT TRANSFER AMOUNT (INDIVIDUAL MARKET, INCLUDING CATASTROPHIC)': 'RISK_TRANSFER_INDIVIDUAL',
     'HHS RISK ADJUSTMENT TRANSFERS AMOUNT (SMALL GROUP MARKET)':'RISK_TRANSFER_SMALLGROUP'})
try:
    del df["Unnamed: 0"]
except KeyError:
    pass

try: 
    del df["Unnamed: 0.1"]
except KeyError:
    pass

#print(df)
df = df.merge(data_2015_reins, on="REINSURANCE")
print(df)
#print(df.dtypes)

    HIOS ID               HIOS INPUTTED INSURANCE COMPANY NAME STATE  \
0     11082                       Aetna Life Insurance Company    AK   
1     93122                             Freedom Life Insurance    AL   
2     60079                           Coventry Health and Life    AR   
3     61273                             Freedom Life Insurance    AR   
4     65441                         Phoenix Health Plans, Inc.    AZ   
5     35305                   Trustmark Life Insurance Company    CA   
6     56887  County of Ventura, dba Ventura County Health C...    CA   
7     64618                  National Health Insurance Company    CA   
8     71408                             Moda Health Plan, Inc.    CA   
9     81914             Coventry Health Care of Delaware, Inc.    DE   
10    15980                           Humana Insurance Company    FL   
11    83883                Florida Health Solution HMO Company    FL   
12    24775                           Celtic Insurance Company  

In [132]:
df2 = pd.read_csv("input/2015-Benefit-Year-RI-RA-Not-Updated-numeric.csv")
df2 = df2.rename(columns={'REINSURANCE PAYMENT AMOUNT (OR NOT ELIGIBLE)': 'REINSURANCE',
     'HHS RISK ADJUSTMENT TRANSFER AMOUNT (INDIVIDUAL MARKET, INCLUDING CATASTROPHIC)': 'RISK_TRANSFER_INDIVIDUAL',
     'HHS RISK ADJUSTMENT TRANSFERS AMOUNT (SMALL GROUP MARKET)':'RISK_TRANSFER_SMALLGROUP'})
try:
    del df2["Unnamed: 0"]
except KeyError:
    pass

try: 
    del df2["Unnamed: 0.1"]
except KeyError:
    pass

#print(df)
df2 = df2.merge(data_2015_riskadj, on="RISK_TRANSFER_SMALLGROUP", how='outer')
print(df2)
#print(df.dtypes)

      HIOS ID               HIOS INPUTTED INSURANCE COMPANY NAME STATE  \
0       44580                           Humana Insurance Company    AL   
1       44580                           Humana Insurance Company    AL   
2       44580                           Humana Insurance Company    AL   
3       44580                           Humana Insurance Company    AL   
4       44580                           Humana Insurance Company    AL   
5       44580                           Humana Insurance Company    AL   
6       44580                           Humana Insurance Company    AL   
7       44580                           Humana Insurance Company    AL   
8       44580                           Humana Insurance Company    AL   
9       44580                           Humana Insurance Company    AL   
10      44580                           Humana Insurance Company    AL   
11      44580                           Humana Insurance Company    AL   
12      44580                         

In [130]:
#data_2015_riskadj = data_2015_riskadj.rename({'CMM_INDIVIDUAL_Q1': 'RISK_TRANSFER_INDIVIDUAL',
#                       'CMM_SMALL_GROUP_Q1':'RISK_TRANSFER_SMALLGROUP'})

#print(data_2015_riskadj)
data_2015_riskadj.rename({'CMM_INDIVIDUAL_Q1': "RISK_TRANSFER_INDIVIDUAL", 'CMM_SMALL_GROUP_Q1':"RISK_TRANSFER_SMALLGROUP"})
#print(data_2015_riskadj['CMM_INDIVIDUAL_Q1'])
data_2015_riskadj["RISK_TRANSFER_INDIVIDUAL"] = data_2015_riskadj["CMM_INDIVIDUAL_Q1"]
del data_2015_riskadj["CMM_INDIVIDUAL_Q1"]
data_2015_riskadj["RISK_TRANSFER_SMALLGROUP"] = data_2015_riskadj["CMM_SMALL_GROUP_Q1"]
del data_2015_riskadj["CMM_SMALL_GROUP_Q1"]

In [131]:
print(data_2015_riskadj)

        MR_SUBMISSION_TEMPLATE_ID            ROW_LOOKUP_CODE  \
26171                      134669  FED_RISK_ADJ_NET_PAYMENTS   
26822                      134816  FED_RISK_ADJ_NET_PAYMENTS   
40865                      135760  FED_RISK_ADJ_NET_PAYMENTS   
43469                      135842  FED_RISK_ADJ_NET_PAYMENTS   
43655                      135947  FED_RISK_ADJ_NET_PAYMENTS   
44864                      136114  FED_RISK_ADJ_NET_PAYMENTS   
48956                      136457  FED_RISK_ADJ_NET_PAYMENTS   
49142                      136459  FED_RISK_ADJ_NET_PAYMENTS   
49514                      136599  FED_RISK_ADJ_NET_PAYMENTS   
54350                      136920  FED_RISK_ADJ_NET_PAYMENTS   
54815                      136946  FED_RISK_ADJ_NET_PAYMENTS   
55280                      136951  FED_RISK_ADJ_NET_PAYMENTS   
55466                      136953  FED_RISK_ADJ_NET_PAYMENTS   
56024                      136959  FED_RISK_ADJ_NET_PAYMENTS   
59093                      136994  FED_R

Empty DataFrame
Columns: [Unnamed: 0, Unnamed: 0.1, HIOS ID, HIOS INPUTTED INSURANCE COMPANY NAME, STATE, REINSURANCE PAYMENT AMOUNT (OR NOT ELIGIBLE), HHS RISK ADJUSTMENT TRANSFER AMOUNT (INDIVIDUAL MARKET, INCLUDING CATASTROPHIC), HHS RISK ADJUSTMENT TRANSFERS AMOUNT (SMALL GROUP MARKET), MR_SUBMISSION_TEMPLATE_ID_x, ROW_LOOKUP_CODE_x, CMM_SMALL_GROUP_Q1_x, MR_SUBMISSION_TEMPLATE_ID_y, ROW_LOOKUP_CODE_y, CMM_INDIVIDUAL_Q1, CMM_SMALL_GROUP_Q1_y]
Index: []


In [70]:
print(data_2015_reins)

       MR_SUBMISSION_TEMPLATE_ID     ROW_LOOKUP_CODE CMM_INDIVIDUAL_Q1  \
26170                     134669  FED_REINS_PAYMENTS           5518706   
40864                     135760  FED_REINS_PAYMENTS                 0   
43468                     135842  FED_REINS_PAYMENTS           5970215   
43654                     135947  FED_REINS_PAYMENTS       12577778.91   
44863                     136114  FED_REINS_PAYMENTS       38644223.02   
48955                     136457  FED_REINS_PAYMENTS           7171320   
49141                     136459  FED_REINS_PAYMENTS           3283275   
49513                     136599  FED_REINS_PAYMENTS          52138.69   
59836                     137008  FED_REINS_PAYMENTS       39134653.11   
60394                     137157  FED_REINS_PAYMENTS             81905   
65137                     137219  FED_REINS_PAYMENTS       75440222.47   
65323                     137253  FED_REINS_PAYMENTS        7001993.36   
65509                     137255  FED_