In [27]:
from IPython.display import HTML

HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')
    

# The Healthcare Cost Report Information System (HCRIS) 

## Guide to HCRIS Database
Each medicare-certified provider must submit an annual report to Center for Medicare&Medicaid Services. It is very comprehensive relational database which includes provider information such as facility characteristics, utilization data, cost and charges by cost center (in total and for Medicare), Medicare settlement data, and financial statement data. These providers in HCRIS are as follows: 
- the Hospital Cost Report (CMS-2552-96 and CMS-2552-10)
- Skilled Nursing Facility Cost Report (CMS-2540-96 and CMS-2540-10), 
- Home Health Agency Cost Report (CMS-1728-94)
- Renal Facility Cost Report (CMS-265-94 and CMS-265-11) 
- Health Clinic Cost Report (CMS-222-92) 
- Hospice Cost Report (CMS-1984-99) 
- Federally Qualified Health Clinic Cost Report (CMS-224-14)
- Community Mental Health Center Cost Report (CMS-2088-92)

We are only interested in Hospital reports CMS-2552-96 and CMS-2552-10. The last two digits in paranthesises above represent the format (year) of that system. For example, the hospital cost report has two formats, 1996 and 2010. Since CMS changed the Hospital cost report format in 2010, there are two subsystems in the hospital system. 

For detail instruction, the best source is the visit the [CMS website](https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Cost-Reports/). It is a long journey to understand the structure and complexity of the database. However, to access the data easier, best alternative is [NBER website](http://www.nber.org/data/hcris.html) Many thanks to Jean Roth. For stata users, there is a nice public good created by [Adam Sacarny](https://github.com/asacarny/hospital-cost-reports). His work helped me figure out the cost report structure much better than CMS's website. Also instead of merging the data from the scratch, I used his data for starter and added new variables that I need to that data. 

First step is to understand the structure of database such as key variables that links all the table. Since it is not regular one-file dataset, we need to determine the data that we need for our study. For that reason, we need to know which worksheet code, line, and column that we can use to extract the data from the cost report forms. One time-consuming but better way to look at [this CMS link](https://www.cms.gov/Regulations-and-Guidance/Guidance/Manuals/Paper-Based-Manuals-Items/CMS021935.html).The provider reimbersument manual includes all the cost report forms and instructions. Since we are interested in Hospital reports we will select and download Chapter 40. In the zip file, there are two pdf files; one is [forms](https://github.com/msari6/CommunityBenefits_SDOH/blob/master/HCRIS/P152_40/R15P240f.pdf), other is [instructions](https://github.com/msari6/CommunityBenefits_SDOH/blob/master/HCRIS/P152_40/R15P240.pdf). An alternative link is [Cost report data website](https://www.costreportdata.com/worksheet_formats.html). It has an organized way to access each forms and instructions. Let's say I need to get total bad debt expense for entire hospital complex. I need to look at line 26 and column 1 at [Worksheet S-10](https://www.costreportdata.com/worksheets/Form_S100.pdf) form. 

Now we have:   
- Worksheet Code ='S100000' 
- Line Number ='02600'
- Column Number ='00100' 

## Community Benefit Project 

CMS defines uncompensated care in the instruction document as follows:

> Uncompensated care consists of charity care, non-Medicare bad debt, and non-reimbursable
Medicare bad debt. Uncompensated care does not include courtesy allowances, discounts given
to patients that do not meet the hospital’s charity care policy, or discounts given to uninsured
patients that do not meet the hospital's FAP, or bad debt reimbursed by Medicare.

I need to look at [Worksheet S-10](https://www.costreportdata.com/worksheets/Form_S100.pdf) to obtain uncompensated care data. To have comparable data across the hospitals, I will have uncompensated care costs as percentage of total operating expenses. Based on the definition of CMS I need three parts from the cost reports to have uncompensated care cost.

1. Charity Care Costs
2. Non-Medicare Bad Debt
3. non-reimbursable Medicare Bad Debt.

The lines I need:

- Charity Care Cost: Worksheet S-10, Line 20 , Column 3
- Charity Care Cost: Worksheet S-10, Line 22, Column 3
- Charity Care Cost: Worksheet S-10, Line 23, Column 3
- Total bad debt expense: Worksheet S-10, Line 26, Column 1
- Non-reimbursable Medicare Bad Debt: Worksheet S-10, Line 27, Column 1
- Medicare allowable bad debts: Worksheet S-10, Line 27.01, Column 1
- Non-Medicare Bad Debt: Worksheet S-10, Line 28, Column 1
- Total Operating Expense: Worksheet G-3, Line 4, Column 1

One important note here is that beginning October 1, 2013 hospitals calculate non-medicare bad debt expense (Line 28) by substracting Medicare allowable bad debts (Line 27.01) from total bad debt expense for the entire hospital complex (Line 26). However, Line 27 was used instead of Line 27.01 before October 1, 2013. For that it is better to adjust the data to have consistent calculation for each year. Line 29 is the sum of the non-Medicare bad debt expense (Line 28) and the non-reimbursable Medicare bad debt expense (Line 27.01 - Line 27).  

Here is the Uncompensated Care calculation:

> UC = Charity Care Costs + Non-Medicare Bad Debt + Non-reimbursable Medicare Bad Debt

> UC = (Line 20 - Line 22) + (Line 28) + (Line 27.01 - Line 27)

Then UC will be divided by total operating expense to obtain the percentage. 

In [this post](https://github.com/msari6/CommunityBenefits_SDOH/blob/master/HCRIS/Comment%20on%20Uncompensated%20Care%20in%20HCRIS.md), I explained how I calculated the uncompensated care line by line and overcomed the change happened in 2013.



In Schedule H,  the community benefit expense calculation is showned in [Worksheet 1 at the Schedule H instruction](https://www.irs.gov/pub/irs-pdf/i990sh.pdf). For example, here is "Financial Assistance at Cost" category of community benefits calculation:

 1. Calculate Amount of gross patient charges written off under financial assistance policies
 2. Ratio of patient care cost to charges which is asked in the form (cost-to-charge ratio)
 3. Estimate cost (multiply gross patient charges by ratio)
 4. Calculate Medicaid provider taxes, fees, and assessments
 5. Add the estimate cost (3) to (4) and find **total community benefit expense**
 6. Calculate revenue from uncompensated care pools or programs 
 7. Other direct offsetting revenue
 8. Add 6 to 7 and find **Total direct offsetting revenue **
 9. Substract **total community benefit expense** from **Total direct offsetting revenue ** and find **Net community benefit expense**
 10. Calculate percent of total expense. The numerator is **Net community benefit expense** and the denominator comes from Form 990, Part IX, line 25, column (A) total functional expense.
 
I will use the percentage of total operating expense for the reason I addressed above. 
Here is an empirical question: As the calculation shows medicaid provider taxes, fees, and assessments are part of financial assistance category which is a big share of total community benefit provision. As an indirect provision, what is the share of medicaid-related fees in the CB provision?
 
### How IRS defines the calculation of the patient care cost to charges ratio (CCR) ?

[The worksheet 2](https://www.irs.gov/pub/irs-pdf/i990sh.pdf) (page 15) in the Schedule H instruction guides hospitals how to calculate the ratio unless hospitals did not choose other cost accounting method or system. 

- CCR =  Adjust Patient Care Cost  / Adjusted Gross Patient Charges 
- Adjusted Patient Care Cost = Total operation expense - Nonpatient expense 
- Adjusted Gross Patient Charges = Gross Patient Charges - Gross charges for community benefit programs

 
IRS 990 Form includes 8 categories of community benefits while Cost report data include only bad debt and charity care. In order to have comparable data, I will calculate uncompensated care costs as percent of total expense for each hospital. 


In [1]:
import os
import pandas as pd
import numpy as np
from pathlib import Path
import os.path

In [17]:
data_folder = os.path.join(*["/home","msari", "Project1", "RawData", "HCRIS"])
hcris = os.path.join(data_folder, "hcris_merged_hospyear.csv")

In [18]:
cp['covg_begin_dt'] = pd.to_datetime(cp['covg_begin_dt'], format = '%d%b%Y', errors = 'ignore') 

In [21]:
# The data covers cost reports from 2000 to 2016. 
cp = pd.read_csv(hcris, dtype = {'year': np.str}, low_memory=False)

In [5]:
# Set ipython's max row display to 100
pd.set_option('display.max_row', 100)

# Set iPython's max column width to 60
pd.set_option('display.max_columns', 60)
# set python's display of numbers to make it more human readable 
pd.set_option('display.float_format', '{:20,.2f}'.format)

Let's look at the summary statistics of the data. Right now the dataset does not have zipcode or provider location info. Later in this report it will be added by merging with AHA data and provider id data from CMS to get accurate info about provider location.

In [22]:
cp.describe()

Unnamed: 0,pn,beds_adultped_wtd,beds_totadultped_wtd,beds_total_min,beds_total_max,beds_total_wtd,availbeddays_adultped,ipbeddays_adultped,ipdischarges_adultped,income,totcost,margin,uccare_chg_harmonized,uccare_cost_harmonized,netpatrev,othinc,opexp,othexp,donations,invinc,iphosprev,ipgenrev,ipicrev,iprcrev,ipancrev,ipoprev,iptotrev,opancrev,opoprev,optotrev,tottotrev,ccr_min,ccr_max,ccr_wtd,chguccare,totinitchcare,ppaychcare,nonmcbaddebt,costuccare_v2010,nreports,nfmt96,nfmt10,nno_uncomp,frac_year_covered,flag_short,flag_long,cost_nonmdc_reimb_bdebt,costofcharitycare,totbaddebtexp,mdc_reimb_bdebt,mdc_allow_bdebt,costof_unc_care,costof_unc_care_adj,costof_uc_prct_totexp,costof_uc_prct_totexp1
count,99045.0,97590.0,97590.0,97415.0,97415.0,97621.0,97395.0,97246.0,97262.0,97540.0,97540.0,95270.0,97540.0,67709.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,67709.0,67709.0,73428.0,97540.0,97540.0,97540.0,97540.0,97540.0,99045.0,99045.0,99045.0,99045.0,99045.0,99045.0,99045.0,97540.0,97540.0,97540.0,97540.0,97540.0,97540.0,67709.0,67372.0,96891.0
mean,266621.12,175.83,175.83,191.38,195.85,193.23,51864.31,24252.66,5688.22,120153182.76,115544790.14,-1203.58,13734421.83,6273621.49,110672949.03,9480233.7,114071357.29,1473432.85,313255.14,885962.38,41412720.44,46746832.49,14367698.88,61114531.37,126354971.13,3868501.31,192872225.21,90474085.79,40721974.86,137761381.69,330633606.9,1.45,1.95,2.0,4650119.15,5368324.23,99230.79,3815209.24,2655268.59,1.61,0.98,0.63,0.17,0.98,0.05,0.0,313849.74,1531788.91,2017385.02,80962.43,111002.5,7205516.04,2923638.49,0.02,0.04
std,153661.02,19501.54,19501.54,19518.93,19519.89,19498.59,3518524.71,35355.01,8441.53,231371202.67,214396809.52,220811.69,80877409.34,59939105.78,208230494.35,47928442.2,212309030.53,12713967.35,3191718.94,5686241.5,110973475.61,120064042.7,46441463.06,156491254.73,276857136.37,22141523.14,414858592.99,210535274.84,126318485.78,283864844.56,673629397.47,23.26,53.83,39.21,23479721.47,32207064.79,1176666.97,66944210.2,27466944.59,0.5,0.85,0.83,0.48,0.12,0.22,0.01,2185110.28,13672363.15,9843629.15,300966.63,482992.54,37008658.05,21683804.91,0.08,0.1
min,10001.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,-3121932864.0,-75562659.25,-48267368.0,-12159279.0,-157275116.0,-3292672161.78,-2211753344.0,-75565986.02,-391603432.0,-120312702.0,-332499416.0,-16032271.0,-16032271.0,-16698137.0,-16032271.0,-6334094.64,-87811331.0,-11354309.98,-9291365.43,-1197235963.99,-9485422.26,-17244267.88,0.0,0.0,0.0,-128490.0,-14538724.42,-12395972.36,-13283251.15,-157275119.09,1.0,0.0,0.0,0.0,0.0,0.0,0.0,-49338942.96,-21075574.0,-20589234.4,-131053.0,-201620.0,-27364834.0,-98444996.0,-3.07,-1.64
25%,140207.0,25.0,25.0,25.0,25.0,25.0,9125.0,2916.8,549.4,14002741.25,14452955.38,-0.02,0.0,0.12,12733197.01,298167.47,14292993.25,0.0,0.0,0.0,2094070.45,3291734.22,0.0,3534779.3,3558153.98,0.0,9824711.87,1.0,0.0,6406973.96,24158191.18,0.25,0.26,0.27,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,260005.0,65.0,65.0,70.0,72.0,71.0,23360.0,11156.26,1988.29,38139630.0,38607870.0,0.03,371158.38,1287273.0,35547035.75,1400925.85,38149382.13,0.0,0.0,4637.36,9887941.08,12040700.31,525848.0,13641035.92,21064309.83,0.0,39810811.39,14679804.53,3767653.5,39057462.6,79099693.0,0.36,0.37,0.37,0.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,390307.0,150.0,150.0,170.0,174.0,171.58,54385.0,32270.02,7627.86,129709411.0,126881292.0,0.09,9033882.38,4828219.5,122239233.5,5556676.25,125045773.81,53386.51,53703.34,173623.48,36712018.65,42705982.31,7737928.25,52182716.5,121741276.09,633851.74,186331788.63,90727330.85,28861029.49,153806102.35,348328570.26,0.5,0.52,0.52,0.0,190024.05,0.0,795304.79,670070.0,2.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0,83301.74,0.0,0.0,0.0,612450.31,1238170.75,0.02,0.02
max,673065.0,6092016.0,6092016.0,6092016.0,6092016.0,6092016.0,1098010950.0,2332687.88,136833.0,6598134272.0,5297989120.0,119.25,20404447232.0,6698841088.0,6414278253.0,6287346021.97,5297989337.0,906290042.73,593445000.0,531236363.65,5745012647.0,6430999639.0,1425333444.01,7337473879.0,6993513158.15,1355861286.41,10870890747.0,6033815293.42,5711795709.0,10741656000.0,15618749067.0,1797.04,10902.2,4884.38,2793923000.0,1971785232.85,111633887.03,20397675776.0,6703468780.0,4.0,4.0,4.0,3.0,1.25,1.0,1.0,292224113.21,1795243680.98,767339067.18,7550529.78,13671565.0,1993296704.0,2590847744.0,10.33,2.47


In [24]:
cp.groupby('year')['pn'].nunique()

year
2001    6231
2002    6179
2003    6191
2004    6296
2005    6329
2006    6146
2007    6151
2008    6172
2009    6164
2010    6170
2011    6168
2012    6179
2013    6162
2014    6179
2015    6167
2016    6161
Name: pn, dtype: int64

The number of obs in the data is 99,045 and it covers from 2001 to 2016. The number of hospitals is around 6100 over the year which is confirmed with AHA data. The output below shows  

In [25]:
uc_columns = ['pn', 'year', 'ccr_min', 'ccr_max', 'ccr_wtd', 'chguccare', 'totinitchcare', 'ppaychcare', 'nonmcbaddebt', 'costuccare_v2010', 'nno_uncomp', 
    'cost_nonmdc_reimb_bdebt', 'costofcharitycare', 'totbaddebtexp', 'mdc_reimb_bdebt', 'mdc_allow_bdebt', 'costof_unc_care',
 'costof_unc_care_adj', 'costof_uc_prct_totexp', 'costof_uc_prct_totexp1']
uncompenscare = ['pn', 'costof_unc_care',  'costof_unc_care_adj', 'costof_uc_prct_totexp', 'costof_uc_prct_totexp1','opexp', 'ccr_wtd']
cp[uncompenscare].describe()

Unnamed: 0,pn,costof_unc_care,costof_unc_care_adj,costof_uc_prct_totexp,costof_uc_prct_totexp1,opexp,ccr_wtd
count,99045.0,97540.0,67709.0,67372.0,96891.0,97540.0,73428.0
mean,266621.12,7205516.04,2923638.49,0.02,0.04,114071357.29,2.0
std,153661.02,37008658.05,21683804.91,0.08,0.1,212309030.53,39.21
min,10001.0,-27364834.0,-98444996.0,-3.07,-1.64,-75565986.02,0.0
25%,140207.0,0.0,0.0,0.0,0.0,14292993.25,0.27
50%,260005.0,0.0,0.0,0.0,0.0,38149382.13,0.37
75%,390307.0,612450.31,1238170.75,0.02,0.02,125045773.81,0.52
max,673065.0,1993296704.0,2590847744.0,10.33,2.47,5297989337.0,4884.38


In [26]:
def f(x):
    d = {}
    d['Provider Number'] = x['pn'].nunique()
    d['Uncompensated Care Charges (v10)'] = x['costof_unc_care'].mean()
    d['Uncompensated Care Costs (v10)'] = x['costof_unc_care_adj'].mean()
    d['Uncompensated Care Charges (diff_cal)'] = x['uccare_chg_harmonized'].mean()
    d['Uncompensated Care Costs (diff_cal)'] = x['uccare_cost_harmonized'].mean()
    x['costof_uc_prct_totexp'] = x['costof_uc_prct_totexp']* 100
    x['costof_uc_prct_totexp1'] = x['costof_uc_prct_totexp1']* 100
    d['Uncompensated Care Charges % of Operating Expense'] = x['costof_uc_prct_totexp'].mean()
    d['Uncompensated Care Costs % of Operating Expense'] = x['costof_uc_prct_totexp1'].mean()
    
    return pd.Series(d, index=['Provider Number', 'Uncompensated Care Charges (v10)', 
                               'Uncompensated Care Costs (v10)', 'Uncompensated Care Charges (diff_cal)', 
                               'Uncompensated Care Costs (diff_cal)', 
                               'Uncompensated Care Charges % of Operating Expense',
                               'Uncompensated Care Costs % of Operating Expense'])

cp.groupby('year').apply(f)

Unnamed: 0_level_0,Provider Number,Uncompensated Care Charges (v10),Uncompensated Care Costs (v10),Uncompensated Care Charges (diff_cal),Uncompensated Care Costs (diff_cal),Uncompensated Care Charges % of Operating Expense,Uncompensated Care Costs % of Operating Expense
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2001,6231.0,0.0,0.0,0.0,0.0,0.0,0.0
2002,6179.0,0.0,0.0,1719880.13,4350848.93,0.0,0.0
2003,6191.0,0.0,0.0,6642868.79,5514348.74,0.0,0.0
2004,6296.0,0.0,0.0,6927941.69,3966329.64,0.0,0.0
2005,6329.0,0.0,0.0,7293485.38,4155717.15,0.0,0.0
2006,6146.0,0.0,0.0,8816396.06,5055446.83,0.0,0.0
2007,6151.0,0.0,0.0,9768513.42,5537306.6,0.0,0.0
2008,6172.0,0.0,0.0,10897430.93,5796432.82,0.0,0.0
2009,6164.0,0.0,0.0,11825703.6,5816580.27,0.0,0.0
2010,6170.0,4831520.32,2201065.46,15092686.58,6639560.53,1.1,2.73
