# Personal Project: Hospital Provider Cost Reports

## Exploratory Data Analysis on Hospital Provider Cost Reports

### Introduction
This project analyzes **Hospital Provider Cost Reports** published between 2011 and 2022. The dataset includes annual hospital cost reporting files, which were combined into a single consolidated dataset for analysis.

Our primary objective is to perform a comprehensive analysis of hospital provider cost report data to identify patterns, trends, and potential areas for improvement. This work will serve as a learning exercise to strengthen my data analysis skills and will result in a polished portfolio piece showcasing my ability to transform raw data into actionable insights.

📂 **Data Source**: [CMS Hospital Provider Cost Report](https://data.cms.gov/provider-compliance/cost-report/hospital-provider-cost-report)

### Research Questions
The following questions guide the analysis of hospital cost report data, with emphasis on financial performance, capacity, and care delivery trends across hospitals and regions.  

Financial Performance
- How has hospital net income changed over time, both nationally and by state?
- What is the average cost-to-charge ratio across hospitals, and which hospitals are outliers?
- How has the cost-to-charge ratio evolved over the years?
- How do urban vs. rural hospitals differ in financial performance trends?
- What are the long-term trends in assets, liabilities, and fund balances?

Capacity & Operations
- Which states or counties have the highest hospital capacity in terms of beds and patient days?
- How has staffing (FTE employees) changed relative to patient volume?

Care Delivery & Equity
- What are the trends in charity care and uncompensated care over time?
- How much uncompensated care is provided by region or hospital type?
- How do inpatient vs. outpatient revenues vary by hospital size and type?

### Who is the intended audience?
The insights will be tailored to meet the needs of several stakeholder groups:
- Hospital administrators and executives – to inform operational and financial decision-making
- Researchers & analysts – to serve as a reference point for further studies
- General public & patient advocates – to promote understanding of healthcare costs and trends
- Policy makers and government agencies – to guide regulation, funding allocation, and healthcare reform initiatives

### Type of Analyses
Descriptive analysis:
- Visualize cost distribution by department and facility characteristics
- Analyze Medicare settlement trends and utilization data
- Explore patterns that might indicate cost drivers or inefficiencies

Financial insights:
- Highlight key KPIs like cost-to-reimbursement ratios
- Detect and flag unusual cost spikes or trends in expenses
- Provide easy-to-understand summaries for quick decision-making

Marketing/strategy angle:
- Benchmark facilities against each other on cost efficiency
- Show utilization strengths in high-demand services
- Present geographic and service-line heatmaps for competitive insights

### Preparing the Environment

We will import the required libraries and read in the data set.
- Pandas - Data Manipulation
- Matplotlib - Data Visualization
- Seaborn - Data Visualization
- Warnings - Utility usage libraries

In [1]:
# Import libraries and alias for easy reading
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

In [6]:
# Suppress all warning messages to keep the output clean
warnings.filterwarnings('ignore')

# Configure plot styles: use matplotlib's default style
plt.style.use("default")

# Apply seaborn's theme for better-looking plots
sns.set_theme()

# Display all columns when printing DataFrames
pd.set_option("display.max_columns", None)

# Set the number of decimal places to display in DataFrames
pd.set_option("display.precision", 2)

In [15]:
# File path for the CSV data (replace with your own path if needed)
file_path = "C:/Users/Jason/Documents/Documents/Projects/Hospital_Provider_Cost_Report/Data/interim/hospital_provider_cost_output.csv"

# Read/load data in CSV format into a pandas DataFrame for analysis
hospital_provider_cost_data = pd.read_csv(file_path)

### Data Source & Preparation

The dataset comes from annual Hospital Provider Cost Reports, published between 2011 and 2022. Each year was originally provided as a separate CSV file (such as `CostReport_2011_Final.csv`, `CostReport_2012_Final.csv`, `CostReport_2022_Final.csv`).  

To simplify the analysis, these files were combined into a **single consolidated dataset**.  
A preprocessing script (`merge_hospital_provider_cost_report_data.py`) was used to:
- Load all CSV files from 2011–2022
- Extract the year from the filename
- Add a new column `Cost Report Year`
- Merge them into one unified dataset
- Save the result as `hospital_provider_cost_output.csv`

This consolidated file is what we use for all further analysis.

### Data Exploration
Let's have a look at the data using df.head() and df.tail() function.

In [19]:
# Display the first 5 rows of the dataset
hospital_provider_cost_data.head()

Unnamed: 0,rpt_rec_num,Provider CCN,Hospital Name,Street Address,City,State Code,Zip Code,County,Medicare CBSA Number,Rural Versus Urban,CCN Facility Type,Provider Type,Type of Control,Fiscal Year Begin Date,Fiscal Year End Date,FTE - Employees on Payroll,Number of Interns and Residents (FTE),Total Days Title V,Total Days Title XVIII,Total Days Title XIX,Total Days (V + XVIII + XIX + Unknown),Number of Beds,Total Bed Days Available,Total Discharges Title V,Total Discharges Title XVIII,Total Discharges Title XIX,Total Discharges (V + XVIII + XIX + Unknown),Number of Beds + Total for all Subproviders,Hospital Total Days Title V For Adults & Peds,Hospital Total Days Title XVIII For Adults & Peds,Hospital Total Days Title XIX For Adults & Peds,Hospital Total Days (V + XVIII + XIX + Unknown) For Adults & Peds,Hospital Number of Beds For Adults & Peds,Hospital Total Bed Days Available For Adults & Peds,Hospital Total Discharges Title V For Adults & Peds,Hospital Total Discharges Title XVIII For Adults & Peds,Hospital Total Discharges Title XIX For Adults & Peds,Hospital Total Discharges (V + XVIII + XIX + Unknown) For Adults & Peds,Cost of Charity Care,Total Bad Debt Expense,Cost of Uncompensated Care,Total Unreimbursed and Uncompensated Care,Total Salaries From Worksheet A,Overhead Non-Salary Costs,Depreciation Cost,Total Costs,Inpatient Total Charges,Outpatient Total Charges,Combined Outpatient + Inpatient Total Charges,Wage-Related Costs (Core),Wage-Related Costs (RHC/FQHC),Total Salaries (adjusted),Contract Labor: Direct Patient Care,Wage Related Costs for Part - A Teaching Physicians,Wage Related Costs for Interns and Residents,Cash on Hand and in Banks,Temporary Investments,Notes Receivable,Accounts Receivable,Less: Allowances for Uncollectible Notes and Accounts Receivable,Inventory,Prepaid Expenses,Other Current Assets,Total Current Assets,Land,Land Improvements,Buildings,Leasehold Improvements,Fixed Equipment,Major Movable Equipment,Minor Equipment Depreciable,Health Information Technology Designated Assets,Total Fixed Assets,Investments,Other Assets,Total Other Assets,Total Assets,Accounts Payable,"Salaries, Wages, and Fees Payable",Payroll Taxes Payable,Notes and Loans Payable (Short Term),Deferred Income,Other Current Liabilities,Total Current Liabilities,Mortgage Payable,Notes Payable,Unsecured Loans,Other Long Term Liabilities,Total Long Term Liabilities,Total Liabilities,General Fund Balance,Total Fund Balances,Total Liabilities and Fund Balances,DRG Amounts Other Than Outlier Payments,DRG Amounts Before October 1,DRG Amounts After October 1,Outlier Payments For Discharges,Disproportionate Share Adjustment,Allowable DSH Percentage,Managed Care Simulated Payments,Total IME Payment,Inpatient Revenue,Outpatient Revenue,Total Patient Revenue,Less Contractual Allowance and Discounts on Patients' Accounts,Net Patient Revenue,Less Total Operating Expense,Net Income from Service to Patients,Total Other Income,Total Income,Total Other Expenses,Net Income,Cost To Charge Ratio,Net Revenue from Medicaid,Medicaid Charges,Net Revenue from Stand-Alone CHIP,Stand-Alone CHIP Charges,Cost Report Year
0,285,10005,MARSHALL MEDICAL CENTER - SOUTH,2505 U.S. HIGHWAY 431,BOAZ,AL,35957-,MARSHALL,13820.0,R,STH,1,9,10/01/2010,09/30/2011,598.72,,,9132.0,4591.0,19641.0,114.0,41610.0,,2109.0,1134.0,5283.0,114.0,,7508.0,3073.0,15068.0,102.0,37230.0,,2109.0,1134.0,5283.0,1310000.0,,1130000.0,1440000.0,26700000.0,44700000.0,5080000.0,58000000.0,68600000.0,139000000.0,208000000.0,6970000.0,,26700000.0,983584.0,,,6880000.0,,,6380000.0,,564530.0,190283.0,1300000.0,15300000.0,1970000.0,2700000.0,76300000.0,,6320000.0,49300000.0,42937.0,,56500000.0,,37500000.0,37500000.0,109000000.0,4100000.0,2600000.0,,,,-5690000.0,1010000.0,12900000.0,,,48100.0,13000000.0,14000000.0,95300000.0,95300000.0,109000000.0,12700000.0,,,23664.0,2180000.0,0.17,,,70700000.0,150000000.0,221000000.0,158000000.0,63300000.0,71300000.0,-8020000.0,5330000.0,-2690000.0,,-2690000.0,0.28,8360000.0,30600000.0,,,2011
1,1022,271326,BEARTOOTH BILLINGS CLINIC,2525 NORTH BROADWAY,RED LODGE,MT,59806,CARBON,99927.0,R,CAH,1,2,01/01/2011,12/31/2011,70.56,,,922.0,10.0,1196.0,25.0,9125.0,,104.0,3.0,155.0,25.0,,298.0,10.0,410.0,25.0,9125.0,,104.0,3.0,155.0,205000.0,503000.0,735000.0,827000.0,3380000.0,6850000.0,1390000.0,8830000.0,2270000.0,6210000.0,8480000.0,,,,,,,933000.0,1300000.0,,2830000.0,-609000.0,174550.0,66852.0,,4770000.0,695000.0,1270000.0,14500000.0,,,4180000.0,,,18500000.0,,6880.0,6880.0,23300000.0,61200.0,287000.0,136879.0,133727.0,,224000.0,842000.0,17200000.0,167519.0,,,17300000.0,18200000.0,5130000.0,5130000.0,23300000.0,,,,,,,,,2310000.0,7920000.0,10200000.0,1570000.0,8660000.0,10200000.0,-1560000.0,725000.0,-837000.0,,-837000.0,1.06,186000.0,263000.0,,,2011
2,1496,10052,LAKE MARTIN COMMUNITY HOSPITAL,1231 SOUTH STREET,DADEVILLE,AL,36853,TALLAPOOSA,99919.0,U,STH,1,5,01/01/2011,12/31/2011,72.65,,,1564.0,299.0,2681.0,46.0,16790.0,,468.0,75.0,892.0,46.0,,1564.0,299.0,2681.0,46.0,16790.0,,468.0,75.0,892.0,,2410000.0,1470000.0,1910000.0,5240000.0,8040000.0,241000.0,12700000.0,4760000.0,15400000.0,20200000.0,343000.0,129485.0,5240000.0,153051.0,,,851000.0,,,4920000.0,-3580000.0,189453.0,10221.0,,2900000.0,,,,,,2610000.0,,,687000.0,,51900.0,51900.0,3630000.0,1200000.0,333000.0,112449.0,,,-266000.0,1380000.0,,-68357.0,,,-68400.0,1310000.0,2330000.0,2330000.0,3630000.0,1960000.0,,,,223000.0,0.11,,,6700000.0,14700000.0,21400000.0,8220000.0,13100000.0,13300000.0,-139000.0,272000.0,133000.0,,133000.0,0.64,287000.0,1140000.0,,,2011
3,1501,13025,HEALTHSOUTH LAKESHORE HOSPITAL,3800 RIDGEWAY DRIVE,BIRMINGHAM,AL,35209,JEFFERSON,13820.0,U,RH,5,4,01/01/2011,12/31/2011,297.77,,,23060.0,230.0,32378.0,100.0,36500.0,,1668.0,25.0,2390.0,100.0,,23060.0,230.0,32378.0,100.0,36500.0,,1668.0,25.0,2390.0,,,-8800.0,-8800.0,14800000.0,13300000.0,2190000.0,27800000.0,59300000.0,2040000.0,61400000.0,,,,,,,73300000.0,,,4850000.0,-890000.0,134049.0,104878.0,,77500000.0,,,32700000.0,718801.0,,2430000.0,,,5550000.0,,20.0,20.0,83000000.0,525000.0,910000.0,,,,1620000.0,3060000.0,,,,3060000.0,3060000.0,6110000.0,76900000.0,76900000.0,83000000.0,,,,,,,,,59300000.0,2040000.0,61400000.0,21800000.0,39600000.0,28200000.0,11400000.0,422000.0,11800000.0,21729.0,11800000.0,0.45,,,,,2011
4,1504,103037,HEALTHSOUTH REHABILITATION HOSPITAL,901 NORTH CLEARWATER-LARGO ROAD,LARGO,FL,33770,PINELLAS,45300.0,U,RH,5,4,01/01/2011,12/31/2011,154.6,,,13304.0,699.0,16961.0,70.0,25550.0,,1086.0,44.0,1370.0,70.0,,13304.0,699.0,16961.0,70.0,25550.0,,1086.0,44.0,1370.0,,,-15000.0,-15000.0,8290000.0,7440000.0,717000.0,16500000.0,24900000.0,3560.0,24900000.0,,,,,,,40800000.0,,,1970000.0,-364000.0,122400.0,86945.0,,42600000.0,1200000.0,,10500000.0,,,3460000.0,,,4960000.0,,1550000.0,1550000.0,49100000.0,973000.0,611000.0,,,,391000.0,1980000.0,,,,,,1980000.0,47100000.0,47100000.0,49100000.0,,,,,,,,,24900000.0,3560.0,24900000.0,4670000.0,20200000.0,15700000.0,4510000.0,56900.0,4570000.0,,4570000.0,0.66,,,,,2011


In [22]:
# Display the last 5 rows of the dataset
hospital_provider_cost_data.tail()

Unnamed: 0,rpt_rec_num,Provider CCN,Hospital Name,Street Address,City,State Code,Zip Code,County,Medicare CBSA Number,Rural Versus Urban,CCN Facility Type,Provider Type,Type of Control,Fiscal Year Begin Date,Fiscal Year End Date,FTE - Employees on Payroll,Number of Interns and Residents (FTE),Total Days Title V,Total Days Title XVIII,Total Days Title XIX,Total Days (V + XVIII + XIX + Unknown),Number of Beds,Total Bed Days Available,Total Discharges Title V,Total Discharges Title XVIII,Total Discharges Title XIX,Total Discharges (V + XVIII + XIX + Unknown),Number of Beds + Total for all Subproviders,Hospital Total Days Title V For Adults & Peds,Hospital Total Days Title XVIII For Adults & Peds,Hospital Total Days Title XIX For Adults & Peds,Hospital Total Days (V + XVIII + XIX + Unknown) For Adults & Peds,Hospital Number of Beds For Adults & Peds,Hospital Total Bed Days Available For Adults & Peds,Hospital Total Discharges Title V For Adults & Peds,Hospital Total Discharges Title XVIII For Adults & Peds,Hospital Total Discharges Title XIX For Adults & Peds,Hospital Total Discharges (V + XVIII + XIX + Unknown) For Adults & Peds,Cost of Charity Care,Total Bad Debt Expense,Cost of Uncompensated Care,Total Unreimbursed and Uncompensated Care,Total Salaries From Worksheet A,Overhead Non-Salary Costs,Depreciation Cost,Total Costs,Inpatient Total Charges,Outpatient Total Charges,Combined Outpatient + Inpatient Total Charges,Wage-Related Costs (Core),Wage-Related Costs (RHC/FQHC),Total Salaries (adjusted),Contract Labor: Direct Patient Care,Wage Related Costs for Part - A Teaching Physicians,Wage Related Costs for Interns and Residents,Cash on Hand and in Banks,Temporary Investments,Notes Receivable,Accounts Receivable,Less: Allowances for Uncollectible Notes and Accounts Receivable,Inventory,Prepaid Expenses,Other Current Assets,Total Current Assets,Land,Land Improvements,Buildings,Leasehold Improvements,Fixed Equipment,Major Movable Equipment,Minor Equipment Depreciable,Health Information Technology Designated Assets,Total Fixed Assets,Investments,Other Assets,Total Other Assets,Total Assets,Accounts Payable,"Salaries, Wages, and Fees Payable",Payroll Taxes Payable,Notes and Loans Payable (Short Term),Deferred Income,Other Current Liabilities,Total Current Liabilities,Mortgage Payable,Notes Payable,Unsecured Loans,Other Long Term Liabilities,Total Long Term Liabilities,Total Liabilities,General Fund Balance,Total Fund Balances,Total Liabilities and Fund Balances,DRG Amounts Other Than Outlier Payments,DRG Amounts Before October 1,DRG Amounts After October 1,Outlier Payments For Discharges,Disproportionate Share Adjustment,Allowable DSH Percentage,Managed Care Simulated Payments,Total IME Payment,Inpatient Revenue,Outpatient Revenue,Total Patient Revenue,Less Contractual Allowance and Discounts on Patients' Accounts,Net Patient Revenue,Less Total Operating Expense,Net Income from Service to Patients,Total Other Income,Total Income,Total Other Expenses,Net Income,Cost To Charge Ratio,Net Revenue from Medicaid,Medicaid Charges,Net Revenue from Stand-Alone CHIP,Stand-Alone CHIP Charges,Cost Report Year
73969,776749,251325,COVINGTON COUNTY HOSPITAL,803 GERALD MCRANEY DRIVE,COLLINS,MS,39428,COVINGTON,25620.0,U,CAH,1,9,10/01/2021,09/30/2022,263.92,,,3316.0,1.0,4872.0,25.0,9125.0,,64.0,11.0,124.0,35.0,,235.0,1.0,441.0,25.0,9125.0,,64.0,11.0,124.0,1.0,82100.0,28700.0,710000.0,20400000.0,22000000.0,671000.0,30700000.0,9290000.0,44500000.0,53800000.0,,,,,,,7850000.0,,,10400000.0,-5930000.0,871000.0,309000.0,1180000.0,15300000.0,290000.0,,,,,10700000.0,,,11200000.0,,,,26500000.0,685000.0,,,820000.0,,2860000.0,4370000.0,,5180000.0,,,5180000.0,9550000.0,17000000.0,17000000.0,26500000.0,,,,,,,,,17200000.0,46000000.0,63200000.0,29600000.0,33700000.0,42500000.0,-8840000.0,9720000.0,884000.0,,884000.0,0.57,1260000.0,3990000.0,,,2022
73970,776764,330273,PUTNAM HOSPITAL CENTER,670 STONELEIGH AVE,CARMEL,NY,10512,PUTNAM,35614.0,U,STH,1,2,10/01/2021,09/30/2022,485.43,6.79,,6281.0,460.0,12328.0,81.0,29487.0,,1390.0,82.0,2797.0,101.0,,5661.0,432.0,10091.0,71.0,25837.0,,1390.0,82.0,2797.0,1120000.0,2700000.0,2150000.0,12400000.0,46100000.0,100000000.0,8900000.0,133000000.0,128000000.0,217000000.0,345000000.0,18300000.0,,46100000.0,3820000.0,,,3010000.0,,,30100000.0,-20500000.0,3430000.0,3150000.0,959000.0,22800000.0,1190000.0,1220000.0,92100000.0,,,111000000.0,,,55800000.0,,77800000.0,77800000.0,156000000.0,6900000.0,4440000.0,,1030000.0,,13700000.0,51800000.0,17300000.0,,,29600000.0,46900000.0,98800000.0,45000000.0,57700000.0,156000000.0,,,16300000.0,,,,6070000.0,533000.0,128000000.0,217000000.0,345000000.0,234000000.0,111000000.0,146000000.0,-34700000.0,8550000.0,-26200000.0,12200000.0,-38400000.0,0.38,7930000.0,47300000.0,,,2022
73971,776821,453304,TEXAS CHILDRENS HOSPITAL,6621 FANNIN,HOUSTON,TX,77030,HARRIS,26420.0,U,CH,7,2,10/01/2021,09/30/2022,11381.06,471.86,,573.0,20313.0,248901.0,863.0,315279.0,,83.0,1635.0,36460.0,863.0,,370.0,10046.0,154906.0,578.0,212318.0,,83.0,1635.0,36460.0,,,,,1130000000.0,2110000000.0,123000000.0,2260000000.0,3730000000.0,2930000000.0,6660000000.0,,,,,,,12200000.0,,,1180000000.0,-543000000.0,53700000.0,141000000.0,275000000.0,1240000000.0,159000000.0,,1670000000.0,56300000.0,986000000.0,662000000.0,174000000.0,86300000.0,2200000000.0,2950000000.0,379000000.0,3330000000.0,6760000000.0,290000000.0,179000000.0,24800000.0,,11600000.0,201000000.0,706000000.0,,1160000000.0,,124000000.0,1280000000.0,1990000000.0,4770000000.0,4770000000.0,6760000000.0,,,,,,,,,3730000000.0,2720000000.0,6450000000.0,3960000000.0,2500000000.0,3240000000.0,-745000000.0,564000000.0,-182000000.0,-2380000.0,-179000000.0,,,,,,2022
73972,776830,520096,WHEATON FRANCISCAN HEALTHCARE - ALL,3801 SPRING STREET,RACINE,WI,53405-,RACINE,39540.0,U,STH,1,1,07/01/2022,06/30/2023,1007.11,8.49,,7488.0,3179.0,41292.0,285.0,104025.0,,1542.0,577.0,7709.0,299.0,,6780.0,2659.0,34539.0,244.0,89060.0,,1542.0,577.0,7709.0,7360000.0,11200000.0,10700000.0,19600000.0,71400000.0,196000000.0,21000000.0,254000000.0,295000000.0,588000000.0,883000000.0,28200000.0,,71400000.0,8040000.0,171672.0,,9740.0,,,95200000.0,-46800000.0,7110000.0,764000.0,,75600000.0,3910000.0,4340000.0,254000000.0,951000.0,11000000.0,57800000.0,879000.0,,216000000.0,,2320000.0,2320000.0,294000000.0,5030000.0,5600000.0,,,,14700000.0,60500000.0,,,,5440000.0,5440000.0,65900000.0,228000000.0,228000000.0,294000000.0,,4270000.0,11900000.0,,1040000.0,0.26,24600000.0,252000.0,295000000.0,588000000.0,883000000.0,615000000.0,268000000.0,315000000.0,-47200000.0,46900000.0,-252000.0,128000.0,-380000.0,0.29,55400000.0,224000000.0,,,2022
73973,776832,520136,ASCENSION SE WISCONSIN HOSPITAL INC,5000 WEST CHAMBERS STREET,MILWAUKEE,WI,53210-,MILWAUKEE,33340.0,U,STH,1,1,07/01/2022,06/30/2023,1119.01,42.57,,10424.0,8085.0,51396.0,301.0,109865.0,,2060.0,703.0,9750.0,310.0,,9426.0,1647.0,32308.0,205.0,74825.0,,2060.0,703.0,9750.0,4590000.0,12300000.0,7990000.0,22300000.0,103000000.0,215000000.0,39600000.0,335000000.0,433000000.0,806000000.0,1240000000.0,25600000.0,,103000000.0,7010000.0,,,479000.0,,,104000000.0,-57300000.0,11200000.0,749000.0,,98800000.0,12400000.0,1920000.0,210000000.0,1980000.0,11600000.0,73000000.0,,,194000000.0,,4170000.0,4170000.0,297000000.0,5000000.0,7530000.0,,,,11300000.0,85800000.0,,,,6170000.0,6170000.0,92000000.0,205000000.0,205000000.0,297000000.0,,4950000.0,15800000.0,,1330000.0,0.26,32400000.0,1220000.0,433000000.0,806000000.0,1240000000.0,862000000.0,377000000.0,374000000.0,2980000.0,5400000.0,8380000.0,779000.0,7600000.0,0.27,75900000.0,334000000.0,,,2022


You can see that the dataset contains a mix of categorical, geographical, and numerical variables.

Each record (row) corresponds to a hospital’s annual Medicare cost report submission for a specific fiscal year. The dataset captures:

- Operational statistics (beds, employees, interns/residents)
- Utilization metrics (patient days, discharges, inpatient vs. outpatient activity)
- Financial performance (costs, charges, revenues, charity care, uncompensated care, net income)
- Geographical details (city, state, county, rural vs. urban classification)
- Organizational attributes (facility type, ownership/control type, teaching status)

Next, we will use df.shape and df.info() to explore the dataset’s dimensions and variable types in more detail.

### Data Validation

In [32]:
# Shape of the data set (rows, columns)
print("Shape of dataset:", hospital_provider_cost_data.shape)

Shape of dataset: (73974, 118)


In [39]:
# Summarised information of data set
hospital_provider_cost_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 73974 entries, 0 to 73973
Columns: 118 entries, rpt_rec_num to Cost Report Year
dtypes: float64(103), int64(5), object(10)
memory usage: 66.6+ MB


There are 73,974 rows including header and 118 columns. Data type is correct and matches the corresponding values.

In [44]:
# Column names and data types
print("\nColumn Info:")
print(hospital_provider_cost_data.dtypes)


Column Info:
rpt_rec_num                            int64
Provider CCN                           int64
Hospital Name                         object
Street Address                        object
City                                  object
                                      ...   
Net Revenue from Medicaid            float64
Medicaid Charges                     float64
Net Revenue from Stand-Alone CHIP    float64
Stand-Alone CHIP Charges             float64
Cost Report Year                       int64
Length: 118, dtype: object


In [50]:
# Quick descriptive stats (numeric only)
print("\nDescriptive Statistics (numeric columns):")
display(hospital_provider_cost_data.describe().T)


Descriptive Statistics (numeric columns):


Unnamed: 0,count,mean,std,min,25%,50%,75%,max
rpt_rec_num,73974.0,6.78e+05,8.53e+04,2.85e+02,6.19e+05,6.96e+05,7.48e+05,7.77e+05
Provider CCN,73974.0,2.69e+05,1.58e+05,1.00e+04,1.40e+05,2.60e+05,3.92e+05,7.13e+05
Medicare CBSA Number,73048.0,5.37e+04,3.42e+04,1.00e+00,2.64e+04,3.97e+04,9.99e+04,1.00e+05
Provider Type,73974.0,1.72e+00,1.55e+00,1.00e+00,1.00e+00,1.00e+00,1.00e+00,1.10e+01
Type of Control,73974.0,4.23e+00,3.22e+00,1.00e+00,2.00e+00,4.00e+00,5.00e+00,1.30e+01
...,...,...,...,...,...,...,...,...
Net Revenue from Medicaid,54482.0,2.38e+07,1.61e+08,-2.69e+07,1.86e+06,6.45e+06,2.01e+07,3.11e+10
Medicaid Charges,54457.0,1.27e+08,2.92e+08,-2.12e+07,6.79e+06,3.26e+07,1.24e+08,1.57e+10
Net Revenue from Stand-Alone CHIP,12037.0,7.81e+05,4.38e+06,-9.51e+04,2.53e+04,9.40e+04,3.58e+05,3.00e+08
Stand-Alone CHIP Charges,12218.0,3.75e+06,1.77e+07,-1.05e+06,1.17e+05,4.96e+05,1.96e+06,8.85e+08


In [55]:
# Quick descriptive stats (categorical only)
print("\nDescriptive Statistics (categorical columns):")
display(hospital_provider_cost_data.describe(include=['O']).T)


Descriptive Statistics (categorical columns):


Unnamed: 0,count,unique,top,freq
Hospital Name,73974,9118,ENCOMPASS HEALTH REHABILITATION HOSP,485
Street Address,73916,8543,444 LAFAYETTE ROAD,102
City,73974,3418,HOUSTON,506
State Code,73974,56,TX,7123
Zip Code,73974,6795,55164-0979,102
County,70162,1899,LOS ANGELES,1180
Rural Versus Urban,73048,2,U,44409
CCN Facility Type,73974,8,STH,40901
Fiscal Year Begin Date,73974,955,01/01/2022,2456
Fiscal Year End Date,73974,746,12/31/2021,2457


In [59]:
# Count missing values per column
print("\nMissing Values per Column:")
print(hospital_provider_cost_data.isnull().sum())


Missing Values per Column:
rpt_rec_num                              0
Provider CCN                             0
Hospital Name                            0
Street Address                          58
City                                     0
                                     ...  
Net Revenue from Medicaid            19492
Medicaid Charges                     19517
Net Revenue from Stand-Alone CHIP    61937
Stand-Alone CHIP Charges             61756
Cost Report Year                         0
Length: 118, dtype: int64


In [62]:
# Percentage of missing values
print("\nPercentage Missing:")
print((hospital_provider_cost_data.isnull().sum() / len(hospital_provider_cost_data) * 100).round(2))


Percentage Missing:
rpt_rec_num                           0.00
Provider CCN                          0.00
Hospital Name                         0.00
Street Address                        0.08
City                                  0.00
                                     ...  
Net Revenue from Medicaid            26.35
Medicaid Charges                     26.38
Net Revenue from Stand-Alone CHIP    83.73
Stand-Alone CHIP Charges             83.48
Cost Report Year                      0.00
Length: 118, dtype: float64


In [64]:
# Detect potential outliers using IQR
numeric_cols = hospital_provider_cost_data.select_dtypes(include=[np.number]).columns
Q1 = hospital_provider_cost_data[numeric_cols].quantile(0.25)
Q3 = hospital_provider_cost_data[numeric_cols].quantile(0.75)
IQR = Q3 - Q1
outliers = ((hospital_provider_cost_data[numeric_cols] < (Q1 - 1.5 * IQR)) | (hospital_provider_cost_data[numeric_cols] > (Q3 + 1.5 * IQR))).sum()

print("\nPotential Outliers per Numeric Column:")
print(outliers)


Potential Outliers per Numeric Column:
rpt_rec_num                            247
Provider CCN                             0
Medicare CBSA Number                     0
Provider Type                        17411
Type of Control                       8866
                                     ...  
Net Revenue from Medicaid             6224
Medicaid Charges                      6070
Net Revenue from Stand-Alone CHIP     1700
Stand-Alone CHIP Charges              1579
Cost Report Year                         0
Length: 108, dtype: int64


## Exploratory Data Analysis (EDA)

## Key Findings / Insights

## Limitations

## Next Steps / Future Work

## Notebook Outline
1. Data Import & Inspection
2. Data Cleaning
3. Exploratory Data Analysis (EDA)
4. Visualizations
5. Insights & Conclusions