## Dataset Exploration
Descriptions of variables found within the CMS Hospital Readmissions Reduction Program dataset.

In [1]:
# Import dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Import the data set
hospital_df = pd.read_csv('FY_2025_Hospital_Readmissions_Reduction_Program_Hospital.csv')
hospital_df.head()

Unnamed: 0,Facility Name,Facility ID,State,Measure Name,Number of Discharges,Footnote,Excess Readmission Ratio,Predicted Readmission Rate,Expected Readmission Rate,Number of Readmissions,Start Date,End Date
0,SOUTHEAST HEALTH MEDICAL CENTER,10001,AL,READM-30-AMI-HRRP,296.0,,0.9483,13.0146,13.7235,36,07/01/2020,06/30/2023
1,SOUTHEAST HEALTH MEDICAL CENTER,10001,AL,READM-30-CABG-HRRP,151.0,,0.9509,9.6899,10.1898,13,07/01/2020,06/30/2023
2,SOUTHEAST HEALTH MEDICAL CENTER,10001,AL,READM-30-HF-HRRP,681.0,,1.0597,21.5645,20.3495,151,07/01/2020,06/30/2023
3,SOUTHEAST HEALTH MEDICAL CENTER,10001,AL,READM-30-HIP-KNEE-HRRP,,,0.9654,4.268,4.4211,Too Few to Report,07/01/2020,06/30/2023
4,SOUTHEAST HEALTH MEDICAL CENTER,10001,AL,READM-30-PN-HRRP,490.0,,0.9715,16.1137,16.5863,77,07/01/2020,06/30/2023


### Variable Descriptions
The 12 variables included in the CMS Hospital Readmissions Reduction Program dataset are:
1) Facility Name - Name of the medical facility at which data was collected
2) Facility ID - Identifier for the medical facility at which data was collected
3) State - State in which the facility resides
4) Measure Name - Identifier for the measure used for a given observation. The measures (direct from the CMS data dictionary) are:
    * READM-30-AMIHRRP: Excess readmission ratio for heart attack patients
    * READM-30-COPDHRRP: Excess readmission ratio for chronic obstructive pulmonary disease (COPD) patients
    * READM-30-CABGHRRP: Excess readmission ration for Coronary Artery Bypass Graft (CABG) patients
    * READM-30-HFHRRP: Excess readmission ratio for heart failure patients
    * READM-30-HIPKNEE-HRRP: Excess readmission ratio for hip/knee replacement patients
    * READM-30-PNHRRP: Excess readmission ratio for pneumonia patients
5) Number of Discharges - The number of discharges for the given measure type during the reporting period
6) Footnote - Optional inclusion of a footnote with the observation
7) Excess Readmissions Ratio - Ratio of predicted readmission rate to expected readmission rate, providing an estimate of how a facility compared to the expected rate based on the national average
8) Predicted Readmission Rate - The rate of readmissions for the given facility and measure type as predicted by hospital data
9) Expected Readmission Rate - The expected readmission rate based on the national average rate for similar patients
10) Number of Readmissions - The actual number of readmissions for the given observation
11) Start Date - Start date for the reporting period
12) End Date - End date for the reporting period

In [7]:
# Print datatypes
hospital_df.dtypes

Facility Name                  object
Facility ID                     int64
State                          object
Measure Name                   object
Number of Discharges          float64
Footnote                      float64
Excess Readmission Ratio      float64
Predicted Readmission Rate    float64
Expected Readmission Rate     float64
Number of Readmissions         object
Start Date                     object
End Date                       object
dtype: object

### Variable Datatypes
The 12 variables have the following datatypes:
1) Facility Name - object (string)
2) Facility ID - int64 (integer id)
3) State - object (string)
4) Measure Name - object (string)
5) Number of Discharges - float54 (floating point number)
6) Footnote - float54 (floating point number)
7) Excess Readmissions Ratio - float54 (floating point number)
8) Predicted Readmission Rate - float54 (floating point number)
9) Expected Readmission Rate - float54 (floating point number)
10) Number of Readmissions - object (generally integer, although occasionally a string)
11) Start Date - object (string, can likely be cast to a datetime object)
12) End Date - object (string, can likely be cast to a datetime object)

In [5]:
# For fun, a quick description of the dataset
hospital_df.describe()

Unnamed: 0,Facility ID,Number of Discharges,Footnote,Excess Readmission Ratio,Predicted Readmission Rate,Expected Readmission Rate
count,18510.0,8340.0,6583.0,11927.0,11927.0,11927.0
mean,261770.055105,279.269904,3.187756,1.001719,14.995386,14.961234
std,164647.739172,266.018069,2.089167,0.080547,5.017854,4.871997
min,10001.0,0.0,1.0,0.4779,1.6742,2.8921
25%,110073.0,115.0,1.0,0.95655,12.533,12.6128
50%,250048.0,197.0,5.0,0.9982,16.0602,16.146
75%,390133.0,354.0,5.0,1.043,18.609,18.66735
max,670327.0,4501.0,7.0,1.643,27.8095,25.3942
