**More Data For Hospital Radmissions**
***
My initial EDA determined that a lack of observations may have lead to unexpected or inaccurate results.  In order to test this hypothesis, I will create a cleaned readmissions dataframe without grouping the data by state.

In [81]:
# Importing Necessary Tools
import pandas as pd
import numpy as np

In [82]:
#Pull File Into Database and Set Column Names
col = ['hospital_name', 'provider_number', 'state', 'measure', 'discharges','footnote',
           'readmission_ratio','predicted_rate','expected_rate','readmissions','starte_date','end_Date']
df = pd.read_csv('Readmissions.csv')
df.columns=col

**Initial removal of the following columns:**<br>
-  Measure: The observations are to be grouped together on the state level to get the overall state readmission ratio.  Subsequently, rendering this column unneccessary to this analysis.
-  Footnote:  Footnotes are associated with a lack of information.  Most of which will be removed in the cleaning process.
-  Start_Date:  Does not provide any useful information for this analysis.  Also the same for all rows.
-  End_Date: Does not provide any useful information fro this analysis.  Also the same for all rows.
-  Discharges:  Currently not needed for this analysis.  High number of null values (8,072).
-  Readmissions:  Data Captured in another dataframe for analysis.  High number of null values (8,192).

In [83]:
# Removing Columns

usecols= ['hospital_name', 'provider_number', 'state','readmission_ratio',
           'predicted_rate','expected_rate']
df=df[usecols]
df.head()

Unnamed: 0,hospital_name,provider_number,state,readmission_ratio,predicted_rate,expected_rate
0,HIGHLANDS MEDICAL CENTER,10061,AL,Not Available,Not Available,Not Available
1,CLAY COUNTY HOSPITAL,10073,AL,0.9853,14.4,14.6
2,NORTHEAST ALABAMA REGIONAL MEDICAL CENTER,10078,AL,1.4044,6.1,4.3
3,NORTHEAST ALABAMA REGIONAL MEDICAL CENTER,10078,AL,0.9653,16.7,17.3
4,ATHENS LIMESTONE HOSPITAL,10079,AL,1.0204,4.3,4.2


In [84]:
# Coerce Discharges, Readmission Ratios, Predicted Rates, Expected Rates, and Readmissions to get NaNs
tonumeric=['readmission_ratio','predicted_rate','expected_rate',]
dfa = df[tonumeric].apply(pd.to_numeric, errors='coerce')
#Setting up additional columns to concatinate
dfb = df[['hospital_name','provider_number','state']]

In [85]:
# Concatenating Data Back Together and Confirming DataFrame Integrity
df2= pd.concat([dfb,dfa], axis=1)
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19830 entries, 0 to 19829
Data columns (total 6 columns):
hospital_name        19830 non-null object
provider_number      19830 non-null int64
state                19830 non-null object
readmission_ratio    14411 non-null float64
predicted_rate       14411 non-null float64
expected_rate        14411 non-null float64
dtypes: float64(3), int64(1), object(2)
memory usage: 929.6+ KB


In [86]:
# Dropping Null Values to Finish Cleaned Data
final = df2.dropna(how='any')
final.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14411 entries, 1 to 19828
Data columns (total 6 columns):
hospital_name        14411 non-null object
provider_number      14411 non-null int64
state                14411 non-null object
readmission_ratio    14411 non-null float64
predicted_rate       14411 non-null float64
expected_rate        14411 non-null float64
dtypes: float64(3), int64(1), object(2)
memory usage: 788.1+ KB


In [87]:
# Save and Print Final DataFrame Heading
final.to_csv('Readmissions_2.csv')
final.head()

Unnamed: 0,hospital_name,provider_number,state,readmission_ratio,predicted_rate,expected_rate
1,CLAY COUNTY HOSPITAL,10073,AL,0.9853,14.4,14.6
2,NORTHEAST ALABAMA REGIONAL MEDICAL CENTER,10078,AL,1.4044,6.1,4.3
3,NORTHEAST ALABAMA REGIONAL MEDICAL CENTER,10078,AL,0.9653,16.7,17.3
4,ATHENS LIMESTONE HOSPITAL,10079,AL,1.0204,4.3,4.2
5,ATHENS LIMESTONE HOSPITAL,10079,AL,1.0616,17.0,16.0


In [120]:
# Now it's Time to Bring in the Expanded Home Health Care Agencies Spreadsheet
df = pd.read_csv('HHC_Agencies.csv')

In [121]:
# Creating List and Renaming Columns
col = ['state', 'cms_number','name','address','city','zip','phone','own_type','nursing_care',
       'physical_therapy', 'occupational_therapy','pathology_services','medical_soc_services',
       'home_health_aid','cert_date','star_rating','footnote','timeliness','footnote','rx_ed',
       'footnote','fall_risk','footnote','depression_check','footnote','flu_shot','footnote',
       'pneumonia_shot','footnote','d_foot_care','footnote','move_buff','footnote','in_out_bed_buff',
       'footnote','bathing_buff','footnote','move_pain_debuff','footnote','breathing_buff','footnote',
       'healing_buff','footnote','oral_rx_buff','footnote','hospital_admit','footnote','urgent_noadmit',
      'footnote','readmit_expectation','footnote','er_admit_expectation','footnote','footnote']
df.columns=col

In [122]:
df.head()

Unnamed: 0,state,cms_number,name,address,city,zip,phone,own_type,nursing_care,physical_therapy,...,footnote,hospital_admit,footnote.1,urgent_noadmit,footnote.2,readmit_expectation,footnote.3,er_admit_expectation,footnote.4,footnote.5
0,AL,17000,ALABAMA DEPARTMENT OF PUBLIC HEALTH HOME CARE,"201 MONROE STREET, THE RSA TOWER, SUITE 1200",MONTGOMERY,36104,3342065341,Government - State/ County,True,True,...,This measure currently does not have data or p...,,This measure currently does not have data or p...,,This measure currently does not have data or p...,Not Available,This measure currently does not have data or p...,Not Available,This measure currently does not have data or p...,
1,AL,17008,JEFFERSON COUNTY HOME CARE,2201 ARLINGTON AVENUE,BESSEMER,35020,2059169500,Government - State/ County,True,True,...,,,The number of patient episodes for this measur...,,The number of patient episodes for this measur...,Not Available,The number of patient episodes for this measur...,Not Available,The number of patient episodes for this measur...,
2,AL,17009,ALACARE HOME HEALTH & HOSPICE,2970 LORNA ROAD,BIRMINGHAM,35216,2058242680,Proprietary,True,True,...,,18.3,,11.4,,Worse Than Expected,,Worse Than Expected,,
3,AL,17013,KINDRED AT HOME,1239 RUCKER BLVD,ENTERPRISE,36330,3343470234,Proprietary,True,True,...,,15.5,,15.1,,Same As Expected,,Worse Than Expected,,
4,AL,17014,AMEDISYS HOME HEALTH,68278 MAIN STREET,BLOUNTSVILLE,35031,8664864919,Proprietary,True,True,...,,18.9,,12.1,,Same As Expected,,Same As Expected,,


**Removal of the following columns:**<br>
-  All Footnotes:  Footnotes are mostly empty and associated with a lack of information.  Most empty data will be removed in the cleaning process.
-  Address:  Does not provide any useful information for this analysis.
-  City:  Does not provide any useful information for this analysis.
-  Zip:  Does not provide any useful information for this analysis.
-  Phone:  Does not provide any useful information for this analysis.
-  Type of Ownership:  Does not provide any useful information for this analysis.
-  Date Certified: Does not provide any useful information fro this analysis.  
-  Wound Care Improvement: Significant loss of data (7,400 null values).

In [123]:
# Removing Columns
usecols= ['state', 'cms_number','name','nursing_care','physical_therapy','occupational_therapy',
          'pathology_services','medical_soc_services','home_health_aid','star_rating','timeliness',
          'rx_ed','fall_risk','depression_check','flu_shot','pneumonia_shot','d_foot_care',
          'move_buff','in_out_bed_buff','bathing_buff','move_pain_debuff','breathing_buff',
          'oral_rx_buff','hospital_admit','urgent_noadmit','readmit_expectation',
          'er_admit_expectation']
df=df[usecols]
df.head()

Unnamed: 0,state,cms_number,name,nursing_care,physical_therapy,occupational_therapy,pathology_services,medical_soc_services,home_health_aid,star_rating,...,move_buff,in_out_bed_buff,bathing_buff,move_pain_debuff,breathing_buff,oral_rx_buff,hospital_admit,urgent_noadmit,readmit_expectation,er_admit_expectation
0,AL,17000,ALABAMA DEPARTMENT OF PUBLIC HEALTH HOME CARE,True,True,True,True,True,True,,...,,,,,,,,,Not Available,Not Available
1,AL,17008,JEFFERSON COUNTY HOME CARE,True,True,True,True,True,True,3.0,...,71.2,64.3,64.1,80.4,76.7,49.2,,,Not Available,Not Available
2,AL,17009,ALACARE HOME HEALTH & HOSPICE,True,True,True,True,True,True,4.0,...,79.4,75.4,83.5,85.9,81.3,72.4,18.3,11.4,Worse Than Expected,Worse Than Expected
3,AL,17013,KINDRED AT HOME,True,True,True,False,False,True,4.0,...,77.6,71.4,80.3,83.6,79.3,59.9,15.5,15.1,Same As Expected,Worse Than Expected
4,AL,17014,AMEDISYS HOME HEALTH,True,True,True,True,True,True,4.0,...,81.3,72.8,82.1,78.0,85.7,68.5,18.9,12.1,Same As Expected,Same As Expected


In [124]:
# Reviewing Information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11678 entries, 0 to 11677
Data columns (total 27 columns):
state                   11678 non-null object
cms_number              11678 non-null int64
name                    11678 non-null object
nursing_care            11678 non-null bool
physical_therapy        11678 non-null bool
occupational_therapy    11678 non-null bool
pathology_services      11678 non-null bool
medical_soc_services    11678 non-null bool
home_health_aid         11678 non-null bool
star_rating             8890 non-null float64
timeliness              9606 non-null float64
rx_ed                   9579 non-null float64
fall_risk               9380 non-null float64
depression_check        9582 non-null float64
flu_shot                9276 non-null float64
pneumonia_shot          9540 non-null float64
d_foot_care             8473 non-null float64
move_buff               8844 non-null float64
in_out_bed_buff         8749 non-null float64
bathing_buff            8887 n

In [125]:
# Dropping Null Values and Resetting the Index
df2=df.dropna(how='any')
df2=df2.reset_index()
df2 = df2.drop('index', axis = 1)
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7664 entries, 0 to 7663
Data columns (total 27 columns):
state                   7664 non-null object
cms_number              7664 non-null int64
name                    7664 non-null object
nursing_care            7664 non-null bool
physical_therapy        7664 non-null bool
occupational_therapy    7664 non-null bool
pathology_services      7664 non-null bool
medical_soc_services    7664 non-null bool
home_health_aid         7664 non-null bool
star_rating             7664 non-null float64
timeliness              7664 non-null float64
rx_ed                   7664 non-null float64
fall_risk               7664 non-null float64
depression_check        7664 non-null float64
flu_shot                7664 non-null float64
pneumonia_shot          7664 non-null float64
d_foot_care             7664 non-null float64
move_buff               7664 non-null float64
in_out_bed_buff         7664 non-null float64
bathing_buff            7664 non-null flo

In [126]:
# Reviewing Distinct Values for Readmission Expectations and ER Admission Expectations
df2.readmit_expectation.unique()

array(['Worse Than Expected', 'Same As Expected', 'Not Available',
       'Better Than Expected'], dtype=object)

In [127]:
df2.er_admit_expectation.unique()

array(['Worse Than Expected', 'Same As Expected', 'Better Than Expected',
       'Not Available'], dtype=object)

In [128]:
# Changing Categories to Numeric Values for Readmission Expectations and ER Admission Expectations
for n in range(len(df2)):
    for m in range(25,27):
        if df2.iloc[n,m] == 'Not Available':
            df2.iloc[n,m] = 0
        elif df2.iloc[n,m] == 'Worse Than Expected':
            df2.iloc[n,m] = 1
        elif df2.iloc[n,m] == 'Same As Expected':
            df2.iloc[n,m] = 2
        elif df2.iloc[n,m] == 'Better Than Expected':
            df2.iloc[n,m] = 3
    

In [129]:
# Save and Print Final DataFrame Heading
df2.to_csv('HHC_Agencies_Cleaned.csv')
df2.head()

Unnamed: 0,state,cms_number,name,nursing_care,physical_therapy,occupational_therapy,pathology_services,medical_soc_services,home_health_aid,star_rating,...,move_buff,in_out_bed_buff,bathing_buff,move_pain_debuff,breathing_buff,oral_rx_buff,hospital_admit,urgent_noadmit,readmit_expectation,er_admit_expectation
0,AL,17009,ALACARE HOME HEALTH & HOSPICE,True,True,True,True,True,True,4.0,...,79.4,75.4,83.5,85.9,81.3,72.4,18.3,11.4,1,1
1,AL,17013,KINDRED AT HOME,True,True,True,False,False,True,4.0,...,77.6,71.4,80.3,83.6,79.3,59.9,15.5,15.1,2,1
2,AL,17014,AMEDISYS HOME HEALTH,True,True,True,True,True,True,4.0,...,81.3,72.8,82.1,78.0,85.7,68.5,18.9,12.1,2,2
3,AL,17016,SOUTHEAST ALABAMA HOMECARE,True,True,True,True,True,False,5.0,...,85.8,79.0,87.9,91.5,87.2,80.6,16.9,11.9,2,2
4,AL,17018,KINDRED AT HOME,True,True,True,True,True,True,4.0,...,82.8,73.9,85.2,80.8,85.0,66.0,22.2,10.2,1,2
