# Value Based Care Performance for US States
### A SQLAlchemy /Javascript project demonstrating ETL and Visualization

Using performance data for End-Stage Renal Disease clinics reported for Payment Year 2019, this project demonstrates:
- Connecting to a SQLite database
- Extracting data with SQL
- Transforming with Pandas
- Loading data to csv files

View the data: https://data.medicare.gov/Dialysis-Facility-Compare/ESRD-QIP-Complete-QIP-Data-Payment-Year-2019/qt66-qjh7

<img src="static/images/ESRDdatasource.png">

## Dependencies

In [1]:
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy import inspect

## E X T R A C T

### Connect to Database
##### The database includes three key tables: 
esrd_total_2019 (contains payment year 2019 scores for the end-stage renal disease Quality Incentive Program, for all clinics who participated)<br>
states_latlon (contains coordinates for US states.)<br>
zipcodes_latlon (contains coordinates for US zip codes.)<br>

In [2]:
# First, we connect the engine to the database
engine = create_engine('sqlite:///db/esrdqip.sqlite')
engine

Engine(sqlite:///db/esrdqip.sqlite)

In [3]:
# Create the database connection...

connection = engine.connect()

# ...and query the data
result = connection.execute("select distinct State from esrdqip_total_2019")
for row in result:
    print("State:", row['State'])
connection.close()

State: AL
State: AK
State: AZ
State: AR
State: CA
State: CO
State: CT
State: DE
State: DC
State: FL
State: GA
State: HI
State: ID
State: IL
State: IN
State: IA
State: KS
State: KY
State: LA
State: ME
State: MD
State: MA
State: MI
State: MN
State: MS
State: MO
State: MT
State: NE
State: NV
State: NJ
State: NM
State: NY
State: NC
State: ND
State: OH
State: OK
State: OR
State: PA
State: PR
State: RI
State: SC
State: SD
State: TN
State: TX
State: UT
State: VT
State: VI
State: VA
State: WA
State: WV
State: WI
State: AS
State: GU
State: MP
State: NH
State: WY


In [4]:
# SQL Expression Language creates metadata that contains objects that define the table
metadata = MetaData()

# This method instantiates the tables that already exist in the database, which the engine is connected to. 
metadata.create_all(engine)

# Using inspector, we can see the table structure and variable types.
inspector = inspect(engine)

In [5]:
# We can check out the tables in the database
inspector.get_table_names()

['esrdqip_total_2019', 'states_latlon', 'zipcodes_latlon']

In [6]:
# We can check out the columns in the table of ESRD scores
esrd_column_dict = inspector.get_columns('esrdqip_total_2019')

esrd_column_list = []
for column in esrd_column_dict:
    esrd_column_list.append(column['name'])
esrd_column_list

['Facility Name',
 'CMS Certification Number (CCN)',
 'Alternate CCN',
 'Address',
 'City',
 'State',
 'Zip Code',
 'Network',
 'VAT Catheter Measure Score',
 'VAT Catheter Reason for No Score (See Footnotes File)',
 'VAT Catheter Achievement Measure Rate',
 'Number of Patients Included in VAT Catheter Measure Score Achievement Period',
 'VAT Catheter Achievement Period Numerator',
 'VAT Catheter Achievement Period Denominator',
 'VAT Catheter Improvement Measure Rate',
 'VAT Catheter Improvement Period Numerator',
 'VAT Catheter Improvement Period Denominator',
 'VAT Catheter Measure Score Applied',
 'National Average VAT Catheter Measure Score',
 'VAT Fistula Measure Score',
 'VAT Fistula Reason for No Score (See Footnotes File)',
 'VAT Fistula Achievement Measure Rate',
 'Number of Patients Included in VAT Fistula Measure Score Achievement Period',
 'VAT Fistula Achievement Period Numerator',
 'VAT Fistula Achievement Period Denominator',
 'VAT Fistula Improvement Measure Rate',
 'V

In [7]:
# We can check out the columns in the table of state coordinates
states_column_dict = inspector.get_columns('states_latlon')

states_column_list = []
for column in states_column_dict:
    states_column_list.append(column['name'])
states_column_list

['Abbr', 'Latitude', 'Longitude', 'State']

In [8]:
# We can check out the columns in the table of zipcode coordinates
zips_column_dict = inspector.get_columns('zipcodes_latlon')

zips_column_list = []
for column in zips_column_dict:
    zips_column_list.append(column['name'])
zips_column_list

['Zipcode',
 'ZipCodeType',
 'City',
 'State',
 'LocationType',
 'Lat',
 'Long',
 'Location',
 'Decommisioned',
 'TaxReturnsFiled',
 'EstimatedPopulation',
 'TotalWages']

In [9]:
# Let's execute raw SQL on a table using SQLAlchemy
with engine.connect() as con:
    
    rs = con.execute('SELECT "Facility Name", "VAT Catheter Measure Score" FROM esrdqip_total_2019 LIMIT 10')
    
    for row in rs:
        print(row)
        
# Close the connection to the database when the query is done.
con.close()

('WALKER COUNTY DIALYSIS', '4')
('FMC PRICHARD', '8')
('OZARK DIALYSIS', '10')
('DCI GEORGIANA', '7')
('FMC THOMASVILLE', '4')
('FMC WEST', '8')
('PICKENS COUNTY DIALYSIS', '4')
('FRESENIUS MEDICAL CARE BIRMINGHAM HOME LLC', '7')
('BIO MEDICAL APPLICATIONS OF ALABAMA INC', 'No Score')
('FMC SOLDOTNA', '5')


## T R A N S F O R M

### Query the tables into dataframes for data transformation.

In [10]:
facility_totals_df = pd.read_sql_query("""
SELECT 
    esrdqip_total_2019."State",
    esrdqip_total_2019."Facility Name",    
    esrdqip_total_2019."Total Performance Score",
    esrdqip_total_2019."PY2019 Payment Reduction Percentage",
    esrdqip_total_2019."CMS Certification Date",
    states_latlon."Latitude" as StateLat,
    states_latlon."Longitude" as StateLon
FROM
    esrdqip_total_2019 JOIN states_latlon 
    ON esrdqip_total_2019."State" = states_latlon."Abbr"
""", con=engine.connect())

facility_totals_df.head()

Unnamed: 0,State,Facility Name,Total Performance Score,PY2019 Payment Reduction Percentage,CMS Certification Date,StateLat,StateLon
0,AL,WALKER COUNTY DIALYSIS,62,No Reduction,12/29/1987,32.6010112,-86.6807365
1,AL,FMC PRICHARD,62,No Reduction,4/23/1990,32.6010112,-86.6807365
2,AL,OZARK DIALYSIS,74,No Reduction,8/4/1992,32.6010112,-86.6807365
3,AL,DCI GEORGIANA,83,No Reduction,12/31/1996,32.6010112,-86.6807365
4,AL,FMC THOMASVILLE,64,No Reduction,9/15/1998,32.6010112,-86.6807365


In [11]:
pay_reduct_bystate_df = pd.read_sql_query("""
SELECT 
    esrdqip_total_2019."State",
    esrdqip_total_2019."PY2019 Payment Reduction Percentage",
    COUNT(esrdqip_total_2019."Facility Name") as FacilityCount
FROM
    esrdqip_total_2019
GROUP BY 
    esrdqip_total_2019."State",
    esrdqip_total_2019."PY2019 Payment Reduction Percentage"
""", con=engine.connect())

pay_reduct_bystate_df.head()

Unnamed: 0,State,PY2019 Payment Reduction Percentage,FacilityCount
0,AK,No Reduction,9
1,AL,0.50%,32
2,AL,1.00%,8
3,AL,1.50%,5
4,AL,No Reduction,131


In [12]:
esrd_df = pd.read_sql_query("""
SELECT 
    esrdqip_total_2019."Facility Name",
    esrdqip_total_2019."CMS Certification Number (CCN)",
    esrdqip_total_2019."City",
    esrdqip_total_2019."State",
    esrdqip_total_2019."Zip Code",
    esrdqip_total_2019."VAT Catheter Measure Score",
    esrdqip_total_2019."National Average VAT Catheter Measure Score",
    esrdqip_total_2019."VAT Fistula Measure Score",
    esrdqip_total_2019."National Average Fistula Measure Score",
    esrdqip_total_2019."Vascular Access Combined Measure Score",
    esrdqip_total_2019."National Average Vascular Access Combined Measure Score",
    esrdqip_total_2019."Kt/V Comprehensive Measure Score",
    esrdqip_total_2019."National Average Kt/V Comprehensive Measure Score",
    esrdqip_total_2019."Hypercalcemia Measure Score",
    esrdqip_total_2019."National Average Hypercalcemia Measure Score",
    esrdqip_total_2019."NHSN Influenza Measure Score",
    esrdqip_total_2019."National Average NHSN Influenza Measure Score",
    esrdqip_total_2019."NHSN BSI Measure Score",
    esrdqip_total_2019."National Average NHSN BSI Measure Score",
    esrdqip_total_2019."NHSN Combined Measure Score",
    esrdqip_total_2019."National Average NHSN Combined Measure Score",
    esrdqip_total_2019."ICH CAHPS Measure Score",
    esrdqip_total_2019."National Average ICH CAHPS Measure Score",
    esrdqip_total_2019."Mineral Metabolism Measure Score",
    esrdqip_total_2019."National Avg Mineral Metabolism Measure Score",
    esrdqip_total_2019."Anemia Management Measure Score",
    esrdqip_total_2019."National Average Anemia Management Measure Score",
    esrdqip_total_2019."Standardized Readmission Ratio (SRR) Measure Score",
    esrdqip_total_2019."National Average SRR Measure Score",
    esrdqip_total_2019."Standardized Transfusion Ratio (STrR) Measure Score",
    esrdqip_total_2019."National Average STrR Measure Score",
    esrdqip_total_2019."Clinical Depression Screening and Follow-up Measure Score",
    esrdqip_total_2019."National Average Clinical Depression Screening and Follow-up Measure Score",
    esrdqip_total_2019."Pain Assessment and Follow-up Measure Score",
    esrdqip_total_2019."National Average Pain Assessment and Follow-up Measure Score",
    esrdqip_total_2019."Total Performance Score",
    esrdqip_total_2019."PY2019 Payment Reduction Percentage",
    esrdqip_total_2019."CMS Certification Date"
FROM
    esrdqip_total_2019
""", con=engine.connect())

# Drop rows where there are no scores
esrd_df = esrd_df[esrd_df != 'No Score'].dropna()

# Cast scores as integers for statistical calculations
esrd_df[['VAT Catheter Measure Score',
'National Average VAT Catheter Measure Score',
'VAT Fistula Measure Score',
'National Average Fistula Measure Score',
'Vascular Access Combined Measure Score',
'National Average Vascular Access Combined Measure Score',
'Kt/V Comprehensive Measure Score',
'National Average Kt/V Comprehensive Measure Score',
'Hypercalcemia Measure Score',
'National Average Hypercalcemia Measure Score',
'NHSN Influenza Measure Score',
'National Average NHSN Influenza Measure Score',
'NHSN BSI Measure Score',
'National Average NHSN BSI Measure Score',
'NHSN Combined Measure Score',
'National Average NHSN Combined Measure Score',
'ICH CAHPS Measure Score',
'National Average ICH CAHPS Measure Score',
'Mineral Metabolism Measure Score',
'National Avg Mineral Metabolism Measure Score',
'Anemia Management Measure Score',
'National Average Anemia Management Measure Score',
'Standardized Readmission Ratio (SRR) Measure Score',
'National Average SRR Measure Score',
'Standardized Transfusion Ratio (STrR) Measure Score',
'National Average STrR Measure Score',
'Clinical Depression Screening and Follow-up Measure Score',
'National Average Clinical Depression Screening and Follow-up Measure Score',
'Pain Assessment and Follow-up Measure Score',
'National Average Pain Assessment and Follow-up Measure Score',
'Total Performance Score']] = esrd_df[['VAT Catheter Measure Score',
'National Average VAT Catheter Measure Score',
'VAT Fistula Measure Score',
'National Average Fistula Measure Score',
'Vascular Access Combined Measure Score',
'National Average Vascular Access Combined Measure Score',
'Kt/V Comprehensive Measure Score',
'National Average Kt/V Comprehensive Measure Score',
'Hypercalcemia Measure Score',
'National Average Hypercalcemia Measure Score',
'NHSN Influenza Measure Score',
'National Average NHSN Influenza Measure Score',
'NHSN BSI Measure Score',
'National Average NHSN BSI Measure Score',
'NHSN Combined Measure Score',
'National Average NHSN Combined Measure Score',
'ICH CAHPS Measure Score',
'National Average ICH CAHPS Measure Score',
'Mineral Metabolism Measure Score',
'National Avg Mineral Metabolism Measure Score',
'Anemia Management Measure Score',
'National Average Anemia Management Measure Score',
'Standardized Readmission Ratio (SRR) Measure Score',
'National Average SRR Measure Score',
'Standardized Transfusion Ratio (STrR) Measure Score',
'National Average STrR Measure Score',
'Clinical Depression Screening and Follow-up Measure Score',
'National Average Clinical Depression Screening and Follow-up Measure Score',
'Pain Assessment and Follow-up Measure Score',
'National Average Pain Assessment and Follow-up Measure Score',
'Total Performance Score']].astype('int64')

# Preview the dataframe
esrd_df.head()

Unnamed: 0,Facility Name,CMS Certification Number (CCN),City,State,Zip Code,VAT Catheter Measure Score,National Average VAT Catheter Measure Score,VAT Fistula Measure Score,National Average Fistula Measure Score,Vascular Access Combined Measure Score,...,National Average SRR Measure Score,Standardized Transfusion Ratio (STrR) Measure Score,National Average STrR Measure Score,Clinical Depression Screening and Follow-up Measure Score,National Average Clinical Depression Screening and Follow-up Measure Score,Pain Assessment and Follow-up Measure Score,National Average Pain Assessment and Follow-up Measure Score,Total Performance Score,PY2019 Payment Reduction Percentage,CMS Certification Date
1,FMC PRICHARD,12537,WHISTLER,AL,36613,8,5,7,5,8,...,5,2,5,10,10,10,10,62,No Reduction,4/23/1990
2,OZARK DIALYSIS,12544,OZARK,AL,36360,10,5,7,5,9,...,5,7,5,10,10,10,10,74,No Reduction,8/4/1992
5,FMC WEST,12601,BIRMINGHAM,AL,35211,8,5,1,5,5,...,5,4,5,10,10,10,10,57,0.50%,3/27/2001
6,PICKENS COUNTY DIALYSIS,12640,CARROLLTON,AL,35447,4,5,0,5,2,...,5,3,5,10,10,10,10,66,No Reduction,1/5/2011
10,031308 GILA RIVER DIALYSIS EAST,32315,SACATON,AZ,85247,8,5,10,5,9,...,5,10,5,10,10,10,10,76,No Reduction,1/4/2006


In [13]:
# Create dataframe with states (spelled out)

states_df = pd.read_sql_query("""
SELECT * FROM states_latlon
""", con=engine.connect())

states_df.head()

Unnamed: 0,Abbr,Latitude,Longitude,State
0,AL,32.6010112,-86.6807365,Alabama
1,AK,61.3025006,-158.7750198,Alaska
2,AZ,34.1682185,-111.930907,Arizona
3,AR,34.7519275,-92.1313784,Arkansas
4,CA,37.2718745,-119.2704153,California


In [14]:
# Create a dataframe of zip code location data (useful for map visualizations)

zips_df = pd.read_sql_query("""
SELECT
    zipcodes_latlon."Zipcode",
    zipcodes_latlon."Lat",
    zipcodes_latlon."Long"
FROM
    zipcodes_latlon
""", con=engine.connect())

zips_df.head()

Unnamed: 0,Zipcode,Lat,Long
0,99950,55.34,-131.64
1,99929,55.95,-131.96
2,99928,55.45,-131.79
3,99927,56.3,-133.57
4,99926,55.14,-131.49


In [15]:
# Merge the dataframes to build one complete dataframe with all of the useful information

# Merge esrd scores with zip code data
merge1_df = pd.merge(esrd_df, zips_df, how= "inner", left_on= "Zip Code", right_on= "Zipcode")\
    .drop(columns="Zipcode")
merge1_df.head()

# Merge in the state names spelled out
merge2_df = pd.merge(merge1_df, states_df, how= "inner", left_on= "State", right_on= "Abbr")\
    .drop(columns=["Abbr", "Latitude", "Longitude"])\
    .rename(columns= {"State_x" : "State Abbr", "State_y" : "state"})
merge2_df.head()

# Select and reorder the desired columns
complete_df = merge2_df[['Facility Name',
'CMS Certification Number (CCN)',
'City',
'state',
'State Abbr',
'Zip Code',
'Lat',
'Long',
'VAT Catheter Measure Score',
'National Average VAT Catheter Measure Score',
'VAT Fistula Measure Score',
'National Average Fistula Measure Score',
'Vascular Access Combined Measure Score',
'National Average Vascular Access Combined Measure Score',
'Kt/V Comprehensive Measure Score',
'National Average Kt/V Comprehensive Measure Score',
'Hypercalcemia Measure Score',
'National Average Hypercalcemia Measure Score',
'NHSN Influenza Measure Score',
'National Average NHSN Influenza Measure Score',
'NHSN BSI Measure Score',
'National Average NHSN BSI Measure Score',
'NHSN Combined Measure Score',
'National Average NHSN Combined Measure Score',
'ICH CAHPS Measure Score',
'National Average ICH CAHPS Measure Score',
'Mineral Metabolism Measure Score',
'National Avg Mineral Metabolism Measure Score',
'Anemia Management Measure Score',
'National Average Anemia Management Measure Score',
'Standardized Readmission Ratio (SRR) Measure Score',
'National Average SRR Measure Score',
'Standardized Transfusion Ratio (STrR) Measure Score',
'National Average STrR Measure Score',
'Clinical Depression Screening and Follow-up Measure Score',
'National Average Clinical Depression Screening and Follow-up Measure Score',
'Pain Assessment and Follow-up Measure Score',
'National Average Pain Assessment and Follow-up Measure Score',
'Total Performance Score',
'PY2019 Payment Reduction Percentage',
'CMS Certification Date']]

# Preview the data
complete_df.head()

Unnamed: 0,Facility Name,CMS Certification Number (CCN),City,state,State Abbr,Zip Code,Lat,Long,VAT Catheter Measure Score,National Average VAT Catheter Measure Score,...,National Average SRR Measure Score,Standardized Transfusion Ratio (STrR) Measure Score,National Average STrR Measure Score,Clinical Depression Screening and Follow-up Measure Score,National Average Clinical Depression Screening and Follow-up Measure Score,Pain Assessment and Follow-up Measure Score,National Average Pain Assessment and Follow-up Measure Score,Total Performance Score,PY2019 Payment Reduction Percentage,CMS Certification Date
0,FMC PRICHARD,12537,WHISTLER,Alabama,AL,36613,30.76,-88.12,8,5,...,5,2,5,10,10,10,10,62,No Reduction,4/23/1990
1,OZARK DIALYSIS,12544,OZARK,Alabama,AL,36360,31.43,-85.64,10,5,...,5,7,5,10,10,10,10,74,No Reduction,8/4/1992
2,FMC WEST,12601,BIRMINGHAM,Alabama,AL,35211,33.52,-86.79,8,5,...,5,4,5,10,10,10,10,57,0.50%,3/27/2001
3,RCG PRINCETON,12526,BIRMINGHAM,Alabama,AL,35211,33.52,-86.79,8,5,...,5,4,5,10,10,10,10,48,1.00%,9/18/1985
4,PICKENS COUNTY DIALYSIS,12640,CARROLLTON,Alabama,AL,35447,33.26,-88.09,4,5,...,5,3,5,10,10,10,10,66,No Reduction,1/5/2011


In [16]:
complete_df["FacilityAge"] = pd.to_datetime('12/31/18') - pd.to_datetime(complete_df["CMS Certification Date"])
complete_df["FacilityAge"] = complete_df["FacilityAge"][:-5]
complete_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0,Facility Name,CMS Certification Number (CCN),City,state,State Abbr,Zip Code,Lat,Long,VAT Catheter Measure Score,National Average VAT Catheter Measure Score,...,Standardized Transfusion Ratio (STrR) Measure Score,National Average STrR Measure Score,Clinical Depression Screening and Follow-up Measure Score,National Average Clinical Depression Screening and Follow-up Measure Score,Pain Assessment and Follow-up Measure Score,National Average Pain Assessment and Follow-up Measure Score,Total Performance Score,PY2019 Payment Reduction Percentage,CMS Certification Date,FacilityAge
0,FMC PRICHARD,12537,WHISTLER,Alabama,AL,36613,30.76,-88.12,8,5,...,2,5,10,10,10,10,62,No Reduction,4/23/1990,10479 days
1,OZARK DIALYSIS,12544,OZARK,Alabama,AL,36360,31.43,-85.64,10,5,...,7,5,10,10,10,10,74,No Reduction,8/4/1992,9645 days
2,FMC WEST,12601,BIRMINGHAM,Alabama,AL,35211,33.52,-86.79,8,5,...,4,5,10,10,10,10,57,0.50%,3/27/2001,6488 days
3,RCG PRINCETON,12526,BIRMINGHAM,Alabama,AL,35211,33.52,-86.79,8,5,...,4,5,10,10,10,10,48,1.00%,9/18/1985,12157 days
4,PICKENS COUNTY DIALYSIS,12640,CARROLLTON,Alabama,AL,35447,33.26,-88.09,4,5,...,3,5,10,10,10,10,66,No Reduction,1/5/2011,2917 days


In [17]:
# Create a State Averages Dataframe

avg_state_scores_df = round(complete_df[['state',
'VAT Catheter Measure Score',
'National Average VAT Catheter Measure Score',
'VAT Fistula Measure Score',
'National Average Fistula Measure Score',
'Vascular Access Combined Measure Score',
'National Average Vascular Access Combined Measure Score',
'Kt/V Comprehensive Measure Score',
'National Average Kt/V Comprehensive Measure Score',
'Hypercalcemia Measure Score',
'National Average Hypercalcemia Measure Score',
'NHSN Influenza Measure Score',
'National Average NHSN Influenza Measure Score',
'NHSN BSI Measure Score',
'National Average NHSN BSI Measure Score',
'NHSN Combined Measure Score',
'National Average NHSN Combined Measure Score',
'ICH CAHPS Measure Score',
'National Average ICH CAHPS Measure Score',
'Mineral Metabolism Measure Score',
'National Avg Mineral Metabolism Measure Score',
'Anemia Management Measure Score',
'National Average Anemia Management Measure Score',
'Standardized Readmission Ratio (SRR) Measure Score',
'National Average SRR Measure Score',
'Standardized Transfusion Ratio (STrR) Measure Score',
'National Average STrR Measure Score',
'Clinical Depression Screening and Follow-up Measure Score',
'National Average Clinical Depression Screening and Follow-up Measure Score',
'Pain Assessment and Follow-up Measure Score',
'National Average Pain Assessment and Follow-up Measure Score',
'Total Performance Score']].groupby("state").mean(), 2)

avg_state_scores_df.head()

Unnamed: 0_level_0,VAT Catheter Measure Score,National Average VAT Catheter Measure Score,VAT Fistula Measure Score,National Average Fistula Measure Score,Vascular Access Combined Measure Score,National Average Vascular Access Combined Measure Score,Kt/V Comprehensive Measure Score,National Average Kt/V Comprehensive Measure Score,Hypercalcemia Measure Score,National Average Hypercalcemia Measure Score,...,National Average Anemia Management Measure Score,Standardized Readmission Ratio (SRR) Measure Score,National Average SRR Measure Score,Standardized Transfusion Ratio (STrR) Measure Score,National Average STrR Measure Score,Clinical Depression Screening and Follow-up Measure Score,National Average Clinical Depression Screening and Follow-up Measure Score,Pain Assessment and Follow-up Measure Score,National Average Pain Assessment and Follow-up Measure Score,Total Performance Score
state,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Alabama,6.74,5.0,2.64,5.0,4.89,5.0,7.95,8.0,8.33,8.0,...,10.0,4.7,5.0,4.62,5.0,9.84,10.0,9.92,10.0,63.1
Alaska,3.5,5.0,6.5,5.0,4.5,5.0,7.5,8.0,8.0,8.0,...,10.0,6.0,5.0,6.5,5.0,10.0,10.0,10.0,10.0,73.5
Arizona,6.71,5.0,7.64,5.0,7.11,5.0,8.44,8.0,8.45,8.0,...,10.0,5.2,5.0,6.59,5.0,10.0,10.0,9.98,10.0,70.08
Arkansas,4.75,5.0,3.88,5.0,4.38,5.0,8.53,8.0,7.91,8.0,...,10.0,4.66,5.0,3.44,5.0,10.0,10.0,10.0,10.0,60.75
California,5.48,5.0,5.77,5.0,5.6,5.0,7.95,8.0,8.24,8.0,...,10.0,4.77,5.0,5.81,5.0,9.95,10.0,9.94,10.0,67.06


In [18]:
# Create a City Averages Dataframe

avg_city_scores_df = round(complete_df[['City',
'VAT Catheter Measure Score',
'National Average VAT Catheter Measure Score',
'VAT Fistula Measure Score',
'National Average Fistula Measure Score',
'Vascular Access Combined Measure Score',
'National Average Vascular Access Combined Measure Score',
'Kt/V Comprehensive Measure Score',
'National Average Kt/V Comprehensive Measure Score',
'Hypercalcemia Measure Score',
'National Average Hypercalcemia Measure Score',
'NHSN Influenza Measure Score',
'National Average NHSN Influenza Measure Score',
'NHSN BSI Measure Score',
'National Average NHSN BSI Measure Score',
'NHSN Combined Measure Score',
'National Average NHSN Combined Measure Score',
'ICH CAHPS Measure Score',
'National Average ICH CAHPS Measure Score',
'Mineral Metabolism Measure Score',
'National Avg Mineral Metabolism Measure Score',
'Anemia Management Measure Score',
'National Average Anemia Management Measure Score',
'Standardized Readmission Ratio (SRR) Measure Score',
'National Average SRR Measure Score',
'Standardized Transfusion Ratio (STrR) Measure Score',
'National Average STrR Measure Score',
'Clinical Depression Screening and Follow-up Measure Score',
'National Average Clinical Depression Screening and Follow-up Measure Score',
'Pain Assessment and Follow-up Measure Score',
'National Average Pain Assessment and Follow-up Measure Score',
'Total Performance Score']].groupby("City").mean(), 2)

avg_city_scores_df.head()

Unnamed: 0_level_0,VAT Catheter Measure Score,National Average VAT Catheter Measure Score,VAT Fistula Measure Score,National Average Fistula Measure Score,Vascular Access Combined Measure Score,National Average Vascular Access Combined Measure Score,Kt/V Comprehensive Measure Score,National Average Kt/V Comprehensive Measure Score,Hypercalcemia Measure Score,National Average Hypercalcemia Measure Score,...,National Average Anemia Management Measure Score,Standardized Readmission Ratio (SRR) Measure Score,National Average SRR Measure Score,Standardized Transfusion Ratio (STrR) Measure Score,National Average STrR Measure Score,Clinical Depression Screening and Follow-up Measure Score,National Average Clinical Depression Screening and Follow-up Measure Score,Pain Assessment and Follow-up Measure Score,National Average Pain Assessment and Follow-up Measure Score,Total Performance Score
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ABBEVILLE,10.0,5.0,4.0,5.0,7.0,5.0,10.0,8.0,10.0,8.0,...,10.0,7.0,5.0,10.0,5.0,10.0,10.0,10.0,10.0,82.0
ABERDEEN,8.0,5.0,8.33,5.0,8.0,5.0,9.67,8.0,9.67,8.0,...,10.0,5.0,5.0,5.33,5.0,10.0,10.0,10.0,10.0,73.0
ABILENE,4.67,5.0,6.67,5.0,5.67,5.0,8.67,8.0,9.67,8.0,...,10.0,4.67,5.0,1.33,5.0,10.0,10.0,10.0,10.0,67.0
ABINGDON,4.0,5.0,7.0,5.0,5.0,5.0,9.0,8.0,10.0,8.0,...,10.0,8.0,5.0,8.0,5.0,10.0,10.0,10.0,10.0,84.0
ADEL,10.0,5.0,6.0,5.0,8.0,5.0,10.0,8.0,9.0,8.0,...,10.0,7.0,5.0,8.0,5.0,10.0,10.0,10.0,10.0,84.0


## L O A D

In [19]:
# Load the cleaned dataframes to csv files. These will be used to build interactive visualizations (d3).

complete_df.to_csv("./db./csv./complete.csv")
avg_state_scores_df.to_csv("./db./csv./avgstatescores.csv")
avg_city_scores_df.to_csv("./db./csv./avgcityscores.csv")
pay_reduct_bystate_df.to_csv("./db./csv./payreductbystate.csv")
facility_totals_df.to_csv("./db./csv./facilitytotals.csv")