# Creator: Subhash Nair, San Diego, CA - https://www.linkedin.com/in/nairsubhash/
# IBM data science professional certificate Capstone project work (January 2020)

### Goal:
### 1. K-Means clustering of zip codes on a California map based on the most common causes of death from the past ~17 years.
### 2. Use this notebook to find to find the most common causes of death in your zip code in California or any other information that you may seek from this dataset.
-----------------------------------------------------------------------------------------------------------------


### Raw data sources:
### 1. California Leading Causes of Death by ZIP Code - https://healthdata.gov/dataset/leading-causes-death-zip-code
### 2. California zip code, latitude and longitude data - https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/export/?refine.state=CA
-------------------------------------------------------------------------------------------------------------------------


### Obtain California Leading Causes of Death by ZIP Code from https://healthdata.gov/dataset/leading-causes-death-zip-code
### The code lines in the cell below have sensitive account information which Watson will remove during publication on github

In [69]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Year,ZIP Code,Causes of Death,Count
0,1999,90001,ALZ,3
1,1999,90001,CAN,53
2,1999,90001,CLD,16
3,1999,90001,DIA,16
4,1999,90001,HOM,11


### Please refer to the following long description of the 3 character code for the causes of death.
##### HTD Diseases of the Heart 
##### CAN Malignant Neoplasms (Cancers)
##### STK Cerebrovascular Disease (Stroke)
##### CLD Chronic Lower Respiratory Disease (CLRD)
##### INJ Unintentional Injuries
##### PNF Pneumonia and Influenza
##### DIA Diabetes Mellitus
##### ALZ Alzheimer's Disease
##### LIV Chronic Liver Disease and Cirrhosis
##### SUI Intentional Self Harm (Suicide)
##### HYP Essential Hypertension and Hypertensive Renal Disease
##### HOM Homicide
##### NEP Nephritis, Nephrotic Syndrome and Nephrosis
##### CPD Chronic pulmonary disease
##### OTH All Other Causes of Death

In [70]:
#Change column names
dfD = dfD.rename({'ZIP Code':'ZipCode', 'Causes of Death':'CausesofDeath'}, axis=1)
dfD.head()

Unnamed: 0,Year,ZipCode,CausesofDeath,Count
0,1999,90001,ALZ,3
1,1999,90001,CAN,53
2,1999,90001,CLD,16
3,1999,90001,DIA,16
4,1999,90001,HOM,11


#### Eliminate bad zip code data (4 digits zip codes in the raw dataset), convert column datatypes etc.

In [71]:
dfD['ZipCode'] = dfD['ZipCode'].apply(str)
dfD.dtypes

Year              int64
ZipCode          object
CausesofDeath    object
Count             int64
dtype: object

In [72]:
dfD = dfD[~(dfD.ZipCode.str.len() < 5)]

In [73]:
dfD.groupby(['ZipCode','CausesofDeath']).Count.sum().head()

ZipCode  CausesofDeath
90001    ALZ               78
         CAN              892
         CLD              136
         DIA              235
         HOM              147
Name: Count, dtype: int64

#### California zip code, latitude and longitude data obtained from https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/export/?refine.state=CA
#### The code lines in the cell below have sensitive account information which Watson will remove during publication on github

In [74]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Zip,Latitude,Longitude
0,92232,33.026203,-115.284581
1,93227,36.357151,-119.425371
2,93234,36.209815,-120.0847
3,93529,37.765218,-119.07769
4,93761,36.746375,-119.639658


#### Rename column names, convert column datatypes etc.

In [75]:
df = df.rename({'Zip':'ZipCode'}, axis=1)


In [76]:
df['ZipCode'] = df['ZipCode'].apply(str)
df.head()

Unnamed: 0,ZipCode,Latitude,Longitude
0,92232,33.026203,-115.284581
1,93227,36.357151,-119.425371
2,93234,36.209815,-120.0847
3,93529,37.765218,-119.07769
4,93761,36.746375,-119.639658


#### Merge the two datasets on zip code

In [77]:
dfD = pd.merge(df, dfD, how='outer', on=['ZipCode'])
dfD.reset_index(drop=True, inplace=True)
dfD.head()

Unnamed: 0,ZipCode,Latitude,Longitude,Year,CausesofDeath,Count
0,92232,33.026203,-115.284581,2016.0,HTD,0.0
1,92232,33.026203,-115.284581,2016.0,CAN,0.0
2,92232,33.026203,-115.284581,2016.0,STK,0.0
3,92232,33.026203,-115.284581,2016.0,CLD,0.0
4,92232,33.026203,-115.284581,2016.0,ALZ,1.0


#### The CA health website had non-existing zip codes. So, eliminate latitudes and longitudes in the merged dataset that have no corresponding zip codes.
#### Eliminate rows where sum total counts of death for a particular year is zero.

In [78]:
import numpy as np
dfD = dfD[np.isfinite(dfD['Year'])]
dfD = dfD[np.isfinite(dfD['Latitude'])]
dfD.shape

(396783, 6)

In [79]:
dfD['Year'] = dfD['Year'].apply(int)
dfD['Count'] = dfD['Count'].apply(int)
dfD['CausesofDeath'] = dfD['CausesofDeath'].apply(str)


In [80]:
dfD = dfD[dfD['Count'] > 0]


#### Obtain the causes of death in your California zip code

In [81]:
dfD.loc[dfD['ZipCode'] == "92127"].head()
dfD.head()

Unnamed: 0,ZipCode,Latitude,Longitude,Year,CausesofDeath,Count
4,92232,33.026203,-115.284581,2016,ALZ,1
6,92232,33.026203,-115.284581,2016,DIA,1
14,93227,36.357151,-119.425371,1999,CAN,3
15,93227,36.357151,-119.425371,1999,CLD,1
18,93227,36.357151,-119.425371,1999,HTD,1


#### Below code for the top causes of death by the provided zip code during the past ~20 years (Beverly Hills zip code used below as a example where heart disease and cancer are at the top)

In [82]:
dfD1 = dfD.loc[dfD['ZipCode'] == "90210"]
dfD1.groupby(['ZipCode','CausesofDeath'])['Count'].sum().nlargest(10)

ZipCode  CausesofDeath
90210    HTD              962
         CAN              907
         OTH              518
         STK              253
         ALZ              180
         CLD              136
         PNF              121
         INJ               95
         DIA               60
         SUI               44
Name: Count, dtype: int64

#### Zip codes in descending order by number of deaths by suicide during the past ~20 years. The City of Carmichael (95608 zip code) in SACRAMENTO county has the highest number.

In [83]:
dfD1 = dfD.loc[dfD['CausesofDeath'] == "SUI"]
dfD1.groupby(['ZipCode','CausesofDeath'])['Count'].sum().nlargest(10)


ZipCode  CausesofDeath
95608    SUI              192
92345    SUI              189
92021    SUI              171
92101    SUI              168
94109    SUI              167
96001    SUI              158
92683    SUI              157
94509    SUI              151
92103    SUI              150
94533    SUI              150
Name: Count, dtype: int64

In [84]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [85]:
address = 'California, USA'

geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of California are {}, {}.'.format(latitude, longitude))

The geographical coordinates of California are 36.7014631, -118.7559974.


In [86]:
dfD.reset_index(drop=True, inplace=True)
dfD.head()

Unnamed: 0,ZipCode,Latitude,Longitude,Year,CausesofDeath,Count
0,92232,33.026203,-115.284581,2016,ALZ,1
1,92232,33.026203,-115.284581,2016,DIA,1
2,93227,36.357151,-119.425371,1999,CAN,3
3,93227,36.357151,-119.425371,1999,CLD,1
4,93227,36.357151,-119.425371,1999,HTD,1


In [87]:
dfD1 = dfD.loc[(dfD.Year == 2016) & (dfD.CausesofDeath == "NEP")]
dfD1.shape


(1004, 6)

#### As an example, show deaths in 2016 in California where the cause of death was "NEP" i.e. deaths due to Nephritis, Nephrotic Syndrome and Nephrosis (Kidney diseases)

In [88]:
# create map of California with causes of death as "NEP" using latitude and longitude values
map_dfD = folium.Map(location=[latitude, longitude], zoom_start=6)

# add markers to map
for lat, lng, label in zip(dfD1['Latitude'], dfD1['Longitude'], dfD1['CausesofDeath']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dfD)  
    
map_dfD

In [89]:
dfD1 = dfD.groupby(['ZipCode','CausesofDeath']).Count.sum().reset_index()


In [90]:
dfD2 = dfD1.pivot(index='ZipCode', columns='CausesofDeath').reset_index()
dfD2.fillna(0, inplace=True)


In [91]:
dfD3 = pd.DataFrame(dfD2.to_records())
dfD3.columns = ['index','ZipCode', 'ALZ', 'CAN', 'CLD', 'DIA' , 'HOM' , 'HTD' , 'HYP' , 'INJ' , 'LIV' , 'NEP' , 'OTH' , 'PNF' , 'STK' , 'SUI']
dfD3.drop('index', axis=1, inplace=True)
dfD3.head()

Unnamed: 0,ZipCode,ALZ,CAN,CLD,DIA,HOM,HTD,HYP,INJ,LIV,NEP,OTH,PNF,STK,SUI
0,90001,78.0,892.0,136.0,235.0,147.0,1196.0,89.0,228.0,168.0,39.0,804.0,125.0,246.0,38.0
1,90002,91.0,989.0,200.0,233.0,174.0,1364.0,87.0,206.0,100.0,61.0,892.0,136.0,314.0,38.0
2,90003,114.0,1261.0,242.0,264.0,235.0,1705.0,89.0,278.0,161.0,65.0,1127.0,181.0,354.0,57.0
3,90004,113.0,1140.0,158.0,164.0,58.0,1390.0,78.0,177.0,121.0,26.0,735.0,182.0,303.0,88.0
4,90005,66.0,719.0,88.0,126.0,32.0,799.0,48.0,123.0,71.0,17.0,436.0,146.0,186.0,76.0


In [92]:
def return_most_common_deaths(row, num_top_deaths):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_deaths]

#### Results below sorted on zip code for easier look up

In [93]:
num_top_deaths = 14
indicators = ['st', 'nd', 'rd']

# create columns according to number of top deaths
columns = ['ZipCode']
for ind in np.arange(num_top_deaths):
    try:
        columns.append('{}{} Most Common cause of Death'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common cause of Death'.format(ind+1))

# create a new dataframe
Causesofdeath_sorted = pd.DataFrame(columns=columns)
Causesofdeath_sorted['ZipCode'] = dfD3['ZipCode']

for ind in np.arange(dfD3.shape[0]):
   Causesofdeath_sorted.iloc[ind, 1:] = return_most_common_deaths(dfD3.iloc[ind, :], num_top_deaths)
Causesofdeath_sorted.sort_values(Causesofdeath_sorted.columns[0])

Unnamed: 0,ZipCode,1st Most Common cause of Death,2nd Most Common cause of Death,3rd Most Common cause of Death,4th Most Common cause of Death,5th Most Common cause of Death,6th Most Common cause of Death,7th Most Common cause of Death,8th Most Common cause of Death,9th Most Common cause of Death,10th Most Common cause of Death,11th Most Common cause of Death,12th Most Common cause of Death,13th Most Common cause of Death,14th Most Common cause of Death
0,90001,HTD,CAN,OTH,STK,DIA,INJ,LIV,HOM,CLD,PNF,HYP,ALZ,NEP,SUI
1,90002,HTD,CAN,OTH,STK,DIA,INJ,CLD,HOM,PNF,LIV,ALZ,HYP,NEP,SUI
2,90003,HTD,CAN,OTH,STK,INJ,DIA,CLD,HOM,PNF,LIV,ALZ,HYP,NEP,SUI
3,90004,HTD,CAN,OTH,STK,PNF,INJ,DIA,CLD,LIV,ALZ,SUI,HYP,HOM,NEP
4,90005,HTD,CAN,OTH,STK,PNF,DIA,INJ,CLD,SUI,LIV,ALZ,HYP,HOM,NEP
5,90006,HTD,CAN,OTH,STK,PNF,INJ,DIA,CLD,LIV,ALZ,HOM,SUI,HYP,NEP
6,90007,HTD,CAN,OTH,STK,PNF,INJ,DIA,LIV,CLD,HOM,ALZ,HYP,SUI,NEP
7,90008,HTD,CAN,OTH,STK,CLD,PNF,DIA,INJ,ALZ,HYP,HOM,LIV,NEP,SUI
8,90009,HTD,OTH,LIV,DIA,CAN,SUI,STK,PNF,NEP,INJ,HYP,HOM,CLD,ALZ
9,90010,HTD,CAN,OTH,STK,PNF,SUI,DIA,CLD,ALZ,INJ,LIV,NEP,HYP,HOM


In [94]:
# set number of clusters
num_top_deaths = 5
kclusters = 5

dfD3_clustering = dfD3.drop('ZipCode', 1)
#toronto_grouped_clustering
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dfD3_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:5] 


array([1, 1, 4, 1, 3], dtype=int32)

In [95]:
# add clustering labels
Causesofdeath_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge dfD3 with df to add latitude/longitude for each neighborhood
df = df.join(Causesofdeath_sorted.set_index('ZipCode'), on='ZipCode')
df = df.dropna()
df = df.reset_index(drop=True)

In [96]:
df['Cluster Labels'] = df['Cluster Labels'].astype(int)
df.head()

Unnamed: 0,ZipCode,Latitude,Longitude,Cluster Labels,1st Most Common cause of Death,2nd Most Common cause of Death,3rd Most Common cause of Death,4th Most Common cause of Death,5th Most Common cause of Death,6th Most Common cause of Death,7th Most Common cause of Death,8th Most Common cause of Death,9th Most Common cause of Death,10th Most Common cause of Death,11th Most Common cause of Death,12th Most Common cause of Death,13th Most Common cause of Death,14th Most Common cause of Death
0,92232,33.026203,-115.284581,0,DIA,ALZ,SUI,STK,PNF,OTH,NEP,LIV,INJ,HYP,HTD,HOM,CLD,CAN
1,93227,36.357151,-119.425371,0,HTD,CAN,OTH,STK,INJ,CLD,DIA,LIV,HYP,HOM,PNF,SUI,ALZ,NEP
2,93234,36.209815,-120.0847,0,OTH,CAN,HTD,INJ,DIA,STK,PNF,LIV,HYP,SUI,HOM,CLD,ALZ,NEP
3,93529,37.765218,-119.07769,0,OTH,HTD,CAN,INJ,LIV,DIA,CLD,SUI,STK,PNF,NEP,HYP,HOM,ALZ
4,94931,38.328614,-122.71044,0,CAN,HTD,OTH,STK,CLD,INJ,ALZ,DIA,SUI,PNF,LIV,HOM,HYP,NEP


In [97]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

countmap = 0
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df['Latitude'], df['Longitude'], df['ZipCode'], df['Cluster Labels']):
    
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster - 1],
        fill=True,
        fill_color=rainbow[cluster - 1],
        fill_opacity=0.7).add_to(map_clusters)
    countmap = countmap + 1
    #Folium crashes on the complete dataset (likely because I have a 'free' Watson account). So, I am limiting to 50 for demonstration purposes!!!!
    if countmap == 50:
            break
map_clusters

#### Five clusters created by K-Means shown below:

In [98]:
df.loc[df['Cluster Labels'] == 0, df.columns[[0] + list(range(4, df.shape[1]))]]

Unnamed: 0,ZipCode,1st Most Common cause of Death,2nd Most Common cause of Death,3rd Most Common cause of Death,4th Most Common cause of Death,5th Most Common cause of Death,6th Most Common cause of Death,7th Most Common cause of Death,8th Most Common cause of Death,9th Most Common cause of Death,10th Most Common cause of Death,11th Most Common cause of Death,12th Most Common cause of Death,13th Most Common cause of Death,14th Most Common cause of Death
0,92232,DIA,ALZ,SUI,STK,PNF,OTH,NEP,LIV,INJ,HYP,HTD,HOM,CLD,CAN
1,93227,HTD,CAN,OTH,STK,INJ,CLD,DIA,LIV,HYP,HOM,PNF,SUI,ALZ,NEP
2,93234,OTH,CAN,HTD,INJ,DIA,STK,PNF,LIV,HYP,SUI,HOM,CLD,ALZ,NEP
3,93529,OTH,HTD,CAN,INJ,LIV,DIA,CLD,SUI,STK,PNF,NEP,HYP,HOM,ALZ
4,94931,CAN,HTD,OTH,STK,CLD,INJ,ALZ,DIA,SUI,PNF,LIV,HOM,HYP,NEP
8,93701,OTH,HTD,CAN,INJ,STK,DIA,LIV,CLD,PNF,HOM,SUI,ALZ,NEP,HYP
9,92693,HTD,OTH,INJ,SUI,STK,PNF,NEP,LIV,HYP,HOM,DIA,CLD,CAN,ALZ
10,93140,INJ,SUI,STK,PNF,OTH,NEP,LIV,HYP,HTD,HOM,DIA,CLD,CAN,ALZ
11,93255,HTD,CAN,OTH,CLD,INJ,STK,DIA,SUI,PNF,LIV,NEP,HYP,ALZ,HOM
13,95383,CAN,HTD,OTH,INJ,CLD,STK,SUI,DIA,PNF,ALZ,LIV,HYP,NEP,HOM


In [99]:
df.loc[df['Cluster Labels'] == 1, df.columns[[0] + list(range(4, df.shape[1]))]]

Unnamed: 0,ZipCode,1st Most Common cause of Death,2nd Most Common cause of Death,3rd Most Common cause of Death,4th Most Common cause of Death,5th Most Common cause of Death,6th Most Common cause of Death,7th Most Common cause of Death,8th Most Common cause of Death,9th Most Common cause of Death,10th Most Common cause of Death,11th Most Common cause of Death,12th Most Common cause of Death,13th Most Common cause of Death,14th Most Common cause of Death
7,90063,HTD,CAN,OTH,DIA,STK,LIV,INJ,PNF,CLD,ALZ,HYP,HOM,NEP,SUI
15,90301,HTD,CAN,OTH,STK,CLD,PNF,DIA,INJ,LIV,ALZ,HOM,HYP,NEP,SUI
18,95965,HTD,CAN,OTH,CLD,INJ,STK,ALZ,DIA,PNF,LIV,SUI,HYP,NEP,HOM
22,95826,CAN,HTD,OTH,CLD,STK,INJ,ALZ,PNF,DIA,SUI,LIV,HYP,NEP,HOM
23,93458,HTD,OTH,CAN,STK,INJ,CLD,DIA,ALZ,PNF,LIV,HYP,SUI,NEP,HOM
24,93215,HTD,CAN,OTH,INJ,STK,CLD,DIA,PNF,LIV,ALZ,SUI,NEP,HYP,HOM
28,94131,CAN,HTD,OTH,STK,INJ,CLD,ALZ,PNF,SUI,DIA,LIV,HYP,NEP,HOM
40,92410,HTD,OTH,CAN,CLD,STK,DIA,INJ,LIV,PNF,HOM,ALZ,HYP,SUI,NEP
46,95358,HTD,CAN,OTH,INJ,CLD,STK,DIA,ALZ,PNF,LIV,SUI,HYP,NEP,HOM
53,90065,HTD,CAN,OTH,STK,CLD,PNF,DIA,INJ,LIV,ALZ,SUI,HYP,NEP,HOM


In [100]:
df.loc[df['Cluster Labels'] == 2, df.columns[[0] + list(range(4, df.shape[1]))]]

Unnamed: 0,ZipCode,1st Most Common cause of Death,2nd Most Common cause of Death,3rd Most Common cause of Death,4th Most Common cause of Death,5th Most Common cause of Death,6th Most Common cause of Death,7th Most Common cause of Death,8th Most Common cause of Death,9th Most Common cause of Death,10th Most Common cause of Death,11th Most Common cause of Death,12th Most Common cause of Death,13th Most Common cause of Death,14th Most Common cause of Death
16,92543,HTD,CAN,OTH,CLD,STK,INJ,ALZ,PNF,DIA,LIV,HYP,SUI,NEP,HOM
20,92114,HTD,CAN,OTH,STK,DIA,INJ,CLD,ALZ,PNF,LIV,HYP,SUI,HOM,NEP
26,93308,HTD,CAN,OTH,CLD,INJ,STK,DIA,ALZ,PNF,LIV,SUI,HYP,NEP,HOM
49,92220,HTD,CAN,OTH,CLD,STK,INJ,ALZ,DIA,PNF,LIV,HYP,SUI,NEP,HOM
66,95240,HTD,CAN,OTH,STK,CLD,INJ,ALZ,PNF,DIA,LIV,SUI,HYP,NEP,HOM
81,92503,HTD,CAN,OTH,CLD,STK,INJ,DIA,ALZ,PNF,LIV,SUI,HYP,NEP,HOM
173,95608,HTD,OTH,CAN,STK,CLD,ALZ,PNF,INJ,DIA,SUI,HYP,LIV,NEP,HOM
215,93230,HTD,CAN,OTH,CLD,STK,INJ,DIA,ALZ,PNF,LIV,SUI,HYP,NEP,HOM
265,94509,HTD,CAN,OTH,STK,CLD,INJ,DIA,PNF,ALZ,LIV,SUI,HYP,HOM,NEP
270,92376,HTD,OTH,CAN,CLD,STK,DIA,INJ,PNF,ALZ,LIV,HOM,HYP,SUI,NEP


In [101]:
df.loc[df['Cluster Labels'] == 3, df.columns[[0] + list(range(4, df.shape[1]))]]

Unnamed: 0,ZipCode,1st Most Common cause of Death,2nd Most Common cause of Death,3rd Most Common cause of Death,4th Most Common cause of Death,5th Most Common cause of Death,6th Most Common cause of Death,7th Most Common cause of Death,8th Most Common cause of Death,9th Most Common cause of Death,10th Most Common cause of Death,11th Most Common cause of Death,12th Most Common cause of Death,13th Most Common cause of Death,14th Most Common cause of Death
5,95322,HTD,CAN,OTH,CLD,INJ,STK,PNF,ALZ,DIA,LIV,SUI,NEP,HYP,HOM
6,90038,HTD,CAN,OTH,STK,INJ,DIA,CLD,PNF,LIV,ALZ,SUI,HYP,HOM,NEP
12,93465,CAN,HTD,OTH,STK,CLD,ALZ,INJ,PNF,DIA,SUI,LIV,NEP,HYP,HOM
31,93110,HTD,CAN,OTH,STK,ALZ,CLD,INJ,PNF,DIA,SUI,LIV,HYP,NEP,HOM
33,95337,HTD,CAN,OTH,CLD,STK,INJ,ALZ,DIA,PNF,LIV,SUI,HYP,NEP,HOM
37,94089,CAN,HTD,OTH,STK,CLD,DIA,INJ,PNF,ALZ,LIV,SUI,HYP,NEP,HOM
39,92320,HTD,CAN,OTH,CLD,STK,ALZ,INJ,DIA,PNF,SUI,HYP,LIV,NEP,HOM
43,96094,CAN,HTD,OTH,INJ,CLD,STK,PNF,ALZ,DIA,SUI,LIV,NEP,HYP,HOM
48,91302,CAN,HTD,OTH,STK,ALZ,CLD,INJ,PNF,SUI,DIA,HYP,LIV,NEP,HOM
56,90502,HTD,CAN,OTH,STK,CLD,PNF,DIA,ALZ,INJ,LIV,SUI,HYP,NEP,HOM


In [102]:
df.loc[df['Cluster Labels'] == 4, df.columns[[0] + list(range(4, df.shape[1]))]]

Unnamed: 0,ZipCode,1st Most Common cause of Death,2nd Most Common cause of Death,3rd Most Common cause of Death,4th Most Common cause of Death,5th Most Common cause of Death,6th Most Common cause of Death,7th Most Common cause of Death,8th Most Common cause of Death,9th Most Common cause of Death,10th Most Common cause of Death,11th Most Common cause of Death,12th Most Common cause of Death,13th Most Common cause of Death,14th Most Common cause of Death
27,93611,HTD,CAN,OTH,STK,ALZ,CLD,INJ,PNF,DIA,HYP,LIV,SUI,NEP,HOM
35,92840,HTD,CAN,OTH,STK,CLD,INJ,PNF,ALZ,DIA,LIV,HYP,SUI,NEP,HOM
36,92583,HTD,CAN,OTH,CLD,STK,INJ,ALZ,DIA,LIV,PNF,SUI,HYP,NEP,HOM
45,90220,HTD,CAN,OTH,STK,DIA,CLD,INJ,HOM,PNF,ALZ,HYP,LIV,NEP,SUI
47,92553,HTD,CAN,OTH,INJ,STK,CLD,DIA,PNF,ALZ,LIV,SUI,HYP,HOM,NEP
62,92024,HTD,CAN,OTH,ALZ,STK,CLD,INJ,PNF,DIA,SUI,HYP,LIV,NEP,HOM
72,95482,HTD,CAN,OTH,CLD,STK,INJ,DIA,PNF,SUI,LIV,ALZ,NEP,HYP,HOM
74,92346,CAN,HTD,OTH,CLD,STK,DIA,INJ,ALZ,PNF,LIV,SUI,HYP,HOM,NEP
83,95901,HTD,CAN,OTH,CLD,INJ,STK,PNF,LIV,DIA,SUI,ALZ,HYP,NEP,HOM
95,91001,HTD,CAN,OTH,STK,CLD,PNF,ALZ,INJ,DIA,HYP,LIV,SUI,NEP,HOM
