# <u> Applied Data Science Capstone Project</u>
### <i> Major Japanese Cities and Access to Healthcare </i>
#### Kordell Mitchell Bernaldez Tan
<p> Biomedical and Clinical Enginnering | Electrical Engineering 
 
 California State University, Long Beach </p>
                                              
    

#### This notebook will be used for the organization and implementation of data structures and analysis of populations in major cities in Japan and respective access to healthcare facilities, using Python libraries, as well as investigating the correlation between in-patient and outpatient numbers and locations of treatment. Healthcare data was measured between 1975 and 2016.

### Import Python Libraries

In [1]:
# IMPORT LIBRARIES FOR DATA PROCESSING
import numpy as np
import pandas as pd
import requests
import json

# Library to convert json retrieval into a dataframe
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# IMPORT FOLIUM
#!conda install -c conda-forge folium=0.5.0 --yes
import folium
print('FOLIUM LIBRARY SUCCESSFULLY INSTALLED.')

FOLIUM LIBRARY SUCCESSFULLY INSTALLED.


### Read .CSV and .XLS Files into Pandas Dataframes

In [2]:
# Read .csv files and convert to a pandas dataframe
    # Population of Japanese Cities downloaded from 'https://simplemaps.com/data/jp-cities'
jp_pop = pd.read_csv('jp_population.csv')

    # Health Statistics and Data of Japan Prefectures (1975-2017)
    # Downloaded from 'https://knoema.com/JP10209/health-and-medical-care-statistics-of-japan-social-indicators-by-prefecture?regionId=JP'
jhealth = pd.read_excel('japan_healthcare_data.xls')

print('Data files were succesfully read into the notebook.')

Data files were succesfully read into the notebook.


In [3]:
# View the imported data sets
# Format header of population table and display
jp_pop.columns = ['City','Latitude','Longitude','Country','ID','Admin','Capital','Population','Population Proper']
jp_pop.head()

Unnamed: 0,City,Latitude,Longitude,Country,ID,Admin,Capital,Population,Population Proper
0,Tokyo,35.685,139.751389,Japan,JP,Tōkyō,primary,35676000.0,8336599.0
1,Ōsaka,34.683333,135.516667,Japan,JP,Ōsaka,admin,11294000.0,2592413.0
2,Yokohama,35.433333,139.65,Japan,JP,Kanagawa,admin,3697894.0,3697894.0
3,Nagoya,35.183333,136.9,Japan,JP,Aichi,admin,3230000.0,2191279.0
4,Fukuoka,33.6,130.416667,Japan,JP,Fukuoka,admin,2792000.0,1392289.0


### Formatting Datasets

In [4]:
# Organize table by alphabetical order of cities and reindex
jp_pop = jp_pop.sort_values(by = 'City')
jp_pop = jp_pop.reset_index(drop = True)
jp_pop['City'] = jp_pop['City'].str.replace('ō','o').str.replace('Ō','O')
jp_pop.head()

Unnamed: 0,City,Latitude,Longitude,Country,ID,Admin,Capital,Population,Population Proper
0,Akita,39.716667,140.1,Japan,JP,Akita,admin,320069.0,281856.0
1,Aomori,40.816667,140.733333,Japan,JP,Aomori,admin,298394.0,264749.0
2,Asahikawa,43.767778,142.370278,Japan,JP,Hokkaidō,,356612.0,325547.0
3,Ashino,43.015176,144.397259,Japan,JP,Hokkaidō,,198566.0,183612.0
4,Chiba,35.6,140.116667,Japan,JP,Chiba,admin,,


In [5]:
# We remove the top 10 rows of unnecessary data from 'jhealth'
for i in range(10):
    jhealth.drop(jhealth.index[0],inplace = True)

jhealth.head()

Unnamed: 0,Statistics name :,Prefectural Data Social Indicators by Prefecture,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 179,Unnamed: 180,Unnamed: 181,Unnamed: 182,Unnamed: 183,Unnamed: 184,Unnamed: 185,Unnamed: 186,Unnamed: 187,Unnamed: 188
10,SURVEY YEAR Code,SURVEY YEAR,AREA Code,AREA,/I Health and Medical Care,"#I0210101_Height, male of the fifth grade (ele...",Annotation,"#I0210102_Height, female of the fifth grade (e...",Annotation,"#I0210103_Height, male of the second grade (ju...",...,"#I1520302_Cases of benefits paid (per 1,000 de...",Annotation,#I1520401_Amount of benefits paid (per person ...,Annotation,#I1520402_Amount of benefits paid (per depende...,Annotation,#I1520501_Amount of benefits paid (per person ...,Annotation,#I1520502_Amount of benefits paid (per depende...,Annotation
11,1975100000,1975,0,All Japan,,136.4,,137.6,,156.1,...,***,,***,,***,,***,,***,
12,1975100000,1975,1000,Hokkaido,,136.6,,137.2,,156.4,...,***,,***,,***,,***,,***,
13,1975100000,1975,2000,Aomori-ken,,136.5,,138.5,,156.1,...,***,,***,,***,,***,,***,
14,1975100000,1975,3000,Iwate-ken,,136.8,,138.1,,156.2,...,***,,***,,***,,***,,***,


In [6]:
# Use the first row information as the header information
jhealth.columns = jhealth.iloc[0]

In [7]:
# Assign interested columns into variables to be concatenated
year = jhealth['SURVEY YEAR']
area = jhealth['AREA']
inpatients = jhealth['#I04102_New inpatients, general hospitals (per 100,000 persons)[person]']
outpatients = jhealth['#I0420102_Outpatients in general hospitals per day (per 100,000 persons)[person]']
mc_num = jhealth['#I0910103_General hospitals (per 100,000 persons)[hospitals]']

In [8]:
# Concatenate rows of interest into new data frame 'data'
data = pd.concat([year,area,inpatients,outpatients,mc_num],axis = 1)
data.drop(data.index[0],inplace = True)

# Rename the data frame column headers and display
data.columns = ['Year','Prefecture','Inpatients','Outpatients','Hospitals per 100,000 persons']
data = data.reset_index(drop = True) 
data.head()

Unnamed: 0,Year,Prefecture,Inpatients,Outpatients,"Hospitals per 100,000 persons"
0,1975,All Japan,5400.6,920.8,6.5
1,1975,Hokkaido,6375.8,1156.2,8.1
2,1975,Aomori-ken,5358.7,1116.8,6.3
3,1975,Iwate-ken,5611.8,1107.2,6.1
4,1975,Miyagi-ken,5036.5,952.7,6.2


In [9]:
# For easier access and key relationship, we format 'Prefecture' names and remove suffix
data['Prefecture'] = data['Prefecture'].str.replace('-ken', '').str.replace('-to', '').str.replace('-fu', '')

# Sort values by year in descending order
data.sort_values(by = 'Year',ascending = False,inplace = True)
data = data.reset_index(drop = True)

# Only concerned with data acquired in year 2016
for i in range(48):
    data.drop(data.index[0],inplace = True)
data = data.reset_index(drop = True)
data.head()

Unnamed: 0,Year,Prefecture,Inpatients,Outpatients,"Hospitals per 100,000 persons"
0,2016,Saitama,8728.8,837.2,4.0
1,2016,Gifu,11459.9,1021.5,4.5
2,2016,Nagano,13917.2,1200.1,5.5
3,2016,Yamanashi,12015.8,1096.0,6.3
4,2016,Fukui,13988.0,1318.4,7.4


In [10]:
# Assing 2016 data under new variable name and sort by prefecture title
d2016 = data.iloc[0:48].sort_values('Prefecture')

# Remove 'All Japan' information] and display
d2016.drop(d2016.index[2],inplace = True)
d2016 = d2016.reset_index(drop = True)
d2016.head()

Unnamed: 0,Year,Prefecture,Inpatients,Outpatients,"Hospitals per 100,000 persons"
0,2016,Aichi,10874.0,870.3,3.8
1,2016,Akita,13892.4,1220.4,5.2
2,2016,Aomori,11987.9,1016.1,6.2
3,2016,Chiba,10306.3,953.8,4.0
4,2016,Ehime,14253.1,1306.7,9.2


### Generating the Map of Japan using Folium

In [11]:
# Folium Map of Japan and Prefectures
    # We will observe the healthcare facilities surrounding Tokyo
latitude = 35.685
longitude = 139.751389


# create map of Japan using latitude and longitude values
map_japan = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map using data retrieved on jp_pop table
for lat, lng, label, in zip(jp_pop['Latitude'], jp_pop['Longitude'], jp_pop['City']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='purple',
        fill=True,
        fill_color='#9589dc',
        fill_opacity=.5,
        parse_html=False).add_to(map_japan)  

### Foursquare API Access

In [12]:
CLIENT_ID = 'XHANL3JB2NMHC1MD1M40V5NS1D3QGZ4JBMQYNCEJEWRBIYAS' # your Foursquare ID
CLIENT_SECRET = 'KHALW35X3ZHBNFXQJNSVHRT4QUTEVYV2JYFQGYCNFA5BW5UD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XHANL3JB2NMHC1MD1M40V5NS1D3QGZ4JBMQYNCEJEWRBIYAS
CLIENT_SECRET:KHALW35X3ZHBNFXQJNSVHRT4QUTEVYV2JYFQGYCNFA5BW5UD


In [13]:
# Define Foursquare API hospital code
med_center = '4bf58dd8d48988d104941735'

In [14]:
# Create the request for hospital venues
radius = 100000
LIMIT = 1000000000

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&categoryId={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    med_center,
    VERSION,
    latitude,
    longitude,
    radius,
    LIMIT)

In [15]:
# Call request to Foursquare database and display
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d1678c8342adf0038ef8f69'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Tokyo',
  'headerFullLocation': 'Tokyo',
  'headerLocationGranularity': 'city',
  'query': 'medical',
  'totalResults': 252,
  'suggestedBounds': {'ne': {'lat': 36.5850009000009,
    'lng': 140.8573732175807},
   'sw': {'lat': 34.784999099999105, 'lng': 138.6454047824193}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b2f1e7cf964a52099e924e3',
       'name': 'akiba:F 献血ルーム',
       'location': {'address': '外神田1-16-9',
        'crossStreet': '朝風二号館ビル 5F',
        'lat': 35.697689525355095,
        'lng': 139.77249322340495,
        'labeledLatLngs': [{'label': 'displa

In [16]:
# Foursquare API 'Get Categories' Function Definition
    # function that extracts the category of the medical facility
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [17]:
# Filtering and Organizing retrieved venues and relative categories
venues = results['response']['groups'][0]['items']

# flatten JSON
med_centers = json_normalize(venues) 

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']

# Store all rows and filtered_columns in new dataframe 'nearby_venues'
med_centers = med_centers.loc[:, filtered_columns]

# filter the category for each row
med_centers['venue.categories'] = med_centers.apply(get_category_type, axis=1)

# clean columns
med_centers.columns = [col.split(".")[-1] for col in med_centers.columns]

med_centers.head()

Unnamed: 0,name,categories,lat,lng
0,akiba:F 献血ルーム,Medical Center,35.69769,139.772493
1,国立成育医療研究センター,Medical Center,35.633549,139.612043
2,Japan Red Cross Medical Center (日本赤十字社医療センター),Medical Center,35.654831,139.717755
3,鶯谷健診センター,Medical Center,35.725696,139.775926
4,St. Luke's Hospital (聖路加国際病院),Medical Center,35.667372,139.7775


In [18]:
# Organize data set by category and display
med_centers.sort_values(by = 'categories',inplace=True)
med_centers = med_centers.reset_index(drop = True)

# Shows all rows and columns, regardless of size
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns',None)
med_centers.columns = ['Name','Category','lat','lng']

# Display nearby_venues and categories, with locations
display(med_centers)

Unnamed: 0,Name,Category,lat,lng
0,社会医療法人財団 石心会 さやま総合クリニック,Doctor's Office,35.853096,139.40146
1,新橋トラストクリニック,Doctor's Office,35.66635,139.756028
2,Tokyo Midtown Clinic (東京ミッドタウンクリニック),Doctor's Office,35.666244,139.731219
3,川崎幸クリニック,Doctor's Office,35.534934,139.690485
4,春日クリニック,Doctor's Office,35.711892,139.752365
5,はしもと内科クリニック,Doctor's Office,35.810866,139.720307
6,ひもんや外科内科クリニック,Doctor's Office,35.62036,139.681653
7,フィオーレ健診クリニック,Doctor's Office,35.698028,139.709052
8,オーバルコート健診クリニック,Doctor's Office,35.623107,139.729029
9,深川ギャザリアクリニック,Doctor's Office,35.666736,139.804728


In [19]:
# Map surrounding medical facilities onto map
for lat, lng, label, in zip(med_centers['lat'], med_centers['lng'], med_centers['Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#1089dc',
        fill_opacity=.1,
        parse_html=False).add_to(map_japan) 

map_japan

In [20]:
# Analyze access to facilities in regards to the cities surrounding Tokyo
tokyo = jp_pop[jp_pop['City'] == 'Tokyo']
kawagoe = jp_pop[jp_pop['City'] == 'Kawagoe']
saitama = jp_pop[jp_pop['City'] == 'Saitama']
chiba = jp_pop[jp_pop['City'] == 'Chiba']
shinkawasaki = jp_pop[jp_pop['City'] == 'Shinkawasaki']
yokohama = jp_pop[jp_pop['City'] == 'Yokohama']
hachioji = jp_pop[jp_pop['City'] == 'Hachioji']

# Concatenate rows into new data frame
tokyo_health = pd.concat([tokyo,kawagoe,saitama,chiba,
                          shinkawasaki,yokohama,hachioji]).sort_values(by = 'City').reset_index(drop = True)
tokyo_health

Unnamed: 0,City,Latitude,Longitude,Country,ID,Admin,Capital,Population,Population Proper
0,Chiba,35.6,140.116667,Japan,JP,Chiba,admin,,
1,Hachioji,35.655833,139.323889,Japan,JP,Tōkyō,,579399.0,579399.0
2,Kawagoe,35.908611,139.485278,Japan,JP,Saitama,,337931.0,337931.0
3,Saitama,35.9,139.65,Japan,JP,Saitama,admin,,
4,Shinkawasaki,35.550193,139.670327,Japan,JP,Kanagawa,,1437266.0,1306785.0
5,Tokyo,35.685,139.751389,Japan,JP,Tōkyō,primary,35676000.0,8336599.0
6,Yokohama,35.433333,139.65,Japan,JP,Kanagawa,admin,3697894.0,3697894.0


In [21]:
def getNearbyMC(names, latitudes, longitudes, radius=100000):
    
    mc_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&categoryId={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            med_center,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        mc_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_mc = pd.DataFrame([item for mc_list in mc_list for item in mc_list])
    nearby_mc.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Medical Facility', 
                  'Facility Latitude', 
                  'Facility Longitude', 
                  'Facility Category']
    
    return(nearby_mc)

In [22]:
# Obtain Medical Facility information in Tokyo and the surrounding cities
mc_venues = getNearbyMC(names = tokyo_health['City'],
                                   latitudes = tokyo_health['Latitude'],
                                   longitudes = tokyo_health['Longitude']
                                  )
mc_venues

Chiba
Hachioji
Kawagoe
Saitama
Shinkawasaki
Tokyo
Yokohama


Unnamed: 0,City,City Latitude,City Longitude,Medical Facility,Facility Latitude,Facility Longitude,Facility Category
0,Chiba,35.6,140.116667,akiba:F 献血ルーム,35.69769,139.772493,Medical Center
1,Chiba,35.6,140.116667,国立成育医療研究センター,35.633549,139.612043,Medical Center
2,Chiba,35.6,140.116667,鶯谷健診センター,35.725696,139.775926,Medical Center
3,Chiba,35.6,140.116667,Japan Red Cross Medical Center (日本赤十字社医療センター),35.654831,139.717755,Medical Center
4,Chiba,35.6,140.116667,St. Luke's Hospital (聖路加国際病院),35.667372,139.7775,Medical Center
5,Chiba,35.6,140.116667,関東ITソフトウェア健康保険組合 大久保健診センター,35.701846,139.695944,Medical Center
6,Chiba,35.6,140.116667,Tokyo Metropolitan Hiroo Hospital (東京都立広尾病院),35.646944,139.722337,Medical Center
7,Chiba,35.6,140.116667,Tokyo Women's Medical University Yachiyo Medic...,35.730192,140.096734,Medical Center
8,Chiba,35.6,140.116667,埼玉医科大学国際医療センター,35.921375,139.32216,Medical Center
9,Chiba,35.6,140.116667,JR東京総合病院,35.685145,139.700143,Medical Center


In [23]:
mc_venues.groupby('City').count()

Unnamed: 0_level_0,City Latitude,City Longitude,Medical Facility,Facility Latitude,Facility Longitude,Facility Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Chiba,100,100,100,100,100,100
Hachioji,100,100,100,100,100,100
Kawagoe,100,100,100,100,100,100
Saitama,100,100,100,100,100,100
Shinkawasaki,100,100,100,100,100,100
Tokyo,100,100,100,100,100,100
Yokohama,100,100,100,100,100,100


In [24]:
# Use one hot encoding method to determine unit distribution of the 
# varying types of categories of medical centers per prefecture
    # one hot encoding
mc_onehot = pd.get_dummies(mc_venues[['Facility Category']], prefix="", prefix_sep="")

    # add neighborhood column back to dataframe
mc_onehot['City'] = mc_venues['City'] 

    # move city column to the first column
fixed_columns = [mc_onehot.columns[-1]] + list(mc_onehot.columns[:-1])
mc_onehot = mc_onehot[fixed_columns]

mc_onehot

Unnamed: 0,City,Doctor's Office,Hospital,Medical Center
0,Chiba,0,0,1
1,Chiba,0,0,1
2,Chiba,0,0,1
3,Chiba,0,0,1
4,Chiba,0,0,1
5,Chiba,0,0,1
6,Chiba,0,0,1
7,Chiba,0,0,1
8,Chiba,0,0,1
9,Chiba,0,0,1


In [25]:
# Observe normalized distribution of medical centers in the cities
mc_grouped = mc_onehot.groupby('City').mean().reset_index()
mc_grouped

Unnamed: 0,City,Doctor's Office,Hospital,Medical Center
0,Chiba,0.13,0.45,0.42
1,Hachioji,0.13,0.46,0.41
2,Kawagoe,0.13,0.44,0.43
3,Saitama,0.13,0.46,0.41
4,Shinkawasaki,0.13,0.48,0.39
5,Tokyo,0.13,0.45,0.42
6,Yokohama,0.13,0.48,0.39


In [26]:
# Check top three abundant types of facilities
num_top_mc = 3

for mc in mc_grouped['City']:
    print("----"+mc+"----")
    temp = mc_grouped[mc_grouped['City'] == mc].T.reset_index()
    temp.columns = ['Medical Facility','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_mc))
    print('\n')

----Chiba----
  Medical Facility  freq
0         Hospital  0.45
1   Medical Center  0.42
2  Doctor's Office  0.13


----Hachioji----
  Medical Facility  freq
0         Hospital  0.46
1   Medical Center  0.41
2  Doctor's Office  0.13


----Kawagoe----
  Medical Facility  freq
0         Hospital  0.44
1   Medical Center  0.43
2  Doctor's Office  0.13


----Saitama----
  Medical Facility  freq
0         Hospital  0.46
1   Medical Center  0.41
2  Doctor's Office  0.13


----Shinkawasaki----
  Medical Facility  freq
0         Hospital  0.48
1   Medical Center  0.39
2  Doctor's Office  0.13


----Tokyo----
  Medical Facility  freq
0         Hospital  0.45
1   Medical Center  0.42
2  Doctor's Office  0.13


----Yokohama----
  Medical Facility  freq
0         Hospital  0.48
1   Medical Center  0.39
2  Doctor's Office  0.13




In [27]:
def return_most_common_med(row, num_top_mc):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_mc]

In [28]:
# Check top 3 facilities and sort by frequency relative to city
num_top_mc = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_mc):
    try:
        columns.append('{}{} Most Common Facility'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Facility'.format(ind+1))

# create a new dataframe
mc_sorted = pd.DataFrame(columns=columns)
mc_sorted['City'] = mc_grouped['City']

for ind in np.arange(mc_grouped.shape[0]):
    mc_sorted.iloc[ind, 1:] = return_most_common_med(mc_grouped.iloc[ind, :], num_top_mc)

mc_sorted.head()

Unnamed: 0,City,1st Most Common Facility,2nd Most Common Facility,3rd Most Common Facility
0,Chiba,Hospital,Medical Center,Doctor's Office
1,Hachioji,Hospital,Medical Center,Doctor's Office
2,Kawagoe,Hospital,Medical Center,Doctor's Office
3,Saitama,Hospital,Medical Center,Doctor's Office
4,Shinkawasaki,Hospital,Medical Center,Doctor's Office


### Performing K-Mean Clusters to Determine Distribution of Medical Facilities Surrounding Tokyo

In [29]:
# set number of clusters
kclusters = 5

mc_grouped_clustering = mc_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

  return_n_iter=True)


array([1, 2, 3, 2, 0, 1, 0], dtype=int32)

In [30]:
# add clustering labels
mc_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

mc_merged = tokyo_health

# merge to add latitude/longitude for each city
mc_merged = mc_merged.join(mc_sorted.set_index('City'), on='City')
mc_merged.head()

Unnamed: 0,City,Latitude,Longitude,Country,ID,Admin,Capital,Population,Population Proper,Cluster Labels,1st Most Common Facility,2nd Most Common Facility,3rd Most Common Facility
0,Chiba,35.6,140.116667,Japan,JP,Chiba,admin,,,1,Hospital,Medical Center,Doctor's Office
1,Hachioji,35.655833,139.323889,Japan,JP,Tōkyō,,579399.0,579399.0,2,Hospital,Medical Center,Doctor's Office
2,Kawagoe,35.908611,139.485278,Japan,JP,Saitama,,337931.0,337931.0,3,Hospital,Medical Center,Doctor's Office
3,Saitama,35.9,139.65,Japan,JP,Saitama,admin,,,2,Hospital,Medical Center,Doctor's Office
4,Shinkawasaki,35.550193,139.670327,Japan,JP,Kanagawa,,1437266.0,1306785.0,0,Hospital,Medical Center,Doctor's Office


In [31]:
# Add cluster groups to the Japan Map
    # set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

    # add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mc_merged['Latitude'], mc_merged['Longitude'], mc_merged['City'], mc_merged['Cluster Labels'].astype(int)):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_japan)
       
map_japan

### Examining Cluster Groups

#### Cluster 0

In [32]:
c0 = mc_merged.loc[mc_merged['Cluster Labels'] == 0, mc_merged.columns[[0] + list(range(5, mc_merged.shape[1]))]].reset_index(drop = True)
c0

Unnamed: 0,City,Admin,Capital,Population,Population Proper,Cluster Labels,1st Most Common Facility,2nd Most Common Facility,3rd Most Common Facility
0,Shinkawasaki,Kanagawa,,1437266.0,1306785.0,0,Hospital,Medical Center,Doctor's Office
1,Yokohama,Kanagawa,admin,3697894.0,3697894.0,0,Hospital,Medical Center,Doctor's Office


#### Cluster I

In [33]:
c1 = mc_merged.loc[mc_merged['Cluster Labels'] == 1, mc_merged.columns[[0] + list(range(5, mc_merged.shape[1]))]].reset_index(drop = True)
c1

Unnamed: 0,City,Admin,Capital,Population,Population Proper,Cluster Labels,1st Most Common Facility,2nd Most Common Facility,3rd Most Common Facility
0,Chiba,Chiba,admin,,,1,Hospital,Medical Center,Doctor's Office
1,Tokyo,Tōkyō,primary,35676000.0,8336599.0,1,Hospital,Medical Center,Doctor's Office


#### Cluster II

In [34]:
c2 = mc_merged.loc[mc_merged['Cluster Labels'] == 2, mc_merged.columns[[0] + list(range(5, mc_merged.shape[1]))]].reset_index(drop = True)
c2

Unnamed: 0,City,Admin,Capital,Population,Population Proper,Cluster Labels,1st Most Common Facility,2nd Most Common Facility,3rd Most Common Facility
0,Hachioji,Tōkyō,,579399.0,579399.0,2,Hospital,Medical Center,Doctor's Office
1,Saitama,Saitama,admin,,,2,Hospital,Medical Center,Doctor's Office


#### Cluster III

In [35]:
c4 = mc_merged.loc[mc_merged['Cluster Labels'] == 3, mc_merged.columns[[0] + list(range(5, mc_merged.shape[1]))]].reset_index(drop = True)
c4

Unnamed: 0,City,Admin,Capital,Population,Population Proper,Cluster Labels,1st Most Common Facility,2nd Most Common Facility,3rd Most Common Facility
0,Kawagoe,Saitama,,337931.0,337931.0,3,Hospital,Medical Center,Doctor's Office
