# Obesity and Yoga

We know that (child) obesity in the US is a large scale problem. There are numerous factors that lead to obesity but we also know that there are a number of ways to reduce and fight this "epidemic", as it was labeled by "Centers for Disease Control and Prevention" in 1999.
I want to use Foursquare API to see where we can start a Yoga studio as this has proven to combat obesity and would a nice exercise for a family as a whole.

We use the data from https://chronicdata.cdc.gov/500-Cities/500-Cities-Local-Data-for-Better-Health-2019-relea/6vp6-wxuq
We need only the BMI of the 500 cities. The original file is slightly more than 200 mb in size.

The resulting file can be found here:
https://raw.githubusercontent.com/naveen1973/coursera_foursquare/master/obesity.csv

In [1]:
import pandas as pd
# after downloading "obesity.csv" call it df
# then execute the code below

df = pd.read_csv("geolocation.csv")
df_small = df[(df['Year'] == 2017) & # select year 
         (df['MeasureId'] == "OBESITY") & # select disease
          (df['Measure'] == "Obesity among adults aged >=18 Years") & (df['GeographicLevel'] == "City") & # we want at city level
           (df['DataValueTypeID'] == "AgeAdjPrv") # age adjusted level
            ]


df_small[['CityName', 'Data_Value','PopulationCount', 'GeoLocation', 'CityFIPS']].to_csv('obesity.csv')

# https://github.com/naveen1973/coursera_foursquare/blob/master/500_Cities__Local_Data_for_Better_Health__2019_For_github.ipynb


KeyError: 'Year'

The resulting file can be found here:
https://raw.githubusercontent.com/naveen1973/coursera_foursquare/master/obesity.csv

In [2]:
# Let's import the standard liberaries
import numpy as np #

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)


!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('All imports done!')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

# Collect geo-coordinates
We collected the Latitude and Longitude of the neighborhoods of Gary (Indiana). Please note that we will not use ALL the neighborhoods as two of them (Pulaski and Morningside Historic District) do not have ANY venue as there are no people. This will give a "none" value in the process.
We call this data set df_gary

In [78]:
df_gary = pd.read_csv("geolocation.csv")
df_gary

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Aetna,41.59062,-87.28896
1,Ambridge Mann,41.601548,-87.359294
2,Black Oak,41.565452,-87.396802
3,Brunswick,41.602158,-87.403293
4,Downtown,41.6039,-87.33717
5,Downtown West,41.6025,-87.338611
6,Emerson,41.602869,-87.332217
7,Glen Park,41.553072,-87.336509
8,Midtown,41.5767,-87.343813
9,Miller Beach,41.601,-87.261


# Creat a map of Gary, Indiana

In [79]:
address = 'Gary, Indiana'

geolocator = Nominatim(user_agent="gary_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Gary (Indiana) are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Gary (Indiana) are 41.6021292, -87.3371372.


In [80]:
# create map of Gary (Indiana) using latitude and longitude values
map_gary = folium.Map(location=[latitude, longitude], zoom_start=10)

In [81]:
# add markers to map
for lat, lng, borough in zip(df_gary['Latitude'], df_gary['Longitude'], df_gary['Neighborhood']):
    label = '{}, {}'.format(borough, "test")
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='purple',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7,
        parse_html=False).add_to(map_gary)  
    
map_gary

In [82]:
# This is how we find the latitude/longitude of a neighborhood

neighborhood_latitude = df_gary.loc[1, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_gary.loc[1, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_gary.loc[1, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Ambridge Mann are 41.601548, -87.35929399999999.


In [84]:
results = requests.get(url).json()
# results

In [85]:
# Now create a function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

### For each of the neighborhood, we want to have the category, latitude and longitude

In [86]:
 # this will deliver the data in json format
venues = results['response']['groups'][0]['items']
    
# flatten the JSON in readable Pandas format
nearby_venues = pd.json_normalize(venues) 

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()
# nearby_venues['categories'].unique()


Unnamed: 0,name,categories,lat,lng
0,Abandoned Train Station,Light Rail Station,41.604033,-87.362667
1,McDonald's,Fast Food Restaurant,41.601636,-87.356141
2,Horace Mann Child Care,Daycare,41.599474,-87.363369
3,Shoreline,Light Rail Station,41.6019,-87.353505


In [87]:
# How many venues were there in this neighborhood?
print('There are {} venues in this neighborhood.'.format(nearby_venues.shape[0]))

There are 4 venues in this neighborhood.


# Explore Neighborhoods in Gary (Indiana)

In [88]:
# For each neighborhood we want to have the latitudes and longitudes of the venues within 500 meter radius.

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [89]:
gary_venues = getNearbyVenues(names=df_gary['Neighborhood'],
                                   latitudes=df_gary['Latitude'],
                                   longitudes=df_gary['Longitude']
                                  )

Aetna
Ambridge Mann
Black Oak
Brunswick
Downtown
Downtown West
Emerson
Glen Park
Midtown
Miller Beach
Tolleston
Westside


In [90]:
print(gary_venues.shape)
gary_venues.head()

(49, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Aetna,41.59062,-87.28896,Aetna,41.594145,-87.291222,Tourist Information Center
1,Ambridge Mann,41.601548,-87.359294,Abandoned Train Station,41.604033,-87.362667,Light Rail Station
2,Ambridge Mann,41.601548,-87.359294,McDonald's,41.601636,-87.356141,Fast Food Restaurant
3,Ambridge Mann,41.601548,-87.359294,Horace Mann Child Care,41.599474,-87.363369,Daycare
4,Ambridge Mann,41.601548,-87.359294,Shoreline,41.6019,-87.353505,Light Rail Station


In [91]:
# How many category types are there in each Neighborhood
gary_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aetna,1,1,1,1,1,1
Ambridge Mann,4,4,4,4,4,4
Black Oak,1,1,1,1,1,1
Brunswick,4,4,4,4,4,4
Downtown,7,7,7,7,7,7
Downtown West,8,8,8,8,8,8
Emerson,6,6,6,6,6,6
Glen Park,8,8,8,8,8,8
Midtown,1,1,1,1,1,1
Miller Beach,4,4,4,4,4,4


In [92]:
print('There are {} uniques categories.'.format(len(gary_venues['Venue Category'].unique())))

There are 31 uniques categories.


# 3. Analyze Each Neighborhood
### We will now want to find out 

In [93]:
# one hot encoding
gary_onehot = pd.get_dummies(gary_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
gary_onehot['Neighborhood'] = gary_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [gary_onehot.columns[-1]] + list(gary_onehot.columns[:-1])
gary_onehot = gary_onehot[fixed_columns]

gary_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare,Discount Store,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Food,Fried Chicken Joint,Gym,Historic Site,Hostel,Ice Cream Shop,Indie Theater,Intersection,Light Rail Station,Mexican Restaurant,Mobile Phone Shop,Park,Pizza Place,Platform,Restaurant,Seafood Restaurant,Theater,Tourist Information Center,Train Station
0,Aetna,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
1,Ambridge Mann,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2,Ambridge Mann,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ambridge Mann,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ambridge Mann,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [94]:
gary_onehot.shape

(49, 32)

In [95]:
gary_grouped = gary_onehot.groupby('Neighborhood').mean().reset_index()
gary_grouped

Unnamed: 0,Neighborhood,American Restaurant,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare,Discount Store,Electronics Store,Fast Food Restaurant,Fish & Chips Shop,Food,Fried Chicken Joint,Gym,Historic Site,Hostel,Ice Cream Shop,Indie Theater,Intersection,Light Rail Station,Mexican Restaurant,Mobile Phone Shop,Park,Pizza Place,Platform,Restaurant,Seafood Restaurant,Theater,Tourist Information Center,Train Station
0,Aetna,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
1,Ambridge Mann,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Black Oak,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brunswick,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
4,Downtown,0.142857,0.142857,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857
5,Downtown West,0.0,0.125,0.125,0.0,0.125,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125
6,Emerson,0.166667,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667
7,Glen Park,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0
8,Midtown,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Miller Beach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0


In [96]:
gary_grouped.shape

(12, 32)

In [97]:
num_top_venues = 5

for hood in gary_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = gary_grouped[gary_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aetna----
                        venue  freq
0  Tourist Information Center   1.0
1         American Restaurant   0.0
2                      Hostel   0.0
3                     Theater   0.0
4          Seafood Restaurant   0.0


----Ambridge Mann----
                  venue  freq
0    Light Rail Station  0.50
1               Daycare  0.25
2  Fast Food Restaurant  0.25
3   American Restaurant  0.00
4        Ice Cream Shop  0.00


----Black Oak----
                        venue  freq
0                        Park   1.0
1         American Restaurant   0.0
2                      Hostel   0.0
3  Tourist Information Center   0.0
4                     Theater   0.0


----Brunswick----
                 venue  freq
0           Restaurant  0.25
1    Electronics Store  0.25
2    Mobile Phone Shop  0.25
3   Mexican Restaurant  0.25
4  American Restaurant  0.00


----Downtown----
                 venue  freq
0  American Restaurant  0.14
1                  Gym  0.14
2             Platform  0.14
3

In [98]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [99]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = gary_grouped['Neighborhood']

for ind in np.arange(gary_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(gary_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aetna,Tourist Information Center,Train Station,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare
1,Ambridge Mann,Light Rail Station,Daycare,Fast Food Restaurant,Train Station,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service
2,Black Oak,Park,Train Station,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare
3,Brunswick,Mobile Phone Shop,Mexican Restaurant,Restaurant,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Discount Store,Daycare,Chinese Restaurant,Train Station
4,Downtown,Train Station,Gym,Bank,Baseball Stadium,Bus Station,American Restaurant,Platform,Pizza Place,Fish & Chips Shop,Theater
5,Downtown West,Train Station,Gym,Bank,Baseball Stadium,Platform,Bus Station,Chinese Restaurant,Discount Store,Basketball Court,Business Service
6,Emerson,Train Station,Platform,Baseball Stadium,Bus Station,Food,American Restaurant,Pizza Place,Fish & Chips Shop,Bank,Theater
7,Glen Park,Fried Chicken Joint,Discount Store,Theater,Bank,Pizza Place,Hostel,Business Service,Ice Cream Shop,Fish & Chips Shop,Fast Food Restaurant
8,Midtown,Historic Site,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare,Discount Store
9,Miller Beach,Seafood Restaurant,Mexican Restaurant,Fish & Chips Shop,Train Station,Fried Chicken Joint,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service


# 4. Cluster Neighborhoods

In [100]:
# set number of clusters
kclusters = 5

gary_grouped_clustering = gary_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(gary_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 4, 0, 0, 0, 0, 0, 2, 0], dtype=int32)

In [101]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

gary_merged = df_gary

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
gary_merged = gary_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

gary_merged

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aetna,41.59062,-87.28896,3,Tourist Information Center,Train Station,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare
1,Ambridge Mann,41.601548,-87.359294,0,Light Rail Station,Daycare,Fast Food Restaurant,Train Station,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service
2,Black Oak,41.565452,-87.396802,4,Park,Train Station,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare
3,Brunswick,41.602158,-87.403293,0,Mobile Phone Shop,Mexican Restaurant,Restaurant,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Discount Store,Daycare,Chinese Restaurant,Train Station
4,Downtown,41.6039,-87.33717,0,Train Station,Gym,Bank,Baseball Stadium,Bus Station,American Restaurant,Platform,Pizza Place,Fish & Chips Shop,Theater
5,Downtown West,41.6025,-87.338611,0,Train Station,Gym,Bank,Baseball Stadium,Platform,Bus Station,Chinese Restaurant,Discount Store,Basketball Court,Business Service
6,Emerson,41.602869,-87.332217,0,Train Station,Platform,Baseball Stadium,Bus Station,Food,American Restaurant,Pizza Place,Fish & Chips Shop,Bank,Theater
7,Glen Park,41.553072,-87.336509,0,Fried Chicken Joint,Discount Store,Theater,Bank,Pizza Place,Hostel,Business Service,Ice Cream Shop,Fish & Chips Shop,Fast Food Restaurant
8,Midtown,41.5767,-87.343813,2,Historic Site,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare,Discount Store
9,Miller Beach,41.601,-87.261,0,Seafood Restaurant,Mexican Restaurant,Fish & Chips Shop,Train Station,Fried Chicken Joint,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service


In [102]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(gary_merged['Latitude'], gary_merged['Longitude'], gary_merged['Neighborhood'], gary_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Conlcusion for finding the most appropriate neighborhood suitable for a Yoga studio.
We believe that Ambridge is spot on:
<ul>
  <li>There is a day care so that parent can drop their children and attend a Yoga session.</li>
  <li>There Light Rail Station and Bus Station make it easy to reach.</li>
  <li>There is also a Gym perhaps we can start giving session as a start</li>
    <li>There are two sport stadia, so people are sport minded.</li>
</ul>



In [104]:
gary_merged.loc[gary_merged['Cluster Labels'] == 0, gary_merged.columns[[0] + list(range(4, gary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Ambridge Mann,Light Rail Station,Daycare,Fast Food Restaurant,Train Station,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service
3,Brunswick,Mobile Phone Shop,Mexican Restaurant,Restaurant,Electronics Store,Fish & Chips Shop,Fast Food Restaurant,Discount Store,Daycare,Chinese Restaurant,Train Station
4,Downtown,Train Station,Gym,Bank,Baseball Stadium,Bus Station,American Restaurant,Platform,Pizza Place,Fish & Chips Shop,Theater
5,Downtown West,Train Station,Gym,Bank,Baseball Stadium,Platform,Bus Station,Chinese Restaurant,Discount Store,Basketball Court,Business Service
6,Emerson,Train Station,Platform,Baseball Stadium,Bus Station,Food,American Restaurant,Pizza Place,Fish & Chips Shop,Bank,Theater
7,Glen Park,Fried Chicken Joint,Discount Store,Theater,Bank,Pizza Place,Hostel,Business Service,Ice Cream Shop,Fish & Chips Shop,Fast Food Restaurant
9,Miller Beach,Seafood Restaurant,Mexican Restaurant,Fish & Chips Shop,Train Station,Fried Chicken Joint,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service
10,Tolleston,Basketball Court,Intersection,Train Station,Gym,Bank,Baseball Stadium,Bus Station,Business Service,Chinese Restaurant,Daycare


In [105]:
gary_merged.loc[gary_merged['Cluster Labels'] == 1, gary_merged.columns[[0] + list(range(4, gary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Westside,Indie Theater,Train Station,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare


In [106]:
gary_merged.loc[gary_merged['Cluster Labels'] == 2, gary_merged.columns[[0] + list(range(4, gary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Midtown,Historic Site,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare,Discount Store


In [107]:
gary_merged.loc[gary_merged['Cluster Labels'] == 3, gary_merged.columns[[0] + list(range(4, gary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aetna,Tourist Information Center,Train Station,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare


In [108]:
gary_merged.loc[gary_merged['Cluster Labels'] == 4, gary_merged.columns[[0] + list(range(4, gary_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Black Oak,Park,Train Station,Gym,Bank,Baseball Stadium,Basketball Court,Bus Station,Business Service,Chinese Restaurant,Daycare
