## CAPSTONE PROJECT

### Yoga & Meditation shops in New York city

An entepreneur wants to open 5 tea stores in New York city selling also Yoga, meditation and alternative medicine equipment/books/tools/etc. She wants to find out how Yoga-related venues are distributed in the area, and how efficiently these could be clustered to be served by each of the 5  stores in proximity, optimally within walking distance and with minimum number of river and bridge crossings (to make it easier for cyclists and walkers).

In this notebook we perform the data acquisition, pre-processing, analysis and visualization for identifying a set of five areas of optimal location.

**Import all necessary dependencies**

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


**Use the Foursquare API**

We will use the Foursquare API for obtaining list of all Yoga Studios

First we will use the geopy library to get New York City's geographical coordinates

In [1]:
address = 'New York City, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

NameError: name 'Nominatim' is not defined

In [3]:
# Putting together the URL for the foursquare api call
CLIENT_ID = 'M1OEMNM3EGHKQ4U3IQZ4WTJWRNTHIGJ4QAH5J4OWWVJYN2ML' # your Foursquare ID
CLIENT_SECRET = 'NWDV3GKRHBDPRSAFZ3QAAUWR5B13HPHHI4DB0TGDKJBIQVYP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


In [4]:
# Radius from the center of New York City 
# and number of venues (LIMIT) to retrieve information on
radius = 5000
LIMIT = 200

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}&query=Yoga Studio'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url


'https://api.foursquare.com/v2/venues/explore?client_id=M1OEMNM3EGHKQ4U3IQZ4WTJWRNTHIGJ4QAH5J4OWWVJYN2ML&client_secret=NWDV3GKRHBDPRSAFZ3QAAUWR5B13HPHHI4DB0TGDKJBIQVYP&ll=40.7308619,-73.9871558&v=20180605&radius=5000&limit=200&query=Yoga Studio'

In [5]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5bd2e9a14434b9406e1705d2'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'New York',
  'headerFullLocation': 'New York',
  'headerLocationGranularity': 'city',
  'query': 'yoga studio',
  'totalResults': 216,
  'suggestedBounds': {'ne': {'lat': 40.775861945000045,
    'lng': -73.92788286035417},
   'sw': {'lat': 40.685861854999956, 'lng': -74.04642873964582}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '45367ca4f964a520ab3b1fe3',
       'name': 'Yoga to the People',
       'location': {'address': '12 Saint Marks Pl',
        'crossStreet': 'btwn 2nd & 3rd Ave.',
        'lat': 40.72898670891247,
        'lng': -73.98936467240519,
        '

In [6]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [7]:
# Clean the JSON and structureit into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
yoga_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
yoga_venues = yoga_venues.loc[:, filtered_columns]

# filter the category for each row
yoga_venues['venue.categories'] = yoga_venues.apply(get_category_type, axis=1)

# clean columns
yoga_venues.columns = [col.split(".")[-1] for col in yoga_venues.columns]

yoga_venues.shape

(100, 4)

In [8]:
yoga_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Yoga to the People,Yoga Studio,40.728987,-73.989365
1,Yoga Vida,Yoga Studio,40.733937,-73.992687
2,Modo Yoga NYC,Yoga Studio,40.734674,-73.99867
3,Sacred Sounds Yoga,Yoga Studio,40.728638,-74.000115
4,Jivamukti Yoga School NYC,Yoga Studio,40.734314,-73.991184


**Create a map of New York with Yoga-related venues superimposed on top**

In [9]:
# create map of New York using latitude and longitude values

map_newyork = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map

for lat, lng, name in zip(yoga_venues['lat'], yoga_venues['lng'], yoga_venues['name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
map_newyork

**Clustering**

Run k-means to cluster the neighborhood into 5 clusters.

In [10]:

# set number of clusters
kclusters = 5

yoga_venues_clustering = yoga_venues.drop(['name','categories'],1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0, n_init = 10).fit(yoga_venues_clustering)

# check cluster labels generated for each row in the dataframe

kmeans.labels_[0:10] 

array([2, 0, 0, 2, 0, 2, 0, 2, 0, 2], dtype=int32)

In [11]:
# add clustering labels
yoga_venues['Cluster Labels'] = kmeans.labels_

In [12]:
yoga_venues.head()

Unnamed: 0,name,categories,lat,lng,Cluster Labels
0,Yoga to the People,Yoga Studio,40.728987,-73.989365,2
1,Yoga Vida,Yoga Studio,40.733937,-73.992687,0
2,Modo Yoga NYC,Yoga Studio,40.734674,-73.99867,0
3,Sacred Sounds Yoga,Yoga Studio,40.728638,-74.000115,2
4,Jivamukti Yoga School NYC,Yoga Studio,40.734314,-73.991184,0


Finally, let's visualize the resulting clusters

In [13]:
# create map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map

markers_colors = []
for lat, lon, name, cluster in zip(yoga_venues['lat'], yoga_venues['lng'], yoga_venues['name'],yoga_venues['Cluster Labels']):
    label = folium.Popup(str(name) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters