<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>Explore Financial firms in Manhattan and explore eateries around</font></h1>

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>
2. <a href="#item2">Look for Manhattan in New York City</a>
3. <a href="#item3">Explore financial firms in Manhattan</a>
4. <a href="#item4">Extract eateries around the financial firm</a>
5. <a href="#item5">Explore eatery categories around the financial firm</a>
6. <a href="#item6">Explore the organic eateries around each financial firm</a>
7. <a href="#item7">Cluster the financial firms and find out the locations of organic eateries</a>    
</font>
</div>

Download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 1. Download and Explore Dataset

The New York neighborhoods and locations dataset exists for free on the web in this link:https://geo.nyu.edu/catalog/nyu_2451_34572

In [None]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

### Load and explore the data

Next, let's load the data.

In [2]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Let's take a quick look at the data.

In [3]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [4]:
neighborhoods_data=newyork_data['features']

Let's take a look at the first item in this list.

In [5]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [7]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [8]:
for data in neighborhoods_data:
    borough = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [9]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [10]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


#### Use geopy library to get the latitude and longitude values of New York City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [11]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Create a map of New York with neighborhoods superimposed on top.

In [12]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Let's slice the original dataframe and create a new dataframe of the Manhattan data.

In [13]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


## 2.Locate Manhattan in New York and visualize on the map

Let's get the geographical coordinates of Manhattan.

In [14]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7900869, -73.9598295.


In [15]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

Next, we are going to start utilizing the Foursquare API to explore the financial firms in Manhattan and segment them.

#### Define Foursquare Credentials and Version

In [16]:
CLIENT_ID = 'ZTVYERK50L2GZZY3ZW33DC2EXWEF32255BD4JDKR152UYHMZ' # your Foursquare ID
CLIENT_SECRET = 'SXW4TOAZJHKKRMDWR0FMVA2JW4WRAXCLYNAEEXJTXOQFJNEV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ZTVYERK50L2GZZY3ZW33DC2EXWEF32255BD4JDKR152UYHMZ
CLIENT_SECRET:SXW4TOAZJHKKRMDWR0FMVA2JW4WRAXCLYNAEEXJTXOQFJNEV


## 3.Explore Manhattan for financial firms

#### Now let's search for the financial institutions in Manhattan

First, let's create the GET request URL. Name the URL **url**.

In [17]:
search_query='finance,investment'
radius=10000
LIMIT=100
url='https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
    CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)

Send the GET request and examine the results

In [18]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cc0f9ea2b274a0039774030'},
 'response': {'venues': [{'id': '4e5465301f6e7ab6b1b28def',
    'name': 'Fidelity Investment',
    'location': {'address': '1356 3rd Ave',
     'lat': 40.77302882997516,
     'lng': -73.95793558189008,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.77302882997516,
       'lng': -73.95793558189008}],
     'distance': 1905,
     'postalCode': '10075',
     'cc': 'US',
     'city': 'New York',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['1356 3rd Ave',
      'New York, NY 10075',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d10a951735',
      'name': 'Bank',
      'pluralName': 'Banks',
      'shortName': 'Bank',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/financial_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1556150762',
    'hasPerk': False},
   {'id': '51adc8498bbdb821d3c901b2',
    'name': 'The

Define a function to get the category type from a given dataset

In [19]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [20]:
venues = results['response']['venues']   
nearby_venues = json_normalize(venues) # flatten JSON
nearby_venues.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d10a951735', 'name': 'B...",False,4e5465301f6e7ab6b1b28def,1356 3rd Ave,US,New York,United States,,1905,"[1356 3rd Ave, New York, NY 10075, United States]","[{'label': 'display', 'lat': 40.77302882997516...",40.773029,-73.957936,,10075,NY,Fidelity Investment,v-1556150762,
1,"[{'id': '5032764e91d4c4b30a586d5a', 'name': 'C...",False,51adc8498bbdb821d3c901b2,1717 Broadway,US,New York,United States,,3493,"[1717 Broadway, New York, NY 10019, United Sta...","[{'label': 'display', 'lat': 40.7633040105658,...",40.763304,-73.98144,,10019,NY,The Finance Encyclopedia,v-1556150762,
2,"[{'id': '4bf58dd8d48988d124941735', 'name': 'O...",False,507d7707e4b0751354e6c4bf,www.apollospartners.com,US,New York,United States,,700,"[www.apollospartners.com, New York, NY 10025, ...","[{'label': 'display', 'lat': 40.7927105, 'lng'...",40.79271,-73.967388,,10025,NY,Apollos Partners NYC office (Executive Search ...,v-1556150762,
3,"[{'id': '503287a291d4c4b30a586d65', 'name': 'F...",False,5c8657f835f983002c3ee75f,1345 Avenue of the Americas,US,New York,United States,,3425,"[1345 Avenue of the Americas, New York, NY 101...","[{'label': 'display', 'lat': 40.762857, 'lng':...",40.762857,-73.978752,,10105,NY,First Eagle Investment Management,v-1556150762,
4,"[{'id': '4bf58dd8d48988d124941735', 'name': 'O...",False,5c0ecb0767af3a002c4ec4d4,40 W 57th St,US,New York,United States,,3244,"[40 W 57th St, New York, NY 10019, United States]","[{'label': 'display', 'lat': 40.763417, 'lng':...",40.763417,-73.97536,,10019,NY,HPS Investment Partners,v-1556150762,


Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [21]:
venues = results['response']['venues']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues
finfirms=nearby_venues
finfirms=finfirms.rename(columns={'name':'Name','categories':'Category','lat':'Latitude','lng':'Longitude'})
finfirms.head()

Unnamed: 0,Name,Category,Latitude,Longitude
0,Fidelity Investment,Bank,40.773029,-73.957936
1,The Finance Encyclopedia,Campaign Office,40.763304,-73.98144
2,Apollos Partners NYC office (Executive Search ...,Office,40.79271,-73.967388
3,First Eagle Investment Management,Financial or Legal Service,40.762857,-73.978752
4,HPS Investment Partners,Office,40.763417,-73.97536


In [22]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

50 venues were returned by Foursquare.


#### Visualize the financial firms on Manhattan's map

In [23]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
for lat, lng, label in zip(finfirms['Latitude'], finfirms['Longitude'], finfirms['Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=False,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

The above map gives us an eagle's view of where the financial firms are located in Manhattan and also the neighborhoods around it.

## 4.Extract eateries around the above listed financial firms in Manhattan

##### Let's create a function to repeat the same process to the financial firms listed

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=3000):
    
    search_query='organic,eatery'
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            search_query,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Name', 
                  'Firm Latitude', 
                  'Firm Longitude', 
                  'Eatery', 
                  'Eatery Latitude', 
                  'Eatery Longitude', 
                  'Eatery Category']
    
    return(nearby_venues)

#### Now write the code to run the above function around each financial firm and create a new dataframe called *finfirm_eateries*.

In [25]:
# type your answer here
finfirm_eateries= getNearbyVenues(names=finfirms['Name'],
                                   latitudes=finfirms['Latitude'],
                                   longitudes=finfirms['Longitude']
                               )

Fidelity Investment
The Finance Encyclopedia
Apollos Partners NYC office (Executive Search - Accounting & Finance Professionals)
First Eagle Investment Management
HPS Investment Partners
Citadel Investment Group
Fortress Investment Group
Main & Wall University, The Institute for Modular-Finance
Stone Harbor Investment Partners Lp
US Post Office
Fidelity Investments
JPMorgan Chase & Co. World Headquarters
Schroders Investment Management
Carnegie Investment Counsel
Korea Investment Corp.
a2 investment group
PennantPark Investment Corporation
Tudor Investment Corporation
Decagon Alternative Investment Consultants
Signature Investment Group
Worlwide Investment Network
AC Investment Management
G2 Investment
Midwood Investment & Development
Polygon Investment Partners
Tiedemann Investment Group
Senator Investment Group
plural investment
Robeco Investment Management
Capricorn Investment Group
Falcon Investment Advisors
1919 Investment Counsel
Al Rayyan Tourism Investment Company
Aquamarine In

#### Let's check the size of the resulting dataframe of financial firms and surrounding eateries

In [26]:
print(finfirm_eateries.shape)
finfirm_eateries.head()

(1065, 7)


Unnamed: 0,Name,Firm Latitude,Firm Longitude,Eatery,Eatery Latitude,Eatery Longitude,Eatery Category
0,Fidelity Investment,40.773029,-73.957936,Ateaz Organic Coffee & Tea,40.775124,-73.953813,Café
1,Fidelity Investment,40.773029,-73.957936,Duke Eatery,40.756797,-73.975107,Food & Drink Shop
2,Fidelity Investment,40.773029,-73.957936,Cock & Bull British Pub and Eatery,40.755946,-73.980621,Pub
3,Fidelity Investment,40.773029,-73.957936,KA Organic Handmade Shoes,40.774416,-73.955096,Shoe Store
4,Fidelity Investment,40.773029,-73.957936,Julia's Organic Skincare,40.769118,-73.95479,Health & Beauty Service


#### Let's filter the organic eateries from the above dataframe and name it *finfirm_org_eateries*

In [27]:
finfirm_org_eateries=finfirm_eateries[finfirm_eateries['Eatery'].str.contains('Organic')].reset_index(drop=True)
finfirm_org_eateries=finfirm_org_eateries[~finfirm_org_eateries.Eatery.str.contains('Spa')]
finfirm_org_eateries=finfirm_org_eateries[~finfirm_org_eateries.Eatery.str.contains('Dry Cleaners')]
finfirm_org_eateries=finfirm_org_eateries[~finfirm_org_eateries.Eatery.str.contains('Dry Cleaning')]
finfirm_org_eateries=finfirm_org_eateries[~finfirm_org_eateries.Eatery.str.contains('Skincare')]
finfirm_org_eateries=finfirm_org_eateries[~finfirm_org_eateries.Eatery.str.contains('Hair')]
finfirm_org_eateries=finfirm_org_eateries[~finfirm_org_eateries.Eatery.str.contains('Healing')]
finfirm_org_eateries=finfirm_org_eateries[~finfirm_org_eateries.Eatery.str.contains('Shoes')]
finfirm_org_eateries.reset_index(drop=True)
finfirm_org_eateries.head()

Unnamed: 0,Name,Firm Latitude,Firm Longitude,Eatery,Eatery Latitude,Eatery Longitude,Eatery Category
0,Fidelity Investment,40.773029,-73.957936,Ateaz Organic Coffee & Tea,40.775124,-73.953813,Café
3,Fidelity Investment,40.773029,-73.957936,Organic Harvest Cafe,40.77796,-73.945745,Café
4,Fidelity Investment,40.773029,-73.957936,Healthy Organic Deli,40.781418,-73.94616,Deli / Bodega
5,Fidelity Investment,40.773029,-73.957936,Organic Fruit Shakes & Smoothies,40.786135,-73.950812,Food Truck
6,The Finance Encyclopedia,40.763304,-73.98144,Bean & Bean Organic Coffee,40.747189,-73.9971,Coffee Shop


#### Let's check for the number of organic eateries around our list of financial firms

In [28]:
finfirm_org_eateries.shape

(310, 7)

#### Now, let's check how many eateries were returned around each financial firm

In [29]:
finfirm_eateries.groupby('Name').count()

Unnamed: 0_level_0,Firm Latitude,Firm Longitude,Eatery,Eatery Latitude,Eatery Longitude,Eatery Category
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1919 Investment Counsel,23,23,23,23,23,23
AC Investment Management,21,21,21,21,21,21
Al Rayyan Tourism Investment Company,23,23,23,23,23,23
Apollos Partners NYC office (Executive Search - Accounting & Finance Professionals),10,10,10,10,10,10
Aquamarine Investment Partners,23,23,23,23,23,23
Bardin Hill Investment Partners LP,23,23,23,23,23,23
"Bowey Investment Management, LLC",23,23,23,23,23,23
Candlewood Investment Group,25,25,25,25,25,25
Capricorn Investment Group,22,22,22,22,22,22
Carnegie Investment Counsel,19,19,19,19,19,19


## 5.Explore different categories of eateries around the firms

#### Let's find out how many unique categories can be curated from all the returned eateries

In [30]:
print('There are {} uniques categories.'.format(len(finfirm_eateries['Eatery Category'].unique())))

There are 29 uniques categories.


#### Analyze each financial firm's vicinity for different types of eateries around

In [33]:
# one hot encoding
fineats_onehot = pd.get_dummies(finfirm_eateries[['Eatery Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
fineats_onehot['Name'] = finfirm_eateries['Name'] 
fixed_columns = [fineats_onehot.columns[-1]] + list(fineats_onehot.columns[:-1])
fineats_onehot = fineats_onehot[fixed_columns]

print(fineats_onehot.shape)
fineats_onehot.head()

(1065, 30)


Unnamed: 0,Name,American Restaurant,BBQ Joint,Breakfast Spot,Burger Joint,Business Service,Café,Cocktail Bar,Coffee Shop,Deli / Bodega,Dry Cleaner,Food & Drink Shop,Food Truck,Fruit & Vegetable Store,Health & Beauty Service,Intersection,Juice Bar,Korean Restaurant,Lounge,Market,Massage Studio,Mediterranean Restaurant,New American Restaurant,Organic Grocery,Pub,Restaurant,Salad Place,Shoe Store,Spa,Vegetarian / Vegan Restaurant
0,Fidelity Investment,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Fidelity Investment,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Fidelity Investment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
3,Fidelity Investment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
4,Fidelity Investment,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Eliminate any non-eateries as they are irrelevant in our study

In [34]:
#fineats_onehot.drop('Furniture / Home Store',axis=1,inplace=True)
fineats_onehot.drop('Intersection',axis=1,inplace=True)
fineats_onehot.drop('Dry Cleaner',axis=1,inplace=True)
fineats_onehot.drop('Shoe Store',axis=1,inplace=True)
fineats_onehot.drop('Massage Studio',axis=1,inplace=True)
fineats_onehot.drop('Spa',axis=1,inplace=True)
print(fineats_onehot.shape)
fineats_onehot.head()

(1065, 25)


Unnamed: 0,Name,American Restaurant,BBQ Joint,Breakfast Spot,Burger Joint,Business Service,Café,Cocktail Bar,Coffee Shop,Deli / Bodega,Food & Drink Shop,Food Truck,Fruit & Vegetable Store,Health & Beauty Service,Juice Bar,Korean Restaurant,Lounge,Market,Mediterranean Restaurant,New American Restaurant,Organic Grocery,Pub,Restaurant,Salad Place,Vegetarian / Vegan Restaurant
0,Fidelity Investment,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Fidelity Investment,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Fidelity Investment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
3,Fidelity Investment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Fidelity Investment,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

#### Next, let's group rows by financial firm and by taking the mean of the frequency of occurrence of each eatery category

In [35]:
fineats_grouped = fineats_onehot.groupby('Name').mean().reset_index()
fineats_grouped.head()

Unnamed: 0,Name,American Restaurant,BBQ Joint,Breakfast Spot,Burger Joint,Business Service,Café,Cocktail Bar,Coffee Shop,Deli / Bodega,Food & Drink Shop,Food Truck,Fruit & Vegetable Store,Health & Beauty Service,Juice Bar,Korean Restaurant,Lounge,Market,Mediterranean Restaurant,New American Restaurant,Organic Grocery,Pub,Restaurant,Salad Place,Vegetarian / Vegan Restaurant
0,1919 Investment Counsel,0.043478,0.0,0.0,0.043478,0.0,0.130435,0.0,0.043478,0.086957,0.043478,0.043478,0.0,0.043478,0.0,0.043478,0.0,0.043478,0.086957,0.043478,0.0,0.043478,0.086957,0.086957,0.0
1,AC Investment Management,0.047619,0.0,0.0,0.047619,0.0,0.142857,0.0,0.047619,0.095238,0.047619,0.047619,0.0,0.047619,0.0,0.047619,0.0,0.0,0.095238,0.047619,0.0,0.047619,0.095238,0.047619,0.0
2,Al Rayyan Tourism Investment Company,0.043478,0.0,0.0,0.043478,0.0,0.130435,0.0,0.043478,0.086957,0.043478,0.043478,0.0,0.043478,0.0,0.043478,0.0,0.043478,0.086957,0.043478,0.0,0.043478,0.086957,0.086957,0.0
3,Apollos Partners NYC office (Executive Search ...,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.2,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aquamarine Investment Partners,0.086957,0.0,0.043478,0.043478,0.0,0.086957,0.0,0.043478,0.173913,0.043478,0.043478,0.0,0.043478,0.0,0.043478,0.0,0.0,0.086957,0.043478,0.0,0.043478,0.043478,0.043478,0.0


#### Let's confirm the new size

In [36]:
print(fineats_grouped.shape)


(49, 25)


#### Let's print each firm along with the top 5 most common eateries

In [37]:
num_top_venues = 5

for firm in fineats_grouped['Name']:
    print("----"+firm+"----")
    temp = fineats_grouped[fineats_grouped['Name'] == firm].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1919 Investment Counsel----
                      venue  freq
0                      Café  0.13
1               Salad Place  0.09
2                Restaurant  0.09
3  Mediterranean Restaurant  0.09
4             Deli / Bodega  0.09


----AC Investment Management----
                      venue  freq
0                      Café  0.14
1                Restaurant  0.10
2  Mediterranean Restaurant  0.10
3             Deli / Bodega  0.10
4       American Restaurant  0.05


----Al Rayyan Tourism Investment Company----
                      venue  freq
0                      Café  0.13
1               Salad Place  0.09
2                Restaurant  0.09
3  Mediterranean Restaurant  0.09
4             Deli / Bodega  0.09


----Apollos Partners NYC office (Executive Search - Accounting & Finance Professionals)----
                     venue  freq
0               Food Truck   0.2
1                     Café   0.1
2            Deli / Bodega   0.1
3  Fruit & Vegetable Store   0.1
4              

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the eateries in descending order.

In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 eatery categories for each financial firm.

In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Eatery'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Eatery'.format(ind+1))

# create a new dataframe
finfirm_eateries_sorted = pd.DataFrame(columns=columns)
finfirm_eateries_sorted['Name'] = fineats_grouped['Name']

for ind in np.arange(fineats_grouped.shape[0]):
    finfirm_eateries_sorted.iloc[ind, 1:] = return_most_common_venues(fineats_grouped.iloc[ind, :], num_top_venues)

finfirm_eateries_sorted.head()

Unnamed: 0,Name,1st Most Common Eatery,2nd Most Common Eatery,3rd Most Common Eatery,4th Most Common Eatery,5th Most Common Eatery,6th Most Common Eatery,7th Most Common Eatery,8th Most Common Eatery,9th Most Common Eatery,10th Most Common Eatery
0,1919 Investment Counsel,Café,Restaurant,Deli / Bodega,Mediterranean Restaurant,Salad Place,American Restaurant,Health & Beauty Service,Coffee Shop,Food & Drink Shop,Food Truck
1,AC Investment Management,Café,Restaurant,Deli / Bodega,Mediterranean Restaurant,American Restaurant,Health & Beauty Service,Food & Drink Shop,Food Truck,Salad Place,Burger Joint
2,Al Rayyan Tourism Investment Company,Café,Restaurant,Deli / Bodega,Mediterranean Restaurant,Salad Place,American Restaurant,Health & Beauty Service,Coffee Shop,Food & Drink Shop,Food Truck
3,Apollos Partners NYC office (Executive Search ...,Food Truck,Fruit & Vegetable Store,Café,Market,Deli / Bodega,BBQ Joint,Breakfast Spot,Burger Joint,Business Service,Cocktail Bar
4,Aquamarine Investment Partners,Deli / Bodega,American Restaurant,Mediterranean Restaurant,Café,Korean Restaurant,Food & Drink Shop,Food Truck,Salad Place,Health & Beauty Service,Burger Joint


## 6. Explore the organic eateries around the firms

### Next, let's also analyze the different organic eateries around each financial firm.

This is similar to what we have done for the different types of eatery categories

In [40]:
# one hot encoding
fineats_org_onehot = pd.get_dummies(finfirm_org_eateries[['Eatery']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
fineats_org_onehot['Name'] =finfirm_org_eateries['Name'] 
fixed_columns = [fineats_org_onehot.columns[-1]] + list(fineats_org_onehot.columns[:-1])
# move neighborhood column to the first column

fineats_org_onehot = fineats_org_onehot[fixed_columns]
print(fineats_org_onehot.shape)
fineats_org_onehot.head()

(310, 22)


Unnamed: 0,Name,Ateaz Organic Coffee & Tea,Bean & Bean Organic Coffee,Central Park Organic Deli Grocery,Chelsea Organic,Creative Organic Foods Store,East Village Organic,Evergreen Organic Foods & Deli,Go Green Natural Juice Organic,Healthy Organic Deli,Jahlookova Natural Organic Health Mart,NYC Healthy Bites - Organic Hot Dogs,Olives Organic Market,Organic Cold Pressed Juices & Smoothies,Organic Fruit Shakes & Smoothies,Organic Harvest Cafe,Organic Mexican Food Truck,Organic Rug Cleaners,Smokey Burger Organic,Sol Tan Organic,The Organic Grill,The Organic Salad
0,Fidelity Investment,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Fidelity Investment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
4,Fidelity Investment,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
5,Fidelity Investment,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
6,The Finance Encyclopedia,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by financial firm and by taking the mean of the frequency of occurrence of each organic eatery

In [41]:
orgeats_grouped = fineats_org_onehot.groupby('Name').mean().reset_index()
orgeats_grouped.head()

Unnamed: 0,Name,Ateaz Organic Coffee & Tea,Bean & Bean Organic Coffee,Central Park Organic Deli Grocery,Chelsea Organic,Creative Organic Foods Store,East Village Organic,Evergreen Organic Foods & Deli,Go Green Natural Juice Organic,Healthy Organic Deli,Jahlookova Natural Organic Health Mart,NYC Healthy Bites - Organic Hot Dogs,Olives Organic Market,Organic Cold Pressed Juices & Smoothies,Organic Fruit Shakes & Smoothies,Organic Harvest Cafe,Organic Mexican Food Truck,Organic Rug Cleaners,Smokey Burger Organic,Sol Tan Organic,The Organic Grill,The Organic Salad
0,1919 Investment Counsel,0.166667,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.166667
1,AC Investment Management,0.166667,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.166667
2,Al Rayyan Tourism Investment Company,0.166667,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.166667
3,Apollos Partners NYC office (Executive Search ...,0.166667,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aquamarine Investment Partners,0.142857,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.142857


Let's confirm the new size

In [42]:
print(orgeats_grouped.shape)

(49, 22)


#### Let's print each firm along with the top 5 most common organic eateries

In [43]:
num_top_venues = 5

for firm in orgeats_grouped['Name']:
    print("----"+firm+"----")
    temp = orgeats_grouped[orgeats_grouped['Name'] == firm].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1919 Investment Counsel----
                        venue  freq
0  Ateaz Organic Coffee & Tea  0.17
1  Bean & Bean Organic Coffee  0.17
2       Smokey Burger Organic  0.17
3  Organic Mexican Food Truck  0.17
4           The Organic Salad  0.17


----AC Investment Management----
                        venue  freq
0  Ateaz Organic Coffee & Tea  0.17
1  Bean & Bean Organic Coffee  0.17
2       Smokey Burger Organic  0.17
3  Organic Mexican Food Truck  0.17
4           The Organic Salad  0.17


----Al Rayyan Tourism Investment Company----
                        venue  freq
0  Ateaz Organic Coffee & Tea  0.17
1  Bean & Bean Organic Coffee  0.17
2       Smokey Burger Organic  0.17
3  Organic Mexican Food Truck  0.17
4           The Organic Salad  0.17


----Apollos Partners NYC office (Executive Search - Accounting & Finance Professionals)----
                                    venue  freq
0              Ateaz Organic Coffee & Tea  0.17
1            Creative Organic Foods Store  0.17


#### Let's put that into a *pandas* dataframe

With the help of the function defined earlier to sort the eateries, let's create a dataframe and display *10 most common organic eateries* for each financial firm

In [44]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Org Eatery'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Org Eatery'.format(ind+1))

# create a new dataframe
org_eateries_sorted = pd.DataFrame(columns=columns)
org_eateries_sorted['Name'] = orgeats_grouped['Name']

for ind in np.arange(orgeats_grouped.shape[0]):
    org_eateries_sorted.iloc[ind, 1:] = return_most_common_venues(orgeats_grouped.iloc[ind, :], num_top_venues)

org_eateries_sorted.head()

Unnamed: 0,Name,1st Most Common Org Eatery,2nd Most Common Org Eatery,3rd Most Common Org Eatery,4th Most Common Org Eatery,5th Most Common Org Eatery,6th Most Common Org Eatery,7th Most Common Org Eatery,8th Most Common Org Eatery,9th Most Common Org Eatery,10th Most Common Org Eatery
0,1919 Investment Counsel,The Organic Salad,Organic Mexican Food Truck,Bean & Bean Organic Coffee,Chelsea Organic,Ateaz Organic Coffee & Tea,Smokey Burger Organic,Organic Rug Cleaners,Organic Harvest Cafe,Organic Fruit Shakes & Smoothies,Organic Cold Pressed Juices & Smoothies
1,AC Investment Management,The Organic Salad,Organic Mexican Food Truck,Bean & Bean Organic Coffee,Chelsea Organic,Ateaz Organic Coffee & Tea,Smokey Burger Organic,Organic Rug Cleaners,Organic Harvest Cafe,Organic Fruit Shakes & Smoothies,Organic Cold Pressed Juices & Smoothies
2,Al Rayyan Tourism Investment Company,The Organic Salad,Organic Mexican Food Truck,Bean & Bean Organic Coffee,Chelsea Organic,Ateaz Organic Coffee & Tea,Smokey Burger Organic,Organic Rug Cleaners,Organic Harvest Cafe,Organic Fruit Shakes & Smoothies,Organic Cold Pressed Juices & Smoothies
3,Apollos Partners NYC office (Executive Search ...,NYC Healthy Bites - Organic Hot Dogs,Organic Fruit Shakes & Smoothies,Central Park Organic Deli Grocery,Creative Organic Foods Store,Jahlookova Natural Organic Health Mart,Ateaz Organic Coffee & Tea,Organic Harvest Cafe,Organic Mexican Food Truck,Organic Cold Pressed Juices & Smoothies,Olives Organic Market
4,Aquamarine Investment Partners,The Organic Salad,Organic Mexican Food Truck,Bean & Bean Organic Coffee,Chelsea Organic,Olives Organic Market,Ateaz Organic Coffee & Tea,Smokey Burger Organic,Organic Harvest Cafe,Organic Fruit Shakes & Smoothies,Organic Cold Pressed Juices & Smoothies


## 7.Cluster the financial firms and locate the organic eateries

Run *k*-means to cluster the financial firms into 5 clusters.

In [45]:
# set number of clusters
kclusters = 5

fineats_grouped_clustering = fineats_grouped.drop('Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(fineats_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 2, 3, 0, 0, 0, 0, 2, 2], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 eatery categories in each firm's neighborhood.

In [46]:
# add clustering labels
finfirm_eateries_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

fineats_merged = finfirms

fineats_merged = fineats_merged.join(finfirm_eateries_sorted.set_index('Name'), on='Name')

fineats_merged.head()# check the last columns!

Unnamed: 0,Name,Category,Latitude,Longitude,Cluster Labels,1st Most Common Eatery,2nd Most Common Eatery,3rd Most Common Eatery,4th Most Common Eatery,5th Most Common Eatery,6th Most Common Eatery,7th Most Common Eatery,8th Most Common Eatery,9th Most Common Eatery,10th Most Common Eatery
0,Fidelity Investment,Bank,40.773029,-73.957936,2,Café,Food Truck,Pub,Coffee Shop,Deli / Bodega,Health & Beauty Service,Food & Drink Shop,Vegetarian / Vegan Restaurant,BBQ Joint,Breakfast Spot
1,The Finance Encyclopedia,Campaign Office,40.763304,-73.98144,2,Café,Restaurant,Deli / Bodega,Mediterranean Restaurant,Salad Place,American Restaurant,Health & Beauty Service,Food & Drink Shop,Food Truck,Burger Joint
2,Apollos Partners NYC office (Executive Search ...,Office,40.79271,-73.967388,3,Food Truck,Fruit & Vegetable Store,Café,Market,Deli / Bodega,BBQ Joint,Breakfast Spot,Burger Joint,Business Service,Cocktail Bar
3,First Eagle Investment Management,Financial or Legal Service,40.762857,-73.978752,2,Café,Restaurant,Deli / Bodega,Mediterranean Restaurant,American Restaurant,Health & Beauty Service,Food & Drink Shop,Food Truck,Salad Place,Burger Joint
4,HPS Investment Partners,Office,40.763417,-73.97536,0,Deli / Bodega,Café,American Restaurant,Mediterranean Restaurant,Korean Restaurant,Food & Drink Shop,Food Truck,Salad Place,Health & Beauty Service,Coffee Shop


#### Finally, let's visualize the resulting clusters
### The map below visualizes clusters of the financial firms along with the organic eateries around them.

In [47]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
    
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(fineats_merged['Latitude'], fineats_merged['Longitude'], fineats_merged['Name'], fineats_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=7,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

#Locate organic eateries on the map
for lat,lon,eat in zip(finfirm_org_eateries['Eatery Latitude'],finfirm_org_eateries['Eatery Longitude'],finfirm_org_eateries['Eatery']):
    #if 'rganic' in cat:
    label = folium.Popup(str(eat), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='brown',
        fill=True,
        fill_color='brown',
        fill_opacity=0.7,
        parse_html=False
        ).add_to(map_clusters)

map_clusters

<a id='item5'></a>