## Introduction

A client is moving from Paris to Brooklyn, New York for work and his wife wants to move the Coffee Shop she has now in Paris to Brooklyn. They ask me where in the Borough I would recommend to open it.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in Brookly</a>

3. <a href="#item3">Conclusion</a>    
</font>
</div>

Download all the dependencies needed

In [1]:
import numpy as np 

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium 

## 1. Download and Explore Dataset

Dataset that contains boroughs and neighborhoods in New York as well as the the latitude and logitude coordinates of each neighborhood: 
https://geo.nyu.edu/catalog/nyu_2451_34572

I use the downloaded file placed on the server, so I run a `wget` command and access the data. 

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### Load data

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Define a new variable that includes the Features data.

In [4]:
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the data into a *pandas* dataframe

In [5]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Fill the dataframe one row at a time with a loop.

In [6]:
for data in neighborhoods_data:
    borough = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [7]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### Create a new dataframe of Brooklyn data.

In [8]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


In [9]:
brooklyn_data.shape

(70, 4)

#### Get the latitude and longitud of Brooklyn and create a map with the neighborhoods superimposed on top

In [10]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


In [11]:
# create map of Brooklyn using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

## 2. Explore Neighborhoods in Brooklyn

Define Foursquare Credentials and Version

In [32]:
CLIENT_ID = 'NA3LVSVRKA340AEGF5LJWUFT4ZLS3KDUWSFV3M20JQFMJKKM' # your Foursquare ID
CLIENT_SECRET = 'ABM2AZ5SFVGWNYOJLYNFZX55JLBHWQOG0KSSQWOABNFOHHN5' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: NA3LVSVRKA340AEGF5LJWUFT4ZLS3KDUWSFV3M20JQFMJKKM
CLIENT_SECRET:ABM2AZ5SFVGWNYOJLYNFZX55JLBHWQOG0KSSQWOABNFOHHN5


#### Function to get venues in all the neighborhoods of Brooklyn

In [36]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the above function

In [37]:
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )


Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heig

In [41]:
print(brooklyn_venues.shape)
brooklyn_venues.head()

(5626, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Cocoa Grinder,40.623967,-74.030863,Juice Bar
2,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
4,Bay Ridge,40.625801,-74.030621,Ho' Brah Taco Joint,40.62296,-74.031371,Taco Place


Get a new dataframe that includes only CoffeShops and Cafes in Brooklyn

In [72]:
brooklyn_coffeeshops=brooklyn_venues.loc[brooklyn_venues['Venue Category'] == 'Coffee Shop']
brooklyn_cafe=brooklyn_venues.loc[brooklyn_venues['Venue Category'] == 'Café']
frames = [brooklyn_coffeeshops, brooklyn_cafe]
brooklyn_coffee = pd.concat(frames)
brooklyn_coffee.shape

(262, 7)

I consider that Starbucks won´t be a competitor for the new Coffee Shop because they produce American coffee that nothing has to do with French coffee. So, I drop Starbucks from the Coffee dataframe

In [77]:
starbucks = brooklyn_coffee[brooklyn_coffee['Venue'] == 'Starbucks' ]
brooklyn_coffee_competitors = brooklyn_coffee.drop(starbucks.index, axis=0)
#reset index
brooklyn_coffee_competitors.reset_index(drop=True,inplace=True)
brooklyn_coffee_competitors.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bensonhurst,40.611009,-73.99518,Caffe Romeo,40.609732,-73.989766,Coffee Shop
1,Greenpoint,40.730201,-73.954241,Homecoming,40.729696,-73.957525,Coffee Shop
2,Greenpoint,40.730201,-73.954241,Upright Coffee,40.729332,-73.953892,Coffee Shop
3,Greenpoint,40.730201,-73.954241,Maman,40.730427,-73.958035,Coffee Shop
4,Greenpoint,40.730201,-73.954241,Café de Colombia,40.730526,-73.951822,Coffee Shop


## 3. Conclusion

I would recommend to open the coffee store in one of the Neighborhoods with less coffee stores in Brooklyn. So, I need to count stores per Neighborhood


In [130]:
#Add a column count that counts frecuency of each neighborhood
brooklyn_coffee_competitors['count'] = brooklyn_coffee_competitors.groupby('Neighborhood')['Neighborhood'].transform('count')
#Get only the column I'm interested in
coffee_per_neigh=brooklyn_coffee_competitors[['Neighborhood','count']]
coffee_per_neigh.head()

Unnamed: 0,Neighborhood,count
0,Bensonhurst,2
1,Greenpoint,18
2,Greenpoint,18
3,Greenpoint,18
4,Greenpoint,18


In [139]:
#Drop duplicates
coffee_per_neigh.drop_duplicates(subset=None, keep='first', inplace=True)
coffee_per_neigh.head()
#Reset index
coffee_per_neigh.reset_index(drop=True,inplace=True)
coffee_per_neigh.head()
#Sort
coffee_per_neigh=coffee_per_neigh.sort_values(['count'], ascending=True)
coffee_per_neigh.head()

Unnamed: 0,Neighborhood,count
0,Bensonhurst,2
31,Sheepshead Bay,2
22,City Line,2
21,Ocean Hill,2
18,Borough Park,2


Find out number of stores

In [142]:
coffee_per_neigh['count'].unique()

array([ 2,  4,  6,  8, 10, 12, 16, 18])

I would recommend to open a coffee shop in those neighborhoods in which there are only 2 coffee shops. So, the recommendation list is the following:

In [143]:
#Get only Count=2
brooklyn_recommen_coffee = coffee_per_neigh[ coffee_per_neigh['count'] == 2 ]
brooklyn_recommen_coffee

Unnamed: 0,Neighborhood,count
0,Bensonhurst,2
31,Sheepshead Bay,2
22,City Line,2
21,Ocean Hill,2
18,Borough Park,2
