# Description of the problem and a Discussion of the background

Famous Indian restaurant in Newyork is planning to open their branch in Toronto. They approached us to find a best location in Toronto where the branch can be opened. As Toronto already got many Indian restaurant, it's very important to find a spot which is 

* Similar to the current location in Newyork
* Not having much Indian restaurants

# Description of the data and How it will be used to solve the problem

Newyork data will be downloaded from the following site and cleanedup for this project. 

https://cocl.us/new_york_dataset

For Toronto, web scrapping will be done to extract the data from the following site

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Once we have the data available, the following approach will be used to solve the problem

* Toronto data will be used first to assess the current restaurant location and the amenities available within 500 meters and set this as a base line for the future location in Newyork
* With the help of Newyork data, we will come up with nice neighbourhoods which is quite similar with the current Toronto neighborhood, but not infested much with Indian restaurants.
* Foursquare data will be used for segmentation and KClustering will be used to bucket the neighbourhood which shows similar behaviour

Once the analysis is carried out, the report will be generated and provided to the client with the following information.

Best top 3 locations in Newyork which shows quite similar structure to current restaurant location in Toronto, but not having more Indian restaurants in those locations, which is a must criteria from the client for this new location selection

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset

# Loading Newyork Data

In [2]:
# Reading the json as a dict
import json

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    


In [3]:
newyork_data['features'][0]['properties']

{'annoangle': 0.0,
 'annoline1': 'Wakefield',
 'annoline2': None,
 'annoline3': None,
 'bbox': [-73.84720052054902,
  40.89470517661,
  -73.84720052054902,
  40.89470517661],
 'borough': 'Bronx',
 'name': 'Wakefield',
 'stacked': 1}

In [4]:
columns = ['Borough','Neighborhood','Lat','Lon']
nyc_df = pd.DataFrame(columns=columns)


In [5]:
for data in newyork_data['features']:
    borough = data['properties']['borough']
    neighbour = data['properties']['name']
    lat = data['properties']['bbox'][1]
    lon = data['properties']['bbox'][0]
    nyc_df = nyc_df.append(
        {'Borough':borough,
         'Neighborhood':neighbour,
         'Lat':lat,
         'Lon':lon   
        },ignore_index=True
    )


In [6]:
nyc_df.head()

Unnamed: 0,Borough,Neighborhood,Lat,Lon
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [7]:
!conda install -c conda-forge folium=0.5.0 --yes 

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  32.35 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  36.91 MB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00  40.13 MB/s
folium-0.5.0-p 100% |################################| Time: 0:00:00  44.89 MB/s


In [8]:
import folium

In [9]:
latitude =40.730610
longitude = -73.935242
map_nyc = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat,lan,borough in zip(nyc_df.Lat,nyc_df.Lon,nyc_df.Borough):
    
    label = '{}, {}, {}'.format(lat,lan,borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lan],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_nyc)
map_nyc

In [10]:
rest_lan = 40.7826825671257
rest_lon = -73.95325646837112

In [11]:
CLIENT_ID = 'IYUYZGQ1MKKUXRYJVYPIDLZ5OHJ0FZH0EW43ZDS554AJCIUB' # your Foursquare ID
CLIENT_SECRET = '3X3UA3W0VYCYVLM1LD2KP03E2J1RI4YL3BVGD0SYEWVTWFOM' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: IYUYZGQ1MKKUXRYJVYPIDLZ5OHJ0FZH0EW43ZDS554AJCIUB
CLIENT_SECRET:3X3UA3W0VYCYVLM1LD2KP03E2J1RI4YL3BVGD0SYEWVTWFOM


In [12]:
def get_100_venues(borough_latitude,borough_longitude):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    radius = 500 # define radius
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    borough_latitude, 
    borough_longitude, 
    radius, 
    LIMIT)
    return url

In [13]:
url =get_100_venues(rest_lan,rest_lon)
url


'https://api.foursquare.com/v2/venues/explore?&client_id=IYUYZGQ1MKKUXRYJVYPIDLZ5OHJ0FZH0EW43ZDS554AJCIUB&client_secret=3X3UA3W0VYCYVLM1LD2KP03E2J1RI4YL3BVGD0SYEWVTWFOM&v=20180604&ll=40.7826825671257,-73.95325646837112&radius=500&limit=100'

In [14]:
import requests
results = requests.get(url).json()


In [15]:
results['response']['groups'][0]['items']

[{'reasons': {'count': 0,
   'items': [{'reasonName': 'globalInteractionReason',
     'summary': 'This spot is popular',
     'type': 'general'}]},
  'referralId': 'e-0-4aeca8edf964a52002ca21e3-0',
  'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/bookstore_',
      'suffix': '.png'},
     'id': '4bf58dd8d48988d114951735',
     'name': 'Bookstore',
     'pluralName': 'Bookstores',
     'primary': True,
     'shortName': 'Bookstore'}],
   'id': '4aeca8edf964a52002ca21e3',
   'location': {'address': '1435 Lexington Ave',
    'cc': 'US',
    'city': 'New York',
    'country': 'United States',
    'crossStreet': 'at E 93rd St',
    'distance': 196,
    'formattedAddress': ['1435 Lexington Ave (at E 93rd St)',
     'New York, NY 10128',
     'United States'],
    'labeledLatLngs': [{'label': 'display',
      'lat': 40.784225883561795,
      'lng': -73.95213507194228}],
    'lat': 40.784225883561795,
    'lng': -73.95213507194228,
    'postalCode': 

In [21]:
for count,data in enumerate(results['response']['groups'][0]['items']):
    
   
        print ("{}-{}-{}".format(count,data['venue']['categories'][0]['shortName'],data['venue']))
 
    
        


0-Bookstore-{'name': 'Kitchen Arts & Letters', 'photos': {'count': 0, 'groups': []}, 'id': '4aeca8edf964a52002ca21e3', 'location': {'postalCode': '10128', 'lat': 40.784225883561795, 'city': 'New York', 'country': 'United States', 'distance': 196, 'address': '1435 Lexington Ave', 'formattedAddress': ['1435 Lexington Ave (at E 93rd St)', 'New York, NY 10128', 'United States'], 'state': 'NY', 'crossStreet': 'at E 93rd St', 'labeledLatLngs': [{'lng': -73.95213507194228, 'lat': 40.784225883561795, 'label': 'display'}], 'lng': -73.95213507194228, 'cc': 'US'}, 'venuePage': {'id': '402546699'}, 'categories': [{'name': 'Bookstore', 'pluralName': 'Bookstores', 'id': '4bf58dd8d48988d114951735', 'shortName': 'Bookstore', 'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/bookstore_', 'suffix': '.png'}, 'primary': True}]}
1-Wine Bar-{'name': 'Kaia Wine Bar', 'photos': {'count': 0, 'groups': []}, 'delivery': {'id': '320292', 'url': 'https://www.seamless.com/menu/kaia-wine-bar-1614-3rd-

In [100]:
rest_lan=13.067
rest_lon=80.237
cat_url = 'https://api.foursquare.com/v2/venues/search?&categoryId=4bf58dd8d48988d10f941735&client_id={}&client_secret={}&v={}&ll={},{}'.format(CLIENT_ID,CLIENT_SECRET,VERSION,rest_lan,rest_lon)
cat_res = requests.get(cat_url).json()

In [101]:
cat_res

{'meta': {'code': 200, 'requestId': '5c81e5e1351e3d13a56e7041'},
 'response': {'confident': False,
  'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/indian_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d10f941735',
      'name': 'Indian Restaurant',
      'pluralName': 'Indian Restaurants',
      'primary': True,
      'shortName': 'Indian'}],
    'hasPerk': False,
    'id': '4bb9942653649c74a81d48fb',
    'location': {'address': 'Nungambakkam High Rd',
     'cc': 'IN',
     'city': 'Chennai',
     'country': 'India',
     'distance': 789,
     'formattedAddress': ['Nungambakkam High Rd',
      'Chennai 600034',
      'Tamil Nadu',
      'India'],
     'labeledLatLngs': [{'label': 'display',
       'lat': 13.063903633822777,
       'lng': 80.24354758362048}],
     'lat': 13.063903633822777,
     'lng': 80.24354758362048,
     'postalCode': '600034',
     'state': 'Tamil Nadu'},
    'name': 'Apoorva Sangeetha',
    'referralId': 'v-1

In [102]:
for item in cat_res['response']['venues']:
    print ("{} - {},{}".format(item['name'],item['location']['lat'],item['location']['lng']))

Apoorva Sangeetha - 13.063903633822777,80.24354758362048
Southern Spices - 12.979273840830421,80.26465903108732
Saravana Mess - 13.054053584228644,80.23774969315136
Parkway Inn - 12.823802,80.23078
Madras Pavilion - 13.01111721192126,80.22041892204294
Kappa Chakka Kandhari - 13.063702,80.247786
Sangeetha Restaurant - 12.852087,80.225945
Nair Mess - 13.064454532787801,80.27756034842496
Hotel Saravana Bhavan - 12.996520376888332,80.19022398979648
Gangotree - 13.0472419839075,80.25453810828202
Hotel Saravana Bhavan - 13.068072337369237,80.27122564026403
Ratna cafe - 13.058739850625004,80.2741698052978
Saravana Bhavan - 13.00766,80.25952
Ghumaghumalu Andhra Mess - 12.840474398841922,80.22735239340072
Sukkkubai Beef Biryani Shop - 12.998769171130256,80.20138073942545
Hotel Saravana Bhavan - 13.044409912548184,80.26412494093498
Salem RR Biryani - 13.025442172283624,80.17587925268884
Saravana Bhavan - 13.085040465835968,80.21034251172765
Parambriym - 13.128321647644043,80.21631622314453
Murug

In [107]:
ind_columns = ['Restaurant Name','Lat','Lon','Rating']
ind_rest_df = pd.DataFrame(columns=ind_columns)
ind_rest_df

Unnamed: 0,Restaurant Name,Lat,Lon,Rating


In [108]:
for item in cat_res['response']['venues']:
    a =item['name']
    b=item['location']['lat']
    c=item['location']['lng']
    ind_rest_df = ind_rest_df.append({'Restaurant Name':a,'Lat':b,'Lon':c},ignore_index=True)
ind_rest_df.head()

Unnamed: 0,Restaurant Name,Lat,Lon,Rating
0,Apoorva Sangeetha,13.063904,80.243548,
1,Southern Spices,12.979274,80.264659,
2,Saravana Mess,13.054054,80.23775,
3,Parkway Inn,12.823802,80.23078,
4,Madras Pavilion,13.011117,80.220419,


In [105]:
latitude =13.067
longitude = 80.243
map_ind_res = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat,lan,name in zip(ind_rest_df.Lat,ind_rest_df.Lon,ind_rest_df['Restaurant Name']):
    
    label = '{}, {}, {}'.format(lat,lan,name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lan],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ind_res)
map_ind_res

In [113]:
for item in cat_res['response']['venues']:
    #print (item['id'])
    url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(item['id'],CLIENT_ID,CLIENT_SECRET,VERSION)
    #print (url)
    resp = requests.get(url).json()
    try:
        print ("{} is {} and rating is {}" .format(resp['response']['venue']['name'],resp['response']['venue']['price']['message'],resp['response']['venue']['rating']))
        ind_rest_df = ind_rest_df.append({'Rating':'{}'.format(resp['response']['venue']['rating'])},ignore_index=True)
    except:
        ind_rest_df = ind_rest_df.append({'Rating':'Not Available'},ignore_index=True)
        print ("No price info for {}".format(resp['response']['venue']['name']))

Apoorva Sangeetha is Moderate and rating is 7.8
Southern Spices is Moderate and rating is 5.8
No price info for Saravana Mess
No price info for Parkway Inn
Madras Pavilion is Moderate and rating is 8.5
No price info for Kappa Chakka Kandhari
No price info for Sangeetha Restaurant
Nair Mess is Moderate and rating is 8.1
No price info for Hotel Saravana Bhavan
Gangotree is Moderate and rating is 7.5
Hotel Saravana Bhavan is Moderate and rating is 6.7
Ratna cafe is Moderate and rating is 7.9
No price info for Saravana Bhavan
Ghumaghumalu Andhra Mess is Moderate and rating is 7.7
Sukkkubai Beef Biryani Shop is Moderate and rating is 7.5
Hotel Saravana Bhavan is Moderate and rating is 7.8
Salem RR Biryani is Moderate and rating is 5.8
Saravana Bhavan is Moderate and rating is 6.4
No price info for Parambriym
No price info for Murugan Idli Shop
Hotel Saravana Bhavan is Moderate and rating is 6.3
No price info for The Grand Sweets & Snacks
No price info for Hotel Saravana Bhavan
No price info

In [114]:
ind_rest_df

Unnamed: 0,Restaurant Name,Lat,Lon,Rating
0,Apoorva Sangeetha,13.063904,80.243548,
1,Southern Spices,12.979274,80.264659,
2,Saravana Mess,13.054054,80.237750,
3,Parkway Inn,12.823802,80.230780,
4,Madras Pavilion,13.011117,80.220419,
5,Kappa Chakka Kandhari,13.063702,80.247786,
6,Sangeetha Restaurant,12.852087,80.225945,
7,Nair Mess,13.064455,80.277560,
8,Hotel Saravana Bhavan,12.996520,80.190224,
9,Gangotree,13.047242,80.254538,
