# Description of the problem and a Discussion of the background

Famous Indian restaurant in Newyork is planning to open their branch in Toronto. They approached us to find a best location in Toronto where the branch can be opened. As Toronto already got many Indian restaurant, it's very important to find a spot which is 

* Similar to the current location in Newyork
* Not having much Indian restaurants

# Description of the data and How it will be used to solve the problem

Newyork data will be downloaded from the following site and cleanedup for this project. 

https://cocl.us/new_york_dataset

For Toronto, web scrapping will be done to extract the data from the following site

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Once we have the data available, the following approach will be used to solve the problem

* Toronto data will be used first to assess the current restaurant location and the amenities available within 500 meters and set this as a base line for the future location in Newyork
* With the help of Newyork data, we will come up with nice neighbourhoods which is quite similar with the current Toronto neighborhood, but not infested much with Indian restaurants.
* Foursquare data will be used for segmentation and KClustering will be used to bucket the neighbourhood which shows similar behaviour

Once the analysis is carried out, the report will be generated and provided to the client with the following information.

Best top 3 locations in Newyork which shows quite similar structure to current restaurant location in Toronto, but not having more Indian restaurants in those locations, which is a must criteria from the client for this new location selection

In [158]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset

# Loading Newyork Data

In [166]:
# Reading the json as a dict
import json

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    


In [167]:
newyork_data['features'][0]['properties']

{'annoangle': 0.0,
 'annoline1': 'Wakefield',
 'annoline2': None,
 'annoline3': None,
 'bbox': [-73.84720052054902,
  40.89470517661,
  -73.84720052054902,
  40.89470517661],
 'borough': 'Bronx',
 'name': 'Wakefield',
 'stacked': 1}

In [168]:
columns = ['Borough','Neighborhood','Lat','Lon']
nyc_df = pd.DataFrame(columns=columns)


In [169]:
for data in newyork_data['features']:
    borough = data['properties']['borough']
    neighbour = data['properties']['name']
    lat = data['properties']['bbox'][1]
    lon = data['properties']['bbox'][0]
    nyc_df = nyc_df.append(
        {'Borough':borough,
         'Neighborhood':neighbour,
         'Lat':lat,
         'Lon':lon   
        },ignore_index=True
    )


In [170]:
nyc_df.head()

Unnamed: 0,Borough,Neighborhood,Lat,Lon
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [145]:
!conda install -c conda-forge folium=0.5.0 --yes 

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge


In [171]:
import folium

In [172]:
latitude =40.730610
longitude = -73.935242
map_nyc = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat,lan,borough in zip(nyc_df.Lat,nyc_df.Lon,nyc_df.Borough):
    
    label = '{}, {}, {}'.format(lat,lan,borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lan],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_nyc)
map_nyc

In [None]:
-