# Capstone Project: Starting a business in....
## Solomin Oleg

The Project consists of the following sections. 

#### Introduction 
where you discuss the business problem and who would be interested in this project.

#### Data 
where you describe the data that will be used to solve the problem and the source of the data.

#### Methodology section
which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.

#### Results section
where you discuss the results.

#### Discussion section
where you discuss any observations you noted and any recommendations you can make based on the results.

#### Conclusion section 
where you conclude the report.

It would be appropriate to split final task on following stages. The first stage is devoted to problem description and understanding data while the second stage will consists of methodology, explarotary analysis, results and conclusions.

## Part 1

* A description of `the problem` and a discussion of the background. 
* A description of `the data` and how it will be used to solve the problem.

## Deciding to move from Toronto to NYC

The contractor is going to move from Toronto to NYC. He's sold a famous barber-shop located in Downtown of Toronto and is looking to start a new venture.
The Contractor is open to the new ideas so he is ready to get unbiased results of the research. 

## Libraries

In [1]:
import numpy as np 

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium

print('Libraries imported.')

Libraries imported.


## Manhattan

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [7]:
NYC_neighborhoods_data = newyork_data['features']
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
NYC_neighborhoods = pd.DataFrame(columns=column_names)

In [9]:
for data in NYC_neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    NYC_neighborhoods = NYC_neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [11]:
addressNYC = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
locationNYC = geolocator.geocode(addressNYC)
latitudeNYC = locationNYC.latitude
longitudeNYC = locationNYC.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitudeNYC, longitudeNYC))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [34]:
manhattan_data = NYC_neighborhoods[NYC_neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [37]:
manhattan_data = manhattan_data[manhattan_data['Latitude'] < 40.76].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Chinatown,40.715618,-73.994279
1,Manhattan,Clinton,40.759101,-73.996119
2,Manhattan,Midtown,40.754691,-73.981669
3,Manhattan,Murray Hill,40.748303,-73.978332
4,Manhattan,Chelsea,40.744035,-74.003116


In [45]:
addressM = 'Financial District, NY'

geolocator = Nominatim(user_agent="ny_explorer")
locationM = geolocator.geocode(addressM)
latitudeM = locationM.latitude
longitudeM = locationM.longitude
print('The geograpical coordinate of Financial District NYC are {}, {}.'.format(latitudeM, longitudeM))

The geograpical coordinate of Financial District NYC are 40.7076124, -74.009378.


In [48]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitudeM, longitudeM], zoom_start=12)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

## Toronto

In [17]:
# Url to get Data from Wikipedia concerning "List of postal codes of Canada: M"
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [18]:
raw_data=pd.read_html(url)

In [19]:
df=raw_data[0]

In [20]:
df_adj=df.dropna(axis=0)

In [21]:
url_coord = 'http://cocl.us/Geospatial_data'

coord=pd.read_csv(url_coord)

coord_adj = coord.rename(columns={'Postal Code' : 'Postal code'})

In [22]:
T_neighborhoods=pd.merge(df_adj,coord_adj, on='Postal code')

In [23]:
T_neighborhoods.drop(columns=['Postal code'], inplace = True)

In [30]:
downtown_data = T_neighborhoods[T_neighborhoods['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Downtown Toronto,Regent Park / Harbourfront,43.65426,-79.360636
1,Downtown Toronto,Queen's Park / Ontario Provincial Government,43.662301,-79.389494
2,Downtown Toronto,Garden District / Ryerson,43.657162,-79.378937
3,Downtown Toronto,St. James Town,43.651494,-79.375418
4,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [25]:
addressT = 'Downtown Toronto, CA'

geolocator = Nominatim(user_agent="ny_explorer")
locationT = geolocator.geocode(addressT)
latitudeT = locationT.latitude
longitudeT = locationT.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitudeT, longitudeT))

The geograpical coordinate of Downtown Toronto are 43.6563221, -79.3809161.


In [28]:
# create map of Downtown Toronto using latitude and longitude values
map_downtown = folium.Map(location=[latitudeT, longitudeT], zoom_start=13)

# add markers to map
for lat, lng, label in zip(downtown_data['Latitude'], downtown_data['Longitude'], downtown_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

## Explore neighboorhoods in Manhattan and Downtown Toronto

Define Foursquare Credentials and Version

In [50]:
CLIENT_ID = '5YCHHU4TOBAQ2PF1DEJFXUMNUS4KRQOE4T25G4JSFEYDPP4J' # your Foursquare ID
CLIENT_SECRET = 'IC1QEDFWDPDT24Q1GG5RSYMIQWH5CACYS1VWW24U3SWUWUEC' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5YCHHU4TOBAQ2PF1DEJFXUMNUS4KRQOE4T25G4JSFEYDPP4J
CLIENT_SECRET:IC1QEDFWDPDT24Q1GG5RSYMIQWH5CACYS1VWW24U3SWUWUEC


In [53]:
def getNearbyVenues(names, latitudes, longitudes, LIMIT=100, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [54]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Chinatown
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Gramercy
Battery Park City
Financial District
Noho
Civic Center
Midtown South
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [55]:
downtown_venues = getNearbyVenues(names=downtown_data['Neighborhood'],
                                   latitudes=downtown_data['Latitude'],
                                   longitudes=downtown_data['Longitude']
                                  )

Regent Park / Harbourfront
Queen's Park / Ontario Provincial Government
Garden District / Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond / Adelaide / King
Harbourfront East / Union Station / Toronto Islands
Toronto Dominion Centre / Design Exchange
Commerce Court / Victoria Hotel
University of Toronto / Harbord
Kensington Market / Chinatown / Grange Park
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport
Rosedale
Stn A PO Boxes
St. James Town / Cabbagetown
First Canadian Place / Underground city
Church and Wellesley


In [58]:
print(manhattan_venues.shape)
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))
manhattan_venues.head()

(1885, 7)
There are 270 uniques categories.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Chinatown,40.715618,-73.994279,Cheeky Sandwiches,40.715821,-73.99183,Sandwich Place
1,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,Greek Restaurant
2,Chinatown,40.715618,-73.994279,Hotel 50 Bowery NYC,40.715936,-73.996789,Hotel
3,Chinatown,40.715618,-73.994279,Renew Day Spa,40.715559,-73.996747,Spa
4,Chinatown,40.715618,-73.994279,Michaeli Bakery,40.714704,-73.991847,Bakery


In [73]:
barberM = manhattan_venues[manhattan_venues['Venue Category'].str.contains("Barber")]
print('There are {} barbershops.'.format(len(barberM['Venue Category'])))

There are 16 barbershops.


In [59]:
print(downtown_venues.shape)
print('There are {} uniques categories.'.format(len(downtown_venues['Venue Category'].unique())))
downtown_venues.head()

(1225, 7)
There are 204 uniques categories.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Regent Park / Harbourfront,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,Regent Park / Harbourfront,43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Regent Park / Harbourfront,43.65426,-79.360636,Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,Regent Park / Harbourfront,43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
4,Regent Park / Harbourfront,43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa


In [74]:
barberT = downtown_venues[downtown_venues['Venue Category'].str.contains("Barber")]
print('There are {} barbershops.'.format(len(barberT['Venue Category'])))

There are 6 barbershops.
