## Week 4 and 5 Capstone Project - The Battle of Neighborhoods

### For this project, I want to find out which city, New York City or Toronto, is more vegan friendly, so that I can recommend one over the other as a better travel destination for my vegan clients at my fictional travel agenecy business.


### NOTE: if there is a key error: ‘groups’ when running the code in the notebook, it might be because the number of calls I can make with my free foursquare account is limited and as a result the quota might have been exceeded. I tried running my code and I couldn't get the venue data and then plot the map data as a result after I reset it one too many times. Please look at my work closely as a result, and not just the output or lack of output. Thank You!



### Introduction/Business Problem:
In this project, I own a fictional travel agency, and my clients are having a hard time trying to decide between two locations: New York City and Toronto. 

Because they are finding it hard to really choose between the two, they have decided to make the decision based on their dietary restrictions. 

They are Vegetarian, and would prefer vegan/vegetarian restaurants to eat at. 

Therefore, this project aims to discover which city is more vegan friendly, thereby helping the couple choose their next destination. 

It also has the benefit of giving my clients a list of all the vegan/vegetarian restaurants in their city of choice, thus eliminating the headache of trying to find and choose vegan/vegetarian restaurants.


### Data:

Based on the definition of our problem, the factors that will influence my client's decision are:

1. The number of vegan/vegetarian restaurants in each city
2. Which city has the greater number of vegan/vegetarian restaurants.

For this analysis, I wanted to present both a list of all the restaurants in each city, as well as a map of all the restaurants in each city, to be able to properly visualize the number of vegan/vegetarian restaurants in each city and to provide a list that would make it easy for my clients to plan out their trip.

We will require the following for this analysis:

1. A list of all the neighborhoods in each city, along with the approximate latitude and longitudes.

2. A list of venues generated using the foursquare places API that will allow us to see what kinds of venue categories exist, and then narrow that down further to obtain a list of all the venues that are under the category 'Vegetarian / Vegan Restaurant'.

3. The coordinates of each city, obtained using the Google Maps APi geocoder package, which we will then use to create a base map. 

4. A map of all the venues that we filtered out earlier, displayed in such a way that will allow my clients to see the relative position on the map to be able to use as a reference when planning stops to visit on their trip, to make sure that there is a place to eat for them near any sites they want to see.





Import statements are at the top for convenience.

In [1]:
import pandas as pd 
import numpy as np 

from bs4 import BeautifulSoup
import requests 

import geocoder 
import folium
from geopy.geocoders import Nominatim

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import matplotlib.cm as cm
import matplotlib.colors as colors

### Step 1: Obtaining the neighborhood datsets for each city, which include the Borough, Neighborhood, Latitude and Longitude of each neighborhood.

For each city our approach will differ, primarily due to:
1. the location of each of the datasets
2. how much of the data is missing
3. how much preprocessing we will need to do
4. how much of the data will we need to clean and/or eliminate. 





#### New York City Neighborhood Data

For New York City, we will obtain the data by grabbing the URL for the dataset that was given to us by the course admins for the previous lab.

We will grab the raw JSON from the URL using the requests package. From the JSON, we will isolate the data that we need, which is under 'features' in the JSON dict. 

Once we have that sorted, we will create a dataframe with the columns Borough, Neighborhood, Latitude and Longitude, and then load the data using a for loop.

In [2]:
#New York

# #grab URL and store it in a URL variable
NURL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json"

print(NURL)

# #obtain raw html
nyr = requests.get(NURL).json()

nydata = nyr["features"]

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
nyneigh = pd.DataFrame(columns=column_names)    

for data in nydata:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    nyneigh = nyneigh.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

nyneigh.head()


https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### Toronto Neighborhood Data

For the Toronto Neighborhood data, our approach will differ because we are scraping data from two different sources, and will require us to merge the data at the end to obtain a similar Neighborhods table with the same columns as the New York City Neighborhood Data. 

In [3]:
#toronto

#grab URL and store it in a URL variable
URL = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

#obtain raw html
r = requests.get(URL)
#parse into beautifulsoup 
soup = BeautifulSoup(r.content) 
#find required content
table = soup.findAll('table',attrs ={'rules':'all','cellspacing':0,}) 
#convert to text
a = table[0].text

"""
beautiful soup text has a lot of newlines. split by new line, and then run through
the rest of the code to remove the newline and replace it with an empty char
"""
b = a.split("\n")
for word in b:
    word.replace("\n", "")
    word.replace("\n", "")
    word.replace("\n", "")
#create a list
c = []
#now only add stuff that isn't an empty char or does not contain Not Assigned
for word in b:
    if "Not assigned" not in word and word != "":
        c.append(word)
#create a new table to store results
table_contents=[]

for word in c:
    #dict to store values for each neighborhood
    cell = {}
    #Since each postal code is 3 digits, go till the third digit and set it as the postal code value
    cell['Postal Code'] = word[:3]
    #set a temp variable equal to the rest of the word, which has the borough and the neighborhood
    temp = word.replace(word[:3], '')
    #split by the parenthesis, and grab the first half which is the borough, set it to the borough of the cell
    cell['Borough'] = temp.split('(')[0]
    #same logic for the neighborhood, just strip all the uneeded characters
    cell['Neighborhood'] = (((((temp).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
    #append the dict to the table
    table_contents.append(cell)
#convert to trawneigh
trawneigh=pd.DataFrame(table_contents)
#handle special cases
trawneigh['Borough']=trawneigh['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

f = open("/Users/shreyasiyengar/VSC/Coursera_Capstone/Geospatial_Coordinates.csv", "r")
trawnocoords = pd.read_csv("/Users/shreyasiyengar/VSC/Coursera_Capstone/Geospatial_Coordinates.csv",",",header = 0)



trawneigh.head()
    
trawneigh = pd.merge(trawneigh, trawnocoords[['Postal Code', 'Latitude', 'Longitude']], )
trawneigh.head()
    

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


### Step 2: Obtaining the Foursquare venue tables for each city 

#### Now that we have the neighborhood tables for NYC and Toronto, we need to grab the foursquare venue tables for each city.

#### Foursquare Credential prep for API usage

In [4]:
# Foursquare Credential Prep
CLIENT_ID = 'KOGTV4EAQCBU054ZKLCRMAINCATFAHUJWLHPTAUZ4A3N33SG' # your Foursquare ID
CLIENT_SECRET = 'R0X1HHFDDDIQF5EHZHICRSD1IX2J4BWNEWP4VF3XFC43EYDP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: KOGTV4EAQCBU054ZKLCRMAINCATFAHUJWLHPTAUZ4A3N33SG
CLIENT_SECRET:R0X1HHFDDDIQF5EHZHICRSD1IX2J4BWNEWP4VF3XFC43EYDP


#### The following function, getNearbyVenues uses the foursquare credentials that we had defined earlier to make a get request to obtain all the venues for each neighborhood based on the Borough, Neighborhood, Latitude and Longitude. It will then return a data frame that we will then store for use later on.

In [5]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

    

#### Here we are calling the getNearbyVenues function twice, to get the nearby venues for each neighborhood in each city, and then storing them into two variables, trawno_venues and nyc_venues respectively for later use. The Output is the name of each neighborhood the function goes through

In [6]:
trawno_venues = getNearbyVenues(names=trawneigh['Neighborhood'],
                                   latitudes= trawneigh['Latitude'],
                                   longitudes=trawneigh['Longitude']
                                  )
nyc_venues = getNearbyVenues(names=nyneigh['Neighborhood'],
                                   latitudes= nyneigh['Latitude'],
                                   longitudes=nyneigh['Longitude']
                                  )                                 

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills North
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
The Danforth  East
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview East
The Danforth

NOTE: if there is a key error: ‘groups’ when running the code in the notebook, it might be because the number of calls I can make with my free foursquare account is limited and as a result the quota might have been exceeded.


#### First 5 venues in the trawno_venues dataset.

In [7]:
trawno_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Parkwoods,43.753259,-79.329656,GreenWin pool,43.756232,-79.333842,Pool
4,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena


#### First 5 venues in the nyc_venues dataset.

In [8]:
nyc_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop


### Step 3: Filtering the venue datasets to obtain only vegan/vegetarian restaurants in the city. 

#### We have to filter the venue datasets to obtain a list of all the vegan/vegetarian restaurants in each city.

#### First 5 Vegan/Vegetarian Venues in Toronto

In [9]:
trawveg = trawno_venues.loc[trawno_venues['Venue Category'] == 'Vegetarian / Vegan Restaurant']
trawveg.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
269,St. James Town,43.651494,-79.375418,Fresh On Front,43.647815,-79.374453,Vegetarian / Vegan Restaurant
360,Berczy Park,43.644771,-79.373306,Fresh On Front,43.647815,-79.374453,Vegetarian / Vegan Restaurant
479,Central Bay Street,43.657952,-79.387383,Vegetarian Haven,43.656016,-79.392758,Vegetarian / Vegan Restaurant
599,"Richmond, Adelaide, King",43.650571,-79.384568,Rosalinda,43.650252,-79.385156,Vegetarian / Vegan Restaurant
619,"Richmond, Adelaide, King",43.650571,-79.384568,Planta Queen,43.650622,-79.388154,Vegetarian / Vegan Restaurant


#### First 5 Vegan/Vegetarian venues in New York City

In [10]:
nyveg = nyc_venues.loc[nyc_venues['Venue Category'] == 'Vegetarian / Vegan Restaurant']
nyveg.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
657,Country Club,40.844246,-73.824099,Vegetarian Joint,40.842828,-73.825764,Vegetarian / Vegan Restaurant
1258,Greenpoint,40.730201,-73.954241,Jungle Cafe,40.730201,-73.954761,Vegetarian / Vegan Restaurant
1601,Prospect Heights,40.676822,-73.964859,Natural Blend,40.673819,-73.962923,Vegetarian / Vegan Restaurant
1724,Bushwick,40.698116,-73.925258,Hartbreakers,40.701627,-73.922853,Vegetarian / Vegan Restaurant
1725,Bushwick,40.698116,-73.925258,Sol Sips,40.699135,-73.92251,Vegetarian / Vegan Restaurant


Step 4: Mapping out the venues for each city. 

We need to map out the venues for each city, in order to provide a better visual analysis for my clients

#### We need to first use the geolocator package to obtain the geographical coordinates of each city, which we will then use to create a base map for each city. 

#### Geographic coordinates of Toronto

In [11]:
taddress = 'Toronto, ON'

geolocator = Nominatim(user_agent="trawno_explorer")
tlocation = geolocator.geocode(taddress)
tlatitude = tlocation.latitude
tlongitude = tlocation.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(tlatitude, tlongitude))


The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


#### Geographic coordinates of New York City

In [12]:
naddress = 'New York City, NY'

geolocator = Nominatim(user_agent="nyc_explorer")
nlocation = geolocator.geocode(naddress)
nlatitude = nlocation.latitude
nlongitude = nlocation.longitude
print('The geograpical coordinates of New York City are {}, {}.'.format(nlatitude, nlongitude))


The geograpical coordinates of New York City are 40.7127281, -74.0060152.


#### Generate maps for each city, and mark each venue that is in the isolated vegan vegetarian datasets

#### Map of Toronto

In [13]:
map_trawno = folium.Map(location=[tlatitude, tlongitude], zoom_start=10)

# add markers to map
for lat, lng, venue, neighborhood in zip(trawveg['Neighborhood Latitude'], trawveg['Neighborhood Longitude'], trawveg['Venue'], trawveg['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#ffffff',
        fill_opacity=0.7,
        parse_html=False).add_to(map_trawno)  
    
map_trawno

#### Map of New York City

In [14]:
 map_ny = folium.Map(location=[nlatitude, nlongitude], zoom_start=10)

# add markers to map
for lat, lng, venue, neighborhood in zip(nyveg['Neighborhood Latitude'], nyveg['Neighborhood Longitude'], nyveg['Venue'], nyveg['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#ffffff',
        fill_opacity=0.7,
        parse_html=False).add_to(map_ny)  
    
map_ny

## Analysis

Since this is a simple analysis we are only comparing the number of venues in each city to see which city is more vegan friendly. We can look at the maps as well as the overall shape of the datasets to see which city is more vegan friendly.

## Results and Discussion

Based on the number of restaurants and the map visualization of this data, the clear winner in this case is New York City, and I would recommend New York City to my clients for travelling.

When I was looking at the data, I realized that because I only focused on purely vegan/vegetarian restaurants and didn’t include restaurants which might have vegan/vegetarian options in addition to their non-vegetarian options, I might have eliminated some viable options for my clients. However, based on the sheer number of vegetarian options, I believe that even if they wanted a little more variety, there are a ton of options on the list and that NYC is very, very vegan friendly. 



## Conclusion

I can confidently recommend New York City as the better option for my clients to travel to. For the future and can use this approach to evaluate other cities for vegan friendliness and travel viability. I could also adjust the approach to check for restaurants that match the customers preferred cuisine and/or create food-based tours where the main focus is travelling and sampling famous local joints and eateries, thus making my package options very diverse, while offering a special theme that other places might not have.

