### Week4 Peer graded assignment

Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve. If you cannot think of an idea or a problem, here are some ideas to get you started:

In Module 3, we explored New York City and the city of Toronto and segmented and clustered their neighborhoods. Both cities are very diverse and are the financial capitals of their respective countries. One interesting idea would be to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. 
* Is New York City more like Toronto or Paris or some other multicultural city? I will leave it to you to refine this idea.
* In a city of your choice, if someone is looking to open a restaurant, where would you recommend that they open it? Similarly, if a contractor is trying to start their own business, where would you recommend that they setup their office?
These are just a couple of many ideas and problems that can be solved using location data in addition to other datasets. No matter what you decide to do, make sure to provide sufficient justification of why you think what you want to do or solve is important and why would a client or a group of people be interested in your project.


#### For this week, you will required to submit the following:

* A description of the problem and a discussion of the background. (15 marks)
* A description of the data and how it will be used to solve the problem. (15 marks)
For the second week, the final deliverables of the project will be:

* A link to your Notebook on your Github repository, showing your code. (15 marks)
* A full report consisting of all of the following components (15 marks):

* Introduction where you discuss the business problem and who would be interested in this project.
* Data where you describe the data that will be used to solve the problem and the source of the data.
* Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
* Results section where you discuss the results.
* Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
* Conclusion section where you conclude the report.
* Your choice of a presentation or blogpost. (10 marks)

### Data description
I have decided to compare Toronto and Cape Town data in the final project. I have already scraped the data for Cape Town in https://github.com/pieterdt1979/coursera_capstone/blob/master/Cape_Town_Data.ipynb and saved the data to a CSV so it can loaded easily again. The one city is in a developed country, where is the other is in a developing country.
The average GDP per capita in Canada is about 51000 USD as per https://tradingeconomics.com/canada/gdp-per-capita
The average GDP per capita for South-Africa is 7000 USD as per https://tradingeconomics.com/south-africa/gdp-per-capita 
So it would be nice to compare these two cities
I will also be importing historical economic data from tradingeconomics.com for both countries

In [14]:
# Importing all the modules that will be needed
# !pip3 install tradingeconomics
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.style.use('ggplot')
import tradingeconomics as te # This will be used to pull economic data for the two countries
from requests import get  # To scrape cost of living as well from:
                         # https://www.expatistan.com/cost-of-living/toronto?currency=ZAR
                        # https://www.expatistan.com/cost-of-living/cape-town?currency=CAD
import tradingeconomics as te
import folium
from sklearn.cluster import KMeans
import json

In [48]:
# Load the previously scraped data
toronto_data = pd.read_csv('toronto_data.csv')
capetown_data = pd.read_csv('cape_town_data.csv')
newyork_data = pd.read_csv('newyork_data.csv')
newyork_data.drop(columns=['Unnamed: 0'], inplace=True)
newyork_data.rename(columns={'Neighborhood':'Neighbourhood'}, inplace=True) # In South-Africa we speak a different type of English :)
toronto_data = toronto_data[['Borough', 'Neighbourhood', 'Latitude', 'Longitude']]
capetown_data = capetown_data[['Borough', 'Neighbourhood', 'Latitude', 'Longitude']]
print(toronto_data.head(1))
print('_________________________________________________________________________')
print(newyork_data.head(1))
print('_________________________________________________________________________')
print(capetown_data.head(1))

       Borough   Neighbourhood   Latitude  Longitude
0  Scarborough  Rouge, Malvern  43.806686 -79.194353
_________________________________________________________________________
  Borough Neighbourhood   Latitude  Longitude
0   Bronx     Wakefield  40.894705 -73.847201
_________________________________________________________________________
      Borough Neighbourhood   Latitude  Longitude
0  Cape Flats         Delft -33.965556  18.644444


In [119]:
#Function to get the nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=5000, categoryids=[]):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        if len(categoryids) > 0:
            url = url + '&categoryId={}'.format(','.join(categoryids))
            
        # make the GET request
        results = get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        try:    
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
        except:
            pass
    try:
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighbourhood', 
              'Neighbourhood Latitude', 
              'Neighbourhood Longitude', 
              'Venue', 
              'Venue Latitude', 
              'Venue Longitude', 
              'Venue Category']
    except:
        nearby_venues = pd.DataFrame(columns=['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category'])

    
    return(nearby_venues)

In [32]:
CLIENT_ID = 'LH2OJQC5DUE5Y3TQZVD3DXDH2A3ASD5MANEVAKEMGDJN0HWT' # your Foursquare ID
CLIENT_SECRET = 'YRM0HCKVDAG5GJU1JYCKPMUWJD0IZUVF1N0ZY3GRKLC3MXT3' # your Foursquare Secret
VERSION = '20191005' # Foursquare API version
LIMIT = 200

In [120]:
african = '4bf58dd8d48988d1c8941735' # We will only be looking for the African Restaurant Categories
bakeries = '4bf58dd8d48988d16a941735' # And also check for the suppliers in the areas
butchers = '4bf58dd8d48988d11d951735'
farmersmarkets = '4bf58dd8d48988d1fa941735'
categories = [african, bakeries, butchers, farmersmarkets]
toronto_neighbourhoods = toronto_data[toronto_data['Borough'] == 'Central Toronto'].reset_index(drop=True)
capetown_neighbourhoods = capetown_data[capetown_data['Borough'] == 'Atlantic Seaboard'].reset_index(drop=True)
newyork_neighbourhoods = newyork_data[newyork_data['Borough'] == 'Manhattan'].reset_index(drop=True)
central_toronto_venues = getNearbyVenues(names=toronto_neighbourhoods['Neighbourhood'], 
                                             latitudes=toronto_neighbourhoods['Latitude'],
                                             longitudes=toronto_neighbourhoods['Longitude'],
                                             radius=3000, categoryids=categories)
cpt_atlantic_venues = getNearbyVenues(names=capetown_neighbourhoods['Neighbourhood'], 
                                             latitudes=capetown_neighbourhoods['Latitude'],
                                             longitudes=capetown_neighbourhoods['Longitude'],
                                             radius=6000, categoryids=categories) #Not as densly populated as the other two cities
ny_manhattan_venues = getNearbyVenues(names=newyork_neighbourhoods['Neighbourhood'], 
                                             latitudes=newyork_neighbourhoods['Latitude'],
                                             longitudes=newyork_neighbourhoods['Longitude'],
                                             radius=1000, categoryids=categories) # Using a smaller radius for NY as it is more densly populated than the other 2 cities

Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park, Summerhill East
Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West
Roselawn
Forest Hill North, Forest Hill West
The Annex, North Midtown, Yorkville
Hout Bay, Imizamo Yethu, Llandudno
Bakoven, Bantry Bay, Camps Bay, Clifton, Fresnaye, Green Point, Mouille Point, Sea Point, Three Anchor Bay
Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [121]:
print('There are {} uniques categories for Central Toronto.'.format(len(central_toronto_venues['Venue Category'].unique())))
print('There are {} uniques categories for Manhattan.'.format(len(ny_manhattan_venues['Venue Category'].unique())))
print('There are {} uniques categories for Cape Town.'.format(len(cpt_atlantic_venues['Venue Category'].unique())))

There are 15 uniques categories for Central Toronto.
There are 37 uniques categories for Manhattan.
There are 16 uniques categories for Cape Town.


In [122]:
# Encode the data for each City
# one hot encoding
toronto_onehot = pd.get_dummies(central_toronto_venues[['Venue Category']], prefix="", prefix_sep="")
manhattan_onehot = pd.get_dummies(ny_manhattan_venues[['Venue Category']], prefix="", prefix_sep="")
cpt_onehot = pd.get_dummies(cpt_atlantic_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighbourhood column back to dataframe
toronto_onehot['Neighbourhood'] = central_toronto_venues['Neighbourhood'] 
manhattan_onehot['Neighbourhood'] = ny_manhattan_venues['Neighbourhood'] 
cpt_onehot['Neighbourhood'] = cpt_atlantic_venues['Neighbourhood'] 
# move neighbourhood column to the first column
tor_fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[tor_fixed_columns]
ny_fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[ny_fixed_columns]
cpt_fixed_columns = [cpt_onehot.columns[-1]] + list(cpt_onehot.columns[:-1])
cpt_onehot = cpt_onehot[cpt_fixed_columns]
print(toronto_onehot.head(1))
print(manhattan_onehot.head(1))
print(cpt_onehot.head(1))

   Neighbourhood  African Restaurant  Bakery  Butcher  Chocolate Shop  \
0  Lawrence Park                   0       1        0               0   

   Ethiopian Restaurant  Farmers Market  Food & Drink Shop  Food Court  \
0                     0               0                  0           0   

   Grocery Store  Hardware Store  Market  Portuguese Restaurant  Restaurant  \
0              0               0       0                      0           0   

   Supermarket  Tea Room  
0            0         0  
  Neighbourhood  African Restaurant  Asian Restaurant  BBQ Joint  Bakery  Bar  \
0   Marble Hill                   0                 0          0       1    0   

   Bookstore  Butcher  Café  Coffee Shop  ...  Pub  Sandwich Place  School  \
0          0        0     0            0  ...    0               0       0   

   Shopping Mall  Steakhouse  Supermarket  Tea Room  Vietnamese Restaurant  \
0              0           0            0         0                      0   

   Wine Bar  W

In [123]:
print('Toronto shape: ',toronto_onehot.shape)
print('Manhattan shape: ',manhattan_onehot.shape)
print('Cape Town shape: ', cpt_onehot.shape)

Toronto shape:  (436, 16)
Manhattan shape:  (2163, 38)
Cape Town shape:  (49, 17)


In [126]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
manhattan_grouped = manhattan_onehot.groupby('Neighbourhood').mean().reset_index()
cpt_grouped = cpt_onehot.groupby('Neighbourhood').mean().reset_index()

Unnamed: 0,Neighbourhood,African Restaurant,Bakery,Bar,Butcher,Café,Coffee Shop,Deli / Bodega,Ethiopian Restaurant,Flea Market,Gastropub,Grocery Store,Hotel,Indian Restaurant,Malay Restaurant,Market,Restaurant
0,"Bakoven, Bantry Bay, Camps Bay, Clifton, Fresn...",0.222222,0.377778,0.022222,0.022222,0.066667,0.022222,0.022222,0.044444,0.022222,0.022222,0.022222,0.044444,0.022222,0.022222,0.022222,0.022222
1,"Hout Bay, Imizamo Yethu, Llandudno",0.75,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
