# Capstone Project: The Battle of the Neighbourhoods (Week 1)
### Corurse 9: Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)

## Introduction: Business Problem <a name="introduction"></a>

In this project, we will try to build a model to find the optimal neighbourhood for openning a new business. As an example we will specify the business type to be an **Italian restaurant**.

Since London is huge city with a lot of restraunts, we will try to identify the optimal neighbourhood to open a new Italian restaurant. We will try to identify the optimal neighborhood based on:
* Less number of **high rating** Italian restraunts.
* Less number of Italian restraunts.
* Less number of restraunts.


## Data <a name="data"></a>

Based on teh description of the [Business Problem](#introduction), we will need to get data of:
* List of neighbourhoods in London with their latitudes, longitudes and area.
* Number of restraunts (any type) in each neighbourhood.
* Number of Italian restraunts in the neighbourhood.
* Rating of each Italian restraunt in the neighbourhood.

We will extract the required data as follows:
* We will get the list of neighbourhoods in London and their geo-location from [Wikipedia page](#https://en.wikipedia.org/wiki/List_of_London_boroughs).
* We will get the restraunts information for each neighbourhood from **Foursquare API**.
* Also, for the map visualization we will get the location of London using **geopy API**.

#### Assumptions

First, lets import the required libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import re #Library to read config file

import geocoder

import folium

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

import configparser #Library to read config file

import json

import requests

import math

### Neighbourhood Data
We will retrieve the list of London neighbourhoods from Wikipedia page [List of London boroughs](#https://en.wikipedia.org/wiki/List_of_London_boroughs) which contain London neighbourhoods and their Geo-Location information and other irrelavant information which will be excluded.

In [2]:
# Funtion to calculate area in meter square from mile square
def convert_mi2_m2(row) :
    mi_2 = row["Area (sq mi)"]
    m_2 = mi_2 * 2590000
    return m_2

# function to calculate Circle radius in meteres from meter square area
def convert_area_radius(row) :
    area = row["Area"]
    radius = math.sqrt((area / math.pi))
    return radius

# read html tables from the Wikipedia page
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_London_boroughs') 
print('List of London boroughs loaded')

# we get 2 dataframes so we concat them into one dataframe 
df = pd.concat([dfs[0], dfs[1]])

# Caclculate Area and radius in metric measurements
df["Area"] = df.apply(lambda row : convert_mi2_m2(row), axis=1)
df["Area-Radius"] = df.apply(lambda row : convert_area_radius(row), axis=1)

# extract only the columns we are interested in
london_df = df[['Borough', 'Co-ordinates', 'Area', 'Area-Radius']].reset_index(drop=True)
london_df.rename(columns={'Borough' : 'Neighbourhood'}, inplace = True)

# clean Borough from [note 1]
note_regex = re.compile(r"\[note \d]")
london_df['Neighbourhood'] = london_df['Neighbourhood'].str.replace(pat=note_regex, repl='', regex=True)

#Split the Co-ordinates column into 2 columns for DMS and Decimal geo-formats
london_df[['Co-ordinates_DMS', 'Co-ordinates_DEC']] = london_df['Co-ordinates'].str.split(pat=' / ', expand=True)
london_df[['Latitude', 'Longitude']] = london_df['Co-ordinates_DEC'].str.split(expand=True)

# extract and clean Latitude and Longitude from Co-ordinates_DEC
def clean_dec(dec_str):
    
    sign = -1 if re.search('[swSW]', dec_str) else 1
    dec_str = re.sub(r'°.', '', dec_str)
    dec_str = re.sub(' ', '', dec_str)
    dec_str = re.sub(u'\ufeff', '', dec_str)
    
    return sign * (float(dec_str))

london_df['Latitude'] = london_df['Latitude'].apply(clean_dec)
london_df['Longitude'] = london_df['Longitude'].apply(clean_dec)

# extract only needed columns
london_df = london_df[['Neighbourhood', 'Latitude', 'Longitude', 'Area', 'Area-Radius']]

london_df.head()


List of London boroughs loaded


Unnamed: 0,Neighbourhood,Latitude,Longitude,Area,Area-Radius
0,Barking and Dagenham,51.5607,0.1557,36078700.0,3388.835625
1,Barnet,51.6252,-0.1517,86739100.0,5254.513588
2,Bexley,51.4549,0.1505,60554200.0,4390.330342
3,Brent,51.5588,-0.2817,43253000.0,3710.506368
4,Bromley,51.4039,0.0198,150142300.0,6913.1598


Each neighborhood in London has different area. We already consider/assume the Longitude and Latitude we got from Wikipedia to be the center of each neighborhood. 

As we are going to provide the **radius** in our Foursquare APIs calls, we associated each neighborhood to a radius value of a circle whose area equals the neighborhood area.  

### London City Location

Now, le's get the location of London using geopy API.

In [3]:
def get_geo_location(addr) :
    geolocator = Nominatim(user_agent="uk_explorer")
    location = geolocator.geocode(addr)
    latitude = location.latitude
    longitude = location.longitude
    #print('The geograpical coordinate of London City are {}, {}.'.format(latitude, longitude))
    return [latitude, longitude]

address = 'London, UK'
london_location = get_geo_location(address)
print('Coordinate of {}: {}'.format(address, london_location))

Coordinate of London, UK: [51.5073219, -0.1276474]


Now we can visualize the London maps and its neighborhood. Each neighborhood center will be surrounded with circls of radius proportional to its area.

In [5]:


map_london = folium.Map(location=london_location, zoom_start=10)


folium.CircleMarker(
    london_location,
    radius=3,
    color='red',
    popup=address,
    fill=True,
    fill_color='red',
    fill_opacity=0.6
).add_to(map_london)

for lat, lng, radius, label in zip(london_df.Latitude, london_df.Longitude, london_df['Area-Radius'] , london_df.Neighbourhood) :
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(map_london)

    folium.Circle(
        [lat, lng],
        radius=radius,
        color='Yellow'
    ).add_to(map_london)

map_london

### Restraunts Data
#### Foursquare

In [None]:
# Read Foursquare authentication info from hidden config file
config = configparser.ConfigParser()
config.read('secrets.cfg')
CLIENT_ID = config['4square_personal']['CLIENT_ID']
CLIENT_SECRET = config['4square_personal']['CLIENT_SECRET']
VERSION = '20201103' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

REQUEST_DEFAULT_PARAMS = dict(
    client_id = CLIENT_ID,
    client_secret = CLIENT_SECRET,
    v=VERSION
)




In [None]:
def get_food_category_config() :
    URL_4SQU_CATEGORIES = 'https://api.foursquare.com/v2/venues/categories'

    categories_params = REQUEST_DEFAULT_PARAMS.copy()

    results = requests.get(url=URL_4SQU_CATEGORIES, params=categories_params).json()["response"]["categories"]
    type(results)
    FOOD_CAT_ID = ''
    FOOD_CAT_Name = 'Food'

    index = -1
    for cat in results :
        index += 1
        if cat["name"] == FOOD_CAT_Name :
            FOOD_CAT_ID = cat["id"]
            break
    restaurant_category = []

    for sub_cat in results[index]["categories"] :
        if "Restaurant" in sub_cat["name"] :
           restaurant_category.append(sub_cat["id"]) 

    return FOOD_CAT_ID, FOOD_CAT_Name, restaurant_category_ids


food_cat_id, food_cat_name, restaurants_cat_list = get_food_category_config()


In [None]:
URL_4SQU_EXPLORE = 'https://api.foursquare.com/v2/venues/explore'

lat = 51.6252
lng = -0.1517
explore_params = REQUEST_DEFAULT_PARAMS.copy()
explore_params.update({
    "ll" : '{}, {}'.format(lat, lng), 
    "limit" : 1000,
    "radius" : 250,
    "section" : 'food',
    "sortByPopularity" : 1
    })

results = requests.get(url=URL_4SQU_EXPLORE, params=explore_params).json()
type(results)

In [None]:
URL_4SQU_SEARCH = 'https://api.foursquare.com/v2/venues/search'

lat = 51.6252
lng = -0.1517
search_params = REQUEST_DEFAULT_PARAMS.copy()
search_params.update({
    "ll" : '{}, {}'.format(lat, lng), 
    "limit" : 50,
    "radius" : 250,
    "categoryId" : ",".join(restaurants_cat_list)
    })

results = requests.get(url=URL_4SQU_EXPLORE, params=explore_params).json()
type(results)

In [None]:
groups = results["response"]['groups'][0]['items']
results