# Peer-graded Assignment: Battle of Neighbourhoods - Week 1
By Jordan

## 1. Introduction
### 1.1 Description of the Problem
There are a lot of fantastic restaurants in London – Asian, Middle Eastern, Latin and American cuisine, however, one will struggle to find a place to dine in the finest of West African cuisine: Nigerian, Ghanaian, Cameroonian, Senegalese and more.

### 1.2 Discussion of the Background
A successful restaurant chain in Africa is looking to expand its operations into Europe through London. They want to create a high-end organic African restaurant. Their target is not only West Africans, but they are pro-organic and healthy eating.
Since the London demography is so big, my client needs deeper insight from available data in other to decide where to establish the first Europe “palace” restaurant. This company spends a lot on research and provides customers with data insight into the ingredients used at restaurants.

### 1.3 Target Audience
Considering the diversity of London, there is a high multicultural sense. London is a place where different shades live. As such, in the search for an high-end African-inclined restaurant, there is a high shortage. The target audience is broad, it ranges across Londoners, tourists and those who are passionate about organic food.



## 2. Data
### 2.1 Description of Data
This project will rely on public data from Wikipedia and Foursquare.

### 2.1.1 Dataset 1:
In this project, London will be used synonymously with the "Greater London Area". Within the Greater London Area, there are areas that are within the London Area Postcode. The focus of this project will be the neighbourhoods that are within the London Post Code area.

The London Area consists of 32 Boroughs and the "City of London". Data will be from the link - Greater London Area <https://en.wikipedia.org/wiki/List_of_areas_of_London>

A sample of the web scrapped of the Wikipedia page for the Greater London Area data is provided below:


In [6]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

In [7]:
wikipedia_link = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}
wikipedia_page = requests.get(wikipedia_link, headers = headers)
wikipedia_page

<Response [200]>

In [8]:
# Cleans html file
soup = BeautifulSoup(wikipedia_page.content, 'html.parser')
# This extracts the "tbody" within the table where class is "wikitable sortable"
table = soup.find('table', {'class':'wikitable sortable'}).tbody
# table

In [9]:
# Extracts all "tr" (table rows) within the table above
rows = table.find_all('tr')
# rows

In [10]:
# Extracts the column headers, removes and replaces possible '\n' with space for the "th" tag
columns = [i.text.replace('\n', '')
           for i in rows[0].find_all('th')]
# columns

In [11]:
# Converts columns to pd dataframe
df = pd.DataFrame(columns = columns)
# df

In [12]:
# Extracts every row with corresponding columns
# Then appends the values to the create pd dataframe "df"
# Please not that the first row (row[0]) is skipped because it is already the header

for i in range(1, len(rows)):
    tds = rows[i].find_all('td')
    
    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df = df.append(pd.Series(values, index = columns), ignore_index = True)

        df

In [13]:
df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [14]:
df = df.rename(index=str, columns = {'Location': 'Location', 'London\xa0borough': 'Borough', 'Post town': 'Post-town', 'Postcode\xa0district': 'Postcode', 'Dial\xa0code': 'Dial-code', 'OS grid ref': 'OSGridRef'})

In [15]:
df['Borough'] = df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))

In [16]:
df0 = df.drop('Postcode', axis=1).join(df['Postcode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('Postcode'))

In [17]:
df1 = df0[['Location', 'Borough', 'Postcode', 'Post-town']].reset_index(drop=True)

In [18]:
df1.head()

Unnamed: 0,Location,Borough,Postcode,Post-town
0,Abbey Wood,"Bexley, Greenwich",SE2,LONDON
1,Acton,"Ealing, Hammersmith and Fulham",W3,LONDON
2,Acton,"Ealing, Hammersmith and Fulham",W4,LONDON
3,Angel,Islington,EC1,LONDON
4,Angel,Islington,N1,LONDON


In [19]:
df2 = df1

In [20]:
# df_toronto_loc.Borough.str.contains('Toronto') -> TRUE or FALSE output
df21 = df2[df2['Post-town'].str.contains('LONDON')]

In [21]:
df21.head(10)

Unnamed: 0,Location,Borough,Postcode,Post-town
0,Abbey Wood,"Bexley, Greenwich",SE2,LONDON
1,Acton,"Ealing, Hammersmith and Fulham",W3,LONDON
2,Acton,"Ealing, Hammersmith and Fulham",W4,LONDON
3,Angel,Islington,EC1,LONDON
4,Angel,Islington,N1,LONDON
5,Church End,Brent,NW10,LONDON
6,Church End,Barnet,N3,LONDON
7,Clapham,"Lambeth, Wandsworth",SW4,LONDON
8,Clerkenwell,Islington,EC1,LONDON
10,Colindale,Barnet,NW9,LONDON


In [22]:
df3 = df21[['Location', 'Borough', 'Postcode']].reset_index(drop=True)

In [23]:
df_london = df3

In [24]:
!pip -q install geocoder
import geocoder

In [25]:

'''Geocoder starts here'''
'''Defining a function to use --> get_latlng()'''
def get_latlng(arcgis_geocoder):
    
    # Initialize the Location (lat. and long.) to "None"
    lat_lng_coords = None
    
    # While loop helps to create a continous run until all the location coordinates are geocoded
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords
'''Geocoder ends here'''

'Geocoder ends here'

In [26]:
postal_codes = df_london['Postcode']    
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]

In [27]:
# This will store the London dataframe with coordinates
df_london_loc = df_london

# The obtained coordinates (latitude and longitude) are joined with the dataframe as shown
df_london_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_london_loc['Latitude'] = df_london_coordinates['Latitude']
df_london_loc['Longitude'] = df_london_coordinates['Longitude']

In [28]:
df_london_loc.head()

Unnamed: 0,Location,Borough,Postcode,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",SE2,51.49245,0.12127
1,Acton,"Ealing, Hammersmith and Fulham",W3,51.51324,-0.26746
2,Acton,"Ealing, Hammersmith and Fulham",W4,51.48944,-0.26194
3,Angel,Islington,EC1,51.52361,-0.09877
4,Angel,Islington,N1,51.53792,-0.09983


The data output df_london_loc shows the data format that will be used for further analysis in week 2.

### 2.1.2 Dataset 2:
The Foursquare API will be used to obtain the geographical location data for the London Area. These will be used to explore the venues in the neighbourhoods of London.

The venues will provide the categories needed for the analysis and eventually, these will be used to determine the viability of selected locations for the restaurant.

In [None]:

### 2.2 How data will be used to solve the problem¶
The data from the datasets 1 and 2 will be explored by considering the venues within the neighbourhood of London Postcode areas. These areas' restaurants would be checked in terms of the types of restaurants within a certain mile radius. Due to Foursquare restrictions, the number of venues will be limited to 100 venues. The proximity to transport connection and other amenities would be correlated. Also, accessibility and ease of supplies of organic ingredients would be considered.