# Part 1 - A description of the problem and a discussion of the background.
## 1.1 Description of the Problem

The population of London has grown considerably over the last decades. London is very diverse. It represents what is called the reflection of the old British Empire. In London, you can get fresh from food supplies from Africa. One begins to wonder the efficiency of the supply mechanism.

The real deal is that as much as there are many fine restaurants in London – Asian, Middle Eastern, Latin and American restaurants, you can struggle to find good place to dine in the finest of West African cuisine that has combination of Nigerian, Ghanaian, Cameroonian, Senegalese and more.

## 1.2 Discussion of the Background

My client, a successful restaurant chain in Africa is looking to expand operation into Europe through London. They want to create a high-end restaurant that comes with organic mix and healthy. Their target is not only West Africans, but they are pro-organic and healthy eating. To them every meal counts and counts as a royal when you eat.

Since the London demography is so big, my client needs deeper insight from available data in other to decide where to establish the first Europe “palace” restaurant. This company spends a lot on research and provides customers with data insight into the ingredients used at restaurants.

## 1.3 Target Audience

Considering the diversity of London, there is a high multicultural sense. London is a place where different shades live. As such, in the search for an high-end African-inclined restaurant, there is a high shortage.


# Part 2 - A description of the data and how it will be used to solve the problem
## 2.1 Description of Data

This project will rely on public data from Wikipedia and Foursquare.

## 2.1.1 Dataset 1:

In this project, London will be used as synonymous to the "Greater London Area" in this project. Within the Greater London Area, there are areas that are within the London Area Postcode. The focus of this project will be the nieghbourhoods are that are within the London Post Code area.

The London Area consists of 32 Boroughs and the "City of London". Our data will be from the link - Greater London Area <https://en.wikipedia.org/wiki/List_of_areas_of_London >

A sample of the web scrapped of the Wikipedia page for the Greater London Area data is provided below:


In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

In [2]:
!pip -q install geocoder
import geocoder

In [3]:
wikipedia_link = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
wikipedia_page = requests.get(wikipedia_link)
wikipedia_page

<Response [200]>

In [4]:
soup = BeautifulSoup(wikipedia_page.content, 'html.parser')

table = soup.find(class_='wikitable sortable')

In [5]:
df = pd.read_html(str(table), header=0)[0]

In [6]:
df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [1]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[2]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[2],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[2],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [7]:
df.columns=['Location', 'Borough', 'Post-town', 'Postcode', 'Dial-code', 'OSGridRef']

In [8]:
df.head()

Unnamed: 0,Location,Borough,Post-town,Postcode,Dial-code,OSGridRef
0,Abbey Wood,"Bexley, Greenwich [1]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[2]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[2],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[2],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [9]:
df['Borough'] = df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))

In [10]:
df.drop(['Dial-code','OSGridRef' ], axis=1, inplace=True)

In [11]:
df.head()

Unnamed: 0,Location,Borough,Post-town,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Addington,Croydon,CROYDON,CR0
3,Addiscombe,Croydon,CROYDON,CR0
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


In [12]:
df = df.drop('Postcode', axis=1).join(df['Postcode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('Postcode'))

In [13]:
df.head()

Unnamed: 0,Location,Borough,Post-town,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W4
2,Addington,Croydon,CROYDON,CR0
3,Addiscombe,Croydon,CROYDON,CR0


In [14]:
df = df[df['Post-town'].str.contains('LONDON')]

In [15]:
df.head()

Unnamed: 0,Location,Borough,Post-town,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W4
6,Aldgate,City,LONDON,EC3
7,Aldwych,Westminster,LONDON,WC2


In [16]:
df.reset_index(drop=True, inplace=True)

In [17]:
df.head()

Unnamed: 0,Location,Borough,Post-town,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3
2,Acton,"Ealing, Hammersmith and Fulham",LONDON,W4
3,Aldgate,City,LONDON,EC3
4,Aldwych,Westminster,LONDON,WC2


In [38]:
def get_latlng(arcgis_geocoder):
    
    # Initialize the Location (lat. and long.) to "None"
    lat_lng_coords = None
    max_iter = 10
    i=0
    # While loop helps to create a continous run until all the location coordinates are geocoded
    
    while(lat_lng_coords is None and i<max_iter):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
        i=i+1
    return lat_lng_coords

In [19]:
df_london = df

In [45]:
postal_codes.shape

(380,)

In [46]:
postal_codes = df_london['Postcode']    

In [41]:
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]

In [51]:
coordinates[:][0:5]

[[51.492450000000076, 0.12127000000003818],
 [51.51324000000005, -0.2674599999999714],
 [51.48944000000006, -0.26193999999992457],
 [51.511990000000026, -0.0805899999999724],
 [51.51651000000004, -0.11966999999992822]]

In [52]:
postal_codes.shape

(380,)

In [53]:
df_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_london['Latitude']  = df_coordinates['Latitude']
df_london['Longitude'] = df_coordinates['Longitude']

In [54]:
df_london.head()

Unnamed: 0,Location,Borough,Post-town,Postcode,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3,51.51324,-0.26746
2,Acton,"Ealing, Hammersmith and Fulham",LONDON,W4,51.48944,-0.26194
3,Aldgate,City,LONDON,EC3,51.51199,-0.08059
4,Aldwych,Westminster,LONDON,WC2,51.51651,-0.11967


In [55]:
df_london.to_csv('london_df.csv')


### 2.1.2 Dataset 2:

The Foursquare API will be used to obtain the geographical location data for the London Area. These will be used to explore the venues in the neighbourhoods of London.

The venues will provide the categories needed for the analysis and eventually, these will be used to determine the viability of selected locations for the restaurant.



## 2.2 How data will be used to solve the problem

The data from the datasets 1 and 2 will be explored by considering the venues within the neighbourhood of London Postcode areas. These areas' restaurants would be checked in terms of the types of restaurants within a certain mile radius. Due to Foursquare restrictions, the number of venues will be limited to 100 venues. The proximity to transport connection and other amenities would be correlated. Also, accessibility and ease of supplies of organic ingredients would be considered.
