# Which borough for a London restaurant

## Introduction

### A description of the problem and a discussion of the background.

We want to choose the best place in london in order to open a restaurant. 
London is divided in 33 Borough. So the question is in which of thoses borough would suit better for a restaurant. 

### A description of the data and how it will be used to solve the problem.

In order to build our data set we need first the  list of  london's borough. We can get it from wikipedia: 
'https://en.wikipedia.org/wiki/London_boroughs' . 
To pull data out of the HTML and XML files we will use **BeautifulSoup**. 

Once we get the london's borough list we use **geopy library** to get the latitude and longitude values. 

And then we eventually use **Foursquare** to get the top 100 venues from each Borough. We end up with a  table including thoses columns  :
- Borough
- Borough Latitude
- Borough Longitude
- Venue
- Venue Latitude
- Venue Longitude
- Venue Category


We will use this data to find the best borough for a restaurant. 
       
#### Methodology

As we are working with unlabeled data we will choose **k-means** models for clustering. Using the **Venue Category**, we will separate the london's borough into groups that have similar characteristics. And then from each group we will find those who are the most suitable for a restaurant.  



## Table of Contents

1. <a href="#item1">Import necessary Libraries</a>    
2. <a href="#item2">Use BeautifulSoup to pull out from of the HTML and XML files</a> 
3. <a href="#item2">Use geopy library to get the latitude and longitude values</a> 
4. <a href="#item2">use Foursquare to get the top 100 venues from each Borough</a> 


### 1. Import necessary Libraries

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

from bs4 import BeautifulSoup # library for pulling data out of HTML and XML files
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values


print('Libraries imported.')

Libraries imported.


### 2. Use BeautifulSoup to pull out from of the HTML and XML files

In [4]:
# url
url = 'https://en.wikipedia.org/wiki/London_boroughs'
data_xml = requests.get(url).text
soup_postal_codes = BeautifulSoup(data_xml,'lxml')

# find the table in xml code
Table_boroughs = soup_postal_codes.find('table',{'class':'wikitable sortable'})

# instantiate the dataframe
column_names = ['Borough', 'Designation', 'Former areas 1', 'Former areas 2', 'Former areas 3', 'Former areas 4', 'Former areas 5'] 
Borough_table = pd.DataFrame(columns=column_names)

#Table_boroughs

Table_boroughs = Table_boroughs.find_all('tr')
boroughs_list = []

for balise_tr in Table_boroughs:
    balise_td = balise_tr.find_all('td')
    row = [i.text for i in balise_td]
    boroughs_list.append(row)

del boroughs_list[0]

for data in boroughs_list:
    Borough = data[0]
    Designation = data[1]
    Former_areas_1 = data[2]
    Former_areas_2 = data[3]
    Former_areas_3 = data[4]
    Former_areas_4 = data[5]
    Former_areas_5 = data[6]

    Borough_table = Borough_table.append({'Borough': Borough,
                                   'Designation': Designation,
                                   'Former areas 1':Former_areas_1,
                                    'Former areas 2':Former_areas_2,
                                    'Former areas 3':Former_areas_3,
                                    'Former areas 4':Former_areas_4,
                                    'Former areas 5':Former_areas_5 }, 
                                    ignore_index=True)

# clean data from :\n and [notes]
Borough_table.Borough = Borough_table.Borough.apply(lambda ele : ele.strip('\n'))
Borough_table.Designation = Borough_table.Designation.apply(lambda ele : ele.strip('\n'))

result_split = Borough_table['Borough'].str.split('[')


Borough_list = []
for ele in result_split:
    #print(ele[0])
    Borough_list.append(ele[0])

Borough_table['Borough'] = Borough_list

Borough_table_IO = Borough_table[['Borough','Designation']]



In [6]:
Borough_table_IO.head()

Unnamed: 0,Borough,Designation
0,Greenwich,Inner
1,Hackney,Inner
2,Hammersmith,Inner
3,Islington,Inner
4,Kensington and Chelsea,Inner


### 3. Use geopy library to get the latitude and longitude values

In [7]:
# function that extracts the coordinate of a city
def get_coordinate(address):
    
    geolocator = Nominatim(user_agent="explorer")
    location = geolocator.geocode(address)
    return location

In [8]:
Borough_list_IO = []

for Borough in Borough_table_IO.Borough :
    LD_loc = get_coordinate(Borough + ', London, UK')
    row = [Borough, LD_loc.latitude,LD_loc.longitude]
    #print('row : ', row)
    Borough_list_IO.append(row)


In [9]:
# instantiate the dataframe

column_names = ['Borough', 'latitude', 'longitude'] 
Borough_lat_long = pd.DataFrame(columns=column_names)


In [10]:
for data in Borough_list_IO:
    Borough = data[0]
    latitude = data[1]
    longitude = data[2]


    Borough_lat_long = Borough_lat_long.append({'Borough': Borough,
                                                'latitude': latitude,
                                                'longitude':longitude}, 
                                               ignore_index=True)


In [11]:
Borough_lat_long.head()

Unnamed: 0,Borough,latitude,longitude
0,Greenwich,51.482084,-0.004542
1,Hackney,51.54324,-0.049362
2,Hammersmith,51.492038,-0.22364
3,Islington,51.538429,-0.099905
4,Kensington and Chelsea,51.498995,-0.199123


### 4. Use Foursquare to get the top 100 venues from each Borough

In [13]:
# The code was removed by Watson Studio for sharing.

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT= 100, VERSION = '20120618' ):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
london_venues = getNearbyVenues(names=Borough_lat_long['Borough'],
                                   latitudes=Borough_lat_long['latitude'],
                                   longitudes=Borough_lat_long['longitude']
                                  )

Greenwich
Hackney
Hammersmith
Islington
Kensington and Chelsea
Lambeth
Lewisham
Southwark
Tower Hamlets
Wandsworth
Westminster
Barking
Barnet
Bexley
Brent
Bromley
Croydon
Ealing
Enfield
Haringey
Harrow
Havering
Hillingdon
Hounslow
Kingston upon Thames
Merton
Newham
Redbridge
Richmond upon Thames
Sutton
Camden
Waltham Forest


In [16]:
london_venues.head()

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Greenwich,51.482084,-0.004542,Old Royal Naval College,51.483234,-0.005579,Historic Site
1,Greenwich,51.482084,-0.004542,Painted Hall,51.482889,-0.00642,Museum
2,Greenwich,51.482084,-0.004542,National Maritime Museum,51.481329,-0.005581,History Museum
3,Greenwich,51.482084,-0.004542,Greenwich Naval College Gardens,51.483007,-0.008362,Garden
4,Greenwich,51.482084,-0.004542,The Plume of Feathers,51.481945,-0.001126,Pub
