# IBM Capstone Project 

## Data Acquistion
The first phase of the project is to acquire all of the data that is needed for this project. The initial data required can be broken down into three separate data sets:

1. The FourSquare Top restaurant/cafe venues to visit in London
2. For each of the Top Site get a list of up restaurants in the surrounding area
3. The UK Government London Borough Income Level Dataset for the last Year

In [1]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import json 
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from bs4 import BeautifulSoup
from urllib.request import urlopen
import ssl
import csv

from geopy.geocoders import Nominatim 
# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

  import pandas.util.testing as tm


Libraries imported.


## Exploring City Data with the Four Square API

## Where is the best place to open a restuarant in London?

Where in London would it be best to open a new restuarant? 

This notebook leverages the four square API to explore neighborhood's in London. To explore this problem we need to segment London neigborhood data the data appropriatley. This entails segmenting by the average income of an area, what types of restuarants are popular (historically), and by how many restuarants there are in an area. We can then map the restaurants by location (latitude and longtitude) and then segment the restuarants by type. In this analysis the following questions will be answered; what restuarants recieve the best reviews? what restuarants are most popular, and what part of london are they in? Is there a type of restaurant that is consistently more popular than others? 

**Business Problem:**
This analysis will help anyone that wants to open up a restaurant in London and wants to know if it is a viable option where they intend on opening.


### Four square Top Restuarant Venues in London

In [2]:
# Get longitude and latitude for London
address = 'London, London'

geolocator = Nominatim(user_agent="london_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London is 51.5073219, -0.1276474.


In [3]:
# Set up Foursquare
CLIENT_ID = 'K2D5QVKQ0QPUPIKECQU2JIB3CHVFT4ZMTJQC5WVTRNYTDD5Z'
CLIENT_SECRET = '5ZVFLDC04W0W5SHCBZ11MDRG0MELPIER3QIFUOAWFAIZKJNK'
VERSION = '20200808' # Foursquare API version
LIMIT = 500
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: K2D5QVKQ0QPUPIKECQU2JIB3CHVFT4ZMTJQC5WVTRNYTDD5Z
CLIENT_SECRET:5ZVFLDC04W0W5SHCBZ11MDRG0MELPIER3QIFUOAWFAIZKJNK


**Do not run code unless process below does not work...**

In [None]:
# Use the Requests get method to request the top sites in Chicago
page = requests.get(
    "https://foursquare.com/explore?cat=topPicks&mode=url&near=London%2C%20Greater%20London%2C%20United%20Kingdom&nearGeoId=72057594040571679")

# Convert the HTML response into a BeautifulSoup Object
soup = BeautifulSoup(page.content, 'html.parser')

# Use the BeautifulSoup find_all method to extract each top site venue details.
top_venues = soup.find_all('div', class_='venueDetails')

In [4]:
search_query = 'Restaurant'
radius = 50000
print(search_query + ' .... OK!')

Restaurant .... OK!


In [None]:
GET https://api.foursquare.com/v2/venues/trending

In [5]:
url = 'https://api.foursquare.com/v2/venues/trending?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/trending?client_id=K2D5QVKQ0QPUPIKECQU2JIB3CHVFT4ZMTJQC5WVTRNYTDD5Z&client_secret=5ZVFLDC04W0W5SHCBZ11MDRG0MELPIER3QIFUOAWFAIZKJNK&ll=51.5073219,-0.1276474&v=20200808&query=Restaurant&radius=50000&limit=500'

In [6]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ed7c6e2e826ac001b866a80'},
 'response': {'venues': []}}

#### From this API we call extracted the following data: 

- Location 
- Postal code
- Restaurant name
- Restaurant category
- Location ID

In [7]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

  """


In [8]:
Neighbourhoods = list(dataframe.columns.values)
Neighbourhoods = Neighbourhoods[5:]
print(Neighbourhoods)

[]


In [9]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

KeyError: "None of [Index(['name', 'categories', 'id'], dtype='object')] are in the [columns]"

In [47]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=15,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='green',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

In [20]:
dataframe_filtered.shape

(50, 16)

In [21]:
dataframe_filtered.dtypes

name                 object
categories           object
address              object
crossStreet          object
lat                 float64
lng                 float64
labeledLatLngs       object
distance              int64
postalCode           object
cc                   object
city                 object
state                object
country              object
formattedAddress     object
neighborhood         object
id                   object
dtype: object