# London Park Analysis
###### By Joshua Montgomery
The aim of this project is to determine the extent to which various factors contribute to the success of a park (visitor numbers). A model will then be built to predict if a new park was to be built, with a defined set of characteristics, how many visitors would the park see in average year.

I decided to perform this analysis on London as it has many parks, for which there is no wholely applied template.


### Part 1: Collecting location data on the parks

In [2]:
import requests 
import pandas as pd 
import numpy as np 
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium 


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    ------------------------------------------------------------
                       

In [3]:
CLIENT_ID = '4NWCNROL5FW1VDYTLSNQDLARLYEIWUPGLJ25NIMR24TVWNNR' 
CLIENT_SECRET = '_____________________________________________' 
VERSION = '20180604'
LIMIT = 1000
#I removed my 'CLIENT_SECRET' so you can't access my account. If you wish to execute this code it's a simple matter to create your own account with Foursquare

#The coordinates of Buckingham Palace seem like a reasonable place to define as the center of London.
latitude=51.5014
longitude=-0.1419

In [4]:

search_query = 'park'
radius = 10000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()

In [5]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head(10)

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Hyde Park,Park,Serpentine Rd,GB,London,United Kingdom,,1587,"[Serpentine Rd, London, Greater London, W2 2TP...","[{'label': 'display', 'lat': 51.50778087767913...",51.507781,-0.162392,,W2 2TP,Greater London,4ac518d2f964a52026a720e3
1,St James's Park,Park,The Mall,GB,London,United Kingdom,Horse Guards Rd,650,"[The Mall (Horse Guards Rd), London, Greater L...","[{'label': 'display', 'lat': 51.50325316049429...",51.503253,-0.132995,,SW1A 2BJ,Greater London,4ac518cdf964a520f2a520e3
2,Green Park,Park,Piccadilly,GB,London,United Kingdom,Constitution Hill,385,"[Piccadilly (Constitution Hill), London, Great...","[{'label': 'display', 'lat': 51.50465559886703...",51.504656,-0.143788,,SW1A 1BW,Greater London,4b96b2bbf964a520c2de34e3
3,Battersea Park,Park,Albert Bridge Rd,GB,Battersea,United Kingdom,,2651,"[Albert Bridge Rd, Battersea, Greater London, ...","[{'label': 'display', 'lat': 51.47951201381755...",51.479512,-0.156984,,SW11 4NJ,Greater London,4ac518cef964a52015a620e3
4,Regent's Park,Park,Chester Rd,GB,London,United Kingdom,,3339,"[Chester Rd, London, Greater London, NW1 4NR, ...","[{'label': 'display', 'lat': 51.53047945949403...",51.530479,-0.153766,,NW1 4NR,Greater London,4b233922f964a520785424e3
5,Green Park London Underground Station,Metro Station,Piccadilly,GB,London,United Kingdom,at Stratton St,595,"[Piccadilly (at Stratton St), London, Greater ...","[{'label': 'display', 'lat': 51.5067341345345,...",51.506734,-0.14263,,W1J 9DZ,Greater London,4b54b78bf964a520a8c827e3
6,St James's Park Lake,Lake,Horse Guards Rd,GB,London,United Kingdom,,631,"[Horse Guards Rd, London, SW1A 2BJ, United Kin...","[{'label': 'display', 'lat': 51.50270552998373...",51.502706,-0.133038,,SW1A 2BJ,,5058c4c9e4b0bc64c9103f52
7,Victoria Park,Park,Grove Rd,GB,London,United Kingdom,,8460,"[Grove Rd, London, Greater London, E3 5TB, Uni...","[{'label': 'display', 'lat': 51.53849910020006...",51.538499,-0.03529,Old Ford,E3 5TB,Greater London,4ac518cef964a5201da620e3
8,Hyde Park Corner Bus Stop F,Bus Stop,,GB,London,United Kingdom,,615,"[London, Greater London, SW1W 0QH, United King...","[{'label': 'display', 'lat': 51.502087, 'lng':...",51.502087,-0.150721,Green Park,SW1W 0QH,Greater London,574b47d2498eb823b9601a0f
9,St. James's Park London Underground Station,Metro Station,Petty France,GB,London,United Kingdom,,566,"[Petty France, London, Greater London, SW1H 0B...","[{'label': 'display', 'lat': 51.4997101149314,...",51.49971,-0.134187,,SW1H 0BD,Greater London,5036b36dcc6417d4bcd9a24b


dataframe_filtered now contains the location data for the parks in London, however it also has included data for anything that has park in the title. St James' park is an actual park in London, but St. James's Park London Underground Station is irrelevant to this analysis, hence must be filtered out. Furthermore most of the columns are of no use in this analysis, hence they to must be filtered.

In [6]:
raw_parks= dataframe_filtered.drop(dataframe_filtered[dataframe_filtered.categories!='Park'].index)
i=0
for j in range (0,len(raw_parks)):
        raw_parks.iat[j,i]=str(raw_parks.iat[j,i])
        raw_parks.iat[j,i]=raw_parks.iat[j,i].replace("'s","")
parks=raw_parks[['name','distance','lat','lng','postalCode']]
parks.set_index('name',inplace=True)
parks

Unnamed: 0_level_0,distance,lat,lng,postalCode
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Hyde Park,1587,51.507781,-0.162392,W2 2TP
St James Park,650,51.503253,-0.132995,SW1A 2BJ
Green Park,385,51.504656,-0.143788,SW1A 1BW
Battersea Park,2651,51.479512,-0.156984,SW11 4NJ
Regent Park,3339,51.530479,-0.153766,NW1 4NR
Victoria Park,8460,51.538499,-0.03529,E3 5TB
Holland Park,4318,51.503148,-0.204153,W14
Clissold Park,7639,51.561438,-0.088457,N16 9HJ
Finsbury Park,8179,51.570321,-0.100937,N 4 2
Greenwich Park,10245,51.477521,0.000858,SE10 9NF


The parks dataframe now contains the location data for all the parks in the search area, this however is not enough for a proper analysis as there is no dependent variable to measure, and an inadequte quantity of independent variables to find a suitable prediction model. Part 2 will focus on gathering addiditonal data.

It's worth mentioning that the apostrophes were removed from the names, to prevent parsing errors. It's not great english, but otherwise it would be impossible to label the data

In [7]:
parks_map = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lng, label in zip(parks.lat, parks.lng, parks.index):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        parse_html=True,
        fill_opacity=0.6
        ).add_to(parks_map )
parks_map

# Part 2: Gathering the dependent variable

In [8]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0


client_f428788a31e145d59d01f27487b0e144 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='_______________________________________', #Again I've removed sensitve data
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.eu-geo.objectstorage.service.networklayer.com')

body = client_f428788a31e145d59d01f27487b0e144.get_object(Bucket='capstone-donotdelete-pr-bo2ec3iljwcjnj',Key='Parks in London.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

parks_visits = pd.read_csv(body)
parks_visits.set_index('name',inplace=True)
parks_visits.head()

#This section of code is automatically generated to retrieve a file from this project, and then load it into this notebook.

Unnamed: 0_level_0,visitors,size
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Hyde Park,10.3,350.0
St James Park,13.0,58.0
Green Park,10.9,47.0
Battersea Park,3.0,200.0
Regent Park,6.7,410.0


In [9]:
parks2=parks.join(parks_visits)
parks2=parks2.dropna()
parks2

Unnamed: 0_level_0,distance,lat,lng,postalCode,visitors,size
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Hyde Park,1587,51.507781,-0.162392,W2 2TP,10.3,350.0
St James Park,650,51.503253,-0.132995,SW1A 2BJ,13.0,58.0
Green Park,385,51.504656,-0.143788,SW1A 1BW,10.9,47.0
Battersea Park,2651,51.479512,-0.156984,SW11 4NJ,3.0,200.0
Regent Park,3339,51.530479,-0.153766,NW1 4NR,6.7,410.0
Victoria Park,8460,51.538499,-0.03529,E3 5TB,9.0,213.0
Holland Park,4318,51.503148,-0.204153,W14,5.26,54.0
Clissold Park,7639,51.561438,-0.088457,N16 9HJ,3.0,55.8
Finsbury Park,8179,51.570321,-0.100937,N 4 2,1.5,110.0
Greenwich Park,10245,51.477521,0.000858,SE10 9NF,3.9,180.0


We have know added values for the millions of visitors per year and the size in acres. Unfortunately a sufficient data table couldn't be found online, so I had to manually find data for visitors and size.

# Part 3: Forming a model

In [57]:
regression=parks2[['visitors','size','distance']]
msk = np.random.rand(len(parks2)) < 0.8
train = parks2[msk]
test = parks2[~msk]

from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(train[['size','distance']])
y = np.asanyarray(train[['visitors']])
regr.fit (x, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [58]:
y_hat= regr.predict(test[['size','distance']])
x = np.asanyarray(test[['size','distance']])
y = np.asanyarray(test[['visitors']])
print ('Coefficients: ', regr.coef_)
print("Residual sum of squares: %.2f" % np.mean((y_hat - y) ** 2))
print('Variance score: %.2f' % regr.score(x, y))

Coefficients:  [[ 0.00094031 -0.00051789]]
Residual sum of squares: 6.85
Variance score: 0.03


From the coefficients we can see that the parks that are closer to the center of London, and that are larger, tend to attract more people

# Part 4 Prediction of visitor numbers

Now that the model has been formed we can use it to predict how popular hypothetical parks might be. The three parks we shall evaluate are
* If the grounds of Buckingham Palace became a park
* A small park near the centre of London
* A gigantic park on the outskirts of London

In [69]:
hypothetical_parks=pd.DataFrame()
hypothetical_parks['name']=['Buckingham Palace','Small park','Large park']
hypothetical_parks['distance']=[0,100,10000]
hypothetical_parks['size']=[39,10,800]

In [70]:
hypothetical_visitors=regr.predict(hypothetical_parks[['size','distance']])

In [71]:
hypothetical_visitors

array([[8.80896402],
       [8.72990607],
       [4.34563601]])

The model suggests that the most dominant factor in the success of a park is it's location. If it's postioned nearer a city centre, it will likely see more visitors.
If you were to actually build a new park the model suggests that spending the money on a prime location is the most important thing.