### The Battle of Neighborhoods 

Here I will show the boroughs in London and the incomes by boroughs. Accordingly, I will visualize the boroughs with the lowest, medium and highest incomes, and show which are the most promising areas for living in London.

## Data section
Data on London borough profiles was extracted from the HM Land Registry (http://landregistry.data.gov.uk/),'london-borough-profiles'. The following fields comprise: Area_name, Inner/_Outer_London, Average_Age,Proportion_of_population_of_working-age, Youth_Unemployment_(claimant)_rate_18-24_(Dec-15), Gross_Annual_Pay,_(2016)...

## Methodology 
The Methodology section will describe the main components of our analysis and predication system. The Methodology section comprises four stages:

1. Collect Inspection Data
2. Explore and Understand Data
3. Data preparation and preprocessing 
4. Plotting and modeling

In [1]:
import os # Operating System
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium #import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


### Importing necessary libraries
Download csv file 'london-borough-profiles'.

In [171]:
filename ='https://data.london.gov.uk/download/london-borough-profiles/c1693b82-68b1-44ee-beb2-3decf17dc1f8/london-borough-profiles.csv'
df_london = pd.read_csv(filename, encoding='latin1')
print('Data loaded')

Data loaded


In [172]:
df_london

Unnamed: 0,Code,Area_name,Inner/_Outer_London,GLA_Population_Estimate_2017,GLA_Household_Estimate_2017,Inland_Area_(Hectares),Population_density_(per_hectare)_2017,"Average_Age,_2017","Proportion_of_population_aged_0-15,_2015","Proportion_of_population_of_working-age,_2015",...,Happiness_score_2011-14_(out_of_10),Anxiety_score_2011-14_(out_of_10),Childhood_Obesity_Prevalance_(%)_2015/16,People_aged_17+_with_diabetes_(%),Mortality_rate_from_causes_considered_preventable_2012/14,Political_control_in_council,Proportion_of_seats_won_by_Conservatives_in_2014_election,Proportion_of_seats_won_by_Labour_in_2014_election,Proportion_of_seats_won_by_Lib_Dems_in_2014_election,Turnout_at_2014_local_elections
0,E09000001,City of London,Inner London,8800,5326,290,30.3,43.2,11.4,73.1,...,6.0,5.6,,2.6,129,.,.,.,.,.
1,E09000002,Barking and Dagenham,Outer London,209000,78188,3611,57.9,32.9,27.2,63.1,...,7.1,3.1,28.5,7.3,228,Lab,0,100,0,36.5
2,E09000003,Barnet,Outer London,389600,151423,8675,44.9,37.3,21.1,64.9,...,7.4,2.8,20.7,6.0,134,Cons,50.8,.,1.6,40.5
3,E09000004,Bexley,Outer London,244300,97736,6058,40.3,39.0,20.6,62.9,...,7.2,3.3,22.7,6.9,164,Cons,71.4,23.8,0,39.6
4,E09000005,Brent,Outer London,332100,121048,4323,76.8,35.6,20.9,67.8,...,7.2,2.9,24.3,7.9,169,Lab,9.5,88.9,1.6,36.3
5,E09000006,Bromley,Outer London,327900,140602,15013,21.8,40.2,19.9,62.6,...,7.4,3.3,16,5.2,148,Cons,85,11.7,0,40.8
6,E09000007,Camden,Inner London,242500,107654,2179,111.3,36.4,17.3,71.0,...,7.1,3.6,21.3,3.9,164,Lab,22.2,74.1,1.9,38.7
7,E09000008,Croydon,Outer London,386500,159010,8650,44.7,37.0,22.0,64.9,...,7.2,3.3,24.5,6.5,178,Lab,42.9,57.1,0,38.6
8,E09000009,Ealing,Outer London,351600,132663,5554,63.3,36.2,21.4,66.8,...,7.3,3.6,23.8,6.9,164,Lab,17.4,76.8,5.8,41.2
9,E09000010,Enfield,Outer London,333000,130328,8083,41.2,36.3,22.8,64.4,...,7.3,2.6,25.2,7.0,152,Lab,34.9,65.1,0,38.2


Drop unnecessary columns

In [173]:
df_london.drop(['Code', 'Population_density_(per_hectare)_2017', 'Inner/_Outer_London', 'GLA_Population_Estimate_2017', 'GLA_Household_Estimate_2017', 'Inland_Area_(Hectares)', 'Average_Age,_2017', 'Proportion_of_population_aged_0-15,_2015', 'Proportion_of_population_of_working-age,_2015', 'Proportion_of_population_aged_65_and_over,_2015', 'Net_internal_migration_(2015)', 'Net_international_migration_(2015)', 'Net_natural_change_(2015)', '%_of_resident_population_born_abroad_(2015)', 'Largest_migrant_population_by_country_of_birth_(2011)', '%_of_largest_migrant_population_(2011)', 'Second_largest_migrant_population_by_country_of_birth_(2011)', '%_of_second_largest_migrant_population_(2011)', 'Third_largest_migrant_population_by_country_of_birth_(2011)', '%_of_third_largest_migrant_population_(2011)', '%_of_population_from_BAME_groups_(2016)', '%_people_aged_3+_whose_main_language_is_not_English_(2011_Census)', 'Overseas_nationals_entering_the_UK_(NINo),_(2015/16)', 'New_migrant_(NINo)_rates,_(2015/16)', 'Largest_migrant_population_arrived_during_2015/16', 'Second_largest_migrant_population_arrived_during_2015/16', 'Third_largest_migrant_population_arrived_during_2015/16', 'Employment_rate_(%)_(2015)', 'Male_employment_rate_(2015)', 'Female_employment_rate_(2015)', 'Unemployment_rate_(2015)', 'Youth_Unemployment_(claimant)_rate_18-24_(Dec-15)', 'Proportion_of_16-18_year_olds_who_are_NEET_(%)_(2014)', 'Proportion_of_the_working-age_population_who_claim_out-of-work_benefits_(%)_(May-2016)', '%_working-age_with_a_disability_(2015)', 'Proportion_of_working_age_people_with_no_qualifications_(%)_2015', 'Proportion_of_working_age_with_degree_or_equivalent_and_above_(%)_2015', 'Gross_Annual_Pay_-_Male_(2016)', 'Gross_Annual_Pay_-_Female_(2016)', 'Modelled_Household_median_income_estimates_2012/13', '%_adults_that_volunteered_in_past_12_months_(2010/11_to_2012/13)', 'Number_of_jobs_by_workplace_(2014)', '%_of_employment_that_is_in_public_sector_(2014)', 'Jobs_Density,_2015', 'Number_of_active_businesses,_2015', 'Two-year_business_survival_rates_(started_in_2013)', 'Crime_rates_per_thousand_population_2014/15', 'Fires_per_thousand_population_(2014)', 'Ambulance_incidents_per_hundred_population_(2014)', 'Median_House_Price,_2015', 'Average_Band_D_Council_Tax_charge_(£),_2015/16', 'New_Homes_(net)_2015/16_(provisional)', 'Homes_Owned_outright,_(2014)_%', 'Being_bought_with_mortgage_or_loan,_(2014)_%', 'Rented_from_Local_Authority_or_Housing_Association,_(2014)_%', 'Rented_from_Private_landlord,_(2014)_%', '%_of_area_that_is_Greenspace,_2005', 'Total_carbon_emissions_(2014)', 'Household_Waste_Recycling_Rate,_2014/15', 'Number_of_cars,_(2011_Census)', 'Number_of_cars_per_household,_(2011_Census)', '%_of_adults_who_cycle_at_least_once_per_month,_2014/15', 'Average_Public_Transport_Accessibility_score,_2014', 'Achievement_of_5_or_more_A*-_C_grades_at_GCSE_or_equivalent_including_English_and_Maths,_2013/14', 'Rates_of_Children_Looked_After_(2016)', '%_of_pupils_whose_first_language_is_not_English_(2015)', '%_children_living_in_out-of-work_households_(2015)', 'Male_life_expectancy,_(2012-14)', 'Female_life_expectancy,_(2012-14)', 'Teenage_conception_rate_(2014)', 'Life_satisfaction_score_2011-14_(out_of_10)', 'Worthwhileness_score_2011-14_(out_of_10)', 'Happiness_score_2011-14_(out_of_10)', 'Anxiety_score_2011-14_(out_of_10)', 'Childhood_Obesity_Prevalance_(%)_2015/16', 'People_aged_17+_with_diabetes_(%)', 'Mortality_rate_from_causes_considered_preventable_2012/14', 'Political_control_in_council', 'Proportion_of_seats_won_by_Conservatives_in_2014_election', 'Proportion_of_seats_won_by_Labour_in_2014_election', 'Proportion_of_seats_won_by_Lib_Dems_in_2014_election', 'Turnout_at_2014_local_elections'], inplace=True, axis=1)

In [174]:
df_london

Unnamed: 0,Area_name,"Gross_Annual_Pay,_(2016)"
0,City of London,.
1,Barking and Dagenham,27886
2,Barnet,33443
3,Bexley,34350
4,Brent,29812
5,Bromley,37682
6,Camden,39796
7,Croydon,32696
8,Ealing,31331
9,Enfield,31603


In [175]:
print(df_london.columns.values)

['Area_name' 'Gross_Annual_Pay,_(2016)']


Rename columns

In [176]:
df_london.rename(columns={"Area_name": "Boroughs", "Gross_Annual_Pay,_(2016)": "Income-2016"}, inplace=True)

In [177]:
df_london

Unnamed: 0,Boroughs,Income-2016
0,City of London,.
1,Barking and Dagenham,27886
2,Barnet,33443
3,Bexley,34350
4,Brent,29812
5,Bromley,37682
6,Camden,39796
7,Croydon,32696
8,Ealing,31331
9,Enfield,31603


In [178]:
print(df_london.columns.values)

['Boroughs' 'Income-2016']


In [179]:
Boroughs = df_london['Boroughs']
Boroughs

0             City of London
1       Barking and Dagenham
2                     Barnet
3                     Bexley
4                      Brent
5                    Bromley
6                     Camden
7                    Croydon
8                     Ealing
9                    Enfield
10                 Greenwich
11                   Hackney
12    Hammersmith and Fulham
13                  Haringey
14                    Harrow
15                  Havering
16                Hillingdon
17                  Hounslow
18                 Islington
19    Kensington and Chelsea
20      Kingston upon Thames
21                   Lambeth
22                  Lewisham
23                    Merton
24                    Newham
25                 Redbridge
26      Richmond upon Thames
27                 Southwark
28                    Sutton
29             Tower Hamlets
30            Waltham Forest
31                Wandsworth
32               Westminster
33              Inner London
34            

In [180]:
import pandas as pd
import numpy as np
import datetime as DT
import hmac
from geopy.geocoders import Nominatim
from geopy.distance import vincenty
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [181]:
geolocator = Nominatim()

  if __name__ == '__main__':


In [182]:
df_london['Latitude'] = df_london['Boroughs'].apply(geolocator.geocode).apply(lambda x: x.latitude)
df_london['Longitude'] = df_london['Boroughs'].apply(geolocator.geocode).apply(lambda x: x.longitude)
df_london

Unnamed: 0,Boroughs,Income-2016,Latitude,Longitude
0,City of London,.,51.515618,-0.091998
1,Barking and Dagenham,27886,51.554117,0.150504
2,Barnet,33443,51.65309,-0.200226
3,Bexley,34350,39.969238,-82.936864
4,Brent,29812,32.937346,-87.164718
5,Bromley,37682,51.402805,0.014814
6,Camden,39796,39.94484,-75.119891
7,Croydon,32696,51.371305,-0.101957
8,Ealing,31331,51.512655,-0.305195
9,Enfield,31603,51.652085,-0.081018


In [183]:
df_london = df_london[df_london["Income-2016"] != '.']

In [184]:
df_london

Unnamed: 0,Boroughs,Income-2016,Latitude,Longitude
1,Barking and Dagenham,27886,51.554117,0.150504
2,Barnet,33443,51.65309,-0.200226
3,Bexley,34350,39.969238,-82.936864
4,Brent,29812,32.937346,-87.164718
5,Bromley,37682,51.402805,0.014814
6,Camden,39796,39.94484,-75.119891
7,Croydon,32696,51.371305,-0.101957
8,Ealing,31331,51.512655,-0.305195
9,Enfield,31603,51.652085,-0.081018
10,Greenwich,32415,51.482084,-0.004542


In [185]:
address = 'London, UK'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [186]:
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, income2016 in zip(df_london['Latitude'], df_london['Longitude'], df_london['Boroughs'], df_london['Income-2016']):
    label = '{}, {}'.format(Boroughs, int(income2016))
    label = folium.Popup(label, parse_html=True)
    if int(income2016) < 30000:
        color='red' 
    elif int(income2016)>40000:
        color='lightgreen'
    else:
        color='blue'
    folium.CircleMarker(
        [lat, lng],
        radius=int(income2016) / 1000,
        popup=label,
        color=color,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

In [187]:
#Define Foursquare Credentials and Version

CLIENT_ID = 'KI3TR0QO4JOKMFELOMF3WSOOI3HFNBF5YLW354MYWBKDHEX3' # Foursquare ID
CLIENT_SECRET = 'QF4ZBLJRBV4BQX52DVWUPEHJ14A2UJABPCZARZQZYTKIISUD' # Foursquare Secret
VERSION = '20181206' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KI3TR0QO4JOKMFELOMF3WSOOI3HFNBF5YLW354MYWBKDHEX3
CLIENT_SECRET:QF4ZBLJRBV4BQX52DVWUPEHJ14A2UJABPCZARZQZYTKIISUD


In [188]:
def getNearbyBoroughs(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    boroughs_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby borough
        boroughs_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_boroughs = pd.DataFrame([item for borough_list in boroughs_list for item in borough_list])
    nearby_boroughs.columns = ['Boroughs', 
                  'Borough Latitude', 
                  'Borough Longitude'
                  'Boroughs', 
                  'Venues', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    location_boroughs
    return(nearby_boroughs)

In [190]:
#Run the above function on each location and create a new dataframe called location_boroughs and display it.
location_boroughs = getNearbyBoroughs(names=df_london['Boroughs'],
                                   latitudes=df_london['Latitude'],
                                   longitudes=df_london['Longitude']
                                  )

Barking and Dagenham
Barnet
Bexley
Brent
Bromley
Camden
Croydon
Ealing
Enfield
Greenwich
Hackney
Hammersmith and Fulham
Haringey
Harrow
Havering
Hillingdon
Hounslow
Islington
Kingston upon Thames
Lambeth
Lewisham
Merton
Newham
Redbridge
Richmond upon Thames
Southwark
Sutton
Tower Hamlets
Waltham Forest
Wandsworth
Westminster
London
England
United Kingdom


In [191]:
location_boroughs

Unnamed: 0,Boroughs,Borough Latitude,Borough LongitudeBoroughs,Venues,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,51.554117,0.150504,Tesco Express,51.551536,0.152784,Grocery Store
1,Barking and Dagenham,51.554117,0.150504,Connor Road Bus Stop,51.554345,0.147162,Bus Stop
2,Barking and Dagenham,51.554117,0.150504,Oglethorpe Road Bus Stop,51.555221,0.147136,Bus Stop
3,Barking and Dagenham,51.554117,0.150504,Five Elms Off Licence,51.553878,0.145531,Liquor Store
4,Barking and Dagenham,51.554117,0.150504,Post office,51.551411,0.155003,Convenience Store
5,Barnet,51.653090,-0.200226,Ye Old Mitre Inne,51.652940,-0.199507,Pub
6,Barnet,51.653090,-0.200226,Joie de Vie,51.653659,-0.201288,Bakery
7,Barnet,51.653090,-0.200226,Caffè Nero,51.654861,-0.201743,Coffee Shop
8,Barnet,51.653090,-0.200226,The Black Horse,51.653075,-0.206719,Pub
9,Barnet,51.653090,-0.200226,Waterstones,51.655368,-0.202607,Bookstore


In [192]:
df_london

Unnamed: 0,Boroughs,Income-2016,Latitude,Longitude
1,Barking and Dagenham,27886,51.554117,0.150504
2,Barnet,33443,51.65309,-0.200226
3,Bexley,34350,39.969238,-82.936864
4,Brent,29812,32.937346,-87.164718
5,Bromley,37682,51.402805,0.014814
6,Camden,39796,39.94484,-75.119891
7,Croydon,32696,51.371305,-0.101957
8,Ealing,31331,51.512655,-0.305195
9,Enfield,31603,51.652085,-0.081018
10,Greenwich,32415,51.482084,-0.004542


In [193]:
df_london.shape

(34, 4)

In [197]:
df_london.dtypes

Boroughs        object
Income-2016     object
Latitude       float64
Longitude      float64
dtype: object

In [199]:
london_grouped = df_london.groupby('Boroughs').mean().reset_index()
london_grouped

Unnamed: 0,Boroughs,Latitude,Longitude
0,Barking and Dagenham,51.554117,0.150504
1,Barnet,51.65309,-0.200226
2,Bexley,39.969238,-82.936864
3,Brent,32.937346,-87.164718
4,Bromley,51.402805,0.014814
5,Camden,39.94484,-75.119891
6,Croydon,51.371305,-0.101957
7,Ealing,51.512655,-0.305195
8,Enfield,51.652085,-0.081018
9,England,52.795479,-0.54024


In [201]:
kclusters = 5

london_grouped_clustering = london_grouped.drop('Boroughs', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 1, 1, 0, 2, 0, 0, 0, 0], dtype=int32)

In [212]:
london_grouped=df_london
london_grouped.head(10)

Unnamed: 0,Boroughs,Income-2016,Latitude,Longitude
1,Barking and Dagenham,27886,51.554117,0.150504
2,Barnet,33443,51.65309,-0.200226
3,Bexley,34350,39.969238,-82.936864
4,Brent,29812,32.937346,-87.164718
5,Bromley,37682,51.402805,0.014814
6,Camden,39796,39.94484,-75.119891
7,Croydon,32696,51.371305,-0.101957
8,Ealing,31331,51.512655,-0.305195
9,Enfield,31603,51.652085,-0.081018
10,Greenwich,32415,51.482084,-0.004542


By analyzing income and type and size of boroughs, we were able to show the best places to live in London. Certain places have a higher standard of living, and therefore better conditions and safety, while poorer places have quite insecure and difficult living conditions. Since I have no venues, but boroughs I just tried to bring the clusters and comparison closer.