## Business Problem

Despite London's depiction as a global city full of opportunity for all (Taylor et al., 2010) the quality of living experience is highly spatially segregated (Higgins et al., 2014). This is a result of the borough system that comprises London. In particular, each borough is controlled by different political parties, and hence local tax rates vary, tax expenditure varies and subsequently the quality of facilities and the types of facility can differ. Pair this with the high property prices in London, and the distinct lack of time available for Londoners to visit each prospective area (Jarvis, 2005), and one can see that the pressures of selecting a place to live in London can be intense. 

As such, this project aims firstly to compare neighbourhoods within London based on their facilities and characteristics (i.e. number of and type of restaurants, schools, gyms etc.) using Foursquare data. Subsequently,the current average house price for each area will be added to the data. This will allow an individual to identify a neighbourhood that they would like to live in based on the quality of living, as proxied by types of facilities, and then compare similar neighbourhoods based on the cost of housing. 


### Audience

My target audience for this project is anyone who is looking to rent, buy, move to, or live in London. 



## Data required

To tackle this project there are three key pieces of data that I require. 

#### 1. Identifying London Neighbourhoods

I will use the data available for London Boroughs as my Neighbourhoods. To add further granularity to our exploration I will also add details surrounding the political control of the borough and the approximate population of the borough to the dataframe. Whilst this information is at the borough level, rather than the neighbourhood level, it will be useful to the target audience as it can provide insight surrounding council tax, quality of schooling, and potential overcrowding. This information can be found here:https://en.wikipedia.org/wiki/List_of_London_boroughs 


#### 2. Finding similar Neighbourhoods using Foursquare data
Next I will require data from Foursquare to be able to identify the facilities that are present within each neighbourhood. This is available via the foursquare API. 

I will be using this data to explore, for example, the number of schools available in the area, the number of restaurants etc. 

#### 3. Historical and Average House Prices
Finally, I will require the house prices for each borough which can be found below. 

Average house prices over time (by borough):
https://data.london.gov.uk/dataset/average-house-prices

## Methodology
**Data cleaning**

Because the large majority of the data had to be scraped from web pages, it was necessary to clean the data so that it could be processed. In particular the majority of the work had to be done on the London Borough data. 

As the London Borough data was read in from a wikipedia table, there were a number of references that had been read in. These inhibited the data from being read easily and as such had to be removed. Further, the co-ordinates data was read in as a string that contained much irrelevant information. Therefore, this data was cleared of un-necessary information and any blank spaces. 

All the data was then read into dataframes and merged appropriately on the Borough name to produce a master dataframe that could be analysed. The foursquare data was then applied to each borough to produce an understanding of the types of facilities present in each borough. 

**Data types**

For both the house price data and the London Borough data, all of the information was read in as objects. In order to be able to perform any data analysis, particularly for the house price data, these data needed to be converted either to int64 or float64. Therefore the .convert_objects method was utilised to ensure that the correct data type was used. 

When considering the political control of each borough, the data further needed to be converted into categorical and numerical form. This meant converting each political party into either 1, 2, or 3. This was achieved by creating a dictionary to contain the conversion data, and iterating through the list of boroughs.  

**Exploratory data analysis: Boroughs**

With the fourquare data combined to the master dataframe I was able to produce a summary of the top 10 types of facilities for each borough. This gave the opportunity to better understand what was present in the boroughs, prior to clustering them. This data was then read into a dataframe so that the clustering analysis could be performed on only the most popular facilities, rather than the 100 in total which had been read in for each location. 

In order to better understand and be able to compare neighbourhoods, a clustering analysis using K-means was applied to the dataframe for the top 10 facilities in each borough. This was then mapped using different colours to represent the clusters for an easy understanding of the results. In doing so, it also provided a contextual understanding of the relationships between the boroughs. For example, their distances from each other, and their physical location within London as a whole. 

**Understanding House Prices**

In the second part of the project it was hypothesised that the political control of a borough could impact the house prices. This was largely due to the inference that political spending in the borough could change the quality of facilities and hence impact prices. 

In order to better explore this, three methods of data exploration were used. Firstly the average house price for each borough was mapped in a choropleth map. This was selected to visually represent the comparison of prices easily. Secondly, a boxplot was plotted to explore the range of prices present as split by political control. Finally, a correlation was run to find the R^2 value for political control and price. 

![Correlation.PNG](attachment:Correlation.PNG)



## Results and Discussion 

**Cluster analysis**

![Clusters.png](attachment:Clusters.png)

|Cluster  | Boroughs    |  Defining facilities  |
|:-------:|:-----------:|:---------------------:|
|1|Barnet, Enfield, Haringey|Turkish, Greek and Mediterranean restaurants|
|2|Camden, Kensington and Chelsea, Westminster|Hotels, Art Gallery, Theaters, Cocktail Bar|
|3|Kingston upon Thames, Merton, Tower Hamlets, Waltham Forest|Pubs, Cafe's, Sushi Bars|
|4|Bromley, Croydon, Redbridge, Sutton|Pubs, Grocery stores, Gyms, Indian Restaurants|
|5|Brent, Harrow|Indian Restaurants, Portuguese Restaurants, Gyms|
|6|Hackney, Hammersmith and Fulham, Lambeth, Wandsworth|Parks, Gastropub, Cafes, Gym, Cocktail Bar|
|7|Ealing, Greenwich, Newham|Pubs, Hotels, Historic Sites, International Food|
|8|Barking and Dagenham, Bexley, Havering, Hillingdon|Grocery Stores, Fast Food restaurants, Clothing Stores|
|9|Islington, Southwark|Hotels, Scenic Views, Streetfood, Cafes|
|10|Hounslow, Lewisham, Richmond upon Thames| Pub, Parks, Rugby Stadiums, Asian Restaurants|

Interestingly, by analysing each cluster of boroughs it became clear that there were in fact distinct trends that set each apart from the other. For example, cluster 2 was clearly defined by the presence of typically higher end facilities such as hotels, cocktail bars and theaters. Comparatively cluster 10 was largely dominated by parks, pubs and rugby stadiums. In these two examples alone it can be understood that the facilities of a borough can distinctly influence the feel of the neighbourhood. 

Analysing the spatial distribution of these clusters it becomes apparent that there is a loose geographical connection to them. When considering clusters such as number 2 this makes sense as galleries and theaters often dominate similar areas in cities. Further, hotels would want to be situated in these areas as they can attract more tourists and therefore also charge higher prices. Further, the different clusters seem to move further out from the city center, and one does not find that a dominating characteristic of an inner London borough matches one on the outer edges. This adds credence to the notion that geographical location influences the types of facilities that are present in each borough. It also highlights the varied feel of each borough from the next, which is important for those looking to buy properties in London. 


**House Prices**

![Choropleth.png](attachment:Choropleth.png)

As expected of any city, the choropleth map of average house prices follows the general theory that the further from the city center, the lower the price of property. Further, it can be observed that there is a distinct difference between the East of London and the West of London, with the West, and South West in particular, garnering higher property prices. Those clusters in the West of London were more defined by the presence of higher end restaurants, more entertainment areas and the like, whereas the east was defined more by international cuisines and grocery stores. Though further investigation needs to be done into this, it becomes apparent that certain types of facilities attract higher property prices. 

![Boxplot.png](attachment:Boxplot.png)

Despite the correlation analysis highlighting a weak relationship between political control and house prices, the boxplot can still give us some good insight. It shows that the average house price in London is very similar irrespective of political control. However, we can see clearly that houses in Conservative boroughs have a much wider range as well as the most expensive housing on offer. One can also see that the two anomalous price points for a Labour area are still within the 3rd quartile of the Conservative prices. Thus suggesting that houses within Conservative zones can be much more expensive. Perhaps this can be linked to the difference in political thought whereby Labour encourages social mobility whilst Conservative thought favours the free market. It could also be the case that those who vote for the conservatives tend to be more wealthy and as such live in more expensive areas, thereby influencing these results. However, with the current data we have we are not able to conclude this.  

## Conclusion

Taking the results and analysis from above, it can be concluded that as someone who likes the borough of Southwark I should consider Islington as a borough due to its similarities in facilities. However, when layering in the average house price data for these boroughs it can be concluded that Southwark is on average cheaper and therefore more suitable to buy in. 

This exploration has also shown that there is a clear East vs West divide within London based on both average house price, as well as facilities available in each borough. Further, it can be concluded that the geography of a borough has a great influence on its characteristics. As the exploration of the types of facilities present in each cluster showed, the usage of each area i.e. central London as a cultural region, and West London as a sports hub, also influences the surrounding areas. 

However, this analysis has shown that there are further pieces of data that could be included in future research so that more granularity can be extracted. For example, population data for each borough could be added so that factors such as average income, ethnicity, and age could be explored in conjunction with the clusters. 