# How does London vote?

by  Sofia Faqir
***
<p><a href="#Problem"> 1. Problem and Background </a></p>
<p><a href="#Data"> 2. Data gathering </a></p>
<p><a href="#Use"> 3. How to use the data? </a></p>

***

***
<h2 id="Problem">1. Problem and Background</h2>

British people have been called to the polling booths many times in recent years: Brexit Referendum, general elections, early general elections, mayoral elections, local elections etc. Voter fatigue has been increasing, which makes it even more important to understand where it is worth spending more energy (and money) canvassing and campaigning.

The Greater London area has 73 parliamentary constituencies, while the UK as a whole has 650 constituencies. Wards constitute the “building blocks” (according to “Boundary Commission for England”).
Understanding what drives a constituency to vote for a certain party can be helpful in many ways.
First, the boundaries of constituencies are regularly reviewed and amended to be fairer and more equal. This is a subjective measure, and should be tested against a range of innovative ways to make sure the governing party is not taking through boundaries more favourable to them.
Second, campaign optimization can be empowered by data science. This is particularly relevant because of the rising costs of campaigns (and the multiplication of votes…).

Here, the question that I will ask is:
Using the results for the General Elections of 2017, can the data on venues in a specific constituency help predict how they will vote?

This can be opened up to ward level, for local or parliamentary elections, as well as for other big questions like Brexit.
It can also be extended to the rest of the UK.


***
<h2 id="Data">2. Data gathering</h2>

The British government has clearly made a big effort in transparency and opened many sources of data that are readily available and easily accessible. 
However, the data needed a fair amount of reworking to make it usable for my objective.

For the purpose of this project, I will need a set of data: 

1. List of constituencies and the party that they have elected: 

The data on the General Elections of 2017 is available on London Datastore, which "has been created by the Greater London Authority (GLA) as a first step towards freeing London’s data":
https://data.london.gov.uk/download/general-election-results-2017/26ee40ae-becf-4839-bb0c-509024e61bfd/2017%20General%20Election%20Results.xls

2. List of wards belonging to each constituency:

I need the ward level to increase the granularity when compiling the list of venues in a constituency. Constituencies are fairly large areas, and hence would require more work to drill down the data.

The better way is to use Ordnance Survey Open Data, but this was too involved for this.
I resorted to scraping the website: https://www.electoralcalculus.co.uk/ which had the list of wards for each constituency, and more data…

3. The coordinates of each ward.

The coordinates for all the wards in England are recorded here:
http://geoportal.statistics.gov.uk/datasets/07194e4507ae491488471c84b23a90f2_0
It included the ward code, the ward name, the longitude and latitude of the ward.

However, the same name applies to different wards (in different constituencies). Luckily, within a constituency, ward names were unique, so the couple constituency/ward was in fact unique. To get the list of ward codes, I resorted to using the data here & process it further in excel:
https://data.london.gov.uk/download/excel-mapping-template-for-london-boroughs-and-wards/58f59b22-946e-43e9-96fd-c0a4fa27f76a/Mapping-template-for-London-boroughs.xls

4. The venues that surround each ward, and hence the venues in a certain constituency.

We will be calling the FourSquare API for this purpose.

At the end of the data filtering I will have the following data:
* Constituency
* Ward
* Unique Couple: Constiuency/Ward
* Ward coordinates
* Venues: in the relevant ward, together with coordinates, and venue category.
* Party elected at the last General elections

***
***
<h2 id="Use">3. How to use the data?</h2>

I will be using classification techniques, like clustering or decision trees on a subset of data.
I will then check how good my results are, and if there is a clear voting pattern.

There are a few avenues that I will be exploring, for example:
* Number of venues in any given constituency
* Clustering as a function of the venues in the constituency

***

## 4. Methodology

### 4.1. Data gathering and preprocessing

As previously mentioned, a fair amount of data is readily and freely available on Internet, from reliable governmental sources. 
It was however challenging to sift through all the websites, and assemble data with differing formats.

**a. The election results per constituency:**

The data in the csv file was for the whole of England, and included election results within each constituency at party level. 
I limited the selection to London region, and to the winning party. I dropped all quantitative data on the election results.
I obtained the following: 

<img src='df_elect2.PNG'>

**b. The list of wards per constituency:**

This require scraping the website https://www.electoralcalculus.co.uk/.
Extract of the table that I put together:

<img src='Ward per Constit.PNG'>

After checking the data, it seems that ward names are not  sufficient to determine the location since there are duplicates.

I worked on excel and I used the data in the below file to get the **geocode of each ward**, so that I can get the longitude/latitude from a file including all the coordinates of all the wards in the UK.
https://londondatastore-upload.s3.amazonaws.com/dataset/excel-mapping-template-for-london-boroughs-and-wards/Mapping-template-london-ward-map-2014.xls

I put together a **csv file** with that information, and also decreased the number of wards in the city london which was disproportionately large, to avoid skewing the results too much.

The file is: 'GeoCode_wards_Constit.csv'

<img src='Ward per Constit2.PNG'>

**c. Getting the coordinates of each ward**

The list of coordinates for all the wards using the geocode were available on this website:
https://opendata.arcgis.com/datasets/07194e4507ae491488471c84b23a90f2_0.csv

With further processing, the coordinates of each ward per constituency was ready:

<img src='Ldn ward coord.PNG'>

**d. The venues from FourSquare**

I downloaded from FourSquare the venues surrounding each ward.
Limit was 100 venues, in a radius of 500m.

I only kept the venue name, its category and its coordinates.

The resulting table included 12310 rows and was as follows:

<img src='venues.PNG'>

***
***

### 4.2. Data Visualization


To get a better sense of what we are playing with, it is nice to visualize the data in maps: 

**Map showing the wards, each constituency has a different color**

<img src='map_wards_constit.PNG'>

**Map showing the wards,red is Labour, blue is Tory, yellow is Lib Dem**

<img src='map_wards_party.PNG'>



## References and sources:

London Data Store: https://data.london.gov.uk/

Electoral Calculus, by Martin Baxter: https://www.electoralcalculus.co.uk/

Open Geography Portal, Office for National Statistics : http://geoportal.statistics.gov.uk/

The London Datastore, by the Greater London Authority (GLA) : https://data.london.gov.uk/

Boundary Commission for England: https://boundarycommissionforengland.independent.gov.uk/