# Window functions

General Elections were held in the UK in 2015 and 2017. Every citizen votes in a constituency. The candidate who gains the most votes becomes MP for that constituency.

All these results are recorded in a table ge

yr	| firstName	| lastName	| constituency	| party	| votes
---:|-----------|-----------|---------------|-------|------:
2015	| Ian	| Murray	| S14000024	| Labour	| 19293
2015	| Neil	| Hay	| S14000024	| Scottish National Party	| 16656
2015	| Miles	| Briggs	| S14000024	| Conservative | 8626
2015	| Phyl	| Meyer	| S14000024	| Green	| 2090
2015	| Pramod	| Subbaraman	| S14000024	| Liberal Democrat	| 1823
2015	| Paul	| Marshall	| S14000024	| UK Independence Party	 | 601
2015	| Colin	| Fox	| S14000024	| Scottish Socialist Party	| 197
2017	| Ian	| MURRAY	| S14000024	| Labour	| 26269
2017	| Jim	| EADIE	| S14000024	| SNP	| 10755
2017	| Stephanie Jane Harley	| SMITH	| S14000024	| Conservative	| 9428
2017	| Alan Christopher	| BEAL	| S14000024	| Liberal Democrats	| 1388


In [1]:
import os
import pandas as pd
import findspark
os.environ['SPARK_HOME'] =  '/opt/spark'
findspark.init()

from pyspark.sql import SparkSession
sc = (SparkSession.builder.appName('app09-')
      .config('spark.sql.warehouse.dir', 'hdfs://quickstart.cloudera:8020/user/hive/warehouse')
      .config('hive.metastore.uris', 'thrift://quickstart.cloudera:9083')
      .enableHiveSupport().getOrCreate())

## 1. Warming up

Show the **lastName, party** and **votes** for the **constituency** 'S14000024' in 2017.

In [2]:
ge = sc.read.table('sqlzoo.ge')

In [3]:
(ge.filter((ge['constituency']=='S14000024') & (ge['yr']==2017))
    .select('lastname', 'party', 'votes')
    .toPandas())

Unnamed: 0,lastname,party,votes
0,BEAL,Liberal Democrats,1388
1,MURRAY,Labour,26269
2,EADIE,SNP,10755
3,SMITH,Conservative,9428


## 2. Who won?

You can use the RANK function to see the order of the candidates. If you RANK using (ORDER BY votes DESC) then the candidate with the most votes has rank 1.

f**Show the party and RANK for constituency S14000024 in 2017. List the output by party**

In [4]:
from pyspark.sql.functions import *
from pyspark.sql import Window
(ge.filter((ge['constituency']=='S14000024') & (ge['yr']==2017))
 .select('party', 'votes')
 .withColumn('rank', rank().over(Window.orderBy(col('votes').desc())))
 .orderBy('party')
 .toPandas())

Unnamed: 0,party,votes,rank
0,Conservative,9428,3
1,Labour,26269,1
2,Liberal Democrats,1388,4
3,SNP,10755,2


## 3. PARTITION BY

The 2015 election is a different PARTITION to the 2017 election. We only care about the order of votes for each year.

**Use PARTITION to show the ranking of each party in S14000021 in each year. Include yr, party, votes and ranking (the party with the most votes is 1).**

In [5]:
(ge.filter(ge['constituency']=='S14000021')
 .withColumn('posn', rank().over(
     Window.partitionBy('yr').orderBy(col('votes').desc())))
 .select('yr', 'party', 'votes', 'posn')
 .orderBy('party', 'yr')
 .toPandas())

Unnamed: 0,yr,party,votes,posn
0,2015,Conservative,12465,3
1,2017,Conservative,21496,1
2,2019,Conservative,19451,2
3,2015,Labour,19295,2
4,2017,Labour,14346,2
5,2019,Labour,6855,3
6,2015,Liberal Democrats,1069,4
7,2017,Liberal Democrats,1112,3
8,2019,Liberal Democrats,4174,4
9,2015,SNP,23013,1


## 4. Edinburgh Constituency

Edinburgh constituencies are numbered S14000021 to S14000026.

**Use PARTITION BY constituency to show the ranking of each party in Edinburgh in 2017. Order your results so the winners are shown first, then ordered by constituency.**

In [6]:
(ge.filter((ge['constituency'].between('S14000021', 'S14000026')) &
       (ge['yr']==2017))
 .withColumn('posn', rank().over(
     Window.partitionBy('constituency').orderBy(col('votes').desc())))
 .select('constituency', 'party', 'votes', 'posn')
 .orderBy('posn', 'constituency')
 .toPandas())

Unnamed: 0,constituency,party,votes,posn
0,S14000021,Conservative,21496,1
1,S14000022,SNP,18509,1
2,S14000023,SNP,19243,1
3,S14000024,Labour,26269,1
4,S14000025,SNP,17575,1
5,S14000026,Liberal Democrats,18108,1
6,S14000021,Labour,14346,2
7,S14000022,Labour,15084,2
8,S14000023,Labour,17618,2
9,S14000024,SNP,10755,2


## 5. Winners Only

You can use [SELECT within SELECT](https://sqlzoo.net/wiki/SELECT_within_SELECT_Tutorial) to pick out only the winners in Edinburgh.

**Show the parties that won for each Edinburgh constituency in 2017.**

In [7]:
(ge.filter((ge['constituency'].between('S14000021', 'S14000026')) & (ge['yr']==2017))
 .withColumn('posn', rank().over(Window.partitionBy('constituency').orderBy(col('votes').desc())))
 .filter(col('posn')==1)
 .select('constituency', 'party')
 .orderBy('constituency')
 .toPandas())

Unnamed: 0,constituency,party
0,S14000021,Conservative
1,S14000022,SNP
2,S14000023,SNP
3,S14000024,Labour
4,S14000025,SNP
5,S14000026,Liberal Democrats


## 6. Scottish seats

You can use **COUNT** and **GROUP BY** to see how each party did in Scotland. Scottish constituencies start with 'S'

**Show how many seats for each party in Scotland in 2017.**

In [8]:
(ge.filter((ge['constituency'].startswith('S')) & (ge['yr']==2017))
 .withColumn('posn', rank().over(
     Window.partitionBy('constituency').orderBy(col('votes').desc())))
 .filter(col('posn')==1)
 .groupBy('party')
 .count()
 .toPandas())

Unnamed: 0,party,count
0,SNP,34
1,Labour,9
2,Conservative,12
3,Liberal Democrats,4


In [9]:
sc.stop()