### Introduction

In [1]:
import spanish_elections as sp # should already be installed

### Reading November 2019 Election Data

Notice that the codebook for all the data is in generate_data/README.md. In this section, I will simply use the shortcut functions within the `load_data` module to read the data.

#### Extract general data about provinces

First, I wish to extract some general data about each province, such as its name and the number of seats allocated in the November 2019 elections. We can do that by invoking the `load_general_data` function:

In [2]:
general_data = \
sp.load_data.load_general_data()

general_data.head()

Unnamed: 0,comunidad,código de provincia,provincia,población,número de mesas,censo electoral sin cera,censo cera,total censo electoral,solicitudes voto cera aceptadas,total votantes cer,total votantes cera,total votantes,votos válidos,votos a candidaturas,votos en blanco,votos nulos,diputados
0,Andalucía,4,Almería,709340,809,460639,41988,502627,2923,303481,1933,305414,302424,299763,2661,2990,6
1,Andalucía,11,Cádiz,1238714,1520,973238,29057,1002295,3485,621965,2230,624195,616079,606858,9221,8116,9
2,Andalucía,14,Córdoba,785240,935,630033,18308,648341,2358,450124,1651,451775,444376,438971,5405,7399,6
3,Andalucía,18,Granada,912075,1100,704847,50160,755007,4827,487734,3168,490902,482779,478251,4528,8123,7
4,Andalucía,21,Huelva,519932,650,391497,7522,399019,914,254766,563,255329,250681,247336,3345,4648,5


Where the key columns are `provincia` (province) and `diputados` (seats per province). In the future, I may add a dictionary translating all the column names together with documentation on what each variable means. 

Anyhow, let us now check the dimensions of this dataframe:

In [3]:
general_data.shape

(52, 17)

It has 52 rows: one per province in Spain.

#### Extract results of November 2019 elections

With this purpose, we will use a different function, `load_results`

In [4]:
results = \
sp.load_data.load_results('votes')

results.head()

Unnamed: 0,provincia,party,votes
0,Almería,PSOE,89295
1,Cádiz,PSOE,188271
2,Córdoba,PSOE,146761
3,Granada,PSOE,160190
4,Huelva,PSOE,91656


Here we see that each row provides the following information:
- provincia: province
- party: political party
- votes: # of votes obtained by each political party in each province

In [5]:
results.shape

(3484, 3)

We can already find out how many political parties presented themselves to the general election in November 2019:

In [6]:
results.shape[0] // 52  # 52 is the number of provinces in Spain

67

There were 67 political parties taking part in the event.

In the notebook `explore_the_data`, I will walk you through some interesting random facts that we can already gather from this data, such as the most-populated provinces in Spain or where did a party gain the highest percentage of casted votes.

### Simulate the Results of the Spanish Elections

In this section, we want to obtain the number of seats obtained by each political party in each province from voting data (of course, it would be trivial to find it directly from the dataframe `results_by_province` as it already has this information!)

To simulate the results, we only need to call `dhondt_rule_long`. The name of this function is no coincidence: [d'Hondt rule](https://en.wikipedia.org/wiki/D%27Hondt_method) is the rule used to allocate seats at the province level, and long is the shape of the results dataframe (as it does not have the political parties in the columns).

But to allocate seats per province, we need to know how many seats correspond to each province. We can do that for the November 2019 elections by calling:

In [7]:
seats_per_prov = sp.load_data.load_seats_by_province()

Now we can call the function that runs the D'Hondt rule. But let us first have a look at its signature:

In [8]:
help(sp.dhondt_rule_long)

Help on function dhondt_rule_long in module spanish_elections.dhondt_rule:

dhondt_rule_long(results:pandas.core.frame.DataFrame, n_seats:dict, province_col='province', party_col='party', votes_col='votes', seats_col='seats') -> pandas.core.frame.DataFrame
    Runs `dhondt_rule_long_single province` for each province in results dataframe.
    
    Never works in place.



We see that the function can be adapted to run with a dataframe with different column names. In our case, most default values are correct (e.g. votes_col is "votes"), except for province, so we will let the method know that the province column is actually named "provincia":

In [9]:
results_with_seats = sp.dhondt_rule_long(results, seats_per_prov, province_col='provincia')

We appreciate a number of changes:
1. The order of results has been modified. This is due to an internal call to pandas groupby
2. Most importantly, two additional columns have been added: `total_seats` and `seats`. `total_seats` corresponds to the total number of seats per province and `seats` to the number of seats per political party in that province.

### Aggregate results

We can use the `sp.summarise.agg_results` to aggregate all the seats of the elections

In [10]:
agg_results = sp.summarise.agg_results(results_with_seats,
                                       result_col='seats')
agg_results

party
BNG                       1
CCa-PNC-NC                2
CUP-PR                    2
Cs                       10
EAJ-PNV                   6
ECP-GUANYEM EL CANVI      7
EH Bildu                  5
ERC-SOBIRANISTES         13
JxCAT-JUNTS               8
MÁS PAÍS-EQUO             2
MÉS COMPROMÍS             1
NA+                       2
PODEMOS-EU                2
PODEMOS-IU               26
PP                       89
PRC                       1
PSOE                    120
VOX                      52
¡TERUEL EXISTE!           1
Name: seats, dtype: int64

We can check that the results are correct by loading the seats from the actual elections:

In [11]:
actual_results = sp.load_data.load_results(type='seats')
actual_results = sp.summarise.agg_results(actual_results,
                                         result_col='seats')
actual_results

party
BNG                       1
CCa-PNC-NC                2
CUP-PR                    2
Cs                       10
EAJ-PNV                   6
ECP-GUANYEM EL CANVI      7
EH Bildu                  5
ERC-SOBIRANISTES         13
JxCAT-JUNTS               8
MÁS PAÍS-EQUO             2
MÉS COMPROMÍS             1
NA+                       2
PODEMOS-EU                2
PODEMOS-IU               26
PP                       89
PRC                       1
PSOE                    120
VOX                      52
¡TERUEL EXISTE!           1
Name: seats, dtype: int64

In [12]:
(actual_results == agg_results).all()

True

We got them! In the next section, we will check how to use blocks to see how merging political parties would lead to different results in the elections :)