### Introduction

In [1]:
import spanish_elections as sp_elect # should already be installed

### Reading November 2019 Election Data

Notice that the official source of the results for the November 2019 Spanish general elections is located in the relative path `../data/input/PROV_02_201911_1.xlsx`. In future versions of this package, maybe I collect data of several elections and prepare it beforehand. Then, these functions will only need to read the prepared data. Therefore, this part of the package is likely to be subject to many changes in the future.

In [None]:
official_file_location = '../data/input/PROV_02_201911_1.xlsx'

#### Extract general data about provinces

First, I wish to extract some general data about each province, such as its name and the number of seats allocated in the November 2019 elections. We can do that by invoking the `extract_general_data` function:

In [5]:
general_data = \
sp_elect.extract_general_data(official_file_location)

general_data.head()

Unnamed: 0,comunidad,código de provincia,provincia,población,número de mesas,censo electoral sin cera,censo cera,total censo electoral,solicitudes voto cera aceptadas,total votantes cer,total votantes cera,total votantes,votos válidos,votos a candidaturas,votos en blanco,votos nulos,diputados
0,Andalucía,4,Almería,709340,809,460639,41988,502627,2923,303481,1933,305414,302424,299763,2661,2990,6
1,Andalucía,11,Cádiz,1238714,1520,973238,29057,1002295,3485,621965,2230,624195,616079,606858,9221,8116,9
2,Andalucía,14,Córdoba,785240,935,630033,18308,648341,2358,450124,1651,451775,444376,438971,5405,7399,6
3,Andalucía,18,Granada,912075,1100,704847,50160,755007,4827,487734,3168,490902,482779,478251,4528,8123,7
4,Andalucía,21,Huelva,519932,650,391497,7522,399019,914,254766,563,255329,250681,247336,3345,4648,5


Where the key columns are `provincia` (province) and `diputados` (seats per province). In the future, I may add a dictionary translating all the column names together with an explanation of what each variable means. 

Anyhow, let us now check the dimensions of this dataframe:

In [6]:
general_data.shape

(52, 17)

It has 52 rows: one per province in Spain.

#### Extract results of November 2019 elections

With this purpose, we will use a different function, `extract_results_by_province`

In [8]:
results = \
sp_elect.extract_results_by_province(official_file_location)

results.head()

Unnamed: 0,provincia,party,result,value
0,Almería,PSOE,votos,89295
1,Cádiz,PSOE,votos,188271
2,Córdoba,PSOE,votos,146761
3,Granada,PSOE,votos,160190
4,Huelva,PSOE,votos,91656


Here we see that each row provides the following information:
- provincia: province
- party: political party
- result: can either be 'votos' (ballots) or 'diputados' (seats)
- value: result obtained by each political party in each province

In the notebook `explore_the_data`, I will walk you through some interesting random facts that we can already gather from this data, such as the most-populated provinces in Spain or where did a party gain the highest percentage of casted votes.

### Simulate the Results of the Spanish Elections

In this section, we want to obtain the number of seats obtained by each political party in each province from voting data (of course, it would be trivial to find it directly from the dataframe `results_by_province` as it already has this information!)

In [None]:
votes = results[results.result == 'votos']
votes.shape

We can already find out how many political parties presented themselves to the general election in November 2019:

In [10]:
3484 // 52

67

There were 67 political parties taking part in the event.

To simulate the results, we only need to call `dhondt_rule_long`. The name of this function is no coincidence: 