# Study Catalan Elections Dataset

Load libraries:

In [22]:
import pandas as pd

Load the dataset:

In [23]:
df = pd.read_csv('../../data/catalan-elections-data.csv')

  df = pd.read_csv('../../data/data.csv')


Visualize the structure of the dataset:

In [28]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17518284 entries, 0 to 17518283
Data columns (total 21 columns):
 #   Column                   Dtype  
---  ------                   -----  
 0   index_autonumeric        int64  
 1   id_eleccio               object 
 2   nom_eleccio              object 
 3   id_nivell_territorial    object 
 4   nom_nivell_territorial   object 
 5   territori_codi           object 
 6   territori_nom            object 
 7   secci_                   float64
 8   candidatura_codi         int64  
 9   candidatura_denominacio  object 
 10  candidatura_sigles       object 
 11  vots                     int64  
 12  escons                   float64
 13  candidatura_color        object 
 14  candidatura_logotip      object 
 15  districte                float64
 16  mesa                     object 
 17  candidat_posicio         float64
 18  agrupacio_codi           float64
 19  agrupacio_denominacio    object 
 20  agrupacio_sigles         object 
dtypes: flo

| Column name               | Description                                            | Type      |
|---------------------------|--------------------------------------------------------|-----------|
| INDEX_AUTONUMERIC         | Autonumeric index identifier for the row               | Plain Text|
| ID_ELECCIO                | Identifier of the election (Type+Year+Sequential)      | Plain Text|
| NOM_ELECCIO               | Name of the electoral process                          | Plain Text|
| ID_NIVELL_TERRITORIAL     | Identifier of the territorial level (Municipality, Vegueria, County...) | Plain Text|
| NOM_NIVELL_TERRITORIAL    | Name of the territorial level of the record (Municipality, County...) | Plain Text|
| TERRITORI_CODI            | Territory code                                         | Plain Text|
| TERRITORI_NOM             | Name of the territory                                  | Plain Text|
| DISTRICTE                 | Electoral district                                     | Plain Text|
| SECCIÓ                    | Electoral section                                      | Plain Text|
| MESA                      | Electoral table                                        | Plain Text|
| CANDIDATURA_CODI          | Code of the candidacy                                  | Plain Text|
| CANDIDATURA_DENOMINACIO   | Name of the candidacy                                  | Plain Text|
| CANDIDATURA_SIGLES        | Acronym of the candidacy                               | Plain Text|
| CANDIDAT_POSICIO          | Position of the candidate in the list                  | Plain Text|
| AGRUPACIO_CODI            | Code of the group of candidacies                       | Plain Text|
| AGRUPACIO_DENOMINACIO     | Name of the group of candidacies                       | Plain Text|
| AGRUPACIO_SIGLES          | Acronym of the group of candidacies                    | Plain Text|
| VOTS                      | Votes of the candidacy                                 | Number    |
| ESCONS                    | Seats of the candidacy                                 | Number    |
| CANDIDATURA_COLOR         | Color of the candidacy                                 | Plain Text|
| CANDIDATURA_LOGOTIP       | Logo of the candidacy                                  | Plain Text|

Divide `id_eleccio` into `type`, `year` and `sequential`:

In [36]:
df['type'] = df['id_eleccio'].str[:1]
df['year'] = df['id_eleccio'].str[1:5].astype(int)
df['sequential'] = df['id_eleccio'].str[5:]

Show the types of elections:

In [43]:
types = df[['type', 'nom_eleccio']].groupby(['type']).first()
print(types)
print(len(types))

                                   nom_eleccio
type                                          
A     Eleccions al Parlament de Catalunya 1980
C          Eleccions a Consells Comarcals 1987
D     Eleccions a Diputacions Provincials 2007
E          Eleccions al Parlament Europeu 1987
G                    Eleccions al Congrés 1979
M                    Eleccions Municipals 1979
S                      Eleccions al Senat 1993
V     Eleccions al Consell General d'Aran 1991
8


Now we know that the dataset contains data from 8 different types of elections:

| Type | Election Type Name                          |
|------|---------------------------------------------|
| A    | Elections to the Parliament of Catalonia    |
| C    | Elections to the County Councils            |
| D    | Elections to the Provincial Councils        |
| E    | Elections to the European Parliament        |
| G    | Elections to the Congress                   |
| M    | Municipal Elections                         |
| S    | Elections to the Senate                     |
| V    | Elections to the General Council of Aran    |

We only want data from the elections to the Parliament of Catalonia ('A'), municipal elections ('M'), elections to the European Parliament ('E') and elections to the Congress ('G'), so we filter the dataset:

In [45]:
df = df[df['type'].isin(['M', 'E', 'A', 'G'])]

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12339330 entries, 0 to 12359469
Data columns (total 24 columns):
 #   Column                   Dtype  
---  ------                   -----  
 0   index_autonumeric        int64  
 1   id_eleccio               object 
 2   nom_eleccio              object 
 3   id_nivell_territorial    object 
 4   nom_nivell_territorial   object 
 5   territori_codi           object 
 6   territori_nom            object 
 7   secci_                   float64
 8   candidatura_codi         int64  
 9   candidatura_denominacio  object 
 10  candidatura_sigles       object 
 11  vots                     int64  
 12  escons                   float64
 13  candidatura_color        object 
 14  candidatura_logotip      object 
 15  districte                float64
 16  mesa                     object 
 17  candidat_posicio         float64
 18  agrupacio_codi           float64
 19  agrupacio_denominacio    object 
 20  agrupacio_sigles         object 
 21  type       

Display the different elections in the dataset and count them:

In [48]:
elections = df[['type', 'year', 'sequential', 'nom_eleccio']].drop_duplicates().sort_values(['year'])
print(elections)

print(len(elections))

         type  year sequential                               nom_eleccio
5765157     G  1977          1                 Eleccions al Congrés 1977
10920114    M  1979          1                 Eleccions Municipals 1979
2266502     G  1979          1                 Eleccions al Congrés 1979
0           A  1980          1  Eleccions al Parlament de Catalunya 1980
2849748     G  1982          1                 Eleccions al Congrés 1982
11078000    M  1983          1                 Eleccions Municipals 1983
172580      A  1984          1  Eleccions al Parlament de Catalunya 1984
7842303     G  1986          1                 Eleccions al Congrés 1986
1544        E  1987          1       Eleccions al Parlament Europeu 1987
11114132    M  1987          1                 Eleccions Municipals 1987
388258      A  1988          1  Eleccions al Parlament de Catalunya 1988
762184      E  1989          1       Eleccions al Parlament Europeu 1989
8111467     G  1989          1                 Elec

Count the number of candidatures:

In [26]:
candidatures = df.candidatura_denominacio.unique()
print(candidatures)

print(len(candidatures))

['Conservadors de Catalunya' 'Partit Socialista Unificat de Catalunya'
 'Partit dels Socialistes de Catalunya (PSC-PSOE)' ...
 'Coalició Republicano Socialista' 'Nación y Revolución'
 'Entesa Catalana de Progrés (PSC-ERC-ICV-EUiA)']
7441


Check the number of candidatures per election:

In [29]:
elections_num_candidatures = df.groupby('nom_eleccio').candidatura_denominacio.nunique().sort_values(ascending=False)
print(elections_num_candidatures)

nom_eleccio
Eleccions Municipals 2019                   2228
Eleccions Municipals 2015                   1095
Eleccions Municipals 2011                    940
Eleccions Municipals 2007                    818
Eleccions Municipals 1979                    665
                                            ... 
Eleccions al Consell General d'Aran 1995       5
Eleccions al Consell General d'Aran 2015       5
Eleccions al Consell General d'Aran 2007       4
Eleccions al Consell General d'Aran 2011       4
Eleccions al Consell General d'Aran 1991       4
Name: candidatura_denominacio, Length: 85, dtype: int64
