# Modelación del territorio utilizando PCA

En este cuaderno llevamos a cabo una modelización de un territorio, en este caso toda España, mediante PCA en las elecciones de noviembre de 2019. 

En el cuaderno anterior nos basamos en las secciones elegidas de la provincia de Zaragoza para modelizar esta provincia. Ahora seleccionaremos las secciones a partir de una corta serie de municipios; es decir, estaremos siendo bastante restrictivos desde el principio. Algunos municipios estarán situados en CCAA con partidos regionalistas o nacionalistas, para intentar captar las particuparidades del voto en estos territorios. 

El método hasta aplicar la PCA es exactamente el mismo que el que utilizamos en el cuaderno de regresión lineal, por lo que no nos detendremos en nuestros comentarios tan detalladamente como en aquel. También aplicaremos el modelo obtenido para ver lo bien que es capaz de predecir, con las secciones equivalentes en junio de 2016, los resultados de esa elección.

Comenzamos cargando las librerías y el dataset de noviembre de 2019.

In [1]:
import pandas as pd
import numpy as np
import random

In [2]:
import boto3

BUCKET_NAME = 'electomedia' 

# sustituir por credenciales de acceso. 
s3 = boto3.resource('s3', aws_access_key_id = 'xxxxxxxxxxxx', 
                          aws_secret_access_key= 'xxxxxxxxxxxxxxxxxxx')

In [3]:
import botocore.exceptions

KEY = 'datos-elecciones-generales-unificados/gen_N19_unif_cols_prov_copia.txt' 

try:
    s3.Bucket(BUCKET_NAME).download_file(KEY, 'gen_N19_unif_cols_prov_copia.txt')
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

In [4]:
strings = {'Sección' : 'str', 'cod_ccaa' : 'str', 'cod_prov' : 'str', 'cod_mun' : 'str', 'cod_sec' : 'str'}

In [5]:
df_eleccion_comp = pd.read_csv('gen_N19_unif_cols_prov_copia.txt', dtype = strings)

In [6]:
df_eleccion_comp

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,...,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
0,022019111010400101001,01,04,04001,0400101001,Andalucía,Almería,Abla,1002,717,...,20172.0,19546.0,5574.0,4833.0,3286.0,3082.0,403.0,471.0,"{'PP': 193, 'PSOE': 310, 'Cs': 47, 'UP': 30, '...","[('PSOE', 310), ('PP', 193), ('VOX', 122), ('C..."
1,022019111010400201001,01,04,04002,0400201001,Andalucía,Almería,Abrucena,1013,711,...,17841.0,17115.0,4640.0,4048.0,3418.0,2770.0,568.0,620.0,"{'PP': 111, 'PSOE': 349, 'Cs': 45, 'UP': 42, '...","[('PSOE', 349), ('VOX', 147), ('PP', 111), ('C..."
2,022019111010400301001,01,04,04003,0400301001,Andalucía,Almería,Adra,667,484,...,26498.0,24688.0,5121.0,4795.0,2499.0,2301.0,337.0,333.0,"{'PP': 176, 'PSOE': 128, 'Cs': 15, 'UP': 34, '...","[('PP', 176), ('PSOE', 128), ('VOX', 116), ('U..."
3,022019111010400301002,01,04,04003,0400301002,Andalucía,Almería,Adra,1306,909,...,25677.0,23400.0,5381.0,4837.0,1815.0,1724.0,343.0,464.0,"{'PP': 251, 'PSOE': 220, 'Cs': 51, 'UP': 58, '...","[('VOX', 312), ('PP', 251), ('PSOE', 220), ('U..."
4,022019111010400301003,01,04,04003,0400301003,Andalucía,Almería,Adra,1551,975,...,22051.0,19687.0,5224.0,4044.0,1170.0,1198.0,416.0,476.0,"{'PP': 292, 'PSOE': 202, 'Cs': 73, 'UP': 52, '...","[('VOX', 327), ('PP', 292), ('PSOE', 202), ('C..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36297,022019111195200108011,19,52,52001,5200108011,Melilla,Melilla,Melilla,1638,1021,...,66352.0,62632.0,11378.0,11119.0,1508.0,1274.0,167.0,166.0,"{'PP': 303, 'PSOE': 140, 'Cs': 30, 'UP': 28, '...","[('Otros', 348), ('PP', 303), ('VOX', 158), ('..."
36298,022019111195200108012,19,52,52001,5200108012,Melilla,Melilla,Melilla,1676,1057,...,50730.0,50839.0,13272.0,13038.0,2763.0,2445.0,169.0,177.0,"{'PP': 463, 'PSOE': 205, 'Cs': 36, 'UP': 35, '...","[('PP', 463), ('VOX', 210), ('PSOE', 205), ('O..."
36299,022019111195200108013,19,52,52001,5200108013,Melilla,Melilla,Melilla,1132,638,...,37816.0,36729.0,10102.0,9640.0,1807.0,1615.0,234.0,252.0,"{'PP': 208, 'PSOE': 113, 'Cs': 31, 'UP': 25, '...","[('PP', 208), ('VOX', 144), ('PSOE', 113), ('O..."
36300,022019111195200108014,19,52,52001,5200108014,Melilla,Melilla,Melilla,899,527,...,29898.0,31384.0,5923.0,6061.0,2463.0,2136.0,244.0,284.0,"{'PP': 200, 'PSOE': 87, 'Cs': 13, 'UP': 12, 'I...","[('PP', 200), ('VOX', 126), ('PSOE', 87), ('Ot..."


Seleccionamos las secciones que vamos a modelizar, que son las de toda España.

In [7]:
ccaa_mod = []

provincia_mod = []

municipio_mod = []

secciones_mod = df_eleccion_comp

In [8]:
if len(ccaa_mod) > 0:
    secciones_mod = secciones_mod.loc[secciones_mod['CCAA'].isin(ccaa_mod)]

if len(provincia_mod) > 0:
    secciones_mod = secciones_mod.loc[secciones_mod['Provincia'].isin(provincia_mod)]

if len(municipio_mod) > 0:
    secciones_mod = secciones_mod.loc[secciones_mod['Municipio'].isin(municipio_mod)]



In [9]:
secciones_mod

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,...,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
0,022019111010400101001,01,04,04001,0400101001,Andalucía,Almería,Abla,1002,717,...,20172.0,19546.0,5574.0,4833.0,3286.0,3082.0,403.0,471.0,"{'PP': 193, 'PSOE': 310, 'Cs': 47, 'UP': 30, '...","[('PSOE', 310), ('PP', 193), ('VOX', 122), ('C..."
1,022019111010400201001,01,04,04002,0400201001,Andalucía,Almería,Abrucena,1013,711,...,17841.0,17115.0,4640.0,4048.0,3418.0,2770.0,568.0,620.0,"{'PP': 111, 'PSOE': 349, 'Cs': 45, 'UP': 42, '...","[('PSOE', 349), ('VOX', 147), ('PP', 111), ('C..."
2,022019111010400301001,01,04,04003,0400301001,Andalucía,Almería,Adra,667,484,...,26498.0,24688.0,5121.0,4795.0,2499.0,2301.0,337.0,333.0,"{'PP': 176, 'PSOE': 128, 'Cs': 15, 'UP': 34, '...","[('PP', 176), ('PSOE', 128), ('VOX', 116), ('U..."
3,022019111010400301002,01,04,04003,0400301002,Andalucía,Almería,Adra,1306,909,...,25677.0,23400.0,5381.0,4837.0,1815.0,1724.0,343.0,464.0,"{'PP': 251, 'PSOE': 220, 'Cs': 51, 'UP': 58, '...","[('VOX', 312), ('PP', 251), ('PSOE', 220), ('U..."
4,022019111010400301003,01,04,04003,0400301003,Andalucía,Almería,Adra,1551,975,...,22051.0,19687.0,5224.0,4044.0,1170.0,1198.0,416.0,476.0,"{'PP': 292, 'PSOE': 202, 'Cs': 73, 'UP': 52, '...","[('VOX', 327), ('PP', 292), ('PSOE', 202), ('C..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36297,022019111195200108011,19,52,52001,5200108011,Melilla,Melilla,Melilla,1638,1021,...,66352.0,62632.0,11378.0,11119.0,1508.0,1274.0,167.0,166.0,"{'PP': 303, 'PSOE': 140, 'Cs': 30, 'UP': 28, '...","[('Otros', 348), ('PP', 303), ('VOX', 158), ('..."
36298,022019111195200108012,19,52,52001,5200108012,Melilla,Melilla,Melilla,1676,1057,...,50730.0,50839.0,13272.0,13038.0,2763.0,2445.0,169.0,177.0,"{'PP': 463, 'PSOE': 205, 'Cs': 36, 'UP': 35, '...","[('PP', 463), ('VOX', 210), ('PSOE', 205), ('O..."
36299,022019111195200108013,19,52,52001,5200108013,Melilla,Melilla,Melilla,1132,638,...,37816.0,36729.0,10102.0,9640.0,1807.0,1615.0,234.0,252.0,"{'PP': 208, 'PSOE': 113, 'Cs': 31, 'UP': 25, '...","[('PP', 208), ('VOX', 144), ('PSOE', 113), ('O..."
36300,022019111195200108014,19,52,52001,5200108014,Melilla,Melilla,Melilla,899,527,...,29898.0,31384.0,5923.0,6061.0,2463.0,2136.0,244.0,284.0,"{'PP': 200, 'PSOE': 87, 'Cs': 13, 'UP': 12, 'I...","[('PP', 200), ('VOX', 126), ('PSOE', 87), ('Ot..."


A continuación sumamos los resultados de las secciones de España, normalizamos, y creamos la columna que será el vector 'y' en el modelo.

In [10]:
secciones_mod_lista = list(secciones_mod['Sección']) 

In [11]:
cols_validas_mod = ['Censo_Esc', 'Votos_Total', 'Nulos', 'Votos_Válidos', 'Blanco', 'V_Cand', 'PP', 'PSOE', 'Cs', 'UP',
       'IU', 'VOX', 'UPyD', 'MP', 'CiU', 'ERC', 'JxC', 'CUP', 'DiL', 'PNV',
       'Bildu', 'Amaiur', 'CC', 'FA', 'TE', 'BNG', 'PRC', 'GBai', 'Compromis',
       'PACMA', 'Otros']

In [12]:
secciones_mod = secciones_mod[cols_validas_mod]

In [13]:
secciones_mod

Unnamed: 0,Censo_Esc,Votos_Total,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,...,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros
0,1002,717,7,710,3,707,193,310,47,30,...,0,0,0,0,0,0,0,0,3,2
1,1013,711,12,699,1,698,111,349,45,42,...,0,0,0,0,0,0,0,0,2,2
2,667,484,7,477,5,472,176,128,15,34,...,0,0,0,0,0,0,0,0,3,0
3,1306,909,3,906,5,901,251,220,51,58,...,0,0,0,0,0,0,0,0,6,3
4,1551,975,12,963,9,954,292,202,73,52,...,0,0,0,0,0,0,0,0,5,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36297,1638,1021,3,1018,11,1007,303,140,30,28,...,0,0,0,0,0,0,0,0,0,348
36298,1676,1057,9,1048,2,1046,463,205,36,35,...,0,0,0,0,0,0,0,0,0,97
36299,1132,638,5,633,4,629,208,113,31,25,...,0,0,0,0,0,0,0,0,0,108
36300,899,527,4,523,0,523,200,87,13,12,...,0,0,0,0,0,0,0,0,0,85


In [14]:
censo_mod = secciones_mod['Censo_Esc'].sum()

In [15]:
censo_mod

34871714

In [16]:
modelizacion = pd.DataFrame(secciones_mod.sum(), columns = ['Modelización'])

In [17]:
modelizacion['Modelización'] = modelizacion['Modelización'] / modelizacion['Modelización']['Censo_Esc']

Este será la columna y del modelo, una vez que quitemos la fila del censo, que siempre es 1 por construcción:

In [18]:
modelizacion

Unnamed: 0,Modelización
Censo_Esc,1.0
Votos_Total,0.698612
Nulos,0.007127
Votos_Válidos,0.691485
Blanco,0.006201
V_Cand,0.685284
PP,0.146826
PSOE,0.193633
Cs,0.046953
UP,0.088816


In [19]:
modelizacion = modelizacion.drop(['Censo_Esc']) 

In [20]:
modelizacion

Unnamed: 0,Modelización
Votos_Total,0.698612
Nulos,0.007127
Votos_Válidos,0.691485
Blanco,0.006201
V_Cand,0.685284
PP,0.146826
PSOE,0.193633
Cs,0.046953
UP,0.088816
IU,0.0


In [21]:
modelizacion.shape

(30, 1)

Ahora definimos las provinvias de donde seleccionaremos las secciones. Escogemos algunos muncipios pequeños de provincias o CCAA donde se presentaron partidos regionalistas o nacionalistas.

In [22]:
ccaa_select = []

provincia_select = []

municipio_select = ['Reus', 'Eibar', 'Laredo', 'Teruel', 'Paterna', 'Telde', 'Calatayud', 'Lucena', 'Verín']

secciones_select = df_eleccion_comp

In [23]:
if len(ccaa_select) > 0:
    secciones_select = secciones_select.loc[secciones_select['CCAA'].isin(ccaa_select)]

if len(provincia_select) > 0:
    secciones_select = secciones_select.loc[secciones_select['Provincia'].isin(provincia_select)]

if len(municipio_select) > 0:
    secciones_select = secciones_select.loc[secciones_select['Municipio'].isin(municipio_select)]



In [24]:
secciones_select

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,...,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
1791,022019111011403801001,01,14,14038,1403801001,Andalucía,Córdoba,Lucena,913,699,...,29436.0,28262.0,6487.0,5455.0,3588.0,3238.0,225.0,238.0,"{'PP': 263, 'PSOE': 97, 'Cs': 65, 'UP': 53, 'I...","[('PP', 263), ('VOX', 205), ('PSOE', 97), ('Cs..."
1792,022019111011403801002,01,14,14038,1403801002,Andalucía,Córdoba,Lucena,1023,725,...,22534.0,20780.0,5473.0,4532.0,2351.0,2333.0,353.0,482.0,"{'PP': 157, 'PSOE': 171, 'Cs': 79, 'UP': 46, '...","[('VOX', 237), ('PSOE', 171), ('PP', 157), ('C..."
1793,022019111011403801003,01,14,14038,1403801003,Andalucía,Córdoba,Lucena,1697,1218,...,19944.0,18935.0,5166.0,3985.0,2004.0,1940.0,378.0,483.0,"{'PP': 259, 'PSOE': 357, 'Cs': 124, 'UP': 126,...","[('PSOE', 357), ('VOX', 299), ('PP', 259), ('U..."
1794,022019111011403801004,01,14,14038,1403801004,Andalucía,Córdoba,Lucena,1872,1366,...,20709.0,19069.0,6228.0,4956.0,1207.0,1194.0,434.0,483.0,"{'PP': 233, 'PSOE': 339, 'Cs': 158, 'UP': 124,...","[('VOX', 456), ('PSOE', 339), ('PP', 233), ('C..."
1795,022019111011403801005,01,14,14038,1403801005,Andalucía,Córdoba,Lucena,844,647,...,28191.0,20660.0,5770.0,4814.0,3693.0,3470.0,258.0,295.0,"{'PP': 217, 'PSOE': 102, 'Cs': 54, 'UP': 67, '...","[('PP', 217), ('VOX', 179), ('PSOE', 102), ('U..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35281,022019111174619001044,17,46,46190,4619001044,La Rioja,Valencia,Paterna,1178,930,...,33791.0,33686.0,12562.0,12689.0,555.0,610.0,230.0,356.0,"{'PP': 172, 'PSOE': 202, 'Cs': 140, 'UP': 101,...","[('VOX', 219), ('PSOE', 202), ('PP', 172), ('C..."
35282,022019111174619001045,17,46,46190,4619001045,La Rioja,Valencia,Paterna,691,517,...,32873.0,27525.0,14561.0,11718.0,345.0,628.0,237.0,357.0,"{'PP': 101, 'PSOE': 101, 'Cs': 71, 'UP': 78, '...","[('PP', 101), ('PSOE', 101), ('VOX', 79), ('UP..."
35283,022019111174619001046,17,46,46190,4619001046,La Rioja,Valencia,Paterna,1194,789,...,29076.0,26104.0,10740.0,9220.0,718.0,983.0,247.0,424.0,"{'PP': 170, 'PSOE': 158, 'Cs': 83, 'UP': 130, ...","[('PP', 170), ('VOX', 166), ('PSOE', 158), ('U..."
35284,022019111174619001047,17,46,46190,4619001047,La Rioja,Valencia,Paterna,1260,965,...,36736.0,33819.0,13832.0,12580.0,580.0,798.0,274.0,263.0,"{'PP': 199, 'PSOE': 200, 'Cs': 132, 'UP': 105,...","[('VOX', 230), ('PSOE', 200), ('PP', 199), ('C..."


Comprobamos que tenemos las secciones de los nueve municipios.

In [25]:
secciones_select['Municipio'].unique()

array(['Lucena', 'Teruel', 'Calatayud', 'Telde', 'Laredo', 'Reus',
       'Verín', 'Eibar', 'Paterna'], dtype=object)

Nos quedamos con las secciones de más de 500 censados;  en este caso casi todas superan este límite.

In [26]:
secciones_select = secciones_select.loc[secciones_select['Censo_Esc'] > 500]

In [27]:
secciones_select

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,...,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
1791,022019111011403801001,01,14,14038,1403801001,Andalucía,Córdoba,Lucena,913,699,...,29436.0,28262.0,6487.0,5455.0,3588.0,3238.0,225.0,238.0,"{'PP': 263, 'PSOE': 97, 'Cs': 65, 'UP': 53, 'I...","[('PP', 263), ('VOX', 205), ('PSOE', 97), ('Cs..."
1792,022019111011403801002,01,14,14038,1403801002,Andalucía,Córdoba,Lucena,1023,725,...,22534.0,20780.0,5473.0,4532.0,2351.0,2333.0,353.0,482.0,"{'PP': 157, 'PSOE': 171, 'Cs': 79, 'UP': 46, '...","[('VOX', 237), ('PSOE', 171), ('PP', 157), ('C..."
1793,022019111011403801003,01,14,14038,1403801003,Andalucía,Córdoba,Lucena,1697,1218,...,19944.0,18935.0,5166.0,3985.0,2004.0,1940.0,378.0,483.0,"{'PP': 259, 'PSOE': 357, 'Cs': 124, 'UP': 126,...","[('PSOE', 357), ('VOX', 299), ('PP', 259), ('U..."
1794,022019111011403801004,01,14,14038,1403801004,Andalucía,Córdoba,Lucena,1872,1366,...,20709.0,19069.0,6228.0,4956.0,1207.0,1194.0,434.0,483.0,"{'PP': 233, 'PSOE': 339, 'Cs': 158, 'UP': 124,...","[('VOX', 456), ('PSOE', 339), ('PP', 233), ('C..."
1795,022019111011403801005,01,14,14038,1403801005,Andalucía,Córdoba,Lucena,844,647,...,28191.0,20660.0,5770.0,4814.0,3693.0,3470.0,258.0,295.0,"{'PP': 217, 'PSOE': 102, 'Cs': 54, 'UP': 67, '...","[('PP', 217), ('VOX', 179), ('PSOE', 102), ('U..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35281,022019111174619001044,17,46,46190,4619001044,La Rioja,Valencia,Paterna,1178,930,...,33791.0,33686.0,12562.0,12689.0,555.0,610.0,230.0,356.0,"{'PP': 172, 'PSOE': 202, 'Cs': 140, 'UP': 101,...","[('VOX', 219), ('PSOE', 202), ('PP', 172), ('C..."
35282,022019111174619001045,17,46,46190,4619001045,La Rioja,Valencia,Paterna,691,517,...,32873.0,27525.0,14561.0,11718.0,345.0,628.0,237.0,357.0,"{'PP': 101, 'PSOE': 101, 'Cs': 71, 'UP': 78, '...","[('PP', 101), ('PSOE', 101), ('VOX', 79), ('UP..."
35283,022019111174619001046,17,46,46190,4619001046,La Rioja,Valencia,Paterna,1194,789,...,29076.0,26104.0,10740.0,9220.0,718.0,983.0,247.0,424.0,"{'PP': 170, 'PSOE': 158, 'Cs': 83, 'UP': 130, ...","[('PP', 170), ('VOX', 166), ('PSOE', 158), ('U..."
35284,022019111174619001047,17,46,46190,4619001047,La Rioja,Valencia,Paterna,1260,965,...,36736.0,33819.0,13832.0,12580.0,580.0,798.0,274.0,263.0,"{'PP': 199, 'PSOE': 200, 'Cs': 132, 'UP': 105,...","[('VOX', 230), ('PSOE', 200), ('PP', 199), ('C..."


Llevamos a cabo el mismo proceso que en el cuaderno de la regresión lineal. Normalizamos inicialmente todas las secciones a la espera de encontrar las que serán válidas.

In [28]:
col_validas_select = ['Sección', 'Censo_Esc', 'Votos_Total', 'Nulos', 'Votos_Válidos', 'Blanco', 'V_Cand', 'PP', 'PSOE', 'Cs', 'UP',
       'IU', 'VOX', 'UPyD', 'MP', 'CiU', 'ERC', 'JxC', 'CUP', 'DiL', 'PNV',
       'Bildu', 'Amaiur', 'CC', 'FA', 'TE', 'BNG', 'PRC', 'GBai', 'Compromis',
       'PACMA', 'Otros']

In [29]:
secciones_select = secciones_select[col_validas_select]

In [30]:
secciones_select

Unnamed: 0,Sección,Censo_Esc,Votos_Total,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,...,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros
1791,022019111011403801001,913,699,9,690,3,687,263,97,65,...,0,0,0,0,0,0,0,0,3,1
1792,022019111011403801002,1023,725,15,710,13,697,157,171,79,...,0,0,0,0,0,0,0,0,3,4
1793,022019111011403801003,1697,1218,19,1199,14,1185,259,357,124,...,0,0,0,0,0,0,0,0,10,10
1794,022019111011403801004,1872,1366,18,1348,17,1331,233,339,158,...,0,0,0,0,0,0,0,0,8,13
1795,022019111011403801005,844,647,13,634,4,630,217,102,54,...,0,0,0,0,0,0,0,0,9,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35281,022019111174619001044,1178,930,6,924,12,912,172,202,140,...,0,0,0,0,0,0,0,0,12,2
35282,022019111174619001045,691,517,1,516,6,510,101,101,71,...,0,0,0,0,0,0,0,0,4,11
35283,022019111174619001046,1194,789,4,785,7,778,170,158,83,...,0,0,0,0,0,0,0,0,13,7
35284,022019111174619001047,1260,965,1,964,10,954,199,200,132,...,0,0,0,0,0,0,0,0,6,4


In [31]:
secciones_select_norm = secciones_select.copy()

In [32]:
secciones_select_norm

Unnamed: 0,Sección,Censo_Esc,Votos_Total,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,...,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros
1791,022019111011403801001,913,699,9,690,3,687,263,97,65,...,0,0,0,0,0,0,0,0,3,1
1792,022019111011403801002,1023,725,15,710,13,697,157,171,79,...,0,0,0,0,0,0,0,0,3,4
1793,022019111011403801003,1697,1218,19,1199,14,1185,259,357,124,...,0,0,0,0,0,0,0,0,10,10
1794,022019111011403801004,1872,1366,18,1348,17,1331,233,339,158,...,0,0,0,0,0,0,0,0,8,13
1795,022019111011403801005,844,647,13,634,4,630,217,102,54,...,0,0,0,0,0,0,0,0,9,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35281,022019111174619001044,1178,930,6,924,12,912,172,202,140,...,0,0,0,0,0,0,0,0,12,2
35282,022019111174619001045,691,517,1,516,6,510,101,101,71,...,0,0,0,0,0,0,0,0,4,11
35283,022019111174619001046,1194,789,4,785,7,778,170,158,83,...,0,0,0,0,0,0,0,0,13,7
35284,022019111174619001047,1260,965,1,964,10,954,199,200,132,...,0,0,0,0,0,0,0,0,6,4


In [33]:
set_cols = ['Sección', 'Censo_Esc']

In [34]:
for col in secciones_select_norm.columns:

  if col not in set_cols:
    
    secciones_select_norm[col] = secciones_select_norm[col] / secciones_select_norm['Censo_Esc']

secciones_select_norm = secciones_select_norm.set_index('Sección')
secciones_select_norm = secciones_select_norm.drop('Censo_Esc', axis = 1)

secciones_select_norm = secciones_select_norm.T

In [35]:
secciones_select_norm

Sección,022019111011403801001,022019111011403801002,022019111011403801003,022019111011403801004,022019111011403801005,022019111011403801006,022019111011403802001,022019111011403802002,022019111011403802003,022019111011403802004,...,022019111174619001039,022019111174619001040,022019111174619001041,022019111174619001042,022019111174619001043,022019111174619001044,022019111174619001045,022019111174619001046,022019111174619001047,022019111174619001048
Votos_Total,0.765608,0.7087,0.717737,0.729701,0.766588,0.713405,0.697704,0.67842,0.70041,0.665417,...,0.304175,0.791696,0.77459,0.73357,0.784703,0.789474,0.748191,0.660804,0.765873,0.749107
Nulos,0.009858,0.014663,0.011196,0.009615,0.015403,0.01849,0.006378,0.012694,0.009362,0.02343,...,0.002982,0.002111,0.004098,0.006217,0.008499,0.005093,0.001447,0.00335,0.000794,0.001786
Votos_Válidos,0.75575,0.694037,0.706541,0.720085,0.751185,0.694915,0.691327,0.665726,0.691047,0.641987,...,0.301193,0.789585,0.770492,0.727353,0.776204,0.78438,0.746744,0.657454,0.765079,0.747321
Blanco,0.003286,0.012708,0.00825,0.009081,0.004739,0.004622,0.005102,0.004231,0.007607,0.003749,...,0.003976,0.008445,0.004918,0.002664,0.011331,0.010187,0.008683,0.005863,0.007937,0.008036
V_Cand,0.752464,0.681329,0.698291,0.711004,0.746445,0.690293,0.686224,0.661495,0.683441,0.638238,...,0.297217,0.78114,0.765574,0.724689,0.764873,0.774194,0.738061,0.651591,0.757143,0.739286
PP,0.288061,0.15347,0.152622,0.124466,0.257109,0.177196,0.193878,0.138223,0.147455,0.093721,...,0.030815,0.166784,0.17459,0.095915,0.19169,0.14601,0.146165,0.142379,0.157937,0.176786
PSOE,0.106243,0.167155,0.210371,0.18109,0.120853,0.181818,0.167092,0.165021,0.190755,0.221181,...,0.143141,0.147783,0.14918,0.219361,0.152975,0.171477,0.146165,0.132328,0.15873,0.1375
Cs,0.071194,0.077224,0.07307,0.084402,0.063981,0.046225,0.05102,0.060649,0.061439,0.051546,...,0.00497,0.076003,0.117213,0.076377,0.114259,0.118846,0.10275,0.069514,0.104762,0.098214
UP,0.05805,0.044966,0.074249,0.066239,0.079384,0.069337,0.053571,0.054302,0.063195,0.066542,...,0.042744,0.132301,0.081967,0.118117,0.067044,0.085739,0.11288,0.108878,0.083333,0.085714
IU,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Ahora definimos las dos funciones. La primera elimina los registros de los partidos que no se presentaron en esas provincias, y hace la trasposición y la normalización, y la segunda, que en base a la matriz de correlación va eliminando las que son demasiado similares entre sí.

In [36]:
def preparacion_sec(eleccion):
    set_cols = ['Sección', 'Censo_Esc']
    
    for col in eleccion.columns:
        if eleccion[col].sum() == 0:
            eleccion = eleccion.drop([col], axis = 1)
        elif col not in set_cols:
            eleccion[col] = eleccion[col] / eleccion['Censo_Esc']
        
    eleccion = eleccion.set_index('Sección')
    eleccion = eleccion.drop('Censo_Esc', axis = 1)

    df_elec_transpose = eleccion.T

    lista_sec = list(df_elec_transpose.columns)
    random.shuffle(lista_sec)

    df_elec_transpose = df_elec_transpose[lista_sec]

    return df_elec_transpose


In [37]:
def secciones_corr(dummy, threshold = 0.995):
    for ind in range(2, m.shape[0]):
        s = m.iloc[0:ind, 0:ind]
        
        if max(s.iloc[ind-1, 0:ind-1] > threshold):
            # print(m.columns[ind-1])
            dummy = dummy.drop(m.columns[ind-1], axis = 0)
            dummy = dummy.drop(m.columns[ind-1], axis = 1)
        
    return dummy.columns


In [38]:
secc = preparacion_sec(secciones_select)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  eleccion[col] = eleccion[col] / eleccion['Censo_Esc']


In [39]:
secc

Sección,022019111094312307001,022019111025006702002,022019111024421603002,022019111063903501003,022019111025006701006,022019111024421603005,022019111053502602002,022019111053502606003,022019111113208501009,022019111094312302014,...,022019111053502603018,022019111094312305005,022019111094312303010,022019111094312306011,022019111024421602005,022019111174619001045,022019111113208501002,022019111174619001040,022019111053502603023,022019111053502606014
Votos_Total,0.409565,0.778311,0.730196,0.674004,0.700724,0.79703,0.573657,0.6,0.585835,0.779385,...,0.410667,0.590504,0.64813,0.787718,0.594123,0.748191,0.641855,0.791696,0.531401,0.567901
Nulos,0.002609,0.015355,0.001569,0.006289,0.009654,0.002829,0.003466,0.004918,0.010008,0.001808,...,0.008889,0.003956,0.005753,0.000758,0.000918,0.001447,0.014024,0.002111,0.008454,0.009877
Votos_Válidos,0.406957,0.762956,0.728627,0.667715,0.69107,0.794201,0.570191,0.595082,0.575828,0.777577,...,0.401778,0.586548,0.642378,0.78696,0.593205,0.746744,0.627832,0.789585,0.522947,0.558025
Blanco,0.00087,0.005758,0.003137,0.003145,0.013677,0.002829,0.008666,0.002459,0.003079,0.009042,...,0.003556,0.0,0.002876,0.003033,0.003673,0.008683,0.006472,0.008445,0.01087,0.003704
V_Cand,0.406087,0.757198,0.72549,0.66457,0.677393,0.791372,0.561525,0.592623,0.572748,0.768535,...,0.398222,0.586548,0.639501,0.783927,0.589532,0.738061,0.621359,0.78114,0.512077,0.554321
PP,0.046957,0.25144,0.159216,0.169811,0.207562,0.201556,0.117851,0.109016,0.198614,0.068716,...,0.073778,0.053412,0.078619,0.065959,0.096419,0.146165,0.231931,0.166784,0.061594,0.096296
PSOE,0.117391,0.21977,0.131765,0.144654,0.167337,0.145686,0.169844,0.172951,0.182448,0.162749,...,0.120889,0.167161,0.194631,0.108415,0.135904,0.146165,0.192017,0.147783,0.179952,0.17284
Cs,0.031304,0.064299,0.025098,0.028302,0.060338,0.026167,0.017331,0.054098,0.026944,0.0434,...,0.017778,0.03363,0.044104,0.036391,0.019284,0.10275,0.028047,0.076003,0.027778,0.028395
UP,0.034783,0.067179,0.018824,0.062893,0.049879,0.016973,0.084922,0.086885,0.04542,0.077758,...,0.086222,0.070227,0.074784,0.056861,0.02663,0.11288,0.058252,0.132301,0.099034,0.088889
VOX,0.117391,0.12572,0.086275,0.069182,0.149638,0.08133,0.091854,0.077869,0.066205,0.057866,...,0.036444,0.04451,0.099712,0.025019,0.078972,0.114327,0.072276,0.155524,0.074879,0.095062


Para ver cuán efectivo es el PCA hemos elegido un umbral poco exigente, por lo que obtenemos una lista de secciones distintas entre sí bastante amplio, unas 66.

In [40]:
m = secc.corr()
lista_sec = secciones_corr(m, 0.999)

In [41]:
lista_sec

Index(['022019111094312307001', '022019111025006702002',
       '022019111024421603002', '022019111063903501003',
       '022019111025006701006', '022019111053502602002',
       '022019111053502606003', '022019111113208501009',
       '022019111094312302014', '022019111011403802004',
       '022019111053502604003', '022019111011403803005',
       '022019111053502602011', '022019111053502603008',
       '022019111063903502003', '022019111094312309002',
       '022019111011403803007', '022019111025006703001',
       '022019111024421602001', '022019111174619001018',
       '022019111094312303001', '022019111174619001034',
       '022019111053502604001', '022019111094312301001',
       '022019111053502603012', '022019111011403803004',
       '022019111174619001037', '022019111174619001039',
       '022019111024421602004', '022019111011403805001',
       '022019111094312303011', '022019111142003004003',
       '022019111053502603005', '022019111025006701001',
       '022019111024421603004',

In [42]:
lista_sec.shape

(76,)

In [43]:
lista_sec = np.sort(lista_sec)

Ahora nos quedamos con las secciones antes mencionadas.

In [44]:
secciones_select_norm = secciones_select_norm[lista_sec]

In [45]:
secciones_select_norm

Sección,022019111011403801003,022019111011403801006,022019111011403802004,022019111011403803003,022019111011403803004,022019111011403803005,022019111011403803007,022019111011403804001,022019111011403805001,022019111011403805002,...,022019111174619001015,022019111174619001016,022019111174619001018,022019111174619001023,022019111174619001033,022019111174619001034,022019111174619001037,022019111174619001039,022019111174619001043,022019111174619001045
Votos_Total,0.717737,0.713405,0.665417,0.725291,0.764157,0.711735,0.753255,0.630907,0.740924,0.690541,...,0.687005,0.640807,0.630065,0.607973,0.338078,0.724638,0.805575,0.304175,0.784703,0.748191
Nulos,0.011196,0.01849,0.02343,0.010174,0.003331,0.01148,0.011719,0.007024,0.024752,0.012162,...,0.003962,0.003879,0.001307,0.001661,0.003559,0.004026,0.006272,0.002982,0.008499,0.001447
Votos_Válidos,0.706541,0.694915,0.641987,0.715116,0.760826,0.700255,0.741536,0.623883,0.716172,0.678378,...,0.683043,0.636928,0.628758,0.606312,0.33452,0.720612,0.799303,0.301193,0.776204,0.746744
Blanco,0.00825,0.004622,0.003749,0.001453,0.005996,0.01148,0.00651,0.005747,0.00495,0.006757,...,0.009509,0.002327,0.003922,0.006645,0.003559,0.002415,0.004878,0.003976,0.011331,0.008683
V_Cand,0.698291,0.690293,0.638238,0.713663,0.75483,0.688776,0.735026,0.618135,0.711221,0.671622,...,0.673534,0.6346,0.624837,0.599668,0.330961,0.718196,0.794425,0.297217,0.764873,0.738061
PP,0.152622,0.177196,0.093721,0.133721,0.263158,0.128827,0.175781,0.095147,0.133663,0.085135,...,0.083201,0.058185,0.120261,0.068106,0.030842,0.152174,0.242509,0.030815,0.19169,0.146165
PSOE,0.210371,0.181818,0.221181,0.25436,0.131246,0.161352,0.159505,0.113027,0.313531,0.347297,...,0.213946,0.240497,0.223529,0.262458,0.179122,0.171498,0.15331,0.143141,0.152975,0.146165
Cs,0.07307,0.046225,0.051546,0.063953,0.067288,0.070791,0.083333,0.05364,0.052805,0.054054,...,0.075277,0.032583,0.043137,0.021595,0.005931,0.068438,0.083624,0.00497,0.114259,0.10275
UP,0.074249,0.069337,0.066542,0.043605,0.0493,0.079719,0.069661,0.066411,0.051155,0.108108,...,0.156101,0.099302,0.095425,0.114618,0.035587,0.105475,0.088502,0.042744,0.067044,0.11288
IU,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [46]:
secciones_select_norm.shape

(30, 76)

Ahora importamos la librería de PCA.

In [47]:
from sklearn.decomposition import PCA

Definimos la matriz X, y la transformamos en array numpy, como de costumbre.

In [48]:
X = secciones_select_norm.values

In [49]:
X

array([[0.71773718, 0.71340524, 0.66541706, ..., 0.30417495, 0.78470255,
        0.74819103],
       [0.01119623, 0.01848998, 0.02343018, ..., 0.00298211, 0.00849858,
        0.00144718],
       [0.70654095, 0.69491525, 0.64198688, ..., 0.30119284, 0.77620397,
        0.74674385],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.00589275, 0.0046225 , 0.00374883, ..., 0.00994036, 0.01038716,
        0.00578871],
       [0.00589275, 0.01848998, 0.00468604, ..., 0.00397614, 0.00566572,
        0.01591896]])

In [50]:
X.shape

(30, 76)

Instanciamos el modelo PCA con 4 componentes.

In [51]:
pca = PCA(n_components = 4)

Llevamos a cabo el fit y el transform.

In [52]:
pca.fit(X)

PCA(n_components=4)

In [53]:
X_pca = pca.transform(X)

Como es de esperar, los primeros elementos del PCA tienen una proporción de la varianza mucho mayor que el resto.

In [54]:
print(pca.explained_variance_ratio_)

[0.97069316 0.00934672 0.00694305 0.00483519]


In [55]:
print(pca.singular_values_)

[9.39184547 0.92159404 0.79430119 0.66285248]


Definimos el vector 'y'

In [56]:
y = modelizacion['Modelización'].values

In [57]:
y

array([6.98612262e-01, 7.12735256e-03, 6.91484910e-01, 6.20127247e-03,
       6.85283637e-01, 1.46825935e-01, 1.93632983e-01, 4.69532699e-02,
       8.88163685e-02, 0.00000000e+00, 1.04393406e-01, 0.00000000e+00,
       1.70579513e-02, 0.00000000e+00, 2.49424505e-02, 1.51066850e-02,
       7.02380158e-03, 0.00000000e+00, 1.08254501e-02, 7.93006618e-03,
       0.00000000e+00, 3.55525972e-03, 0.00000000e+00, 5.64956457e-04,
       3.42962781e-03, 1.96663691e-03, 3.61955251e-04, 0.00000000e+00,
       6.48608784e-03, 5.41074637e-03])

Ahora cargamos la librería de linear regression de Sklearn...

In [58]:
from sklearn.linear_model import LinearRegression

...y hacemos el fit, obteniendo un fit del 99,9% tomando únicamente los 4 primeros componentes.

In [59]:
reg = LinearRegression(fit_intercept = True).fit(X_pca, y)

In [60]:
reg.score(X_pca, y)

0.9994289429762898

Ahora podemos hacerlo un poco más sofisticado, utilizando un pipeline que incluya un scaler, un PCA con 10 componentes, y una regresión lineal. Además podemos ver el coeficiente R2.

In [61]:
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

In [62]:
from sklearn.metrics import mean_squared_error

In [63]:
from sklearn.metrics import r2_score

Definimos el pipeline

In [64]:
pipe_modelado = make_pipeline(StandardScaler(), PCA(n_components = 10), LinearRegression(fit_intercept=True))


Hacemos el fit

In [65]:
pipe_modelado.fit(X=X, y=y)

Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=10)),
                ('linearregression', LinearRegression())])

Llevamos a cabo las predicciones con el metodo predict()

In [66]:
predicciones = pipe_modelado.predict(X=X)
predicciones = predicciones.flatten()

In [67]:
r2 = r2_score(
            y_true  = y,
            y_pred  = predicciones
           )

El coeficiente R2 resulta ser altísimo, del 99.996%.

In [68]:
r2

0.9999176507073542

Ahora podemos comparar las predicciones con los datos reales. Primero deshacemos la normalización multiplicando por el censo de España.

In [69]:
est = predicciones * censo_mod

In [70]:
df = pd.DataFrame(est, index = secciones_select_norm.index, columns = ['Estimación']).astype('int32')

In [71]:
df

Unnamed: 0,Estimación
Votos_Total,24371169
Nulos,244013
Votos_Válidos,24125286
Blanco,208514
V_Cand,23914900
PP,5039252
PSOE,6746729
Cs,1747112
UP,2859651
IU,-1870


In [72]:
df1 = pd.DataFrame(secciones_mod.sum(), columns = ['Real']).drop('Censo_Esc')

La comparación con los datos reales es bastante satisfactoria, pues las diferencias son muy pequeñas respecto a los datos reales. Hay que tener en cuenta que hemos seleccionado 66 secciones, tomadas a su vez de municipios escogidos arbitrariamente. 

In [73]:
df['Real'] = df1['Real']

In [74]:
df['pc Estimación'] = df['Estimación'] / df['Estimación'][2] * 100

In [75]:
df['pc Real'] = df['Real'] / df['Real'][2] * 100

In [76]:
df['dif. Real-Est.'] = df['pc Real'] - df['pc Estimación']

In [77]:
df

Unnamed: 0,Estimación,Real,pc Estimación,pc Real,dif. Real-Est.
Votos_Total,24371169,24361807,101.019192,101.030731,0.011539
Nulos,244013,248543,1.011441,1.030731,0.019291
Votos_Válidos,24125286,24113264,100.0,100.0,0.0
Blanco,208514,216249,0.864296,0.896805,0.032509
V_Cand,23914900,23897015,99.127944,99.103195,-0.024749
PP,5039252,5120072,20.887844,21.233426,0.345582
PSOE,6746729,6752314,27.965385,28.002489,0.037104
Cs,1747112,1637341,7.241829,6.790209,-0.45162
UP,2859651,3097179,11.853335,12.844296,0.990961
IU,-1870,0,-0.007751,0.0,0.007751


## Modelización en las elecciones de 2016

Ahora vamos a ver lo bien o mal que se ajusta el modelo del pipeline si lo aplicamos a las elecciones de junio de 2016, considerando las secciones equivalente a las 66 utilizadas para estimar el resultado de noviembre de 2019. 

Comenzamos cargando el dataset de equivalencia de las secciones.

In [78]:
import botocore.exceptions

KEY = 'datos-elecciones-generales-unificados/similitud_secciones_def_REF.csv' 

try:
    s3.Bucket(BUCKET_NAME).download_file(KEY, 'similitud_secciones_def_REF.csv')
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

In [79]:
sim_secciones = pd.read_csv('similitud_secciones_def_REF.csv', dtype = 'str')

Ahora seleccinamos las similares a las 66 secciones que encontramos en el capítulo anterior...

In [80]:
sec_select_J16 = sim_secciones.loc[sim_secciones['cod_sec_ref'].isin(lista_sec)]

In [81]:
sec_select_J16.shape

(76, 12)

... y escogemos sus equivalentes en las elecciones de 2016, que son estas 66:

In [82]:
list_sec_J16 = list(sec_select_J16['cercana J16_ref'])

In [83]:
list_sec_J16 = np.sort(list_sec_J16)

In [84]:
list_sec_J16.shape

(76,)

In [85]:
secciones_select_norm = secciones_select_norm[lista_sec]

In [86]:
secciones_select_norm

Sección,022019111011403801003,022019111011403801006,022019111011403802004,022019111011403803003,022019111011403803004,022019111011403803005,022019111011403803007,022019111011403804001,022019111011403805001,022019111011403805002,...,022019111174619001015,022019111174619001016,022019111174619001018,022019111174619001023,022019111174619001033,022019111174619001034,022019111174619001037,022019111174619001039,022019111174619001043,022019111174619001045
Votos_Total,0.717737,0.713405,0.665417,0.725291,0.764157,0.711735,0.753255,0.630907,0.740924,0.690541,...,0.687005,0.640807,0.630065,0.607973,0.338078,0.724638,0.805575,0.304175,0.784703,0.748191
Nulos,0.011196,0.01849,0.02343,0.010174,0.003331,0.01148,0.011719,0.007024,0.024752,0.012162,...,0.003962,0.003879,0.001307,0.001661,0.003559,0.004026,0.006272,0.002982,0.008499,0.001447
Votos_Válidos,0.706541,0.694915,0.641987,0.715116,0.760826,0.700255,0.741536,0.623883,0.716172,0.678378,...,0.683043,0.636928,0.628758,0.606312,0.33452,0.720612,0.799303,0.301193,0.776204,0.746744
Blanco,0.00825,0.004622,0.003749,0.001453,0.005996,0.01148,0.00651,0.005747,0.00495,0.006757,...,0.009509,0.002327,0.003922,0.006645,0.003559,0.002415,0.004878,0.003976,0.011331,0.008683
V_Cand,0.698291,0.690293,0.638238,0.713663,0.75483,0.688776,0.735026,0.618135,0.711221,0.671622,...,0.673534,0.6346,0.624837,0.599668,0.330961,0.718196,0.794425,0.297217,0.764873,0.738061
PP,0.152622,0.177196,0.093721,0.133721,0.263158,0.128827,0.175781,0.095147,0.133663,0.085135,...,0.083201,0.058185,0.120261,0.068106,0.030842,0.152174,0.242509,0.030815,0.19169,0.146165
PSOE,0.210371,0.181818,0.221181,0.25436,0.131246,0.161352,0.159505,0.113027,0.313531,0.347297,...,0.213946,0.240497,0.223529,0.262458,0.179122,0.171498,0.15331,0.143141,0.152975,0.146165
Cs,0.07307,0.046225,0.051546,0.063953,0.067288,0.070791,0.083333,0.05364,0.052805,0.054054,...,0.075277,0.032583,0.043137,0.021595,0.005931,0.068438,0.083624,0.00497,0.114259,0.10275
UP,0.074249,0.069337,0.066542,0.043605,0.0493,0.079719,0.069661,0.066411,0.051155,0.108108,...,0.156101,0.099302,0.095425,0.114618,0.035587,0.105475,0.088502,0.042744,0.067044,0.11288
IU,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Cargamos ahora los resultados de las elecciones de junio de 2016

In [87]:
import botocore.exceptions

KEY = 'datos-elecciones-generales-unificados/gen_J16_unif_cols_prov_copia.txt' 

try:
    s3.Bucket(BUCKET_NAME).download_file(KEY, 'gen_J16_unif_cols_prov_copia.txt')
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

In [88]:
df_eleccion_comp_J16 = pd.read_csv('gen_J16_unif_cols_prov_copia.txt', dtype = strings)

Seleccionamos las secciones a modelizar, que son naturalmente las de toda España.

In [89]:
secciones_mod = df_eleccion_comp_J16

if len(ccaa_mod) > 0:
    secciones_mod = secciones_mod.loc[secciones_mod['CCAA'].isin(ccaa_mod)]

if len(provincia_mod) > 0:
    secciones_mod = secciones_mod.loc[secciones_mod['Provincia'].isin(provincia_mod)]

if len(municipio_mod) > 0:
    secciones_mod = secciones_mod.loc[secciones_mod['Municipio'].isin(municipio_mod)]

In [90]:
secciones_mod

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,...,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
0,022016061010400101001,01,04,04001,0400101001,Andalucía,Almería,Abla,1062,823,...,20172.0,19546.0,5574.0,4833.0,3286.0,3082.0,403.0,471.0,"{'PP': 267, 'PSOE': 356, 'Cs': 110, 'UP': 65, ...","[('PSOE', 356), ('PP', 267), ('Cs', 110), ('UP..."
1,022016061010400201001,01,04,04002,0400201001,Andalucía,Almería,Abrucena,1040,748,...,17841.0,17115.0,4640.0,4048.0,3418.0,2770.0,568.0,620.0,"{'PP': 212, 'PSOE': 342, 'Cs': 93, 'UP': 79, '...","[('PSOE', 342), ('PP', 212), ('Cs', 93), ('UP'..."
2,022016061010400301001,01,04,04003,0400301001,Andalucía,Almería,Adra,666,487,...,26498.0,24688.0,5121.0,4795.0,2499.0,2301.0,337.0,333.0,"{'PP': 266, 'PSOE': 112, 'Cs': 48, 'UP': 46, '...","[('PP', 266), ('PSOE', 112), ('Cs', 48), ('UP'..."
3,022016061010400301002,01,04,04003,0400301002,Andalucía,Almería,Adra,1264,867,...,25677.0,23400.0,5381.0,4837.0,1815.0,1724.0,343.0,464.0,"{'PP': 436, 'PSOE': 211, 'Cs': 102, 'UP': 101,...","[('PP', 436), ('PSOE', 211), ('Cs', 102), ('UP..."
4,022016061010400301003,01,04,04003,0400301003,Andalucía,Almería,Adra,1439,952,...,22051.0,19687.0,5224.0,4044.0,1170.0,1198.0,416.0,476.0,"{'PP': 512, 'PSOE': 214, 'Cs': 111, 'UP': 85, ...","[('PP', 512), ('PSOE', 214), ('Cs', 111), ('UP..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36188,022016061195200108011,19,52,52001,5200108011,Melilla,Melilla,Melilla,1510,860,...,66352.0,62632.0,11378.0,11119.0,1508.0,1274.0,167.0,166.0,"{'PP': 401, 'PSOE': 172, 'Cs': 158, 'UP': 98, ...","[('PP', 401), ('PSOE', 172), ('Cs', 158), ('UP..."
36189,022016061195200108012,19,52,52001,5200108012,Melilla,Melilla,Melilla,1692,1109,...,50730.0,50839.0,13272.0,13038.0,2763.0,2445.0,169.0,177.0,"{'PP': 646, 'PSOE': 175, 'Cs': 155, 'UP': 81, ...","[('PP', 646), ('PSOE', 175), ('Cs', 155), ('UP..."
36190,022016061195200108013,19,52,52001,5200108013,Melilla,Melilla,Melilla,1167,627,...,37816.0,36729.0,10102.0,9640.0,1807.0,1615.0,234.0,252.0,"{'PP': 317, 'PSOE': 133, 'Cs': 93, 'UP': 58, '...","[('PP', 317), ('PSOE', 133), ('Cs', 93), ('UP'..."
36191,022016061195200108014,19,52,52001,5200108014,Melilla,Melilla,Melilla,947,486,...,29898.0,31384.0,5923.0,6061.0,2463.0,2136.0,244.0,284.0,"{'PP': 279, 'PSOE': 100, 'Cs': 43, 'UP': 39, '...","[('PP', 279), ('PSOE', 100), ('Cs', 43), ('UP'..."


In [91]:
censo_mod = secciones_mod['Censo_Esc'].sum()

In [92]:
censo_mod

34595051

Procedemos de igual manera, sumamos los resultados, normalizamos y los almacenamos en un df.

In [93]:
secciones_mod = secciones_mod[cols_validas_mod]

In [94]:
modelizacion = pd.DataFrame(secciones_mod.sum(), columns = ['Modelización'])
modelizacion['Modelización'] = modelizacion['Modelización'] / modelizacion['Modelización']['Censo_Esc']
modelizacion = modelizacion.drop(['Censo_Esc']) 

In [95]:
modelizacion

Unnamed: 0,Modelización
Votos_Total,0.698307
Nulos,0.006491
Votos_Válidos,0.691816
Blanco,0.005161
V_Cand,0.686655
PP,0.228552
PSOE,0.156789
Cs,0.090286
UP,0.146014
IU,0.0


In [96]:
modelizacion.shape

(30, 1)

Comprobamos que el número de secciones equivalentes es el que está esperando el modelo. Esto es importante porque en uno de las pruebas no fue así. La razón fue que dos de las secciones de 2019 tenían como equivalente en 2016 la misma sección. En ese caso no pudimos ejecutar el método .isin(), sino que echamos mano de un for loop.

In [97]:
np.unique(list_sec_J16).shape

(76,)

In [98]:
secciones_select2 = df_eleccion_comp_J16.loc[df_eleccion_comp_J16['Sección'].isin(list_sec_J16)]

In [99]:
secciones_select2 = secciones_select2[col_validas_select]

In [100]:
secciones_select2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 76 entries, 1767 to 35171
Data columns (total 32 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Sección        76 non-null     object
 1   Censo_Esc      76 non-null     int64 
 2   Votos_Total    76 non-null     int64 
 3   Nulos          76 non-null     int64 
 4   Votos_Válidos  76 non-null     int64 
 5   Blanco         76 non-null     int64 
 6   V_Cand         76 non-null     int64 
 7   PP             76 non-null     int64 
 8   PSOE           76 non-null     int64 
 9   Cs             76 non-null     int64 
 10  UP             76 non-null     int64 
 11  IU             76 non-null     int64 
 12  VOX            76 non-null     int64 
 13  UPyD           76 non-null     int64 
 14  MP             76 non-null     int64 
 15  CiU            76 non-null     int64 
 16  ERC            76 non-null     int64 
 17  JxC            76 non-null     int64 
 18  CUP            76 non-null

In [101]:
secciones_select2.shape

(76, 32)

Hacemos un pequeño inciso para explicar que hicimos en el caso de no poder utilizar .isin().

In [102]:
secciones_select2.iloc[0, ]

Sección          022016061011403801003
Censo_Esc                         1743
Votos_Total                       1180
Nulos                                9
Votos_Válidos                     1171
Blanco                              15
V_Cand                            1156
PP                                 450
PSOE                               377
Cs                                 170
UP                                 144
IU                                   0
VOX                                  0
UPyD                                 2
MP                                   0
CiU                                  0
ERC                                  0
JxC                                  0
CUP                                  0
DiL                                  0
PNV                                  0
Bildu                                0
Amaiur                               0
CC                                   0
FA                                   0
TE                       

En ese caso creamos un dataframe dummy con un elemento inicial que defina así las columnas, dff.



In [103]:
dff = pd.DataFrame(secciones_select2.iloc[0, ]).T

In [104]:
dff

Unnamed: 0,Sección,Censo_Esc,Votos_Total,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,...,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros
1767,022016061011403801003,1743,1180,9,1171,15,1156,450,377,170,...,0,0,0,0,0,0,0,0,8,5


... y aplicamos un for loop para seleccionar las filas de las secciones equivalentes

In [105]:
for sec in list_sec_J16:
    row = df_eleccion_comp_J16.loc[df_eleccion_comp_J16['Sección'] == sec]
    dff = dff.append(row)



In [106]:
dff

Unnamed: 0,Sección,Censo_Esc,Votos_Total,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,...,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
1767,022016061011403801003,1743,1180,9,1171,15,1156,450,377,170,...,,,,,,,,,,
1767,022016061011403801003,1743,1180,9,1171,15,1156,450,377,170,...,19944.0,18935.0,5166.0,3985.0,2004.0,1940.0,378.0,483.0,"{'PP': 450, 'PSOE': 377, 'Cs': 170, 'UP': 144,...","[('PP', 450), ('PSOE', 377), ('Cs', 170), ('UP..."
1770,022016061011403801006,673,448,0,448,1,447,190,121,77,...,19281.0,19901.0,5199.0,4306.0,2269.0,2247.0,371.0,392.0,"{'PP': 190, 'PSOE': 121, 'Cs': 77, 'UP': 51, '...","[('PP', 190), ('PSOE', 121), ('Cs', 77), ('UP'..."
1774,022016061011403802004,1050,678,12,666,5,661,200,252,103,...,17705.0,16493.0,4644.0,3522.0,1420.0,1343.0,494.0,595.0,"{'PP': 200, 'PSOE': 252, 'Cs': 103, 'UP': 96, ...","[('PSOE', 252), ('PP', 200), ('Cs', 103), ('UP..."
1783,022016061011403803003,710,472,4,468,2,466,179,194,54,...,22725.0,21997.0,5411.0,4386.0,2477.0,2413.0,397.0,430.0,"{'PP': 179, 'PSOE': 194, 'Cs': 54, 'UP': 33, '...","[('PSOE', 194), ('PP', 179), ('Cs', 54), ('UP'..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35160,022016061174619001034,1214,917,3,914,4,910,297,150,152,...,42246.0,41518.0,12075.0,11009.0,3697.0,3321.0,180.0,289.0,"{'PP': 297, 'PSOE': 150, 'Cs': 152, 'UP': 280,...","[('PP', 297), ('UP', 280), ('Cs', 152), ('PSOE..."
35163,022016061174619001037,1458,1190,4,1186,8,1178,488,138,243,...,50974.0,53220.0,13897.0,13206.0,4196.0,3544.0,170.0,264.0,"{'PP': 488, 'PSOE': 138, 'Cs': 243, 'UP': 283,...","[('PP', 488), ('UP', 283), ('Cs', 243), ('PSOE..."
35165,022016061174619001039,983,365,5,360,3,357,63,115,26,...,12087.0,11388.0,2253.0,1349.0,809.0,750.0,385.0,399.0,"{'PP': 63, 'PSOE': 115, 'Cs': 26, 'UP': 139, '...","[('UP', 139), ('PSOE', 115), ('PP', 63), ('Cs'..."
35169,022016061174619001043,997,786,3,783,3,780,286,105,226,...,34541.0,32552.0,13233.0,11589.0,842.0,863.0,175.0,307.0,"{'PP': 286, 'PSOE': 105, 'Cs': 226, 'UP': 141,...","[('PP', 286), ('Cs', 226), ('UP', 141), ('PSOE..."


... Y nos quedamos con todas excepto la primera que usamos para crear el dummy.

In [107]:
secciones_select = dff.iloc[1:,]

In [108]:
secciones_select.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 76 entries, 1767 to 35171
Data columns (total 97 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Sección                           76 non-null     object 
 1   Censo_Esc                         76 non-null     object 
 2   Votos_Total                       76 non-null     object 
 3   Nulos                             76 non-null     object 
 4   Votos_Válidos                     76 non-null     object 
 5   Blanco                            76 non-null     object 
 6   V_Cand                            76 non-null     object 
 7   PP                                76 non-null     object 
 8   PSOE                              76 non-null     object 
 9   Cs                                76 non-null     object 
 10  UP                                76 non-null     object 
 11  IU                                76 non-null     object 
 12  VOX 

Pero en este caso no es necesario.

In [109]:
secciones_select

Unnamed: 0,Sección,Censo_Esc,Votos_Total,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,...,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
1767,022016061011403801003,1743,1180,9,1171,15,1156,450,377,170,...,19944.0,18935.0,5166.0,3985.0,2004.0,1940.0,378.0,483.0,"{'PP': 450, 'PSOE': 377, 'Cs': 170, 'UP': 144,...","[('PP', 450), ('PSOE', 377), ('Cs', 170), ('UP..."
1770,022016061011403801006,673,448,0,448,1,447,190,121,77,...,19281.0,19901.0,5199.0,4306.0,2269.0,2247.0,371.0,392.0,"{'PP': 190, 'PSOE': 121, 'Cs': 77, 'UP': 51, '...","[('PP', 190), ('PSOE', 121), ('Cs', 77), ('UP'..."
1774,022016061011403802004,1050,678,12,666,5,661,200,252,103,...,17705.0,16493.0,4644.0,3522.0,1420.0,1343.0,494.0,595.0,"{'PP': 200, 'PSOE': 252, 'Cs': 103, 'UP': 96, ...","[('PSOE', 252), ('PP', 200), ('Cs', 103), ('UP..."
1783,022016061011403803003,710,472,4,468,2,466,179,194,54,...,22725.0,21997.0,5411.0,4386.0,2477.0,2413.0,397.0,430.0,"{'PP': 179, 'PSOE': 194, 'Cs': 54, 'UP': 33, '...","[('PSOE', 194), ('PP', 179), ('Cs', 54), ('UP'..."
1784,022016061011403803004,1456,1056,11,1045,9,1036,612,178,154,...,29800.0,27120.0,7324.0,6148.0,2559.0,2416.0,300.0,310.0,"{'PP': 612, 'PSOE': 178, 'Cs': 154, 'UP': 81, ...","[('PP', 612), ('PSOE', 178), ('Cs', 154), ('UP..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35160,022016061174619001034,1214,917,3,914,4,910,297,150,152,...,42246.0,41518.0,12075.0,11009.0,3697.0,3321.0,180.0,289.0,"{'PP': 297, 'PSOE': 150, 'Cs': 152, 'UP': 280,...","[('PP', 297), ('UP', 280), ('Cs', 152), ('PSOE..."
35163,022016061174619001037,1458,1190,4,1186,8,1178,488,138,243,...,50974.0,53220.0,13897.0,13206.0,4196.0,3544.0,170.0,264.0,"{'PP': 488, 'PSOE': 138, 'Cs': 243, 'UP': 283,...","[('PP', 488), ('UP', 283), ('Cs', 243), ('PSOE..."
35165,022016061174619001039,983,365,5,360,3,357,63,115,26,...,12087.0,11388.0,2253.0,1349.0,809.0,750.0,385.0,399.0,"{'PP': 63, 'PSOE': 115, 'Cs': 26, 'UP': 139, '...","[('UP', 139), ('PSOE', 115), ('PP', 63), ('Cs'..."
35169,022016061174619001043,997,786,3,783,3,780,286,105,226,...,34541.0,32552.0,13233.0,11589.0,842.0,863.0,175.0,307.0,"{'PP': 286, 'PSOE': 105, 'Cs': 226, 'UP': 141,...","[('PP', 286), ('Cs', 226), ('UP', 141), ('PSOE..."


In [110]:
secciones_select_norm = secciones_select2.copy()

Y ahora simplemente normalizamos y trasponemos.

In [111]:
for col in secciones_select_norm.columns:
    if col not in set_cols:
        secciones_select_norm[col] = secciones_select_norm[col]/secciones_select_norm['Censo_Esc']

secciones_select_norm = secciones_select_norm.set_index('Sección')
secciones_select_norm = secciones_select_norm.drop('Censo_Esc', axis = 1)

secciones_select_norm = secciones_select_norm.T

In [112]:
secciones_select_norm

Sección,022016061011403801003,022016061011403801006,022016061011403802004,022016061011403803003,022016061011403803004,022016061011403803005,022016061011403803007,022016061011403804001,022016061011403805001,022016061011403805002,...,022016061174619001015,022016061174619001016,022016061174619001018,022016061174619001023,022016061174619001033,022016061174619001034,022016061174619001037,022016061174619001039,022016061174619001043,022016061174619001045
Votos_Total,0.676994,0.665676,0.645714,0.664789,0.725275,0.671994,0.699287,0.629556,0.746622,0.717105,...,0.724026,0.676839,0.662198,0.635762,0.391254,0.755354,0.816187,0.371312,0.788365,0.788679
Nulos,0.005164,0.0,0.011429,0.005634,0.007555,0.008339,0.004537,0.004639,0.006757,0.013158,...,0.00487,0.00626,0.006702,0.003311,0.002301,0.002471,0.002743,0.005086,0.003009,0.002516
Votos_Válidos,0.67183,0.665676,0.634286,0.659155,0.71772,0.663655,0.69475,0.624917,0.739865,0.703947,...,0.719156,0.670579,0.655496,0.63245,0.388953,0.752883,0.813443,0.366226,0.785356,0.786164
Blanco,0.008606,0.001486,0.004762,0.002817,0.006181,0.005559,0.00324,0.005964,0.010135,0.001316,...,0.008117,0.002347,0.004021,0.003311,0.005754,0.003295,0.005487,0.003052,0.003009,0.003774
V_Cand,0.663224,0.66419,0.629524,0.656338,0.711538,0.658096,0.69151,0.618953,0.72973,0.702632,...,0.711039,0.668232,0.651475,0.629139,0.383199,0.749588,0.807956,0.363174,0.782347,0.78239
PP,0.258176,0.282318,0.190476,0.252113,0.42033,0.234885,0.292936,0.247184,0.211149,0.143421,...,0.157468,0.119718,0.193029,0.107616,0.079402,0.244646,0.334705,0.06409,0.286861,0.188679
PSOE,0.216294,0.179792,0.24,0.273239,0.122253,0.165393,0.16267,0.15507,0.381757,0.331579,...,0.144481,0.197966,0.191689,0.225166,0.120829,0.123558,0.09465,0.116989,0.105316,0.116981
Cs,0.097533,0.114413,0.098095,0.076056,0.105769,0.131341,0.125729,0.117296,0.069257,0.040789,...,0.112825,0.092332,0.046917,0.033113,0.025316,0.125206,0.166667,0.02645,0.22668,0.223899
UP,0.082616,0.07578,0.091429,0.046479,0.055632,0.113968,0.09138,0.083499,0.050676,0.177632,...,0.260552,0.229264,0.198391,0.236755,0.132336,0.230643,0.194102,0.141404,0.141424,0.225157
IU,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [113]:
secciones_select_norm.shape

(30, 76)

Ya podemos aplucar el modelo, y para ello definimos la matriz X e y.

In [114]:
secciones_select_norm['Modelización'] = modelizacion['Modelización']

In [115]:
X = secciones_select_norm.drop('Modelización', axis = 1).values
y = secciones_select_norm['Modelización'].values

Aplicamos el modelo del pipeline, por supuesto sin hacer ningún fit.

In [116]:
predicciones = pipe_modelado.predict(X=X)
predicciones = predicciones.flatten()

In [117]:
r2 = r2_score(
            y_true  = y,
            y_pred  = predicciones
           )

El R2 que conseguimos es muy bueno, un 99.97%

In [118]:
r2

0.9998686184816051

Ahora mostramos los resultados, comenzando por deshacer la normalización.

In [119]:
est = predicciones * censo_mod

In [120]:
df = pd.DataFrame(est, index = secciones_select_norm.index, columns = ['Estimación']).astype('int32')

In [121]:
df

Unnamed: 0,Estimación
Votos_Total,24308874
Nulos,238563
Votos_Válidos,24068455
Blanco,187345
V_Cand,23879253
PP,7620332
PSOE,5566977
Cs,3233498
UP,5162874
IU,-1855


Añadimos la columna de los datos reales en 2016

In [122]:
df1 = pd.DataFrame(secciones_mod.sum(), columns = ['Real']).drop('Censo_Esc')

In [123]:
df['Real'] = df1['Real']

In [124]:
df

Unnamed: 0,Estimación,Real
Votos_Total,24308874,24157982
Nulos,238563,224564
Votos_Válidos,24068455,23933418
Blanco,187345,178559
V_Cand,23879253,23754859
PP,7620332,7906761
PSOE,5566977,5424130
Cs,3233498,3123436
UP,5162874,5051345
IU,-1855,0


Añadimos columnas con los porcentajes de voto para majorar la interpretabilidad. 

In [125]:
df['porcentaje_Estimación'] = df['Estimación'] / df['Estimación'][2] * 100

In [126]:
df['porcentaje_Real'] = df['Real'] / df['Real'][2] * 100

In [127]:
df['dif_Real-Est.'] = df['porcentaje_Real'] - df['porcentaje_Estimación']

In [128]:
df

Unnamed: 0,Estimación,Real,porcentaje_Estimación,porcentaje_Real,dif_Real-Est.
Votos_Total,24308874,24157982,100.998897,100.938286,-0.06061
Nulos,238563,224564,0.991185,0.938286,-0.052899
Votos_Válidos,24068455,23933418,100.0,100.0,0.0
Blanco,187345,178559,0.778384,0.746066,-0.032318
V_Cand,23879253,23754859,99.213901,99.253934,0.040034
PP,7620332,7906761,31.661077,33.036489,1.375412
PSOE,5566977,5424130,23.129765,22.663416,-0.466349
Cs,3233498,3123436,13.434589,13.050522,-0.384067
UP,5162874,5051345,21.450791,21.105824,-0.344967
IU,-1855,0,-0.007707,0.0,0.007707


In [129]:
df.to_csv('modelacion_nacional_pca.csv')

Vemos que el ajuste no es igual de bueno que en noviembre de 2019, como era de esperar, pero sigue siendo bastante satisfactorio en general. La estimación solo difiere de la realidad un poco en el caso del PP.

Pensamos que tiene bastante mérito, teniendo en cuenta la forma casi minimalista en la que hemos elegido las fuentes de las secciones seleccionadas: una serie de municipios elegidos al tuntun en provincias en general con partidos nacionalistas.

In [130]:
#para guardar el archivo en s3:

from botocore.exceptions import ClientError

s3_client = boto3.client(
    's3',
    aws_access_key_id='xxxxxxxxxxxxx',
    aws_secret_access_key='xxxxxxxxxxxxxxxxxxxxx',    
)

def upload_file(file_name, bucket, object_name=None):
    """Subir un archivo a un bucket
    :param file_name: archivo que hay que subir
    :param bucket: Bucket al que hay que subirlo
    :param object_name: S3 object name. Incluye la carpeta en la que hay que guardarlo. si no hay no se pone nada
    :return: True si sube el archivo, else False
    """

    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = file_name

    # Upload the file
    #s3_client = boto3.client('s3')
    try:
        response = s3_client.upload_file(file_name, bucket, object_name)
    except ClientError as e:
        logging.error(e)
        return False
    return True

In [131]:
upload_file('modelacion_nacional_pca.csv',
            'electomedia',
            object_name = "resultados_modelos/" + 'modelacion_nacional_pca.csv'
           )

True