# Modelización de territorio con regresión lineal sin PCA

En este cuaderno mostramos un ejemplo de modelización de un territorio, en este caso la provincia de Zaragoza, mediante secciones electorales escogidas de esa misma provincia. La modelización la haremos mediante regresión lineal sin utilizar PCA.

Primero elegimos las secciones para una misma elección, en este caso la de noviembre de 2019. Después tomamos las secciones elegidas y utilizamos sus equivalentes de las elecciones de 2016, para ver si sirven para modelizar la provincia de Zaragoza en esos comicios.

## Modelización en las elecciones de noviembre de 2019

Comenzamos cargando las librerías necesarias, así como el dataset de las elecciones de noviembre de 2019.

In [None]:
import pandas as pd
import numpy as np
import random

In [None]:
strings = {'Sección' : 'str', 'cod_ccaa' : 'str', 'cod_prov' : 'str', 'cod_mun' : 'str', 'cod_sec' : 'str'}

In [None]:
df_eleccion_comp = pd.read_csv('/content/drive/MyDrive/Proyecto_KeepCoding - Propio/Data/Gen-19-Nov/gen_N19_unif_cols_prov.txt', dtype = strings)

In [None]:
df_eleccion_comp

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,Participación,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,IU,VOX,UPyD,MP,CiU,ERC,JxC,CUP,DiL,PNV,Bildu,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros,...,30-34,35-39,40-44,45-49,50-54,55-59,60-64,65-69,70-74,75-79,80-84,85-89,90-94,95-99,100 y más,Población Total,Hombres,Mujeres,% mayores 65 años,% 20-64 años,% menores 19 años,Afiliados SS Minicipio,% Afiliados SS autónomos,% Afiliados SS / Población,Paro Registrado Municipio,% Paro Hombres,% Paro mayores 45,% Paro s/ Afiliados SS Municipio,Renta persona 2017,Renta persona 2015,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
0,022019111010400101001,01,04,04001,0400101001,Andalucía,Almería,Abla,1002,717,0.715569,7,710,3,707,193,310,47,30,0,122,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,...,73.0,80.0,89.0,81.0,94.0,87.0,91.0,77.0,72.0,42.0,67.0,56.0,19.0,4.0,0.0,1249.0,635.0,614.0,0.269816,0.590072,0.140112,304.0,0.223684,0.243395,140.0,0.421429,0.550000,0.315315,9159.0,8788.0,20172.0,19546.0,5574.0,4833.0,3286.0,3082.0,403.0,471.0,"{'PP': 193, 'PSOE': 310, 'Cs': 47, 'UP': 30, '...","[('PSOE', 310), ('PP', 193), ('VOX', 122), ('C..."
1,022019111010400201001,01,04,04002,0400201001,Andalucía,Almería,Abrucena,1013,711,0.701876,12,699,1,698,111,349,45,42,0,147,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,...,60.0,75.0,70.0,70.0,108.0,101.0,99.0,86.0,61.0,64.0,61.0,46.0,14.0,2.0,1.0,1202.0,637.0,565.0,0.278702,0.609817,0.111481,298.0,0.251678,0.247920,179.0,0.379888,0.625698,0.375262,8827.0,8107.0,17841.0,17115.0,4640.0,4048.0,3418.0,2770.0,568.0,620.0,"{'PP': 111, 'PSOE': 349, 'Cs': 45, 'UP': 42, '...","[('PSOE', 349), ('VOX', 147), ('PP', 111), ('C..."
2,022019111010400301001,01,04,04003,0400301001,Andalucía,Almería,Adra,667,484,0.725637,7,477,5,472,176,128,15,34,0,116,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,...,54.0,54.0,54.0,61.0,82.0,75.0,67.0,48.0,37.0,40.0,26.0,15.0,3.0,1.0,0.0,892.0,435.0,457.0,0.190583,0.643498,0.165919,7968.0,0.382530,8.932735,2525.0,0.432871,0.473663,0.240637,8965.0,8267.0,26498.0,24688.0,5121.0,4795.0,2499.0,2301.0,337.0,333.0,"{'PP': 176, 'PSOE': 128, 'Cs': 15, 'UP': 34, '...","[('PP', 176), ('PSOE', 128), ('VOX', 116), ('U..."
3,022019111010400301002,01,04,04003,0400301002,Andalucía,Almería,Adra,1306,909,0.696018,3,906,5,901,251,220,51,58,0,312,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,3,...,108.0,158.0,162.0,150.0,140.0,119.0,103.0,67.0,49.0,37.0,30.0,14.0,7.0,1.0,0.0,1752.0,865.0,887.0,0.117009,0.647260,0.235731,7968.0,0.382530,4.547945,2525.0,0.432871,0.473663,0.240637,8599.0,7941.0,25677.0,23400.0,5381.0,4837.0,1815.0,1724.0,343.0,464.0,"{'PP': 251, 'PSOE': 220, 'Cs': 51, 'UP': 58, '...","[('VOX', 312), ('PP', 251), ('PSOE', 220), ('U..."
4,022019111010400301003,01,04,04003,0400301003,Andalucía,Almería,Adra,1551,975,0.628627,12,963,9,954,292,202,73,52,0,327,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,3,...,189.0,178.0,215.0,227.0,164.0,110.0,96.0,61.0,58.0,41.0,40.0,27.0,4.0,4.0,0.0,2240.0,1094.0,1146.0,0.104911,0.647768,0.247321,7968.0,0.382530,3.557143,2525.0,0.432871,0.473663,0.240637,8076.0,7150.0,22051.0,19687.0,5224.0,4044.0,1170.0,1198.0,416.0,476.0,"{'PP': 292, 'PSOE': 202, 'Cs': 73, 'UP': 52, '...","[('VOX', 327), ('PP', 292), ('PSOE', 202), ('C..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36297,022019111195200108011,19,52,52001,5200108011,Melilla,Melilla,Melilla,1638,1021,0.623321,3,1018,11,1007,303,140,30,28,0,158,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,348,...,181.0,185.0,171.0,164.0,165.0,180.0,155.0,97.0,38.0,34.0,19.0,16.0,4.0,3.0,0.0,2480.0,1244.0,1236.0,0.085081,0.623387,0.291532,23931.0,0.190548,9.649597,12737.0,0.366177,0.403627,0.347360,16433.0,15847.0,66352.0,62632.0,11378.0,11119.0,1508.0,1274.0,167.0,166.0,"{'PP': 303, 'PSOE': 140, 'Cs': 30, 'UP': 28, '...","[('Otros', 348), ('PP', 303), ('VOX', 158), ('..."
36298,022019111195200108012,19,52,52001,5200108012,Melilla,Melilla,Melilla,1676,1057,0.630668,9,1048,2,1046,463,205,36,35,0,210,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,97,...,160.0,175.0,184.0,162.0,188.0,162.0,147.0,106.0,99.0,67.0,49.0,38.0,8.0,2.0,0.0,2334.0,1173.0,1161.0,0.158098,0.612682,0.229220,23931.0,0.190548,10.253213,12737.0,0.366177,0.403627,0.347360,17350.0,16792.0,50730.0,50839.0,13272.0,13038.0,2763.0,2445.0,169.0,177.0,"{'PP': 463, 'PSOE': 205, 'Cs': 36, 'UP': 35, '...","[('PP', 463), ('VOX', 210), ('PSOE', 205), ('O..."
36299,022019111195200108013,19,52,52001,5200108013,Melilla,Melilla,Melilla,1132,638,0.563604,5,633,4,629,208,113,31,25,0,144,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,108,...,179.0,172.0,123.0,117.0,123.0,127.0,113.0,68.0,44.0,24.0,23.0,17.0,2.0,1.0,0.0,1828.0,976.0,852.0,0.097921,0.663567,0.238512,23931.0,0.190548,13.091357,12737.0,0.366177,0.403627,0.347360,12553.0,11823.0,37816.0,36729.0,10102.0,9640.0,1807.0,1615.0,234.0,252.0,"{'PP': 208, 'PSOE': 113, 'Cs': 31, 'UP': 25, '...","[('PP', 208), ('VOX', 144), ('PSOE', 113), ('O..."
36300,022019111195200108014,19,52,52001,5200108014,Melilla,Melilla,Melilla,899,527,0.586207,4,523,0,523,200,87,13,12,0,126,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,85,...,115.0,124.0,81.0,59.0,69.0,70.0,90.0,71.0,59.0,42.0,25.0,8.0,1.0,0.0,0.0,1298.0,634.0,664.0,0.158706,0.577042,0.264253,23931.0,0.190548,18.436826,12737.0,0.366177,0.403627,0.347360,8906.0,8937.0,29898.0,31384.0,5923.0,6061.0,2463.0,2136.0,244.0,284.0,"{'PP': 200, 'PSOE': 87, 'Cs': 13, 'UP': 12, 'I...","[('PP', 200), ('VOX', 126), ('PSOE', 87), ('Ot..."


Primero especificamos el territorio que queremos modelizar, en este caso la provincia de Zaragoza. Dejamos vacías las opciones de CCAA y municipio; ambas deben ser cocurrentes, es decir, si escogiésemos un municipio, éste tendría que pertenecer en esta caso a la provincia de Zaragoza.

In [None]:
ccaa_mod = []

provincia_mod = ['Zaragoza']

municipio_mod = []

secciones_mod = df_eleccion_comp

In [None]:
if len(ccaa_mod) > 0:

  secciones_mod = secciones_mod.loc[secciones_mod['CCAA'].isin(ccaa_mod)]

if len(provincia_mod) > 0:

  secciones_mod = secciones_mod.loc[secciones_mod['Provincia'].isin(provincia_mod)]

if len(municipio_mod) > 0:

  secciones_mod = secciones_mod.loc[secciones_mod['Municipio'].isin(municipio_mod)]



Vemos que tenemos 880 secciones electorales en Zaragoza provincia.

In [None]:
secciones_mod

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,Participación,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,IU,VOX,UPyD,MP,CiU,ERC,JxC,CUP,DiL,PNV,Bildu,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros,...,30-34,35-39,40-44,45-49,50-54,55-59,60-64,65-69,70-74,75-79,80-84,85-89,90-94,95-99,100 y más,Población Total,Hombres,Mujeres,% mayores 65 años,% 20-64 años,% menores 19 años,Afiliados SS Minicipio,% Afiliados SS autónomos,% Afiliados SS / Población,Paro Registrado Municipio,% Paro Hombres,% Paro mayores 45,% Paro s/ Afiliados SS Municipio,Renta persona 2017,Renta persona 2015,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
6553,022019111025000101001,02,50,50001,5000101001,Aragón,Zaragoza,Abanto,89,68,0.764045,0,68,0,68,42,13,1,0,0,10,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,3.0,5.0,5.0,7.0,7.0,10.0,10.0,7.0,11.0,5.0,4.0,8.0,4.0,0.0,1.0,100.0,59.0,41.0,0.400000,0.510000,0.090000,16.0,0.750000,0.160000,0.0,0.500000,0.350000,0.000000,11234.267197,11184.000000,28322.021999,21149.000000,7855.336603,5134.000000,3217.875711,4987.000000,293.331625,139.000000,"{'PP': 42, 'PSOE': 13, 'Cs': 1, 'UP': 0, 'IU':...","[('PP', 42), ('PSOE', 13), ('VOX', 10), ('MP',..."
6554,022019111025000201001,02,50,50002,5000201001,Aragón,Zaragoza,Acered,125,91,0.728000,5,86,0,86,43,19,4,0,0,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,6.0,7.0,12.0,13.0,20.0,6.0,6.0,17.0,14.0,12.0,14.0,12.0,2.0,1.0,0.0,166.0,98.0,68.0,0.433735,0.487952,0.078313,43.0,0.558140,0.259036,10.0,0.500000,0.800000,0.188679,9448.000000,9665.000000,18895.000000,20525.000000,3494.000000,2873.000000,4611.000000,3968.000000,84.000000,233.000000,"{'PP': 43, 'PSOE': 19, 'Cs': 4, 'UP': 0, 'IU':...","[('PP', 43), ('VOX', 20), ('PSOE', 19), ('Cs',..."
6555,022019111025000301001,02,50,50003,5000301001,Aragón,Zaragoza,Agón,117,89,0.760684,0,89,1,88,23,39,2,2,0,20,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,6.0,6.0,14.0,12.0,12.0,10.0,8.0,11.0,15.0,10.0,6.0,4.0,1.0,0.0,0.0,137.0,80.0,57.0,0.343066,0.532847,0.124088,29.0,0.379310,0.211679,7.0,0.285714,0.714286,0.194444,12298.000000,12334.000000,27578.000000,27753.000000,5804.000000,5694.000000,5604.000000,5250.000000,161.000000,247.000000,"{'PP': 23, 'PSOE': 39, 'Cs': 2, 'UP': 2, 'IU':...","[('PSOE', 39), ('PP', 23), ('VOX', 20), ('Cs',..."
6556,022019111025000401001,02,50,50004,5000401001,Aragón,Zaragoza,Aguarón,475,360,0.757895,4,356,2,354,96,155,17,19,0,44,0,21,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,...,36.0,27.0,47.0,49.0,56.0,69.0,34.0,32.0,32.0,47.0,34.0,31.0,11.0,1.0,0.0,636.0,337.0,299.0,0.295597,0.589623,0.114780,149.0,0.469799,0.234277,21.0,0.285714,0.714286,0.123529,11280.000000,10229.000000,25421.000000,23879.000000,7039.000000,6056.000000,3502.000000,3246.000000,208.000000,253.000000,"{'PP': 96, 'PSOE': 155, 'Cs': 17, 'UP': 19, 'I...","[('PSOE', 155), ('PP', 96), ('VOX', 44), ('MP'..."
6557,022019111025000501001,02,50,50005,5000501001,Aragón,Zaragoza,Aguilón,228,185,0.811404,1,184,2,182,84,34,13,12,0,35,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,...,7.0,11.0,20.0,13.0,13.0,24.0,31.0,25.0,17.0,18.0,12.0,19.0,4.0,5.0,0.0,251.0,136.0,115.0,0.398406,0.549801,0.051793,21.0,0.571429,0.083665,5.0,0.200000,0.600000,0.192308,14168.000000,13341.000000,31410.000000,29687.000000,8651.000000,8019.000000,5616.000000,4816.000000,108.000000,191.000000,"{'PP': 84, 'PSOE': 34, 'Cs': 13, 'UP': 12, 'IU...","[('PP', 84), ('VOX', 35), ('PSOE', 34), ('Cs',..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7428,022019111025029802001,02,50,50298,5029802001,Aragón,Zaragoza,Zuera,610,482,0.790164,2,480,3,477,134,139,45,50,0,82,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,...,24.0,44.0,91.0,56.0,63.0,55.0,47.0,42.0,48.0,37.0,20.0,19.0,11.0,4.0,0.0,762.0,400.0,362.0,0.237533,0.570866,0.191601,5632.0,0.097656,7.391076,442.0,0.384615,0.459276,0.072769,12085.000000,12273.000000,31542.000000,31419.000000,9774.000000,8326.000000,3118.000000,3365.000000,213.000000,395.000000,"{'PP': 134, 'PSOE': 139, 'Cs': 45, 'UP': 50, '...","[('PSOE', 139), ('PP', 134), ('VOX', 82), ('UP..."
7429,022019111025090101001,02,50,50901,5090101001,Aragón,Zaragoza,Biel,133,96,0.721805,0,96,0,96,18,33,7,8,0,21,0,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,...,6.0,4.0,8.0,10.0,16.0,19.0,10.0,19.0,17.0,7.0,7.0,7.0,3.0,1.0,0.0,154.0,96.0,58.0,0.396104,0.571429,0.032468,38.0,0.289474,0.246753,6.0,0.000000,0.666667,0.136364,16414.000000,16613.000000,25367.000000,26506.000000,13108.000000,9636.000000,7146.000000,7398.000000,145.000000,214.000000,"{'PP': 18, 'PSOE': 33, 'Cs': 7, 'UP': 8, 'IU':...","[('PSOE', 33), ('VOX', 21), ('PP', 18), ('UP',..."
7430,022019111025090201001,02,50,50902,5090201001,Aragón,Zaragoza,Marracos,77,65,0.844156,3,62,0,62,29,15,4,3,0,10,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,5.0,1.0,9.0,11.0,4.0,6.0,5.0,5.0,8.0,10.0,6.0,0.0,1.0,1.0,0.0,86.0,53.0,33.0,0.360465,0.534884,0.104651,9.0,0.777778,0.104651,1.0,0.000000,0.000000,0.100000,11234.267197,10618.182737,28322.021999,26938.114416,7855.336603,6845.948425,3217.875711,2985.302533,293.331625,347.217589,"{'PP': 29, 'PSOE': 15, 'Cs': 4, 'UP': 3, 'IU':...","[('PP', 29), ('PSOE', 15), ('VOX', 10), ('Cs',..."
7431,022019111025090301001,02,50,50903,5090301001,Aragón,Zaragoza,Villamayor de Gállego,1143,844,0.738408,5,839,10,829,160,226,64,133,0,160,0,65,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,17,...,71.0,67.0,88.0,106.0,128.0,111.0,107.0,79.0,69.0,52.0,46.0,50.0,14.0,5.0,1.0,1368.0,682.0,686.0,0.230994,0.603070,0.165936,723.0,0.300138,0.528509,131.0,0.404580,0.557252,0.153396,13087.000000,12022.000000,34050.000000,31945.000000,9707.000000,8721.000000,3872.000000,3239.000000,162.000000,287.000000,"{'PP': 160, 'PSOE': 226, 'Cs': 64, 'UP': 133, ...","[('PSOE', 226), ('PP', 160), ('VOX', 160), ('U..."


Queremos modelizar solo los resultados electorales, por lo que nos quedamos solo con ellos.

In [None]:
secciones_mod_lista = list(secciones_mod['Sección']) 

In [None]:
cols_validas_mod = ['Censo_Esc', 'Votos_Total', 'Nulos', 'Votos_Válidos', 'Blanco', 'V_Cand', 'PP', 'PSOE', 'Cs', 'UP',
       'IU', 'VOX', 'UPyD', 'MP', 'CiU', 'ERC', 'JxC', 'CUP', 'DiL', 'PNV',
       'Bildu', 'Amaiur', 'CC', 'FA', 'TE', 'BNG', 'PRC', 'GBai', 'Compromis',
       'PACMA', 'Otros']

In [None]:
secciones_mod = secciones_mod[cols_validas_mod]

In [None]:
secciones_mod

Unnamed: 0,Censo_Esc,Votos_Total,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,IU,VOX,UPyD,MP,CiU,ERC,JxC,CUP,DiL,PNV,Bildu,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros
6553,89,68,0,68,0,68,42,13,1,0,0,10,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6554,125,91,5,86,0,86,43,19,4,0,0,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6555,117,89,0,89,1,88,23,39,2,2,0,20,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6556,475,360,4,356,2,354,96,155,17,19,0,44,0,21,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
6557,228,185,1,184,2,182,84,34,13,12,0,35,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7428,610,482,2,480,3,477,134,139,45,50,0,82,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13
7429,133,96,0,96,0,96,18,33,7,8,0,21,0,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
7430,77,65,3,62,0,62,29,15,4,3,0,10,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7431,1143,844,5,839,10,829,160,226,64,133,0,160,0,65,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,17


Ahora hay que obtener los resultados electorales conjunto del territorio que debemos modelizar. Primero nos quedamos con el censo del territorio, después creamos un df con estos resultados, y finalmente, muy importante: normalizamos estos resultados dividendo por el censo, así no importa el tamaño del territorio que queramos modelizar.

In [None]:
censo_mod = secciones_mod['Censo_Esc'].sum()

In [None]:
modelizacion = pd.DataFrame(secciones_mod.sum(), columns = ['Modelización'])

In [None]:
modelizacion['Modelización'] = modelizacion['Modelización'] / modelizacion['Modelización']['Censo_Esc']

Obtenemos un df de una columna con los resultados electorales normalizados por su censo.

In [None]:
modelizacion

Unnamed: 0,Modelización
Censo_Esc,1.0
Votos_Total,0.719466
Nulos,0.006076
Votos_Válidos,0.713389
Blanco,0.006958
V_Cand,0.706431
PP,0.166932
PSOE,0.220048
Cs,0.065202
UP,0.080233


La primera fila siempre será 1, pues es el censo dividido por sí mismo, por lo que la podemos eliminar.

In [None]:
modelizacion = modelizacion.drop(['Censo_Esc']) 

In [None]:
modelizacion

Unnamed: 0,Modelización
Votos_Total,0.719466
Nulos,0.006076
Votos_Válidos,0.713389
Blanco,0.006958
V_Cand,0.706431
PP,0.166932
PSOE,0.220048
Cs,0.065202
UP,0.080233
IU,0.0


In [None]:
modelizacion.shape

(30, 1)

Ahora debemos buscar las secciones que modelicen la provincia de Zaragoza, que, como hemos mencionado, son las de ella misma.

In [None]:
ccaa_select = []

provincia_select = ['Zaragoza']

municipio_select = []

secciones_select = df_eleccion_comp

In [None]:
if len(ccaa_select) > 0:

  secciones_select = secciones_select.loc[secciones_select['CCAA'].isin(ccaa_select)]

if len(provincia_select) > 0:

  secciones_select = secciones_select.loc[secciones_select['Provincia'].isin(provincia_select)]

if len(municipio_select) > 0:

  secciones_select = secciones_select.loc[secciones_select['Municipio'].isin(municipio_select)]



In [None]:
secciones_select

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,Participación,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,IU,VOX,UPyD,MP,CiU,ERC,JxC,CUP,DiL,PNV,Bildu,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros,...,30-34,35-39,40-44,45-49,50-54,55-59,60-64,65-69,70-74,75-79,80-84,85-89,90-94,95-99,100 y más,Población Total,Hombres,Mujeres,% mayores 65 años,% 20-64 años,% menores 19 años,Afiliados SS Minicipio,% Afiliados SS autónomos,% Afiliados SS / Población,Paro Registrado Municipio,% Paro Hombres,% Paro mayores 45,% Paro s/ Afiliados SS Municipio,Renta persona 2017,Renta persona 2015,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
6553,022019111025000101001,02,50,50001,5000101001,Aragón,Zaragoza,Abanto,89,68,0.764045,0,68,0,68,42,13,1,0,0,10,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,3.0,5.0,5.0,7.0,7.0,10.0,10.0,7.0,11.0,5.0,4.0,8.0,4.0,0.0,1.0,100.0,59.0,41.0,0.400000,0.510000,0.090000,16.0,0.750000,0.160000,0.0,0.500000,0.350000,0.000000,11234.267197,11184.000000,28322.021999,21149.000000,7855.336603,5134.000000,3217.875711,4987.000000,293.331625,139.000000,"{'PP': 42, 'PSOE': 13, 'Cs': 1, 'UP': 0, 'IU':...","[('PP', 42), ('PSOE', 13), ('VOX', 10), ('MP',..."
6554,022019111025000201001,02,50,50002,5000201001,Aragón,Zaragoza,Acered,125,91,0.728000,5,86,0,86,43,19,4,0,0,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,6.0,7.0,12.0,13.0,20.0,6.0,6.0,17.0,14.0,12.0,14.0,12.0,2.0,1.0,0.0,166.0,98.0,68.0,0.433735,0.487952,0.078313,43.0,0.558140,0.259036,10.0,0.500000,0.800000,0.188679,9448.000000,9665.000000,18895.000000,20525.000000,3494.000000,2873.000000,4611.000000,3968.000000,84.000000,233.000000,"{'PP': 43, 'PSOE': 19, 'Cs': 4, 'UP': 0, 'IU':...","[('PP', 43), ('VOX', 20), ('PSOE', 19), ('Cs',..."
6555,022019111025000301001,02,50,50003,5000301001,Aragón,Zaragoza,Agón,117,89,0.760684,0,89,1,88,23,39,2,2,0,20,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,6.0,6.0,14.0,12.0,12.0,10.0,8.0,11.0,15.0,10.0,6.0,4.0,1.0,0.0,0.0,137.0,80.0,57.0,0.343066,0.532847,0.124088,29.0,0.379310,0.211679,7.0,0.285714,0.714286,0.194444,12298.000000,12334.000000,27578.000000,27753.000000,5804.000000,5694.000000,5604.000000,5250.000000,161.000000,247.000000,"{'PP': 23, 'PSOE': 39, 'Cs': 2, 'UP': 2, 'IU':...","[('PSOE', 39), ('PP', 23), ('VOX', 20), ('Cs',..."
6556,022019111025000401001,02,50,50004,5000401001,Aragón,Zaragoza,Aguarón,475,360,0.757895,4,356,2,354,96,155,17,19,0,44,0,21,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,...,36.0,27.0,47.0,49.0,56.0,69.0,34.0,32.0,32.0,47.0,34.0,31.0,11.0,1.0,0.0,636.0,337.0,299.0,0.295597,0.589623,0.114780,149.0,0.469799,0.234277,21.0,0.285714,0.714286,0.123529,11280.000000,10229.000000,25421.000000,23879.000000,7039.000000,6056.000000,3502.000000,3246.000000,208.000000,253.000000,"{'PP': 96, 'PSOE': 155, 'Cs': 17, 'UP': 19, 'I...","[('PSOE', 155), ('PP', 96), ('VOX', 44), ('MP'..."
6557,022019111025000501001,02,50,50005,5000501001,Aragón,Zaragoza,Aguilón,228,185,0.811404,1,184,2,182,84,34,13,12,0,35,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,...,7.0,11.0,20.0,13.0,13.0,24.0,31.0,25.0,17.0,18.0,12.0,19.0,4.0,5.0,0.0,251.0,136.0,115.0,0.398406,0.549801,0.051793,21.0,0.571429,0.083665,5.0,0.200000,0.600000,0.192308,14168.000000,13341.000000,31410.000000,29687.000000,8651.000000,8019.000000,5616.000000,4816.000000,108.000000,191.000000,"{'PP': 84, 'PSOE': 34, 'Cs': 13, 'UP': 12, 'IU...","[('PP', 84), ('VOX', 35), ('PSOE', 34), ('Cs',..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7428,022019111025029802001,02,50,50298,5029802001,Aragón,Zaragoza,Zuera,610,482,0.790164,2,480,3,477,134,139,45,50,0,82,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,...,24.0,44.0,91.0,56.0,63.0,55.0,47.0,42.0,48.0,37.0,20.0,19.0,11.0,4.0,0.0,762.0,400.0,362.0,0.237533,0.570866,0.191601,5632.0,0.097656,7.391076,442.0,0.384615,0.459276,0.072769,12085.000000,12273.000000,31542.000000,31419.000000,9774.000000,8326.000000,3118.000000,3365.000000,213.000000,395.000000,"{'PP': 134, 'PSOE': 139, 'Cs': 45, 'UP': 50, '...","[('PSOE', 139), ('PP', 134), ('VOX', 82), ('UP..."
7429,022019111025090101001,02,50,50901,5090101001,Aragón,Zaragoza,Biel,133,96,0.721805,0,96,0,96,18,33,7,8,0,21,0,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,...,6.0,4.0,8.0,10.0,16.0,19.0,10.0,19.0,17.0,7.0,7.0,7.0,3.0,1.0,0.0,154.0,96.0,58.0,0.396104,0.571429,0.032468,38.0,0.289474,0.246753,6.0,0.000000,0.666667,0.136364,16414.000000,16613.000000,25367.000000,26506.000000,13108.000000,9636.000000,7146.000000,7398.000000,145.000000,214.000000,"{'PP': 18, 'PSOE': 33, 'Cs': 7, 'UP': 8, 'IU':...","[('PSOE', 33), ('VOX', 21), ('PP', 18), ('UP',..."
7430,022019111025090201001,02,50,50902,5090201001,Aragón,Zaragoza,Marracos,77,65,0.844156,3,62,0,62,29,15,4,3,0,10,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,5.0,1.0,9.0,11.0,4.0,6.0,5.0,5.0,8.0,10.0,6.0,0.0,1.0,1.0,0.0,86.0,53.0,33.0,0.360465,0.534884,0.104651,9.0,0.777778,0.104651,1.0,0.000000,0.000000,0.100000,11234.267197,10618.182737,28322.021999,26938.114416,7855.336603,6845.948425,3217.875711,2985.302533,293.331625,347.217589,"{'PP': 29, 'PSOE': 15, 'Cs': 4, 'UP': 3, 'IU':...","[('PP', 29), ('PSOE', 15), ('VOX', 10), ('Cs',..."
7431,022019111025090301001,02,50,50903,5090301001,Aragón,Zaragoza,Villamayor de Gállego,1143,844,0.738408,5,839,10,829,160,226,64,133,0,160,0,65,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,17,...,71.0,67.0,88.0,106.0,128.0,111.0,107.0,79.0,69.0,52.0,46.0,50.0,14.0,5.0,1.0,1368.0,682.0,686.0,0.230994,0.603070,0.165936,723.0,0.300138,0.528509,131.0,0.404580,0.557252,0.153396,13087.000000,12022.000000,34050.000000,31945.000000,9707.000000,8721.000000,3872.000000,3239.000000,162.000000,287.000000,"{'PP': 160, 'PSOE': 226, 'Cs': 64, 'UP': 133, ...","[('PSOE', 226), ('PP', 160), ('VOX', 160), ('U..."


Ahora tomamos una decisión algo arbitraria, que es quedarnos con las secciones de más de 500 censados, pues pensamos que no es bueno depender de aquellas que sean demasiado pequeñas, y en las que factores púramente locales hagan variar el resultado electoral. Quedan 661 secciones, lo cual no es una rebaja muy grande.

In [None]:
secciones_select = secciones_select.loc[secciones_select['Censo_Esc'] > 500]

In [None]:
secciones_select

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,Participación,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,IU,VOX,UPyD,MP,CiU,ERC,JxC,CUP,DiL,PNV,Bildu,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros,...,30-34,35-39,40-44,45-49,50-54,55-59,60-64,65-69,70-74,75-79,80-84,85-89,90-94,95-99,100 y más,Población Total,Hombres,Mujeres,% mayores 65 años,% 20-64 años,% menores 19 años,Afiliados SS Minicipio,% Afiliados SS autónomos,% Afiliados SS / Población,Paro Registrado Municipio,% Paro Hombres,% Paro mayores 45,% Paro s/ Afiliados SS Municipio,Renta persona 2017,Renta persona 2015,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
6558,022019111025000601001,02,50,50006,5000601001,Aragón,Zaragoza,Ainzón,913,670,0.733844,16,654,13,641,140,282,44,59,0,91,0,21,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,...,64.0,52.0,68.0,75.0,98.0,94.0,80.0,55.0,44.0,51.0,43.0,50.0,20.0,7.0,1.0,1077.0,542.0,535.0,0.251625,0.591458,0.156917,212.0,0.495283,0.196843,64.0,0.421875,0.578125,0.231884,11458.0,10682.0,28032.0,26577.0,8319.0,6628.0,3506.0,3683.0,261.0,258.0,"{'PP': 140, 'PSOE': 282, 'Cs': 44, 'UP': 59, '...","[('PSOE', 282), ('PP', 140), ('VOX', 91), ('UP..."
6560,022019111025000801001,02,50,50008,5000801001,Aragón,Zaragoza,Alagón,882,484,0.548753,3,481,10,471,65,162,46,72,0,91,0,21,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,4,...,66.0,76.0,104.0,104.0,96.0,99.0,98.0,85.0,56.0,43.0,16.0,7.0,1.0,0.0,0.0,1173.0,580.0,593.0,0.177323,0.638534,0.184143,2801.0,0.159229,2.387894,527.0,0.436433,0.531309,0.158353,11727.0,11251.0,30871.0,29924.0,9155.0,7807.0,2856.0,2812.0,271.0,403.0,"{'PP': 65, 'PSOE': 162, 'Cs': 46, 'UP': 72, 'I...","[('PSOE', 162), ('VOX', 91), ('UP', 72), ('PP'..."
6561,022019111025000801002,02,50,50008,5000801002,Aragón,Zaragoza,Alagón,1353,856,0.632668,9,847,13,834,127,313,58,151,0,148,0,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,7,...,124.0,152.0,134.0,151.0,151.0,138.0,86.0,72.0,78.0,65.0,64.0,63.0,36.0,6.0,0.0,1858.0,927.0,931.0,0.206674,0.610334,0.182992,2801.0,0.159229,1.507535,527.0,0.436433,0.531309,0.158353,11113.0,10578.0,28145.0,26651.0,9036.0,7514.0,2858.0,2896.0,250.0,328.0,"{'PP': 127, 'PSOE': 313, 'Cs': 58, 'UP': 151, ...","[('PSOE', 313), ('UP', 151), ('VOX', 148), ('P..."
6562,022019111025000801003,02,50,50008,5000801003,Aragón,Zaragoza,Alagón,1758,1138,0.647327,25,1113,7,1106,165,385,98,191,0,192,0,57,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,12,...,164.0,170.0,204.0,218.0,168.0,166.0,141.0,108.0,96.0,80.0,73.0,60.0,24.0,9.0,1.0,2395.0,1185.0,1210.0,0.188309,0.612944,0.198747,2801.0,0.159229,1.169520,527.0,0.436433,0.531309,0.158353,10969.0,10149.0,28058.0,25554.0,8652.0,7190.0,2872.0,2771.0,254.0,380.0,"{'PP': 165, 'PSOE': 385, 'Cs': 98, 'UP': 191, ...","[('PSOE', 385), ('VOX', 192), ('UP', 191), ('P..."
6563,022019111025000801004,02,50,50008,5000801004,Aragón,Zaragoza,Alagón,1194,856,0.716918,9,847,7,840,139,257,74,114,0,199,0,36,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,13,...,104.0,127.0,196.0,156.0,115.0,109.0,127.0,70.0,61.0,41.0,33.0,22.0,3.0,5.0,1.0,1737.0,880.0,857.0,0.135866,0.628094,0.236039,2801.0,0.159229,1.612550,527.0,0.436433,0.531309,0.158353,12367.0,11672.0,34736.0,33041.0,10159.0,8951.0,2698.0,2456.0,210.0,273.0,"{'PP': 139, 'PSOE': 257, 'Cs': 74, 'UP': 114, ...","[('PSOE', 257), ('VOX', 199), ('PP', 139), ('U..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7426,022019111025029801002,02,50,50298,5029801002,Aragón,Zaragoza,Zuera,1618,1226,0.757726,18,1208,18,1190,331,419,94,92,0,215,0,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,7,...,214.0,175.0,250.0,202.0,208.0,195.0,155.0,107.0,85.0,64.0,61.0,54.0,16.0,5.0,0.0,2617.0,1348.0,1269.0,0.149790,0.659916,0.190294,5632.0,0.097656,2.152083,442.0,0.384615,0.459276,0.072769,11152.0,10691.0,29618.0,28599.0,9758.0,8291.0,2169.0,2263.0,183.0,249.0,"{'PP': 331, 'PSOE': 419, 'Cs': 94, 'UP': 92, '...","[('PSOE', 419), ('PP', 331), ('VOX', 215), ('C..."
7427,022019111025029801003,02,50,50298,5029801003,Aragón,Zaragoza,Zuera,1437,1016,0.707029,7,1009,10,999,200,279,98,135,0,238,0,40,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,6,...,278.0,326.0,251.0,177.0,160.0,115.0,88.0,72.0,55.0,43.0,45.0,37.0,21.0,4.0,0.0,2551.0,1367.0,1184.0,0.108585,0.664837,0.226578,5632.0,0.097656,2.207762,442.0,0.384615,0.459276,0.072769,10926.0,10495.0,29236.0,26961.0,10628.0,8834.0,1513.0,1768.0,166.0,284.0,"{'PP': 200, 'PSOE': 279, 'Cs': 98, 'UP': 135, ...","[('PSOE', 279), ('VOX', 238), ('PP', 200), ('U..."
7428,022019111025029802001,02,50,50298,5029802001,Aragón,Zaragoza,Zuera,610,482,0.790164,2,480,3,477,134,139,45,50,0,82,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,...,24.0,44.0,91.0,56.0,63.0,55.0,47.0,42.0,48.0,37.0,20.0,19.0,11.0,4.0,0.0,762.0,400.0,362.0,0.237533,0.570866,0.191601,5632.0,0.097656,7.391076,442.0,0.384615,0.459276,0.072769,12085.0,12273.0,31542.0,31419.0,9774.0,8326.0,3118.0,3365.0,213.0,395.0,"{'PP': 134, 'PSOE': 139, 'Cs': 45, 'UP': 50, '...","[('PSOE', 139), ('PP', 134), ('VOX', 82), ('UP..."
7431,022019111025090301001,02,50,50903,5090301001,Aragón,Zaragoza,Villamayor de Gállego,1143,844,0.738408,5,839,10,829,160,226,64,133,0,160,0,65,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,17,...,71.0,67.0,88.0,106.0,128.0,111.0,107.0,79.0,69.0,52.0,46.0,50.0,14.0,5.0,1.0,1368.0,682.0,686.0,0.230994,0.603070,0.165936,723.0,0.300138,0.528509,131.0,0.404580,0.557252,0.153396,13087.0,12022.0,34050.0,31945.0,9707.0,8721.0,3872.0,3239.0,162.0,287.0,"{'PP': 160, 'PSOE': 226, 'Cs': 64, 'UP': 133, ...","[('PSOE', 226), ('PP', 160), ('VOX', 160), ('U..."


También nos quedamos solo con las columnas del dataset que tratan del resultado electoral.

In [None]:
col_validas_select = ['Sección', 'Censo_Esc', 'Votos_Total', 'Nulos', 'Votos_Válidos', 'Blanco', 'V_Cand', 'PP', 'PSOE', 'Cs', 'UP',
       'IU', 'VOX', 'UPyD', 'MP', 'CiU', 'ERC', 'JxC', 'CUP', 'DiL', 'PNV',
       'Bildu', 'Amaiur', 'CC', 'FA', 'TE', 'BNG', 'PRC', 'GBai', 'Compromis',
       'PACMA', 'Otros']

In [None]:
secciones_select = secciones_select[col_validas_select]

In [None]:
secciones_select

Unnamed: 0,Sección,Censo_Esc,Votos_Total,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,IU,VOX,UPyD,MP,CiU,ERC,JxC,CUP,DiL,PNV,Bildu,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros
6558,022019111025000601001,913,670,16,654,13,641,140,282,44,59,0,91,0,21,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1
6560,022019111025000801001,882,484,3,481,10,471,65,162,46,72,0,91,0,21,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,4
6561,022019111025000801002,1353,856,9,847,13,834,127,313,58,151,0,148,0,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,7
6562,022019111025000801003,1758,1138,25,1113,7,1106,165,385,98,191,0,192,0,57,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,12
6563,022019111025000801004,1194,856,9,847,7,840,139,257,74,114,0,199,0,36,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7426,022019111025029801002,1618,1226,18,1208,18,1190,331,419,94,92,0,215,0,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,7
7427,022019111025029801003,1437,1016,7,1009,10,999,200,279,98,135,0,238,0,40,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,6
7428,022019111025029802001,610,482,2,480,3,477,134,139,45,50,0,82,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13
7431,022019111025090301001,1143,844,5,839,10,829,160,226,64,133,0,160,0,65,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,17


In [None]:
secciones_select_norm = secciones_select.copy()

In [None]:
secciones_select_norm

Unnamed: 0,Sección,Censo_Esc,Votos_Total,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,IU,VOX,UPyD,MP,CiU,ERC,JxC,CUP,DiL,PNV,Bildu,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros
6558,022019111025000601001,913,670,16,654,13,641,140,282,44,59,0,91,0,21,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1
6560,022019111025000801001,882,484,3,481,10,471,65,162,46,72,0,91,0,21,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,4
6561,022019111025000801002,1353,856,9,847,13,834,127,313,58,151,0,148,0,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,7
6562,022019111025000801003,1758,1138,25,1113,7,1106,165,385,98,191,0,192,0,57,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,12
6563,022019111025000801004,1194,856,9,847,7,840,139,257,74,114,0,199,0,36,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7426,022019111025029801002,1618,1226,18,1208,18,1190,331,419,94,92,0,215,0,30,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,7
7427,022019111025029801003,1437,1016,7,1009,10,999,200,279,98,135,0,238,0,40,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,6
7428,022019111025029802001,610,482,2,480,3,477,134,139,45,50,0,82,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13
7431,022019111025090301001,1143,844,5,839,10,829,160,226,64,133,0,160,0,65,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,17


Ahora hacemos un pequeño tratamiento de datos. Tomamos el dataset y normalizamos los resultados de las secciones dividiendo por su censo, y después trasponemos el dataset, siendo ahora las secciones las columnas, y los resultados normalizados las filas, igual que hemos hecho con el dataset que queremos modelizar. 

In [None]:
set_cols = ['Sección', 'Censo_Esc']

In [None]:
for col in secciones_select_norm.columns:

  if col not in set_cols:
    
    secciones_select_norm[col] = secciones_select_norm[col] / secciones_select_norm['Censo_Esc']

secciones_select_norm = secciones_select_norm.set_index('Sección')
secciones_select_norm = secciones_select_norm.drop('Censo_Esc', axis = 1)

secciones_select_norm = secciones_select_norm.T

In [None]:
secciones_select_norm

Sección,022019111025000601001,022019111025000801001,022019111025000801002,022019111025000801003,022019111025000801004,022019111025001701001,022019111025001801001,022019111025002001001,022019111025002201001,022019111025002401001,022019111025002501001,022019111025002501002,022019111025002502001,022019111025002901001,022019111025003401001,022019111025003801001,022019111025004501001,022019111025004501002,022019111025005101001,022019111025005301001,022019111025005501001,022019111025005502001,022019111025005701001,022019111025005901001,022019111025006201001,022019111025006601001,022019111025006601002,022019111025006701001,022019111025006701002,022019111025006701003,022019111025006701004,022019111025006701005,022019111025006701006,022019111025006702001,022019111025006702002,022019111025006702003,022019111025006702004,022019111025006702005,022019111025006703001,022019111025006704001,...,022019111025029710085,022019111025029710086,022019111025029710087,022019111025029710088,022019111025029710089,022019111025029710090,022019111025029710091,022019111025029711001,022019111025029711002,022019111025029711003,022019111025029711004,022019111025029711006,022019111025029711007,022019111025029711008,022019111025029711009,022019111025029711010,022019111025029711011,022019111025029711012,022019111025029711013,022019111025029711015,022019111025029711016,022019111025029711017,022019111025029711018,022019111025029711019,022019111025029712001,022019111025029712002,022019111025029712003,022019111025029712004,022019111025029712005,022019111025029712006,022019111025029712007,022019111025029712008,022019111025029712009,022019111025029712010,022019111025029801001,022019111025029801002,022019111025029801003,022019111025029802001,022019111025090301001,022019111025090301002
Votos_Total,0.733844,0.548753,0.632668,0.647327,0.716918,0.723632,0.704417,0.689308,0.740079,0.736648,0.702381,0.682283,0.666382,0.727273,0.672751,0.729345,0.720611,0.745552,0.687032,0.74028,0.643129,0.620959,0.700977,0.734478,0.706631,0.727438,0.714058,0.644315,0.703963,0.760684,0.721134,0.743415,0.700724,0.733143,0.778311,0.62167,0.763314,0.755891,0.619552,0.630673,...,0.739782,0.791816,0.70483,0.833954,0.827751,0.776575,0.832797,0.629969,0.574757,0.721954,0.701856,0.633534,0.662791,0.698718,0.732444,0.71078,0.728814,0.794504,0.701435,0.790933,0.703067,0.738796,0.760068,0.763042,0.665166,0.735724,0.66859,0.627744,0.658354,0.651982,0.655271,0.655263,0.587117,0.695172,0.749154,0.757726,0.707029,0.790164,0.738408,0.747556
Nulos,0.017525,0.003401,0.006652,0.014221,0.007538,0.010152,0.006795,0.003774,0.007937,0.005525,0.003307,0.007539,0.006826,0.001684,0.006519,0.012108,0.00916,0.012456,0.001247,0.015552,0.015209,0.014493,0.008264,0.006605,0.012732,0.006057,0.005591,0.015549,0.011655,0.001899,0.004727,0.011895,0.009654,0.014245,0.015355,0.006217,0.008284,0.01508,0.005889,0.00626,...,0.006812,0.002872,0.002683,0.002234,0.003828,0.007874,0.005359,0.006881,0.005825,0.00965,0.00232,0.002008,0.015116,0.008242,0.007111,0.006135,0.00339,0.008363,0.003828,0.010133,0.005236,0.006402,0.005105,0.007958,0.002814,0.0,0.011538,0.007024,0.009975,0.003524,0.002849,0.006579,0.006748,0.005517,0.01071,0.011125,0.004871,0.003279,0.004374,0.008889
Votos_Válidos,0.71632,0.545351,0.626016,0.633106,0.70938,0.71348,0.697622,0.685535,0.732143,0.731123,0.699074,0.674744,0.659556,0.725589,0.666232,0.717236,0.71145,0.733096,0.685786,0.724728,0.62792,0.606466,0.692712,0.727873,0.693899,0.721381,0.708466,0.628766,0.692308,0.758784,0.716408,0.731521,0.69107,0.718898,0.762956,0.615453,0.75503,0.740811,0.613663,0.624413,...,0.73297,0.788945,0.702147,0.83172,0.823923,0.768701,0.827438,0.623089,0.568932,0.712304,0.699536,0.631526,0.647674,0.690476,0.725333,0.704645,0.725424,0.786141,0.697608,0.7808,0.697831,0.732394,0.754963,0.755084,0.662352,0.735724,0.657051,0.62072,0.648379,0.648458,0.652422,0.648684,0.580368,0.689655,0.738444,0.746601,0.702157,0.786885,0.734033,0.738667
Blanco,0.014239,0.011338,0.009608,0.003982,0.005863,0.007332,0.005663,0.005031,0.02381,0.005525,0.003968,0.014001,0.005973,0.005051,0.013038,0.010684,0.010687,0.005338,0.002494,0.004666,0.008691,0.011148,0.007513,0.006605,0.007958,0.006663,0.00639,0.007775,0.005828,0.00095,0.007427,0.005098,0.013677,0.008547,0.005758,0.00444,0.010651,0.004713,0.004711,0.004695,...,0.010218,0.008615,0.005367,0.014892,0.019139,0.008858,0.004287,0.009939,0.015534,0.007841,0.00464,0.002008,0.006977,0.005495,0.008889,0.004382,0.000847,0.007168,0.006699,0.004267,0.004488,0.007682,0.010777,0.025641,0.002814,0.002656,0.010256,0.00439,0.00665,0.005286,0.008547,0.007895,0.008589,0.004138,0.005637,0.011125,0.006959,0.004918,0.008749,0.005333
V_Cand,0.702081,0.534014,0.616408,0.629124,0.703518,0.706148,0.691959,0.680503,0.708333,0.725599,0.695106,0.660743,0.653584,0.720539,0.653194,0.706553,0.700763,0.727758,0.683292,0.720062,0.619229,0.595318,0.685199,0.721268,0.685942,0.714718,0.702077,0.620991,0.68648,0.757835,0.70898,0.726423,0.677393,0.710351,0.757198,0.611012,0.744379,0.736098,0.608952,0.619718,...,0.722752,0.78033,0.69678,0.816828,0.804785,0.759843,0.823151,0.61315,0.553398,0.704463,0.694896,0.629518,0.640698,0.684982,0.716444,0.700263,0.724576,0.778973,0.690909,0.776533,0.693343,0.724712,0.744186,0.729443,0.659539,0.733068,0.646795,0.61633,0.641729,0.643172,0.643875,0.640789,0.571779,0.685517,0.732807,0.735476,0.695198,0.781967,0.725284,0.733333
PP,0.153341,0.073696,0.093865,0.093857,0.116415,0.135928,0.184598,0.138365,0.210317,0.145488,0.205026,0.172321,0.174915,0.148148,0.142112,0.172365,0.247328,0.229537,0.204489,0.245723,0.154807,0.136566,0.174305,0.171731,0.145889,0.144155,0.148562,0.124393,0.250583,0.240266,0.170831,0.218352,0.207562,0.209877,0.25144,0.165187,0.249704,0.224317,0.133098,0.208138,...,0.101499,0.071788,0.099284,0.162323,0.180861,0.139764,0.133976,0.135321,0.08932,0.130881,0.110209,0.14257,0.111628,0.156593,0.134222,0.148116,0.116949,0.290323,0.180861,0.1328,0.084518,0.169014,0.150312,0.152962,0.138436,0.357238,0.114744,0.100966,0.068163,0.115419,0.119658,0.110526,0.134969,0.17931,0.199549,0.204574,0.139179,0.219672,0.139983,0.144889
PSOE,0.308872,0.183673,0.231338,0.218999,0.215243,0.178229,0.321631,0.242767,0.263889,0.368324,0.191799,0.207324,0.196246,0.402357,0.259452,0.292735,0.178626,0.279359,0.302993,0.29549,0.183596,0.193423,0.189331,0.268164,0.216976,0.142944,0.15016,0.215743,0.210956,0.230769,0.232275,0.220051,0.167337,0.226971,0.21977,0.128774,0.222485,0.193214,0.254417,0.156495,...,0.21049,0.228284,0.230769,0.251675,0.239234,0.261811,0.26045,0.133792,0.192233,0.193004,0.227378,0.167671,0.189535,0.18956,0.193778,0.182296,0.229661,0.143369,0.152153,0.220267,0.246073,0.190781,0.237096,0.223696,0.182893,0.069057,0.227564,0.213345,0.264339,0.253744,0.24359,0.218421,0.089571,0.136552,0.233935,0.258962,0.194154,0.227869,0.197725,0.184
Cs,0.048193,0.052154,0.042868,0.055745,0.061977,0.078962,0.032843,0.056604,0.053571,0.042357,0.059524,0.06839,0.054608,0.043771,0.061278,0.040598,0.035115,0.044484,0.024938,0.037325,0.049973,0.053512,0.052592,0.048877,0.066844,0.054512,0.075879,0.050534,0.054779,0.059829,0.072248,0.059473,0.060338,0.05793,0.064299,0.047957,0.069822,0.065975,0.058893,0.043818,...,0.08515,0.087581,0.068873,0.117647,0.114833,0.088583,0.098607,0.04893,0.064078,0.082027,0.064965,0.043173,0.045349,0.070513,0.083556,0.081507,0.069492,0.09319,0.061244,0.0928,0.065071,0.058899,0.074305,0.081344,0.051773,0.061089,0.055769,0.042142,0.05985,0.037885,0.052707,0.063158,0.053374,0.062069,0.055242,0.058096,0.068198,0.07377,0.055993,0.066667
UP,0.064622,0.081633,0.111604,0.108646,0.095477,0.087422,0.026048,0.055346,0.027778,0.062615,0.054894,0.042542,0.047782,0.045455,0.061278,0.070513,0.064122,0.032028,0.033666,0.043546,0.059207,0.066332,0.03456,0.046235,0.085942,0.092065,0.079073,0.048591,0.031469,0.05793,0.049966,0.04503,0.049879,0.050332,0.067179,0.044405,0.046154,0.06032,0.038869,0.045383,...,0.098093,0.13855,0.109123,0.07446,0.080383,0.087598,0.099678,0.060398,0.066019,0.086852,0.096288,0.070281,0.081395,0.056777,0.093333,0.07099,0.127119,0.04779,0.085167,0.0896,0.1092,0.088348,0.087918,0.055703,0.082161,0.029216,0.067308,0.08604,0.086451,0.088987,0.08547,0.092105,0.066258,0.078621,0.063698,0.05686,0.093946,0.081967,0.11636,0.102222
IU,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Lo que ocurre ahora es que no sabemos qué secciones vamos finalmente a utilizar.

Seleccionaremos las secciones que estén menos correlacionadas entre sí. Lo que pasa es que vemos que hay registros enteros que tienen todo ceros, por lo que es posible que nos diese un error si quisiemos calcular la matriz de correlación a partir del anterior dataset, 'secciones_select_norm'.

Pese a ser algo redundante, vamos a partir del dataset antes de normalizar, el 'secciones_select'. A este df le aplicamos la función 'preparación_sec' que definimos a continuación. Esencialmente lo que hace es:

- Elimina las columnas (votos a partidos) que son todo ceros, es decir, los que no se presentaron en Burgos, en este caso.

- Normaliza por el censo

- Cambia el orden de los registros al azar, esto es importante para no dar sistemáticamente más importancia a una sección sobre otra cuando las seleccionemos.

- Hace una trasposición, como hemos visto antes.

In [None]:
def preparacion_sec(eleccion):

  set_cols = ['Sección', 'Censo_Esc']
  
  for col in eleccion.columns:

    if eleccion[col].sum() == 0:

      eleccion = eleccion.drop([col], axis = 1)

    elif col not in set_cols:

      eleccion[col] = eleccion[col] / eleccion['Censo_Esc']

  eleccion = eleccion.set_index('Sección')
  eleccion = eleccion.drop('Censo_Esc', axis = 1)

  df_elec_transpose = eleccion.T

  lista_sec = list(df_elec_transpose.columns)
  random.shuffle(lista_sec)

  df_elec_transpose = df_elec_transpose[lista_sec]

  return df_elec_transpose


Con lo que obtenemos, luego veremos un ejemplo, ya podemos seleccionar las secciones. Tras calcular la matriz de correlación de todas las secciones, se la pasamos a la función siguiente, 'secciones_corr', que se encarga de repasar una a una las correlacines de cada sección con el resto, comenzando por la primera que, como vimos elegimos al azar.

Vamos viendo si cada seccion tiene una correlación máxima con otras secciones por encima o por debajo de un limite, threshold:

- Si está por encima, es que está demasiado correlacionada con otra que ya hemos revisado, y por lo tanto la eliminamos. 

- Si está por debajo, no la eliminamos.

Al pasar por todas las secciones, nos quedamos por lo tanto con las poco correlacionadas entre sí. Se trata de elegir bien el threshold para que tengamos unas cuantas, pero no demasiadas, normalmente menos de 10, pongamos.

La elección de las secciones depende del orden en que se vayan examinando, que hemos hecho en la función anterior que fuese al azar, por lo que cada vez puede dar (casi seguro) distintas secciones, salvo que fijemos una semilla.

In [None]:
def secciones_corr(dummy, threshold = 0.995):

  for ind in range(2, m.shape[0]):
    s = m.iloc[0:ind, 0:ind]

    if max(s.iloc[ind-1, 0:ind-1] > threshold):
    # print(m.columns[ind-1])
      dummy = dummy.drop(m.columns[ind-1], axis = 0)
      dummy = dummy.drop(m.columns[ind-1], axis = 1)

  return dummy.columns


El resultado de la primera función es un dataset normalizado y traspuesto, pero que tiene por filas elementos que no son enteramente ceros.

In [None]:
secc = preparacion_sec(secciones_select)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  del sys.path[0]


In [None]:
secc

Sección,022019111025029701041,022019111025016301003,022019111025029710073,022019111025029703084,022019111025029702006,022019111025009502003,022019111025029711007,022019111025029707018,022019111025029712005,022019111025029702031,022019111025029706003,022019111025029710031,022019111025029703078,022019111025029702051,022019111025029710048,022019111025007401001,022019111025006201001,022019111025023002001,022019111025014701001,022019111025022201002,022019111025006701001,022019111025009902001,022019111025029703032,022019111025029704018,022019111025029706011,022019111025029703045,022019111025024001001,022019111025029709028,022019111025025101008,022019111025029710026,022019111025005301001,022019111025029702037,022019111025029710087,022019111025029703019,022019111025029706026,022019111025029708022,022019111025029705008,022019111025024101001,022019111025000801002,022019111025005101001,...,022019111025023501001,022019111025029703034,022019111025000801003,022019111025029706009,022019111025029705043,022019111025029703058,022019111025029703060,022019111025029702003,022019111025029702018,022019111025020901001,022019111025029704021,022019111025029703052,022019111025025203001,022019111025029701031,022019111025029709001,022019111025027201004,022019111025029703054,022019111025009401001,022019111025029710042,022019111025029703087,022019111025025101003,022019111025029702027,022019111025029704064,022019111025029710047,022019111025029703036,022019111025029703039,022019111025029704011,022019111025007404001,022019111025029710023,022019111025029707004,022019111025029703061,022019111025029709016,022019111025029704039,022019111025029710001,022019111025008901002,022019111025029708009,022019111025029710039,022019111025029702021,022019111025028801002,022019111025029704058
Votos_Total,0.599628,0.76589,0.749329,0.805466,0.798471,0.716887,0.662791,0.832536,0.658354,0.764531,0.688446,0.73,0.819209,0.838975,0.795102,0.564494,0.706631,0.694215,0.682796,0.706452,0.644315,0.527458,0.633533,0.734973,0.747837,0.65848,0.734889,0.83553,0.634441,0.681979,0.74028,0.771403,0.70483,0.639191,0.574545,0.730473,0.683544,0.71134,0.632668,0.687032,...,0.705263,0.545703,0.647327,0.700357,0.683466,0.661509,0.602504,0.734072,0.80213,0.748705,0.741007,0.689573,0.706186,0.714599,0.620419,0.62131,0.629808,0.712925,0.781421,0.65047,0.702899,0.791878,0.782977,0.77439,0.669574,0.671093,0.773455,0.642292,0.71364,0.730275,0.642259,0.578256,0.592219,0.715729,0.774887,0.570048,0.756326,0.791332,0.735363,0.837079
Nulos,0.0,0.001059,0.001791,0.001608,0.002085,0.009106,0.015116,0.009569,0.009975,0.002981,0.00239,0.002222,0.001412,0.009149,0.009495,0.009709,0.012732,0.003306,0.002688,0.01129,0.015549,0.007663,0.005988,0.002732,0.008653,0.002339,0.010604,0.010315,0.007553,0.002356,0.015552,0.003643,0.002683,0.006221,0.001818,0.0022,0.009283,0.013746,0.006652,0.001247,...,0.007368,0.004093,0.014221,0.004756,0.004421,0.003868,0.0,0.00277,0.0,0.011226,0.006295,0.010664,0.006014,0.006586,0.007853,0.00576,0.003606,0.013605,0.003903,0.021944,0.007246,0.003807,0.007964,0.004355,0.002907,0.00715,0.007628,0.012846,0.006323,0.008257,0.00523,0.002389,0.002882,0.003608,0.005656,0.008454,0.009372,0.00321,0.017564,0.004815
Votos_Válidos,0.599628,0.764831,0.747538,0.803859,0.796386,0.707781,0.647674,0.822967,0.648379,0.76155,0.686056,0.727778,0.817797,0.829826,0.785607,0.554785,0.693899,0.690909,0.680108,0.695161,0.628766,0.519796,0.627545,0.73224,0.739184,0.65614,0.724284,0.825215,0.626888,0.679623,0.724728,0.76776,0.702147,0.63297,0.572727,0.728273,0.674262,0.697595,0.626016,0.685786,...,0.697895,0.54161,0.633106,0.6956,0.679045,0.65764,0.602504,0.731302,0.80213,0.737478,0.734712,0.67891,0.700172,0.708013,0.612565,0.615551,0.626202,0.69932,0.777518,0.628527,0.695652,0.788071,0.775012,0.770035,0.666667,0.663943,0.765828,0.629447,0.707317,0.722018,0.637029,0.575866,0.589337,0.712121,0.769231,0.561594,0.746954,0.788122,0.717799,0.832263
Blanco,0.009311,0.002119,0.006267,0.008039,0.00417,0.005795,0.006977,0.00638,0.00665,0.00149,0.005578,0.004444,0.0,0.010979,0.008496,0.004161,0.007958,0.01157,0.00672,0.008065,0.007775,0.002554,0.003593,0.005464,0.007417,0.004678,0.012725,0.013181,0.012085,0.002356,0.004666,0.002732,0.005367,0.004666,0.003636,0.006601,0.007595,0.003436,0.009608,0.002494,...,0.007368,0.001364,0.003982,0.002378,0.007073,0.003868,0.00626,0.0,0.004437,0.008636,0.000899,0.010664,0.012027,0.0,0.001309,0.008639,0.003606,0.004082,0.008587,0.007837,0.014493,0.005076,0.013937,0.006969,0.002907,0.006129,0.003814,0.001976,0.006323,0.002752,0.003138,0.008363,0.001441,0.007215,0.007353,0.003623,0.004686,0.014446,0.009368,0.002408
V_Cand,0.590317,0.762712,0.741271,0.79582,0.792217,0.701987,0.640698,0.816587,0.641729,0.76006,0.680478,0.723333,0.817797,0.818847,0.777111,0.550624,0.685942,0.679339,0.673387,0.687097,0.620991,0.517241,0.623952,0.726776,0.731768,0.651462,0.711559,0.812034,0.614804,0.677267,0.720062,0.765027,0.69678,0.628305,0.569091,0.721672,0.666667,0.694158,0.616408,0.683292,...,0.690526,0.540246,0.629124,0.693222,0.671972,0.653772,0.596244,0.731302,0.797693,0.728843,0.733813,0.668246,0.688144,0.708013,0.611257,0.606911,0.622596,0.695238,0.768931,0.62069,0.681159,0.782995,0.761075,0.763066,0.66376,0.657814,0.762014,0.62747,0.700994,0.719266,0.633891,0.567503,0.587896,0.704906,0.761878,0.557971,0.742268,0.773676,0.708431,0.829856
PP,0.098696,0.141949,0.130707,0.302251,0.386379,0.148179,0.111628,0.139553,0.068163,0.248882,0.140239,0.147778,0.214689,0.308326,0.104948,0.07767,0.145889,0.193388,0.123656,0.13871,0.124393,0.058748,0.116168,0.248634,0.159456,0.138012,0.158006,0.087106,0.169184,0.088339,0.245723,0.29326,0.099284,0.135303,0.063636,0.122112,0.140084,0.25945,0.093865,0.204489,...,0.169474,0.081855,0.093857,0.136742,0.138815,0.12766,0.131455,0.301939,0.350488,0.199482,0.223921,0.117299,0.219931,0.211855,0.123037,0.11879,0.135817,0.190476,0.189696,0.144201,0.21558,0.359137,0.095072,0.101916,0.166667,0.152196,0.320366,0.136364,0.152665,0.133945,0.118201,0.069295,0.116715,0.108947,0.177036,0.054348,0.155576,0.309791,0.201405,0.272873
PSOE,0.170391,0.141949,0.271262,0.189711,0.079917,0.268212,0.189535,0.256778,0.264339,0.154993,0.239841,0.283333,0.210452,0.182983,0.264868,0.237171,0.216976,0.231405,0.298387,0.243548,0.215743,0.274585,0.256287,0.17623,0.271941,0.240936,0.284199,0.218911,0.243202,0.293286,0.29549,0.139344,0.230769,0.222395,0.230909,0.221122,0.237131,0.19244,0.231338,0.302993,...,0.195789,0.223738,0.218999,0.317479,0.233422,0.272727,0.228482,0.126039,0.120674,0.219344,0.186151,0.28673,0.213058,0.15258,0.181937,0.182865,0.260817,0.263946,0.214676,0.211599,0.240942,0.10533,0.210055,0.343206,0.227713,0.249234,0.161709,0.205534,0.197832,0.327523,0.243724,0.215054,0.246398,0.242424,0.136878,0.205314,0.238988,0.158909,0.13466,0.142857
Cs,0.050279,0.090042,0.057296,0.083601,0.070883,0.058775,0.045349,0.082935,0.05985,0.105812,0.048606,0.048889,0.112994,0.095151,0.090455,0.056865,0.066844,0.041322,0.043011,0.056452,0.050534,0.028097,0.047904,0.061475,0.084054,0.044444,0.068929,0.135244,0.030211,0.04947,0.037325,0.093807,0.068873,0.043546,0.047273,0.063806,0.059072,0.042955,0.042868,0.024938,...,0.068421,0.038199,0.055745,0.047562,0.082228,0.032882,0.050078,0.065097,0.108252,0.056995,0.071942,0.065166,0.049828,0.069155,0.048429,0.044636,0.050481,0.057143,0.08587,0.061129,0.052536,0.07868,0.102539,0.057491,0.061047,0.054137,0.073227,0.066206,0.086721,0.053211,0.062762,0.051374,0.030259,0.082251,0.086538,0.032609,0.08716,0.05297,0.063232,0.098716
UP,0.128492,0.103814,0.112802,0.046624,0.030577,0.058775,0.081395,0.108453,0.086451,0.067064,0.077291,0.098889,0.070621,0.05398,0.112444,0.04577,0.085942,0.061157,0.071237,0.054839,0.048591,0.058748,0.05509,0.084699,0.069221,0.08655,0.065748,0.11404,0.086103,0.084806,0.043546,0.050091,0.109123,0.097978,0.087273,0.105611,0.070886,0.036082,0.111604,0.033666,...,0.074737,0.079127,0.108646,0.061831,0.06985,0.083172,0.068858,0.062327,0.042591,0.068221,0.066547,0.06872,0.039519,0.09989,0.116492,0.086393,0.066106,0.044898,0.078845,0.075235,0.059783,0.035533,0.130413,0.101916,0.075581,0.07763,0.048818,0.043478,0.097561,0.068807,0.084728,0.094385,0.072046,0.097403,0.059955,0.097826,0.093721,0.05297,0.067916,0.058587
VOX,0.085661,0.228814,0.111907,0.130225,0.202224,0.131623,0.175581,0.150718,0.112219,0.157973,0.112351,0.1,0.162429,0.147301,0.148926,0.115118,0.132095,0.133884,0.086022,0.158065,0.139942,0.088123,0.112575,0.110656,0.100124,0.094737,0.098621,0.176504,0.057402,0.109541,0.080871,0.154827,0.127013,0.076205,0.103636,0.155116,0.11308,0.130584,0.109387,0.105985,...,0.144211,0.075034,0.109215,0.090369,0.111406,0.102515,0.086072,0.127424,0.144632,0.149396,0.131295,0.100711,0.135739,0.118551,0.081152,0.13103,0.075721,0.084354,0.156909,0.092476,0.083333,0.173858,0.135391,0.103659,0.094961,0.085802,0.128909,0.12747,0.106594,0.09633,0.077406,0.081243,0.100865,0.117605,0.246606,0.137681,0.111528,0.158909,0.211944,0.215891


Ahora calculamos la matriz de correlación y se la pasamos a la segunda función con el valor del threshold. Obtenemos siete secciones, que ya sabemos que no están tan correlacionadas entre sí.

In [None]:
m = secc.corr()
lista_sec = secciones_corr(m, 0.996)

In [None]:
lista_sec

Index(['022019111025029701041', '022019111025016301003',
       '022019111025029703084', '022019111025029702006',
       '022019111025029711007', '022019111025029702009',
       '022019111025029704058'],
      dtype='object', name='Sección')

In [None]:
lista_sec = np.sort(lista_sec)

In [None]:
lista_sec

array(['022019111025016301003', '022019111025029701041',
       '022019111025029702006', '022019111025029702009',
       '022019111025029703084', '022019111025029704058',
       '022019111025029711007'], dtype=object)

Ya sabiendo las secciones que hemos elegido ya las podemos seleccionar del dataset normalizado que incluía las 661 secciones de Zaragoza, incluyendo las filas que son todo ceros. 

In [None]:
secciones_select_norm = secciones_select_norm[lista_sec]

In [None]:
secciones_select_norm

Sección,022019111025016301003,022019111025029701041,022019111025029702006,022019111025029702009,022019111025029703084,022019111025029704058,022019111025029711007
Votos_Total,0.76589,0.599628,0.798471,0.739445,0.805466,0.837079,0.662791
Nulos,0.001059,0.0,0.002085,0.001206,0.001608,0.004815,0.015116
Votos_Válidos,0.764831,0.599628,0.796386,0.738239,0.803859,0.832263,0.647674
Blanco,0.002119,0.009311,0.00417,0.002413,0.008039,0.002408,0.006977
V_Cand,0.762712,0.590317,0.792217,0.735826,0.79582,0.829856,0.640698
PP,0.141949,0.098696,0.386379,0.126659,0.302251,0.272873,0.111628
PSOE,0.141949,0.170391,0.079917,0.171291,0.189711,0.142857,0.189535
Cs,0.090042,0.050279,0.070883,0.074789,0.083601,0.098716,0.045349
UP,0.103814,0.128492,0.030577,0.072376,0.046624,0.058587,0.081395
IU,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Vemos que tiene las 30 filas que tiene los datos normalizados de la provincia de Zaragoza que queremos modelizar. Podemos añadir este df para tener los datos que pasaremos al modelo de regresión en un solo df.

In [None]:
secciones_select_norm.shape

(30, 7)

In [None]:
secciones_select_norm['Modelización'] = modelizacion['Modelización']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [None]:
secciones_select_norm

Sección,022019111025016301003,022019111025029701041,022019111025029702006,022019111025029702009,022019111025029703084,022019111025029704058,022019111025029711007,Modelización
Votos_Total,0.76589,0.599628,0.798471,0.739445,0.805466,0.837079,0.662791,0.719466
Nulos,0.001059,0.0,0.002085,0.001206,0.001608,0.004815,0.015116,0.006076
Votos_Válidos,0.764831,0.599628,0.796386,0.738239,0.803859,0.832263,0.647674,0.713389
Blanco,0.002119,0.009311,0.00417,0.002413,0.008039,0.002408,0.006977,0.006958
V_Cand,0.762712,0.590317,0.792217,0.735826,0.79582,0.829856,0.640698,0.706431
PP,0.141949,0.098696,0.386379,0.126659,0.302251,0.272873,0.111628,0.166932
PSOE,0.141949,0.170391,0.079917,0.171291,0.189711,0.142857,0.189535,0.220048
Cs,0.090042,0.050279,0.070883,0.074789,0.083601,0.098716,0.045349,0.065202
UP,0.103814,0.128492,0.030577,0.072376,0.046624,0.058587,0.081395,0.080233
IU,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
secciones_select_norm.index

Index(['Votos_Total', 'Nulos', 'Votos_Válidos', 'Blanco', 'V_Cand', 'PP',
       'PSOE', 'Cs', 'UP', 'IU', 'VOX', 'UPyD', 'MP', 'CiU', 'ERC', 'JxC',
       'CUP', 'DiL', 'PNV', 'Bildu', 'Amaiur', 'CC', 'FA', 'TE', 'BNG', 'PRC',
       'GBai', 'Compromis', 'PACMA', 'Otros'],
      dtype='object')

Ahora ya podemos modelizar mediante regresión lineal. Cargamos las librerías necesarias, y definimos las matrices X e y.

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

In [None]:
X = secciones_select_norm.drop('Modelización', axis = 1).values

In [None]:
y = secciones_select_norm['Modelización'].values

In [None]:
X

array([[7.65889831e-01, 5.99627561e-01, 7.98471161e-01, 7.39445115e-01,
        8.05466238e-01, 8.37078652e-01, 6.62790698e-01],
       [1.05932203e-03, 0.00000000e+00, 2.08478110e-03, 1.20627262e-03,
        1.60771704e-03, 4.81540931e-03, 1.51162791e-02],
       [7.64830508e-01, 5.99627561e-01, 7.96386379e-01, 7.38238842e-01,
        8.03858521e-01, 8.32263242e-01, 6.47674419e-01],
       [2.11864407e-03, 9.31098696e-03, 4.16956220e-03, 2.41254524e-03,
        8.03858521e-03, 2.40770465e-03, 6.97674419e-03],
       [7.62711864e-01, 5.90316574e-01, 7.92216817e-01, 7.35826297e-01,
        7.95819936e-01, 8.29855538e-01, 6.40697674e-01],
       [1.41949153e-01, 9.86964618e-02, 3.86379430e-01, 1.26658625e-01,
        3.02250804e-01, 2.72873194e-01, 1.11627907e-01],
       [1.41949153e-01, 1.70391061e-01, 7.99166088e-02, 1.71290712e-01,
        1.89710611e-01, 1.42857143e-01, 1.89534884e-01],
       [9.00423729e-02, 5.02793296e-02, 7.08825573e-02, 7.47889023e-02,
        8.36012862e-02, 9

In [None]:
y

array([0.71946552, 0.00607642, 0.7133891 , 0.00695846, 0.70643064,
       0.16693179, 0.22004842, 0.06520238, 0.08023338, 0.        ,
       0.12857079, 0.        , 0.03213501, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.00499869, 0.00831018])

Hacemos el fit con X e y. Hemos puesto el intercept como cero, para que no aparezcan votos en partidos que no se presentaron en Zaragoza. Es algo óptico más que nada.

In [None]:
reg = LinearRegression(fit_intercept = False).fit(X, y)

In [None]:
reg.intercept_*censo_mod

0.0

Parece que hay un fit excelente, el 99,998%

In [None]:
reg.score(X, y)

0.9999859921001674

Estos son los coeficientes, que sumados no es extraño que den casi 1, pues tras normalizar estamos modelizando magnitudes unidimensinales del mismo orden de magnitud.

In [None]:
reg.coef_

array([ 0.13345587,  0.09708634, -0.26023849, -0.04128157,  0.78075225,
       -0.15702478,  0.45451262])

In [None]:
reg.coef_.sum()

1.007262242428102

Ahora podemos ver los resultados que hemos predicho en nuestro modelo. Deshacemos la normalización volviendo a multiplicar por el censo total de la provincia Zaragoza, y lo almacenamos en un df.

In [None]:
est = reg.predict(X)*censo_mod

In [None]:
df = pd.DataFrame(est, index = secciones_select_norm.index, columns = ['Estimación']).astype('int32')

In [None]:
df

Unnamed: 0,Estimación
Votos_Total,515640
Nulos,4949
Votos_Válidos,510690
Blanco,6489
V_Cand,504201
PP,119196
PSOE,156991
Cs,47034
UP,56931
IU,0


Ahora mostramos los datos reales que queríamos modelizar, y lo mostramos en otro df.

In [None]:
df1 = pd.DataFrame(secciones_mod.sum(), columns = ['Real']).drop('Censo_Esc')

In [None]:
df1

Unnamed: 0,Real
Votos_Total,514697
Nulos,4347
Votos_Válidos,510350
Blanco,4978
V_Cand,505372
PP,119421
PSOE,157420
Cs,46645
UP,57398
IU,0


Comparamos ambos df. Dado el fit tal alto, era de esperar que se parecieran bastante, en especial en el caso de los partidos principales. Desde luego, el fit parece impresionante pese a que solo hemos utilizado 7 secciones electorales de la provincia, que como vimos tiene 880 en total.

In [None]:
df['Real'] = df1['Real']

In [None]:
df

Unnamed: 0,Estimación,Real
Votos_Total,515640,514697
Nulos,4949,4347
Votos_Válidos,510690,510350
Blanco,6489,4978
V_Cand,504201,505372
PP,119196,119421
PSOE,156991,157420
Cs,47034,46645
UP,56931,57398
IU,0,0


## Modelización en las elecciones de 2016

Nos puede surgir la pregunta que cuán válida es la selección de secciones electorales en 2019 si utilizamos sus equivalentes en las elecciones de 2016. Eso es lo que tratamos en este capítulo. Recordamos las secciones elegidas:

In [None]:
lista_sec

array(['022019111025016301003', '022019111025029701041',
       '022019111025029702006', '022019111025029702009',
       '022019111025029703084', '022019111025029704058',
       '022019111025029711007'], dtype=object)

Esas secciones son las de 2019, tenemos que encontrar las equivalentes, o similares, en 2016. Para ello cargamos el df de similitud de secciones, que acumula todas de las 5 últimas elecciones. 

In [None]:
sim_secciones = pd.read_csv('/content/drive/MyDrive/Proyecto_KeepCoding - Propio/Data/similitud_secciones_def_REF.csv', dtype = 'str')

In [None]:
sim_secciones

Unnamed: 0,cod_sec_ref,CUSEC,CUMUN,CPRO,Elección,cod_ccaa_orig,cod_ccaa_ref,cercana N11_ref,cercana D15_ref,cercana J16_ref,cercana A19_ref,cercana N19_ref
0,022019041140100901001,0100901001,01009,01,02201904,16,14,022011111140100901001,022015121140100901001,022016061140100901001,022019041140100901001,022019111140100901001
1,022019041140101001002,0101001002,01010,01,02201904,16,14,022011111140101001002,022015121140101001002,022016061140101001002,022019041140101001002,022019111140101001002
2,022019041140103101001,0103101001,01031,01,02201904,16,14,022011111140103101001,022015121140103101001,022016061140103101001,022019041140103101001,022019111140103101001
3,022019041140103301001,0103301001,01033,01,02201904,16,14,022011111140103301001,022015121140103301001,022016061140103301001,022019041140103301001,022019111140103301001
4,022019041140103701001,0103701001,01037,01,02201904,16,14,022011111140103701001,022015121140103701001,022016061140103701001,022019041140103701001,022019111140103701001
...,...,...,...,...,...,...,...,...,...,...,...,...
181034,022011111195200108010,5200108010,52001,52,02201111,19,19,022011111195200108010,022015121195200108010,022016061195200108010,022019041195200108010,022019111195200108010
181035,022011111195200108011,5200108011,52001,52,02201111,19,19,022011111195200108011,022015121195200108011,022016061195200108011,022019041195200108011,022019111195200108011
181036,022011111195200108012,5200108012,52001,52,02201111,19,19,022011111195200108012,022015121195200108012,022016061195200108012,022019041195200108012,022019111195200108012
181037,022011111195200108013,5200108013,52001,52,02201111,19,19,022011111195200108013,022015121195200108013,022016061195200108013,022019041195200108013,022019111195200108013


Ahora seleccinamos las similares a las secciones de Zaragoza que encontramos en el capítulo anterior...

In [None]:
sec_select_J16 = sim_secciones.loc[sim_secciones['cod_sec_ref'].isin(lista_sec)]

In [None]:
sec_select_J16

Unnamed: 0,cod_sec_ref,CUSEC,CUMUN,CPRO,Elección,cod_ccaa_orig,cod_ccaa_ref,cercana N11_ref,cercana D15_ref,cercana J16_ref,cercana A19_ref,cercana N19_ref
67969,022019111025029701041,5029701041,50297,50,2201911,2,2,022011111025029701041,022015121025029701041,022016061025029701041,022019041025029701041,022019111025029701041
68003,022019111025029702009,5029702009,50297,50,2201911,2,2,022011111025029702009,022015121025029702009,022016061025029702009,022019041025029702009,022019111025029702009
68044,022019111025029702006,5029702006,50297,50,2201911,2,2,022011111025029702006,022015121025029702006,022016061025029702006,022019041025029702006,022019111025029702006
68112,022019111025029703084,5029703084,50297,50,2201911,2,2,022011111025029703084,022015121025029703084,022016061025029703084,022019041025029703084,022019111025029703084
68152,022019111025029704058,5029704058,50297,50,2201911,2,2,022011111025029704058,022015121025029704058,022016061025029704058,022019041025029704058,022019111025029704058
70003,022019111025029711007,5029711007,50297,50,2201911,2,2,022011111025029711007,022015121025029711007,022016061025029711007,022019041025029711007,022019111025029711007
71190,022019111025016301003,5016301003,50163,50,2201911,2,2,022011111025016301002,022015121025016301002,022016061025016301002,022019041025016301003,022019111025016301003


... y escogemos sus equivalentes en las elecciones de 2016, que son estas siete:

In [None]:
list_sec_J16 = list(sec_select_J16['cercana J16_ref'])

In [None]:
list_sec_J16

['022016061025029701041',
 '022016061025029702009',
 '022016061025029702006',
 '022016061025029703084',
 '022016061025029704058',
 '022016061025029711007',
 '022016061025016301002']

In [None]:
list_sec_J16 = np.sort(list_sec_J16)

In [None]:
list_sec_J16

array(['022016061025016301002', '022016061025029701041',
       '022016061025029702006', '022016061025029702009',
       '022016061025029703084', '022016061025029704058',
       '022016061025029711007'], dtype='<U21')

In [None]:
lista_sec

array(['022019111025016301003', '022019111025029701041',
       '022019111025029702006', '022019111025029702009',
       '022019111025029703084', '022019111025029704058',
       '022019111025029711007'], dtype=object)

Cargamos ahora los resultados de las elecciones de junio de 2016

In [None]:
df_eleccion_comp_J16 = pd.read_csv('/content/drive/MyDrive/Proyecto_KeepCoding - Propio/Data/Gen-16-Jun/gen_J16_unif_cols_prov.txt', dtype = strings)

Seleccionamos las secciones a modelizar, que los naturalmente las de la provincia de Zaragoza.

In [None]:
secciones_mod = df_eleccion_comp_J16

if len(ccaa_mod) > 0:

  secciones_mod = secciones_mod.loc[secciones_mod['CCAA'].isin(ccaa_mod)]

if len(provincia_mod) > 0:

  secciones_mod = secciones_mod.loc[secciones_mod['Provincia'].isin(provincia_mod)]

if len(municipio_mod) > 0:

  secciones_mod = secciones_mod.loc[secciones_mod['Municipio'].isin(municipio_mod)]


In [None]:
secciones_mod

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,Participación,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,IU,VOX,UPyD,MP,CiU,ERC,JxC,CUP,DiL,PNV,Bildu,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros,...,30-34,35-39,40-44,45-49,50-54,55-59,60-64,65-69,70-74,75-79,80-84,85-89,90-94,95-99,100 y más,Población Total,Hombres,Mujeres,% mayores 65 años,% 20-64 años,% menores 19 años,Afiliados SS Minicipio,% Afiliados SS autónomos,% Afiliados SS / Población,Paro Registrado Municipio,% Paro Hombres,% Paro mayores 45,% Paro s/ Afiliados SS Municipio,Renta persona 2017,Renta persona 2015,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
6494,022016061025000101001,02,50,50001,5000101001,Aragón,Zaragoza,Abanto,100,77,0.770000,0,77,0,77,52,15,4,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,3.0,4.0,8.0,2.0,6.0,9.0,7.0,13.0,9.0,5.0,11.0,11.0,6.0,2.0,1.0,106.0,59.0,47.0,0.547170,0.433962,0.018868,20.0,0.800000,0.188679,4.0,1.000000,0.750000,0.166667,11234.267197,11184.000000,28322.021999,21149.000000,7855.336603,5134.000000,3217.875711,4987.000000,293.331625,139.000000,"{'PP': 52, 'PSOE': 15, 'Cs': 4, 'UP': 6, 'IU':...","[('PP', 52), ('PSOE', 15), ('UP', 6), ('Cs', 4..."
6495,022016061025000201001,02,50,50002,5000201001,Aragón,Zaragoza,Acered,143,108,0.755245,4,104,1,103,72,22,8,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,15.0,11.0,15.0,16.0,15.0,7.0,14.0,20.0,14.0,15.0,16.0,11.0,2.0,1.0,0.0,223.0,134.0,89.0,0.354260,0.591928,0.053812,76.0,0.302632,0.340807,7.0,0.857143,0.428571,0.084337,9448.000000,9665.000000,18895.000000,20525.000000,3494.000000,2873.000000,4611.000000,3968.000000,84.000000,233.000000,"{'PP': 72, 'PSOE': 22, 'Cs': 8, 'UP': 1, 'IU':...","[('PP', 72), ('PSOE', 22), ('Cs', 8), ('UP', 1..."
6496,022016061025000301001,02,50,50003,5000301001,Aragón,Zaragoza,Agón,128,92,0.718750,1,91,1,90,34,32,15,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,...,5.0,9.0,16.0,15.0,10.0,8.0,10.0,16.0,10.0,12.0,10.0,7.0,1.0,0.0,0.0,155.0,87.0,68.0,0.361290,0.503226,0.135484,28.0,0.428571,0.180645,9.0,0.444444,0.666667,0.243243,12298.000000,12334.000000,27578.000000,27753.000000,5804.000000,5694.000000,5604.000000,5250.000000,161.000000,247.000000,"{'PP': 34, 'PSOE': 32, 'Cs': 15, 'UP': 6, 'IU'...","[('PP', 34), ('PSOE', 32), ('Cs', 15), ('UP', ..."
6497,022016061025000401001,02,50,50004,5000401001,Aragón,Zaragoza,Aguarón,522,390,0.747126,2,388,0,388,146,148,43,49,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,...,35.0,41.0,47.0,64.0,53.0,52.0,40.0,24.0,46.0,48.0,49.0,28.0,7.0,1.0,0.0,686.0,358.0,328.0,0.295918,0.591837,0.112245,156.0,0.448718,0.227405,20.0,0.600000,0.550000,0.113636,11280.000000,10229.000000,25421.000000,23879.000000,7039.000000,6056.000000,3502.000000,3246.000000,208.000000,253.000000,"{'PP': 146, 'PSOE': 148, 'Cs': 43, 'UP': 49, '...","[('PSOE', 148), ('PP', 146), ('UP', 49), ('Cs'..."
6498,022016061025000501001,02,50,50005,5000501001,Aragón,Zaragoza,Aguilón,229,189,0.825328,0,189,1,188,111,37,23,12,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,...,8.0,16.0,16.0,12.0,18.0,24.0,24.0,21.0,21.0,14.0,21.0,19.0,8.0,1.0,0.0,247.0,134.0,113.0,0.425101,0.542510,0.032389,27.0,0.481481,0.109312,1.0,0.000000,1.000000,0.035714,14168.000000,13341.000000,31410.000000,29687.000000,8651.000000,8019.000000,5616.000000,4816.000000,108.000000,191.000000,"{'PP': 111, 'PSOE': 37, 'Cs': 23, 'UP': 12, 'I...","[('PP', 111), ('PSOE', 37), ('Cs', 23), ('UP',..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7366,022016061025029802001,02,50,50298,5029802001,Aragón,Zaragoza,Zuera,619,492,0.794830,2,490,6,484,188,109,86,92,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,2,...,35.0,77.0,72.0,59.0,64.0,45.0,43.0,46.0,45.0,29.0,25.0,26.0,9.0,1.0,1.0,771.0,408.0,363.0,0.236057,0.582361,0.181582,3792.0,0.312236,4.918288,466.0,0.433476,0.515021,0.109441,12085.000000,12273.000000,31542.000000,31419.000000,9774.000000,8326.000000,3118.000000,3365.000000,213.000000,395.000000,"{'PP': 188, 'PSOE': 109, 'Cs': 86, 'UP': 92, '...","[('PP', 188), ('PSOE', 109), ('UP', 92), ('Cs'..."
7367,022016061025090101001,02,50,50901,5090101001,Aragón,Zaragoza,Biel,138,105,0.760870,0,105,0,105,42,33,11,17,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,...,4.0,8.0,5.0,9.0,15.0,12.0,13.0,19.0,15.0,8.0,13.0,5.0,5.0,1.0,0.0,143.0,84.0,59.0,0.461538,0.524476,0.013986,32.0,0.375000,0.223776,5.0,0.800000,0.600000,0.135135,16414.000000,16613.000000,25367.000000,26506.000000,13108.000000,9636.000000,7146.000000,7398.000000,145.000000,214.000000,"{'PP': 42, 'PSOE': 33, 'Cs': 11, 'UP': 17, 'IU...","[('PP', 42), ('PSOE', 33), ('UP', 17), ('Cs', ..."
7368,022016061025090201001,02,50,50902,5090201001,Aragón,Zaragoza,Marracos,87,79,0.908046,0,79,2,77,50,14,8,4,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,4.0,7.0,9.0,7.0,6.0,5.0,7.0,5.0,15.0,8.0,5.0,2.0,1.0,0.0,0.0,91.0,51.0,40.0,0.395604,0.549451,0.054945,8.0,0.750000,0.087912,2.0,0.000000,1.000000,0.200000,11234.267197,10618.182737,28322.021999,26938.114416,7855.336603,6845.948425,3217.875711,2985.302533,293.331625,347.217589,"{'PP': 50, 'PSOE': 14, 'Cs': 8, 'UP': 4, 'IU':...","[('PP', 50), ('PSOE', 14), ('Cs', 8), ('UP', 4..."
7369,022016061025090301001,02,50,50903,5090301001,Aragón,Zaragoza,Villamayor de Gállego,1133,846,0.746690,1,845,5,840,276,188,136,211,0,4,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,9,...,61.0,85.0,95.0,127.0,116.0,112.0,103.0,77.0,63.0,40.0,66.0,39.0,10.0,3.0,2.0,1402.0,704.0,698.0,0.213980,0.600571,0.185449,685.0,0.303650,0.488588,148.0,0.398649,0.513514,0.177671,13087.000000,12022.000000,34050.000000,31945.000000,9707.000000,8721.000000,3872.000000,3239.000000,162.000000,287.000000,"{'PP': 276, 'PSOE': 188, 'Cs': 136, 'UP': 211,...","[('PP', 276), ('UP', 211), ('PSOE', 188), ('Cs..."


In [None]:
censo_mod = secciones_mod['Censo_Esc'].sum()

Procedemos de igual manera, sumamos los resultados, normalizamos y los almacenamos en un df.

In [None]:
censo_mod

712841

In [None]:
secciones_mod = secciones_mod[cols_validas_mod]

In [None]:
modelizacion = pd.DataFrame(secciones_mod.sum(), columns = ['Modelización'])
modelizacion['Modelización'] = modelizacion['Modelización'] / modelizacion['Modelización']['Censo_Esc']
modelizacion = modelizacion.drop(['Censo_Esc']) 

In [None]:
modelizacion

Unnamed: 0,Modelización
Votos_Total,0.721988
Nulos,0.005557
Votos_Válidos,0.716432
Blanco,0.005759
V_Cand,0.710673
PP,0.250529
PSOE,0.175263
Cs,0.120522
UP,0.145055
IU,0.0


In [None]:
modelizacion.shape

(30, 1)

Ahora ya no tenemos que seleccionar las secciones de la provincia de Zaragoza porque ya las conocemos: son las 7 que hemos visto antes. Sí que nos hace falta almacenar los resultados que tuvieron en 2016.

In [None]:
secciones_select = df_eleccion_comp_J16.loc[df_eleccion_comp_J16['Sección'].isin(list_sec_J16)]

In [None]:
secciones_select = secciones_select[col_validas_select]

In [None]:
secciones_select

Unnamed: 0,Sección,Censo_Esc,Votos_Total,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,IU,VOX,UPyD,MP,CiU,ERC,JxC,CUP,DiL,PNV,Bildu,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros
6704,022016061025016301002,1976,1496,12,1484,18,1466,489,201,385,351,0,3,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,19
6900,022016061025029701041,1098,651,4,647,4,643,165,124,88,254,0,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,5
6907,022016061025029702006,1400,1161,4,1157,4,1153,774,77,205,79,0,6,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,4
6909,022016061025029702009,810,599,3,596,2,594,263,93,109,104,0,6,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,5
7023,022016061025029703084,635,527,6,521,5,516,246,97,103,61,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,4
7079,022016061025029704058,1250,1042,1,1041,5,1036,463,148,270,128,0,7,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,6
7341,022016061025029711007,877,556,6,550,4,546,199,122,91,123,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,5


In [None]:
secciones_select_norm = secciones_select.copy()

Y ahora simplemente normalizamos y trasponemos.

In [None]:
for col in secciones_select_norm.columns:

  if col not in set_cols:
    
    secciones_select_norm[col] = secciones_select_norm[col] / secciones_select_norm['Censo_Esc']

secciones_select_norm = secciones_select_norm.set_index('Sección')
secciones_select_norm = secciones_select_norm.drop('Censo_Esc', axis = 1)

secciones_select_norm = secciones_select_norm.T

In [None]:
secciones_select_norm

Sección,022016061025016301002,022016061025029701041,022016061025029702006,022016061025029702009,022016061025029703084,022016061025029704058,022016061025029711007
Votos_Total,0.757085,0.592896,0.829286,0.739506,0.829921,0.8336,0.633979
Nulos,0.006073,0.003643,0.002857,0.003704,0.009449,0.0008,0.006842
Votos_Válidos,0.751012,0.589253,0.826429,0.735802,0.820472,0.8328,0.627138
Blanco,0.009109,0.003643,0.002857,0.002469,0.007874,0.004,0.004561
V_Cand,0.741903,0.58561,0.823571,0.733333,0.812598,0.8288,0.622577
PP,0.24747,0.150273,0.552857,0.324691,0.387402,0.3704,0.22691
PSOE,0.101721,0.112933,0.055,0.114815,0.152756,0.1184,0.139111
Cs,0.194838,0.080146,0.146429,0.134568,0.162205,0.216,0.103763
UP,0.177632,0.23133,0.056429,0.128395,0.096063,0.1024,0.140251
IU,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
secciones_select_norm.shape

(30, 7)

Ya podemos modelizar, hacemos lo mismo que antes, definimos la matriz X e y.

In [None]:
secciones_select_norm['Modelización'] = modelizacion['Modelización']

In [None]:
X = secciones_select_norm.drop('Modelización', axis = 1).values
y = secciones_select_norm['Modelización'].values

In [None]:
X

array([[7.57085020e-01, 5.92896175e-01, 8.29285714e-01, 7.39506173e-01,
        8.29921260e-01, 8.33600000e-01, 6.33979475e-01],
       [6.07287449e-03, 3.64298725e-03, 2.85714286e-03, 3.70370370e-03,
        9.44881890e-03, 8.00000000e-04, 6.84150513e-03],
       [7.51012146e-01, 5.89253188e-01, 8.26428571e-01, 7.35802469e-01,
        8.20472441e-01, 8.32800000e-01, 6.27137970e-01],
       [9.10931174e-03, 3.64298725e-03, 2.85714286e-03, 2.46913580e-03,
        7.87401575e-03, 4.00000000e-03, 4.56100342e-03],
       [7.41902834e-01, 5.85610200e-01, 8.23571429e-01, 7.33333333e-01,
        8.12598425e-01, 8.28800000e-01, 6.22576967e-01],
       [2.47469636e-01, 1.50273224e-01, 5.52857143e-01, 3.24691358e-01,
        3.87401575e-01, 3.70400000e-01, 2.26909920e-01],
       [1.01720648e-01, 1.12932605e-01, 5.50000000e-02, 1.14814815e-01,
        1.52755906e-01, 1.18400000e-01, 1.39110604e-01],
       [1.94838057e-01, 8.01457195e-02, 1.46428571e-01, 1.34567901e-01,
        1.62204724e-01, 2

... y calculamos el fit del modelo que calculamos en el apartado anterior, no hacemos ahora ningún fit.

El score es magnífico, superior al 99,9%

In [None]:
reg.score(X, y)

0.9995506895213787

Si ahora comprobamos la predicción con los datos reales vemos que las diferencias son pequeñas, del orden del punto porcentual, inferiores por lo tanto al margen de error de un sondeo, por ejemplo. Y eso lo hemos conseguido solo mediante 7 secciones de la provincia, seleccionadas con los datos de otra elección...

In [None]:
est = reg.predict(X) * censo_mod
df = pd.DataFrame(est, index = secciones_select_norm.index, columns = ['Estimación']).astype('int32')
df1 = pd.DataFrame(secciones_mod.sum(), columns = ['Real']).drop('Censo_Esc')
df['Real'] = df1['Real']

In [None]:
df

Unnamed: 0,Estimación,Real
Votos_Total,511447,514663
Nulos,7576,3961
Votos_Válidos,503871,510702
Blanco,5928,4105
V_Cand,497942,506597
PP,169494,178587
PSOE,120745,124935
Cs,92674,85913
UP,106104,103401
IU,0,0


Mostramos a continuación la comparación entre el resultado real y el estimado, que, como comentamos, no difiere en más de 1 pp. El resultado negativo de Vox es debido al bajo porcentaje de voto que obtuvo en 2016.

In [None]:
df['pc Estimación'] = df['Estimación'] / df['Estimación'][2] * 100

In [None]:
df['pc Real'] = df['Real'] / df['Real'][2] * 100


In [None]:
df['dif. Real-Est.'] = df['pc Real'] - df['pc Estimación']

In [None]:
df

Unnamed: 0,Estimación,Real,pc Estimación,pc Real,dif. Real-Est.
Votos_Total,511447,514663,101.503559,100.775599,-0.72796
Nulos,7576,3961,1.503559,0.775599,-0.72796
Votos_Válidos,503871,510702,100.0,100.0,0.0
Blanco,5928,4105,1.176492,0.803796,-0.372696
V_Cand,497942,506597,98.82331,99.196204,0.372895
PP,169494,178587,33.638372,34.968925,1.330553
PSOE,120745,124935,23.963475,24.463386,0.499911
Cs,92674,85913,18.392406,16.822531,-1.569875
UP,106104,103401,21.057771,20.246837,-0.810934
IU,0,0,0.0,0.0,0.0
