# Evolución y equivalencia temporal de las secciones electorales

Las secciones electorales no son constantes en el tiempo. A medida que varía la distribución de la población, las secciones hacen también lo propio. La definición geométrica de éstas se revisa todos los años, pudiendose crear nuevas secciones o fusionarse otras.

Es por ello que las secciones de una elección no son las mismas que en otra, por lo que es preciso definir la forma de poder relacionar unas secciones en una elección otra.

La forma que hemos elegido es la de proximidad geográfica, es decir, si tomamos una sección X en una elección, su equivalencia en otra elección será la sección más cercana geográficamente a X. Impondremos que las secciones equivalentes pertenezcan al mismo municipio, y si el municipio ya no existe o se ha creado de la nada (cosa que pasa), haremos que se conserve la provincia.

Naturalmente, deberemos utilizar la definición geométrica que no proporcionan los ficheros shapely del INE, que leeremos con Geopandas.

Comenzamos pues cargando las librerias necesarias, incluyendo todas las que vienen con Geopandas.

In [None]:
#dejamos este notebook como ejemplo para mostrar el proceso de ETL llevado a cabo sobre los datos iniciales del INE pero no actualizamos los archivos resultado en S3 ya que, como explicamos más abajo, no son del todo válidos

In [None]:
import pandas as pd
import numpy as np

In [None]:
%%time 

# Important library for many geopython libraries
!apt install gdal-bin python-gdal python3-gdal 
# Install rtree - Geopandas requirment
!apt install python3-rtree 
# Install Geopandas
!pip install git+git://github.com/geopandas/geopandas.git
# Install descartes - Geopandas requirment
!pip install descartes 
# Install Folium for Geographic data visualization
!pip install folium
# Install plotlyExpress
!pip install plotly_express

Reading package lists... Done
Building dependency tree       
Reading state information... Done
gdal-bin is already the newest version (2.2.3+dfsg-2).
python-gdal is already the newest version (2.2.3+dfsg-2).
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  python3-numpy
Suggested packages:
  python-numpy-doc python3-nose python3-numpy-dbg
The following NEW packages will be installed:
  python3-gdal python3-numpy
0 upgraded, 2 newly installed, 0 to remove and 40 not upgraded.
Need to get 2,288 kB of archives.
After this operation, 13.2 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 python3-numpy amd64 1:1.13.3-2ubuntu1 [1,943 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python3-gdal amd64 2.2.3+dfsg-2 [346 kB]
Fetched 2,288 kB in 1s (1,949 kB/s)
Selecting previously unselected

In [None]:
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
import matplotlib
import matplotlib.pyplot as plt 
import folium
import plotly_express as px

Ahora cargamos el fichero shapely de las elecciones de 2019, que es el mismo para las de abril y noviembre.

In [None]:
secciones_A19 = gpd.read_file('SECC_CE_20190101.shp')

In [None]:
secciones_A19

Unnamed: 0,OBJECTID,CUSEC,CUMUN,CSEC,CDIS,CMUN,CPRO,CCA,CUDIS,CLAU2,NPRO,NCA,CNUT0,CNUT1,CNUT2,CNUT3,ESTADO,OBS,Shape_Leng,Shape_area,Shape_len,SUPERF_M2,NMUN,geometry
0,1,0100901001,01009,001,01,009,01,16,0100901,01009,Araba/Álava,País Vasco,ES,2,1,1,I,,51725.353538,6.326080e+07,51725.353538,63260804,Asparrena,"MULTIPOLYGON (((556453.835 4752758.332, 556460..."
1,2,0101001002,01010,002,01,010,01,16,0101001,01010,Araba/Álava,País Vasco,ES,2,1,1,I,,13350.774728,7.332951e+06,13350.774728,7332951,Ayala/Aiara,"POLYGON ((502035.230 4771813.197, 502048.071 4..."
2,3,0103101001,01031,001,01,031,01,16,0103101,01031,Araba/Álava,País Vasco,ES,2,1,1,I,,87711.717051,8.041601e+07,87711.717051,80416015,Laguardia,"MULTIPOLYGON (((538984.636 4718139.608, 538985..."
3,4,0103301001,01033,001,01,033,01,16,0103301,01033,Araba/Álava,País Vasco,ES,2,1,1,I,,12331.494377,5.950453e+06,12331.494377,5950453,Lapuebla de Labarca,"POLYGON ((537063.531 4703664.589, 536887.844 4..."
4,6,0103701001,01037,001,01,037,01,16,0103701,01037,Araba/Álava,País Vasco,ES,2,1,1,I,,60761.315212,1.227207e+08,60761.315212,122720687,Arraia-Maeztu,"POLYGON ((551570.951 4739269.962, 551570.889 4..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36312,10673,1509302002,15093,002,02,093,15,12,1509302,15093,"Coruña, A",Galicia,ES,1,1,1,M,,22560.055555,4.284650e+07,40586.196381,24043055,Zas,"POLYGON ((17713.727 4798559.174, 17794.388 479..."
36313,10659,1590203001,15902,001,03,902,15,12,1590203,902,"Coruña, A",Galicia,ES,1,1,1,M,,39748.841339,8.001400e+07,48119.741688,51117047,Oza-Cesuras,"POLYGON ((75420.668 4796270.576, 75442.668 479..."
36314,10189,1503103003,15031,003,03,031,15,12,1503103,15031,"Coruña, A",Galicia,ES,1,1,1,I,,21911.587016,1.727445e+07,21911.587016,465839,Culleredo,"POLYGON ((59718.470 4802760.627, 59735.222 480..."
36315,13100,2101701002,21017,002,01,017,21,01,2101701,21017,Huelva,Andalucía,ES,6,1,5,I,,103495.806720,2.784394e+08,103495.806720,238041207,Calañas,"POLYGON ((151385.134 4180084.902, 151402.134 4..."


El primer paso que damos es el crear el código completo de la sección, que incluya la elección. 

Aquí cometimos un error, confiando en que el INE e Interior hubiesen adoptado el mismo cógido para las CCAA, cosa que no es así. Nos dimos cuenta tarde, y tuviimos que arreglarlo con otro cuaderno. Las variaciones no fueron importantes, con lo que lo que hacemos en este cuaderno es válido.

In [None]:
secciones_A19['cod_sec'] = '022019041' + secciones_A19['CCA'] + secciones_A19['CUSEC']

In [None]:
secciones_A19['CCA'].value_counts()

01    5981
09    5071
13    4417
07    3543
10    3472
12    2169
08    1948
16    1711
02    1450
05    1381
14    1225
11     965
03     850
04     662
15     562
06     467
17     343
18      56
19      44
Name: CCA, dtype: int64

In [None]:
secciones_A19.head()

Unnamed: 0,OBJECTID,CUSEC,CUMUN,CSEC,CDIS,CMUN,CPRO,CCA,CUDIS,CLAU2,NPRO,NCA,CNUT0,CNUT1,CNUT2,CNUT3,ESTADO,OBS,Shape_Leng,Shape_area,Shape_len,SUPERF_M2,NMUN,geometry,cod_sec
0,1,100901001,1009,1,1,9,1,16,100901,1009,Araba/Álava,País Vasco,ES,2,1,1,I,,51725.353538,63260800.0,51725.353538,63260804,Asparrena,"MULTIPOLYGON (((556453.835 4752758.332, 556460...",022019041160100901001
1,2,101001002,1010,2,1,10,1,16,101001,1010,Araba/Álava,País Vasco,ES,2,1,1,I,,13350.774728,7332951.0,13350.774728,7332951,Ayala/Aiara,"POLYGON ((502035.230 4771813.197, 502048.071 4...",022019041160101001002
2,3,103101001,1031,1,1,31,1,16,103101,1031,Araba/Álava,País Vasco,ES,2,1,1,I,,87711.717051,80416010.0,87711.717051,80416015,Laguardia,"MULTIPOLYGON (((538984.636 4718139.608, 538985...",022019041160103101001
3,4,103301001,1033,1,1,33,1,16,103301,1033,Araba/Álava,País Vasco,ES,2,1,1,I,,12331.494377,5950453.0,12331.494377,5950453,Lapuebla de Labarca,"POLYGON ((537063.531 4703664.589, 536887.844 4...",022019041160103301001
4,6,103701001,1037,1,1,37,1,16,103701,1037,Araba/Álava,País Vasco,ES,2,1,1,I,,60761.315212,122720700.0,60761.315212,122720687,Arraia-Maeztu,"POLYGON ((551570.951 4739269.962, 551570.889 4...",022019041160103701001


In [None]:
len(secciones_A19['cod_sec'].unique())

36317

Cargamos el dataset de las elecciones de abril de 2019.

In [None]:
strings = {'Sección' : 'str', 'cod_ccaa' : 'str', 'cod_prov' : 'str', 'cod_mun' : 'str', 'cod_sec' : 'str'}

In [None]:
df_eleccion_A19 = pd.read_csv('gen_A19_unif_cols_prov.txt', dtype = strings)

In [None]:
df_eleccion_A19

Unnamed: 0,Sección,cod_ccaa,cod_prov,cod_mun,cod_sec,CCAA,Provincia,Municipio,Censo_Esc,Votos_Total,Participación,Nulos,Votos_Válidos,Blanco,V_Cand,PP,PSOE,Cs,UP,IU,VOX,UPyD,MP,CiU,ERC,JxC,CUP,DiL,PNV,Bildu,Amaiur,CC,FA,TE,BNG,PRC,GBai,Compromis,PACMA,Otros,...,30-34,35-39,40-44,45-49,50-54,55-59,60-64,65-69,70-74,75-79,80-84,85-89,90-94,95-99,100 y más,Población Total,Hombres,Mujeres,% mayores 65 años,% 20-64 años,% menores 19 años,Afiliados SS Minicipio,% Afiliados SS autónomos,% Afiliados SS / Población,Paro Registrado Municipio,% Paro Hombres,% Paro mayores 45,% Paro s/ Afiliados SS Municipio,Renta persona 2017,Renta persona 2015,Renta hogar 2017,Renta hogar 2015,Renta Salarios 2018,Renta Salarios 2015,Renta Pensiones 2018,Renta Pensiones 2015,Renta Desempleo 2018,Renta Desempleo 2015,dict_res,dict_res_ord
0,022019041010400101001,01,04,04001,0400101001,Andalucía,Almería,Abla,1014,768,0.757396,5,763,9,754,149,326,131,44,0,88,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,4,...,73,80,89,81,94,87,91,77,72,42,67,56,19,4,0,1249,635,614,0.269816,0.590072,0.140112,291.0,0.243986,0.232986,143.0,0.419580,0.559441,0.329493,9159.0,8788.0,20172.0,19546.0,5574.0,4833.0,3286.0,3082.0,403.0,471.0,"{'PP': 149, 'PSOE': 326, 'Cs': 131, 'UP': 44, ...","[('PSOE', 326), ('PP', 149), ('Cs', 131), ('VO..."
1,022019041010400201001,01,04,04002,0400201001,Andalucía,Almería,Abrucena,1039,798,0.768046,6,792,7,785,127,380,91,60,0,113,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,7,...,60,75,70,70,108,101,99,86,61,64,61,46,14,2,1,1202,637,565,0.278702,0.609817,0.111481,323.0,0.238390,0.268719,158.0,0.367089,0.601266,0.328482,8827.0,8107.0,17841.0,17115.0,4640.0,4048.0,3418.0,2770.0,568.0,620.0,"{'PP': 127, 'PSOE': 380, 'Cs': 91, 'UP': 60, '...","[('PSOE', 380), ('PP', 127), ('VOX', 113), ('C..."
2,022019041010400301001,01,04,04003,0400301001,Andalucía,Almería,Adra,671,519,0.773472,4,515,1,514,162,131,68,44,0,103,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,1,...,54,54,54,61,82,75,67,48,37,40,26,15,3,1,0,892,435,457,0.190583,0.643498,0.165919,7571.0,0.393871,8.487668,3036.0,0.399868,0.459157,0.286226,8965.0,8267.0,26498.0,24688.0,5121.0,4795.0,2499.0,2301.0,337.0,333.0,"{'PP': 162, 'PSOE': 131, 'Cs': 68, 'UP': 44, '...","[('PP', 162), ('PSOE', 131), ('VOX', 103), ('C..."
3,022019041010400301002,01,04,04003,0400301002,Andalucía,Almería,Adra,1282,954,0.744150,13,941,7,934,239,241,166,62,0,218,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,6,2,...,108,158,162,150,140,119,103,67,49,37,30,14,7,1,0,1752,865,887,0.117009,0.647260,0.235731,7571.0,0.393871,4.321347,3036.0,0.399868,0.459157,0.286226,8599.0,7941.0,25677.0,23400.0,5381.0,4837.0,1815.0,1724.0,343.0,464.0,"{'PP': 239, 'PSOE': 241, 'Cs': 166, 'UP': 62, ...","[('PSOE', 241), ('PP', 239), ('VOX', 218), ('C..."
4,022019041010400301003,01,04,04003,0400301003,Andalucía,Almería,Adra,1535,1087,0.708143,20,1067,6,1061,274,252,170,67,0,282,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12,4,...,189,178,215,227,164,110,96,61,58,41,40,27,4,4,0,2240,1094,1146,0.104911,0.647768,0.247321,7571.0,0.393871,3.379911,3036.0,0.399868,0.459157,0.286226,8076.0,7150.0,22051.0,19687.0,5224.0,4044.0,1170.0,1198.0,416.0,476.0,"{'PP': 274, 'PSOE': 252, 'Cs': 170, 'UP': 67, ...","[('VOX', 282), ('PP', 274), ('PSOE', 252), ('C..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36312,022019041195200108011,19,52,52001,5200108011,Melilla,Melilla,Melilla,1605,1118,0.696573,9,1109,4,1105,250,194,168,38,0,167,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,288,...,181,185,171,164,165,180,155,97,38,34,19,16,4,3,0,2480,1244,1236,0.085081,0.623387,0.291532,24290.0,0.193413,9.794355,11827.0,0.381331,0.398326,0.327464,16433.0,15847.0,66352.0,62632.0,11378.0,11119.0,1508.0,1274.0,167.0,166.0,"{'PP': 250, 'PSOE': 194, 'Cs': 168, 'UP': 38, ...","[('Otros', 288), ('PP', 250), ('PSOE', 194), (..."
36313,022019041195200108012,19,52,52001,5200108012,Melilla,Melilla,Melilla,1676,1207,0.720167,12,1195,7,1188,421,238,200,42,0,212,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,75,...,160,175,184,162,188,162,147,106,99,67,49,38,8,2,0,2334,1173,1161,0.158098,0.612682,0.229220,24290.0,0.193413,10.407027,11827.0,0.381331,0.398326,0.327464,17350.0,16792.0,50730.0,50839.0,13272.0,13038.0,2763.0,2445.0,169.0,177.0,"{'PP': 421, 'PSOE': 238, 'Cs': 200, 'UP': 42, ...","[('PP', 421), ('PSOE', 238), ('VOX', 212), ('C..."
36314,022019041195200108013,19,52,52001,5200108013,Melilla,Melilla,Melilla,1156,738,0.638408,3,735,6,729,175,168,123,39,0,144,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,80,...,179,172,123,117,123,127,113,68,44,24,23,17,2,1,0,1828,976,852,0.097921,0.663567,0.238512,24290.0,0.193413,13.287746,11827.0,0.381331,0.398326,0.327464,12553.0,11823.0,37816.0,36729.0,10102.0,9640.0,1807.0,1615.0,234.0,252.0,"{'PP': 175, 'PSOE': 168, 'Cs': 123, 'UP': 39, ...","[('PP', 175), ('PSOE', 168), ('VOX', 144), ('C..."
36315,022019041195200108014,19,52,52001,5200108014,Melilla,Melilla,Melilla,905,593,0.655249,6,587,3,584,222,98,64,23,0,103,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,74,...,115,124,81,59,69,70,90,71,59,42,25,8,1,0,0,1298,634,664,0.158706,0.577042,0.264253,24290.0,0.193413,18.713405,11827.0,0.381331,0.398326,0.327464,8906.0,8937.0,29898.0,31384.0,5923.0,6061.0,2463.0,2136.0,244.0,284.0,"{'PP': 222, 'PSOE': 98, 'Cs': 64, 'UP': 23, 'I...","[('PP', 222), ('VOX', 103), ('PSOE', 98), ('Ot..."


Aquí lo que comprobamos es que los códigos de las CCAA son distintos, según el INE e Interior. Lo rectificamos en otro cuaderno.

Con todo, estas filas que siguen son posteriores a redactar este cuaderno, luego lo que hicimos al principio es continuar creando los cógidos completos de las secciones en los demás ficheros shapely.

In [None]:
df_eleccion_A19['parte'] = df_eleccion_A19['Sección'].str[0:9]

In [None]:
df_eleccion_A19['parte'].value_counts()

022019041    36317
Name: parte, dtype: int64

In [None]:
secciones_A19['cod_sec'].str[0:9].value_counts()

022019041    36317
Name: cod_sec, dtype: int64

In [None]:
df_eleccion_A19['cod_ccaa'].value_counts()

01    5981
09    5071
12    4417
08    3543
17    3472
11    2169
07    1948
14    1711
02    1450
05    1381
15    1225
10     965
03     850
04     662
13     562
06     467
16     343
18      56
19      44
Name: cod_ccaa, dtype: int64

In [None]:
secciones_A19['CCA'].value_counts()

01    5981
09    5071
13    4417
07    3543
10    3472
12    2169
08    1948
16    1711
02    1450
05    1381
14    1225
11     965
03     850
04     662
15     562
06     467
17     343
18      56
19      44
Name: CCA, dtype: int64

Así, ahora cargamos el fichero shapey de noviembre de 2019, y creamos el código de sección, que está mal, pero entonces no los sabíamos.

In [None]:
secciones_N19 = gpd.read_file('SECC_CE_20190101.shp')

In [None]:
secciones_N19['cod_sec'] = '022019111' + secciones_N19['CCA'] + secciones_N19['CUSEC']

In [None]:
secciones_N19

Unnamed: 0,OBJECTID,CUSEC,CUMUN,CSEC,CDIS,CMUN,CPRO,CCA,CUDIS,CLAU2,NPRO,NCA,CNUT0,CNUT1,CNUT2,CNUT3,ESTADO,OBS,Shape_Leng,Shape_area,Shape_len,SUPERF_M2,NMUN,geometry,cod_sec
0,1,0100901001,01009,001,01,009,01,16,0100901,01009,Araba/Álava,País Vasco,ES,2,1,1,I,,51725.353538,6.326080e+07,51725.353538,63260804,Asparrena,"MULTIPOLYGON (((556453.835 4752758.332, 556460...",022019111160100901001
1,2,0101001002,01010,002,01,010,01,16,0101001,01010,Araba/Álava,País Vasco,ES,2,1,1,I,,13350.774728,7.332951e+06,13350.774728,7332951,Ayala/Aiara,"POLYGON ((502035.230 4771813.197, 502048.071 4...",022019111160101001002
2,3,0103101001,01031,001,01,031,01,16,0103101,01031,Araba/Álava,País Vasco,ES,2,1,1,I,,87711.717051,8.041601e+07,87711.717051,80416015,Laguardia,"MULTIPOLYGON (((538984.636 4718139.608, 538985...",022019111160103101001
3,4,0103301001,01033,001,01,033,01,16,0103301,01033,Araba/Álava,País Vasco,ES,2,1,1,I,,12331.494377,5.950453e+06,12331.494377,5950453,Lapuebla de Labarca,"POLYGON ((537063.531 4703664.589, 536887.844 4...",022019111160103301001
4,6,0103701001,01037,001,01,037,01,16,0103701,01037,Araba/Álava,País Vasco,ES,2,1,1,I,,60761.315212,1.227207e+08,60761.315212,122720687,Arraia-Maeztu,"POLYGON ((551570.951 4739269.962, 551570.889 4...",022019111160103701001
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36312,10673,1509302002,15093,002,02,093,15,12,1509302,15093,"Coruña, A",Galicia,ES,1,1,1,M,,22560.055555,4.284650e+07,40586.196381,24043055,Zas,"POLYGON ((17713.727 4798559.174, 17794.388 479...",022019111121509302002
36313,10659,1590203001,15902,001,03,902,15,12,1590203,902,"Coruña, A",Galicia,ES,1,1,1,M,,39748.841339,8.001400e+07,48119.741688,51117047,Oza-Cesuras,"POLYGON ((75420.668 4796270.576, 75442.668 479...",022019111121590203001
36314,10189,1503103003,15031,003,03,031,15,12,1503103,15031,"Coruña, A",Galicia,ES,1,1,1,I,,21911.587016,1.727445e+07,21911.587016,465839,Culleredo,"POLYGON ((59718.470 4802760.627, 59735.222 480...",022019111121503103003
36315,13100,2101701002,21017,002,01,017,21,01,2101701,21017,Huelva,Andalucía,ES,6,1,5,I,,103495.806720,2.784394e+08,103495.806720,238041207,Calañas,"POLYGON ((151385.134 4180084.902, 151402.134 4...",022019111012101701002


Hacemos lo mismo con el de 2015.

In [None]:
secciones_D15 = gpd.read_file('SECC_CE_20150101_01_R_INE.shp')

In [None]:
secciones_D15['cod_sec'] = '022015121' + secciones_D15['CCA'] + secciones_D15['CUSEC']

In [None]:
secciones_D15

Unnamed: 0,CUSEC,CUMUN,CSEC,CDIS,CMUN,CPRO,CCA,CUDIS,CLAU2,NPRO,NCA,CNUT0,CNUT1,CNUT2,CNUT3,Shape_Leng,Shape_Area,NMUN,geometry,cod_sec
0,0100101001,01001,001,01,001,01,16,0100101,01001,Araba/Álava,País Vasco,ES,2,1,1,24584.481049,1.110145e+07,Alegría-Dulantzi,"POLYGON ((541571.209 4745050.120, 541581.897 4...",022015121160100101001
1,0100101002,01001,002,01,001,01,16,0100101,01001,Araba/Álava,País Vasco,ES,2,1,1,18936.987581,8.823450e+06,Alegría-Dulantzi,"MULTIPOLYGON (((539559.740 4745571.157, 539562...",022015121160100101002
2,0100201001,01002,001,01,002,01,16,0100201,01002,Araba/Álava,País Vasco,ES,2,1,1,47871.683383,3.478989e+07,Amurrio,"MULTIPOLYGON (((503618.553 4759559.798, 503620...",022015121160100201001
3,0100201002,01002,002,01,002,01,16,0100201,01002,Araba/Álava,País Vasco,ES,2,1,1,31662.368886,3.930400e+07,Amurrio,"POLYGON ((506902.217 4767250.185, 506918.093 4...",022015121160100201002
4,0100201003,01002,003,01,002,01,16,0100201,01002,Araba/Álava,País Vasco,ES,2,1,1,4244.249418,8.494741e+05,Amurrio,"POLYGON ((499919.497 4766600.281, 499849.092 4...",022015121160100201003
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36224,5200108011,52001,011,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,7347.338748,1.529558e+06,Melilla,"POLYGON ((503855.575 3905845.331, 503855.894 3...",022015121195200108011
36225,5200108012,52001,012,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,1476.345581,8.996462e+04,Melilla,"POLYGON ((505732.345 3904250.409, 505638.076 3...",022015121195200108012
36226,5200108013,52001,013,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,4691.060767,5.129825e+05,Melilla,"POLYGON ((506069.418 3903737.345, 506070.096 3...",022015121195200108013
36227,5200108014,52001,014,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,790.887241,3.585715e+04,Melilla,"POLYGON ((504042.988 3905478.740, 504039.153 3...",022015121195200108014


In [None]:
secciones_D15['CCA'].value_counts()

01    5926
09    5050
13    4341
07    3553
10    3473
12    2196
08    1952
16    1735
02    1448
05    1367
14    1232
11     966
03     871
04     653
15     562
06     468
17     338
18      54
19      44
Name: CCA, dtype: int64

Y con el de 2016.

In [None]:
secciones_J16 = gpd.read_file('SECC_CE_20160101.shp')

In [None]:
secciones_J16['cod_sec'] = '022016061' + secciones_J16['CCA'] + secciones_J16['CUSEC']

In [None]:
secciones_J16

Unnamed: 0,CUSEC,CUMUN,CSEC,CDIS,CMUN,CPRO,CCA,CUDIS,CLAU2,NPRO,NCA,CNUT0,CNUT1,CNUT2,CNUT3,NMUN,Shape_Leng,SUPERF_M2,Shape_Le_1,Shape_area,Shape_len,geometry,cod_sec
0,0100101001,01001,001,01,001,01,16,0100101,01001,Araba/Álava,País Vasco,ES,2,1,1,Alegría-Dulantzi,24584.481049,1.110145e+07,24584.481049,1.110145e+07,24584.481049,"POLYGON ((543234.050 4744039.066, 543233.377 4...",022016061160100101001
1,0100101002,01001,002,01,001,01,16,0100101,01001,Araba/Álava,País Vasco,ES,2,1,1,Alegría-Dulantzi,18936.987581,8.823450e+06,18936.987581,8.823450e+06,18936.987581,"MULTIPOLYGON (((541370.963 4745058.623, 541371...",022016061160100101002
2,0100201001,01002,001,01,002,01,16,0100201,01002,Araba/Álava,País Vasco,ES,2,1,1,Amurrio,47871.683383,3.478989e+07,47871.683383,3.478989e+07,47871.683383,"MULTIPOLYGON (((502019.579 4753948.366, 502018...",022016061160100201001
3,0100201002,01002,002,01,002,01,16,0100201,01002,Araba/Álava,País Vasco,ES,2,1,1,Amurrio,31662.368886,3.930400e+07,31662.368886,3.930400e+07,31662.368886,"POLYGON ((508942.568 4765890.674, 508947.599 4...",022016061160100201002
4,0100201003,01002,003,01,002,01,16,0100201,01002,Araba/Álava,País Vasco,ES,2,1,1,Amurrio,4244.249418,8.494741e+05,4244.249418,8.494741e+05,4244.249418,"POLYGON ((499919.497 4766600.281, 499849.092 4...",022016061160100201003
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36210,5200108011,52001,011,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,Melilla,7347.338748,1.529558e+06,7347.338748,1.529558e+06,7347.338748,"POLYGON ((503747.814 3905961.511, 503842.345 3...",022016061195200108011
36211,5200108012,52001,012,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,Melilla,1476.345581,8.996462e+04,1476.345581,8.996462e+04,1476.345581,"POLYGON ((505732.345 3904250.409, 505638.076 3...",022016061195200108012
36212,5200108013,52001,013,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,Melilla,4691.060767,5.129825e+05,4691.060767,5.129825e+05,4691.060767,"POLYGON ((505958.610 3902872.793, 505937.912 3...",022016061195200108013
36213,5200108014,52001,014,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,Melilla,790.887241,3.585715e+04,790.887241,3.585715e+04,790.887241,"POLYGON ((503912.677 3905564.529, 503950.922 3...",022016061195200108014


In [None]:
secciones_J16['CCA'].value_counts()

01    5927
09    5046
13    4341
07    3556
10    3470
12    2187
08    1953
16    1735
02    1448
05    1367
14    1232
11     966
03     868
04     653
15     562
06     468
17     338
18      54
19      44
Name: CCA, dtype: int64

Y finalmente con el de 2011.

In [None]:
secciones_N11 = gpd.read_file('SECC_CE_20110101_03_R_INE.shp')

In [None]:
secciones_N11['cod_sec'] = '022011111' + secciones_N11['CCA'] + secciones_N11['CUSEC']

In [None]:
secciones_N11

Unnamed: 0,OBJECTID,CUSEC,CUMUN,CSEC,CDIS,CMUN,CPRO,CCA,Shape_Leng,CUDIS,OBS,CNUT0,CNUT1,CNUT2,CNUT3,CLAU2,NPRO,NCA,NMUN,Shape_Le_1,Shape_area,Shape_len,geometry,cod_sec
0,1.0,0100101001,01001,001,01,001,01,16,34474.734278,0100101,,ES,2,1,1,01001,Araba/Álava,País Vasco,Alegría-Dulantzi,34474.734278,1.551393e+07,34474.734278,"MULTIPOLYGON (((541571.209 4745050.120, 541581...",022011111160100101001
1,2.0,0100101002,01001,002,01,001,01,16,8620.042319,0100101,,ES,2,1,1,01001,Araba/Álava,País Vasco,Alegría-Dulantzi,8620.042319,4.410972e+06,8620.042319,"POLYGON ((541370.963 4745058.623, 541371.018 4...",022011111160100101002
2,3.0,0100201001,01002,001,01,002,01,16,47379.027701,0100201,,ES,2,1,1,01002,Araba/Álava,País Vasco,Amurrio,47379.027701,3.535737e+07,47379.027700,"MULTIPOLYGON (((503618.553 4759559.798, 503620...",022011111160100201001
3,4.0,0100201002,01002,002,01,002,01,16,31169.713203,0100201,,ES,2,1,1,01002,Araba/Álava,País Vasco,Amurrio,31169.713203,3.873652e+07,31169.713203,"POLYGON ((508942.568 4765890.674, 508947.599 4...",022011111160100201002
4,5.0,0100201003,01002,003,01,002,01,16,4244.249418,0100201,,ES,2,1,1,01002,Araba/Álava,País Vasco,Amurrio,4244.249418,8.494741e+05,4244.249418,"POLYGON ((499919.497 4766600.281, 499849.092 4...",022011111160100201003
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35956,35958.0,5200108010,52001,010,08,001,52,19,868.395859,5200108,,ES,6,4,0,52001,Melilla,Melilla,Melilla,868.395859,3.685583e+04,868.395859,"POLYGON ((504318.195 3904957.402, 504318.033 3...",022011111195200108010
35957,35959.0,5200108011,52001,011,08,001,52,19,7347.338748,5200108,,ES,6,4,0,52001,Melilla,Melilla,Melilla,7347.338748,1.529558e+06,7347.338748,"POLYGON ((503855.015 3905844.061, 503855.334 3...",022011111195200108011
35958,35960.0,5200108012,52001,012,08,001,52,19,1571.615853,5200108,,ES,6,4,0,52001,Melilla,Melilla,Melilla,1571.615853,9.849240e+04,1571.615853,"POLYGON ((505731.785 3904249.139, 505637.515 3...",022011111195200108012
35959,35961.0,5200108013,52001,013,08,001,52,19,4691.060767,5200108,,ES,6,4,0,52001,Melilla,Melilla,Melilla,4691.060767,5.129825e+05,4691.060767,"POLYGON ((506068.859 3903736.075, 506069.536 3...",022011111195200108013


In [None]:
secciones_N11['CCA'].value_counts()

01    5796
09    5019
13    4271
07    3538
10    3478
12    2249
08    1942
16    1739
02    1451
05    1330
14    1220
11     969
03     878
04     630
15     558
06     463
17     336
18      53
19      41
Name: CCA, dtype: int64

Bien, como habrá que buscar las secciones equivalentes a cada una, y en teoría todas son distintas, lo que hacemos en primer lugar es fusionar los datasets de las cinco shapelys.

In [None]:
df_unif = secciones_A19.append(secciones_N19, ignore_index=True)

In [None]:
df_unif = df_unif.append(secciones_J16, ignore_index=True)

In [None]:
df_unif = df_unif.append(secciones_D15, ignore_index=True)

In [None]:
df_unif = df_unif.append(secciones_N11, ignore_index=True)

In [None]:
df_unif

Unnamed: 0,OBJECTID,CUSEC,CUMUN,CSEC,CDIS,CMUN,CPRO,CCA,CUDIS,CLAU2,NPRO,NCA,CNUT0,CNUT1,CNUT2,CNUT3,ESTADO,OBS,Shape_Leng,Shape_area,Shape_len,SUPERF_M2,NMUN,geometry,cod_sec,Shape_Le_1,Shape_Area
0,1.0,0100901001,01009,001,01,009,01,16,0100901,01009,Araba/Álava,País Vasco,ES,2,1,1,I,,51725.353538,6.326080e+07,51725.353538,63260804.0,Asparrena,"MULTIPOLYGON (((556453.835 4752758.332, 556460...",022019041160100901001,,
1,2.0,0101001002,01010,002,01,010,01,16,0101001,01010,Araba/Álava,País Vasco,ES,2,1,1,I,,13350.774728,7.332951e+06,13350.774728,7332951.0,Ayala/Aiara,"POLYGON ((502035.230 4771813.197, 502048.071 4...",022019041160101001002,,
2,3.0,0103101001,01031,001,01,031,01,16,0103101,01031,Araba/Álava,País Vasco,ES,2,1,1,I,,87711.717051,8.041601e+07,87711.717051,80416015.0,Laguardia,"MULTIPOLYGON (((538984.636 4718139.608, 538985...",022019041160103101001,,
3,4.0,0103301001,01033,001,01,033,01,16,0103301,01033,Araba/Álava,País Vasco,ES,2,1,1,I,,12331.494377,5.950453e+06,12331.494377,5950453.0,Lapuebla de Labarca,"POLYGON ((537063.531 4703664.589, 536887.844 4...",022019041160103301001,,
4,6.0,0103701001,01037,001,01,037,01,16,0103701,01037,Araba/Álava,País Vasco,ES,2,1,1,I,,60761.315212,1.227207e+08,60761.315212,122720687.0,Arraia-Maeztu,"POLYGON ((551570.951 4739269.962, 551570.889 4...",022019041160103701001,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
181034,35958.0,5200108010,52001,010,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,,,868.395859,3.685583e+04,868.395859,,Melilla,"POLYGON ((504318.195 3904957.402, 504318.033 3...",022011111195200108010,868.395859,
181035,35959.0,5200108011,52001,011,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,,,7347.338748,1.529558e+06,7347.338748,,Melilla,"POLYGON ((503855.015 3905844.061, 503855.334 3...",022011111195200108011,7347.338748,
181036,35960.0,5200108012,52001,012,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,,,1571.615853,9.849240e+04,1571.615853,,Melilla,"POLYGON ((505731.785 3904249.139, 505637.515 3...",022011111195200108012,1571.615853,
181037,35961.0,5200108013,52001,013,08,001,52,19,5200108,52001,Melilla,Melilla,ES,6,4,0,,,4691.060767,5.129825e+05,4691.060767,,Melilla,"POLYGON ((506068.859 3903736.075, 506069.536 3...",022011111195200108013,4691.060767,


Comprobamos que seguimos estando en un fichero shapely preguntado por la proyección.

In [None]:
df_unif.crs

<Projected CRS: EPSG:25830>
Name: ETRS89 / UTM zone 30N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Europe between 6°W and 0°W: Faroe Islands offshore; Ireland - offshore; Jan Mayen - offshore; Norway including Svalbard - offshore; Spain - onshore and offshore.
- bounds: (-5.9999999999999, 35.265663028, 1.7053025658242e-13, 80.489344496333)
Coordinate Operation:
- name: UTM zone 30N
- method: Transverse Mercator
Datum: European Terrestrial Reference System 1989 ensemble
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

Hay muchos metadatos en los shapely que no nos serán útiles. Nos quedamos con los códigos de secciones y su definición geométrica.

In [None]:
df_unif.columns

Index(['OBJECTID', 'CUSEC', 'CUMUN', 'CSEC', 'CDIS', 'CMUN', 'CPRO', 'CCA',
       'CUDIS', 'CLAU2', 'NPRO', 'NCA', 'CNUT0', 'CNUT1', 'CNUT2', 'CNUT3',
       'ESTADO', 'OBS', 'Shape_Leng', 'Shape_area', 'Shape_len', 'SUPERF_M2',
       'NMUN', 'geometry', 'cod_sec', 'Shape_Le_1', 'Shape_Area'],
      dtype='object')

In [None]:
df_unif = df_unif[['cod_sec', 'CUSEC', 'CUMUN', 'CPRO', 'geometry']]

In [None]:
df_unif

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,geometry
0,022019041160100901001,0100901001,01009,01,"MULTIPOLYGON (((556453.835 4752758.332, 556460..."
1,022019041160101001002,0101001002,01010,01,"POLYGON ((502035.230 4771813.197, 502048.071 4..."
2,022019041160103101001,0103101001,01031,01,"MULTIPOLYGON (((538984.636 4718139.608, 538985..."
3,022019041160103301001,0103301001,01033,01,"POLYGON ((537063.531 4703664.589, 536887.844 4..."
4,022019041160103701001,0103701001,01037,01,"POLYGON ((551570.951 4739269.962, 551570.889 4..."
...,...,...,...,...,...
181034,022011111195200108010,5200108010,52001,52,"POLYGON ((504318.195 3904957.402, 504318.033 3..."
181035,022011111195200108011,5200108011,52001,52,"POLYGON ((503855.015 3905844.061, 503855.334 3..."
181036,022011111195200108012,5200108012,52001,52,"POLYGON ((505731.785 3904249.139, 505637.515 3..."
181037,022011111195200108013,5200108013,52001,52,"POLYGON ((506068.859 3903736.075, 506069.536 3..."


In [None]:
df_unif.dtypes

cod_sec       object
CUSEC         object
CUMUN         object
CPRO          object
geometry    geometry
dtype: object

Ahora bien, para calcular la posición de una sección, la forma más cómoda es definir, o más bien calcular, su centroide. Geopandas tiene un método dedicado a ello. Creamos una nueva columna con este dato, que es simplememte un punto.

In [None]:
from shapely.geometry import Polygon, LineString, Point

In [None]:
df_unif['Centroide'] = df_unif['geometry'].centroid



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [None]:
df_unif

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,geometry,Centroide
0,022019041160100901001,0100901001,01009,01,"MULTIPOLYGON (((556453.835 4752758.332, 556460...",POINT (555853.735 4749228.022)
1,022019041160101001002,0101001002,01010,01,"POLYGON ((502035.230 4771813.197, 502048.071 4...",POINT (500019.424 4771906.777)
2,022019041160103101001,0103101001,01031,01,"MULTIPOLYGON (((538984.636 4718139.608, 538985...",POINT (535052.148 4710767.820)
3,022019041160103301001,0103301001,01033,01,"POLYGON ((537063.531 4703664.589, 536887.844 4...",POINT (535297.829 4704792.205)
4,022019041160103701001,0103701001,01037,01,"POLYGON ((551570.951 4739269.962, 551570.889 4...",POINT (545408.849 4733097.078)
...,...,...,...,...,...,...
181034,022011111195200108010,5200108010,52001,52,"POLYGON ((504318.195 3904957.402, 504318.033 3...",POINT (504226.917 3904997.784)
181035,022011111195200108011,5200108011,52001,52,"POLYGON ((503855.015 3905844.061, 503855.334 3...",POINT (503591.236 3905281.004)
181036,022011111195200108012,5200108012,52001,52,"POLYGON ((505731.785 3904249.139, 505637.515 3...",POINT (505545.192 3904295.175)
181037,022011111195200108013,5200108013,52001,52,"POLYGON ((506068.859 3903736.075, 506069.536 3...",POINT (506046.231 3903459.669)


Por comodidad creamos otra columna con el código que identifica la elección, siendo 02 una elección general, seguida del año y del mes.

In [None]:
df_unif['Elección'] = df_unif['cod_sec'].str[0:8] 



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [None]:
df_unif

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,geometry,Centroide,Elección
0,022019041160100901001,0100901001,01009,01,"MULTIPOLYGON (((556453.835 4752758.332, 556460...",POINT (555853.735 4749228.022),02201904
1,022019041160101001002,0101001002,01010,01,"POLYGON ((502035.230 4771813.197, 502048.071 4...",POINT (500019.424 4771906.777),02201904
2,022019041160103101001,0103101001,01031,01,"MULTIPOLYGON (((538984.636 4718139.608, 538985...",POINT (535052.148 4710767.820),02201904
3,022019041160103301001,0103301001,01033,01,"POLYGON ((537063.531 4703664.589, 536887.844 4...",POINT (535297.829 4704792.205),02201904
4,022019041160103701001,0103701001,01037,01,"POLYGON ((551570.951 4739269.962, 551570.889 4...",POINT (545408.849 4733097.078),02201904
...,...,...,...,...,...,...,...
181034,022011111195200108010,5200108010,52001,52,"POLYGON ((504318.195 3904957.402, 504318.033 3...",POINT (504226.917 3904997.784),02201111
181035,022011111195200108011,5200108011,52001,52,"POLYGON ((503855.015 3905844.061, 503855.334 3...",POINT (503591.236 3905281.004),02201111
181036,022011111195200108012,5200108012,52001,52,"POLYGON ((505731.785 3904249.139, 505637.515 3...",POINT (505545.192 3904295.175),02201111
181037,022011111195200108013,5200108013,52001,52,"POLYGON ((506068.859 3903736.075, 506069.536 3...",POINT (506046.231 3903459.669),02201111


Como ya vimos, la equivalencia de una sección X de una elección, porgamos 2015 en otra elección, 2016, será la sección de esta última elección que tenga un centroide más cercano al de X, siempre que esté en su mismo municipio o provincia.

Para calcular distancias Geopandas cuenta también con un método .distance().

Podemos ver un ejemplo de la distancia entre los centrioides de dos secciones, en este caso unos 520 kilómetros.

In [None]:
a = df_unif['Centroide'][586]

In [None]:
b = df_unif['Centroide'][10587]

In [None]:
a.distance(b)

520189.72612155194

Tomamos ahora la sección de índice 10587, que es una de las elecciones de abril de 2019. Resulta ser del municipio de código 17062.

In [None]:
secc = df_unif['cod_sec'][10587]
secc

'022019041091706201003'

In [None]:
cent = df_unif['Centroide'][10587]


In [None]:
mun = df_unif['CUMUN'][10587]
mun

'17062'

Ahora queremos buscar su equivalente en las elecciones de 2011. Buscamos las secciones de ese municipio en esa elección, y obtenemos un df, df_aux.

In [None]:
elect = '02201111'

In [None]:
df_aux = df_unif.loc[df_unif['Elección'] == elect].loc[df_unif['CUMUN'] == mun]

In [None]:
df_aux

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,geometry,Centroide,Elección
157092,022011111091706201001,1706201001,17062,17,"POLYGON ((1007377.593 4681795.498, 1007298.240...",POINT (1007162.974 4681913.689),2201111
157093,022011111091706201002,1706201002,17062,17,"POLYGON ((1007500.680 4681641.821, 1007541.675...",POINT (1007318.018 4680574.329),2201111
157094,022011111091706201003,1706201003,17062,17,"POLYGON ((1005026.720 4683600.827, 1005102.716...",POINT (1005071.260 4682131.829),2201111
157095,022011111091706201004,1706201004,17062,17,"POLYGON ((1006416.219 4681461.429, 1006417.065...",POINT (1006808.489 4680368.964),2201111
157096,022011111091706201005,1706201005,17062,17,"POLYGON ((1010200.679 4680255.833, 1009523.664...",POINT (1008853.836 4680441.263),2201111


La sección equivalente será aquella cuyo centroide tenga la menor distancia con el centroide de la sección 10587, que es desde luego la de índice 157094.

In [None]:
df_aux['dist'] = df_aux['Centroide'].apply(lambda x : x.distance(cent))

In [None]:
df_aux

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,geometry,Centroide,Elección,dist
157092,022011111091706201001,1706201001,17062,17,"POLYGON ((1007377.593 4681795.498, 1007298.240...",POINT (1007162.974 4681913.689),2201111,2102.533307
157093,022011111091706201002,1706201002,17062,17,"POLYGON ((1007500.680 4681641.821, 1007541.675...",POINT (1007318.018 4680574.329),2201111,2733.28814
157094,022011111091706201003,1706201003,17062,17,"POLYGON ((1005026.720 4683600.827, 1005102.716...",POINT (1005071.260 4682131.829),2201111,0.541192
157095,022011111091706201004,1706201004,17062,17,"POLYGON ((1006416.219 4681461.429, 1006417.065...",POINT (1006808.489 4680368.964),2201111,2474.517126
157096,022011111091706201005,1706201005,17062,17,"POLYGON ((1010200.679 4680255.833, 1009523.664...",POINT (1008853.836 4680441.263),2201111,4142.634792


In [None]:
ind = df_aux['dist'].idxmin()
ind

157094

In [None]:
sec = df_aux['cod_sec'][ind]
sec

'022011111091706201003'

Se trata de iterar este proceso que acabamos de ver a lo largo de todas las secciones de las cinco elecciones, unas 180 mil. Desde luego, la sección equivalente a una sección en su misma elección es ella misma.

Definimos una columna para almecenar las secciones más cercanas en una determinada elección.

In [None]:
df_unif['cercana N11'] = ''
df_unif['cercana D15'] = ''
df_unif['cercana J16'] = ''
df_unif['cercana A19'] = ''
df_unif['cercana N19'] = ''



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [None]:
df_unif

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,geometry,Centroide,Elección,cercana N11,cercana D15,cercana J16,cercana A19,cercana N19
0,022019041160100901001,0100901001,01009,01,"MULTIPOLYGON (((556453.835 4752758.332, 556460...",POINT (555853.735 4749228.022),02201904,,,,,
1,022019041160101001002,0101001002,01010,01,"POLYGON ((502035.230 4771813.197, 502048.071 4...",POINT (500019.424 4771906.777),02201904,,,,,
2,022019041160103101001,0103101001,01031,01,"MULTIPOLYGON (((538984.636 4718139.608, 538985...",POINT (535052.148 4710767.820),02201904,,,,,
3,022019041160103301001,0103301001,01033,01,"POLYGON ((537063.531 4703664.589, 536887.844 4...",POINT (535297.829 4704792.205),02201904,,,,,
4,022019041160103701001,0103701001,01037,01,"POLYGON ((551570.951 4739269.962, 551570.889 4...",POINT (545408.849 4733097.078),02201904,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
181034,022011111195200108010,5200108010,52001,52,"POLYGON ((504318.195 3904957.402, 504318.033 3...",POINT (504226.917 3904997.784),02201111,,,,,
181035,022011111195200108011,5200108011,52001,52,"POLYGON ((503855.015 3905844.061, 503855.334 3...",POINT (503591.236 3905281.004),02201111,,,,,
181036,022011111195200108012,5200108012,52001,52,"POLYGON ((505731.785 3904249.139, 505637.515 3...",POINT (505545.192 3904295.175),02201111,,,,,
181037,022011111195200108013,5200108013,52001,52,"POLYGON ((506068.859 3903736.075, 506069.536 3...",POINT (506046.231 3903459.669),02201111,,,,,


In [None]:
secciones_N19.dtypes

OBJECTID         int64
CUSEC           object
CUMUN           object
CSEC            object
CDIS            object
CMUN            object
CPRO            object
CCA             object
CUDIS           object
CLAU2           object
NPRO            object
NCA             object
CNUT0           object
CNUT1           object
CNUT2           object
CNUT3           object
ESTADO          object
OBS             object
Shape_Leng     float64
Shape_area     float64
Shape_len      float64
SUPERF_M2        int64
NMUN            object
geometry      geometry
cod_sec         object
dtype: object

Tenemos que tener en cuenta que los municipios no son los mismos en cada elección, por lo que tenemos que saber cuales cambian, y aplicar en criterio de misma provincia.

En las celdas que siguen buscamos qué municipios son lo que no están en las cinco elecciones, y obtenemos sus códigos, algunos se han creado y otros han desaparecido; en total son 23. Hallamos sus códigos mediante una manipulación de sets.

In [None]:
N19 = set(secciones_N19['CUMUN'])

In [None]:
len(N19)

8131

In [None]:
N19

{'37128',
 '07039',
 '47169',
 '04053',
 '21060',
 '16094',
 '22217',
 '05069',
 '12060',
 '24175',
 '34010',
 '23069',
 '42115',
 '10142',
 '04022',
 '19267',
 '47060',
 '42145',
 '25093',
 '26052',
 '13062',
 '35033',
 '28030',
 '08021',
 '16250',
 '13029',
 '37165',
 '46183',
 '16145',
 '10073',
 '49267',
 '14055',
 '17018',
 '10027',
 '09277',
 '10082',
 '04023',
 '18069',
 '16177',
 '40204',
 '03055',
 '19306',
 '37302',
 '25132',
 '50175',
 '44017',
 '37252',
 '04071',
 '31065',
 '26003',
 '49171',
 '15012',
 '07060',
 '48036',
 '12138',
 '37251',
 '39079',
 '50082',
 '11039',
 '49031',
 '09072',
 '26088',
 '31194',
 '16265',
 '18111',
 '18035',
 '24021',
 '14062',
 '31076',
 '08120',
 '10034',
 '33035',
 '18056',
 '35019',
 '40054',
 '50205',
 '44241',
 '12117',
 '20007',
 '10018',
 '08014',
 '09414',
 '10011',
 '05154',
 '40219',
 '22042',
 '10214',
 '43124',
 '41047',
 '03119',
 '09424',
 '02078',
 '10196',
 '40107',
 '13073',
 '22908',
 '29008',
 '25247',
 '37361',
 '09329',


In [None]:
N11 = set(secciones_N11['CUMUN'])

In [None]:
len(N11)

8116

In [None]:
dif = N19.difference(N11)

In [None]:
len(dif)

19

In [None]:
dif

{'04904',
 '06903',
 '10904',
 '10905',
 '11903',
 '14901',
 '14902',
 '15902',
 '18065',
 '18077',
 '18106',
 '18914',
 '18915',
 '18916',
 '21902',
 '29903',
 '29904',
 '36902',
 '41904'}

In [None]:
dif1 = N11.difference(N19)

In [None]:
dif1

{'15026', '15063', '36011', '36012'}

In [None]:
'04904' not in dif

False

In [None]:
dif = dif.union(dif1)

In [None]:
len(dif)

23

Ahora lo que hacemos es crear un objeto groupby mediante el código de municipio y la elección, que hará mucho más eficiente el proceso de encontrar las secciones equivalentes.

Como habrá unos pocos casos que tendremos que usar la provincia, creamos un objeto gruopby análogo con las provincias.

In [None]:
grouped = df_unif.groupby(["CUMUN", "Elección"])

In [None]:
grouped.groups

{('01001', '02201111'): [145078, 145079], ('01001', '02201512'): [108849, 108850], ('01001', '02201606'): [72634, 72635], ('01001', '02201904'): [32923, 32961], ('01001', '02201911'): [69240, 69278], ('01002', '02201111'): [145080, 145081, 145082, 145083, 145084, 145085, 145086], ('01002', '02201512'): [108851, 108852, 108853, 108854, 108855, 108856, 108857], ('01002', '02201606'): [72636, 72637, 72638, 72639, 72640, 72641, 72642], ('01002', '02201904'): [11, 13, 20, 21, 38, 39, 48], ('01002', '02201911'): [36328, 36330, 36337, 36338, 36355, 36356, 36365], ('01003', '02201111'): [145087], ('01003', '02201512'): [108858], ('01003', '02201606'): [72643], ('01003', '02201904'): [49], ('01003', '02201911'): [36366], ('01004', '02201111'): [145088], ('01004', '02201512'): [108859], ('01004', '02201606'): [72644], ('01004', '02201904'): [12], ('01004', '02201911'): [36329], ('01006', '02201111'): [145089], ('01006', '02201512'): [108860], ('01006', '02201606'): [72645], ('01006', '02201904')

In [None]:
grouped['cod_sec']

<pandas.core.groupby.generic.SeriesGroupBy object at 0x7fb985bbc450>

In [None]:
a = grouped['Centroide']

In [None]:
grouped.get_group(('01001', '02201111'))

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,geometry,Centroide,Elección,cercana N11,cercana D15,cercana J16,cercana A19,cercana N19
145078,022011111160100101001,100101001,1001,1,"MULTIPOLYGON (((541571.209 4745050.120, 541581...",POINT (539937.648 4741231.730),2201111,,,,,
145079,022011111160100101002,100101002,1001,1,"POLYGON ((541370.963 4745058.623, 541371.018 4...",POINT (539970.480 4744471.147),2201111,,,,,


In [None]:
grouped_prov = df_unif.groupby(["CPRO", "Elección"])

In [None]:
grouped_prov.groups

{('01', '02201111'): [145078, 145079, 145080, 145081, 145082, 145083, 145084, 145085, 145086, 145087, 145088, 145089, 145090, 145091, 145092, 145093, 145094, 145095, 145096, 145097, 145098, 145099, 145100, 145101, 145102, 145103, 145104, 145105, 145106, 145107, 145108, 145109, 145110, 145111, 145112, 145113, 145114, 145115, 145116, 145117, 145118, 145119, 145120, 145121, 145122, 145123, 145124, 145125, 145126, 145127, 145128, 145129, 145130, 145131, 145132, 145133, 145134, 145135, 145136, 145137, 145138, 145139, 145140, 145141, 145142, 145143, 145144, 145145, 145146, 145147, 145148, 145149, 145150, 145151, 145152, 145153, 145154, 145155, 145156, 145157, 145158, 145159, 145160, 145161, 145162, 145163, 145164, 145165, 145166, 145167, 145168, 145169, 145170, 145171, 145172, 145173, 145174, 145175, 145176, 145177, ...], ('01', '02201512'): [108849, 108850, 108851, 108852, 108853, 108854, 108855, 108856, 108857, 108858, 108859, 108860, 108861, 108862, 108863, 108864, 108865, 108866, 108867,

A continuación definimos la función que nos dará las secciones equivalentes para una determinada elección (elect), y nos la asiganará en la columna (col).

Lo que hace es iterar en cada sección de las 180 mil. Primero selecciona un dataset auxiliar con el mismo municipio de la sección en la eleccin elect. En el caso que el municipio sea de los 23 que ha cambiado tomará su provincia entera.

Después calcula la distancia entre centroides, quedándose con la mínima, e identificando la sección correspondiente mediante el índice, y finalmente rellena el código de la sección en la columna col.

Es en definitiva la iteración del proceso que vimos anteriormente.

In [None]:
def cercanos(elect, col):

  for ind in range(len(df_unif)):
    cent = df_unif['Centroide'][ind]
    mun = df_unif['CUMUN'][ind]
    prov = df_unif['CPRO'][ind]
  
    if mun not in dif:
      df_aux = grouped.get_group((mun, elect))

    else:
      df_aux = grouped_prov.get_group((prov, elect))

    df_aux['dist'] = df_aux['Centroide'].apply(lambda x : x.distance(cent))
    idx = df_aux['dist'].idxmin()
    sec = df_aux['cod_sec'][idx]

    df_unif[col][ind] = sec

    if ind % 2500 == 0:
      print(ind)

  
  

Se trata de un proceso lento, que para cada elección tarda unos 50 minutos. La primera vez lo ejecutamos solo para la elección de 2011.

In [None]:
cercanos(elect = '02201111', col = 'cercana N11')

0
2500
5000
7500
10000
12500
15000
17500
20000
22500
25000
27500
30000
32500
35000
37500
40000
42500
45000
47500
50000
52500
55000
57500
60000
62500
65000
67500
70000
72500
75000
77500
80000
82500
85000
87500
90000
92500
95000
97500
100000
102500
105000
107500
110000
112500
115000
117500
120000
122500
125000
127500
130000
132500
135000
137500
140000
142500
145000
147500
150000
152500
155000
157500
160000
162500
165000
167500
170000
172500
175000
177500
180000


In [None]:
df_unif.to_csv('similitud_secciones_1.csv', sep = ',', index = False)

En la siguiente sesión la ejecutamos para las otras cuatro elecciones, comenzando con la de 2015...

In [None]:
cercanos(elect = '02201512', col = 'cercana D15')

0
2500
5000
7500
10000
12500
15000
17500
20000
22500
25000
27500
30000
32500
35000
37500
40000
42500
45000
47500
50000
52500
55000
57500
60000
62500
65000
67500
70000
72500
75000
77500
80000
82500
85000
87500
90000
92500
95000
97500
100000
102500
105000
107500
110000
112500
115000
117500
120000
122500
125000
127500
130000
132500
135000
137500
140000
142500
145000
147500
150000
152500
155000
157500
160000
162500
165000
167500
170000
172500
175000
177500
180000


Después la de 2016...

In [None]:
cercanos(elect = '02201606', col = 'cercana J16')

0
2500
5000
7500
10000
12500
15000
17500
20000
22500
25000
27500
30000
32500
35000
37500
40000
42500
45000
47500
50000
52500
55000
57500
60000
62500
65000
67500
70000
72500
75000
77500
80000
82500
85000
87500
90000
92500
95000
97500
100000
102500
105000
107500
110000
112500
115000
117500
120000
122500
125000
127500
130000
132500
135000
137500
140000
142500
145000
147500
150000
152500
155000
157500
160000
162500
165000
167500
170000
172500
175000
177500
180000


...abril de 2019...

In [None]:
cercanos(elect = '02201904', col = 'cercana A19')

0
2500
5000
7500
10000
12500
15000
17500
20000
22500
25000
27500
30000
32500
35000
37500
40000
42500
45000
47500
50000
52500
55000
57500
60000
62500
65000
67500
70000
72500
75000
77500
80000
82500
85000
87500
90000
92500
95000
97500
100000
102500
105000
107500
110000
112500
115000
117500
120000
122500
125000
127500
130000
132500
135000
137500
140000
142500
145000
147500
150000
152500
155000
157500
160000
162500
165000
167500
170000
172500
175000
177500
180000


... y noviembre de 2019.

In [None]:
cercanos(elect = '02201911', col = 'cercana N19')

0
2500
5000
7500
10000
12500
15000
17500
20000
22500
25000
27500
30000
32500
35000
37500
40000
42500
45000
47500
50000
52500
55000
57500
60000
62500
65000
67500
70000
72500
75000
77500
80000
82500
85000
87500
90000
92500
95000
97500
100000
102500
105000
107500
110000
112500
115000
117500
120000
122500
125000
127500
130000
132500
135000
137500
140000
142500
145000
147500
150000
152500
155000
157500
160000
162500
165000
167500
170000
172500
175000
177500
180000


Este fue el resultado de la segunda sesión, en la que faltan las secciones equivalentes en las elecciones de 2011, que ya habíamos calculado.

In [None]:
df_unif

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,geometry,Centroide,Elección,cercana N11,cercana D15,cercana J16,cercana A19,cercana N19
0,022019041160100901001,0100901001,01009,01,"MULTIPOLYGON (((556453.835 4752758.332, 556460...",POINT (555853.735 4749228.022),02201904,,022015121160100901001,022016061160100901001,022019041160100901001,022019111160100901001
1,022019041160101001002,0101001002,01010,01,"POLYGON ((502035.230 4771813.197, 502048.071 4...",POINT (500019.424 4771906.777),02201904,,022015121160101001002,022016061160101001002,022019041160101001002,022019111160101001002
2,022019041160103101001,0103101001,01031,01,"MULTIPOLYGON (((538984.636 4718139.608, 538985...",POINT (535052.148 4710767.820),02201904,,022015121160103101001,022016061160103101001,022019041160103101001,022019111160103101001
3,022019041160103301001,0103301001,01033,01,"POLYGON ((537063.531 4703664.589, 536887.844 4...",POINT (535297.829 4704792.205),02201904,,022015121160103301001,022016061160103301001,022019041160103301001,022019111160103301001
4,022019041160103701001,0103701001,01037,01,"POLYGON ((551570.951 4739269.962, 551570.889 4...",POINT (545408.849 4733097.078),02201904,,022015121160103701001,022016061160103701001,022019041160103701001,022019111160103701001
...,...,...,...,...,...,...,...,...,...,...,...,...
181034,022011111195200108010,5200108010,52001,52,"POLYGON ((504318.195 3904957.402, 504318.033 3...",POINT (504226.917 3904997.784),02201111,,022015121195200108010,022016061195200108010,022019041195200108010,022019111195200108010
181035,022011111195200108011,5200108011,52001,52,"POLYGON ((503855.015 3905844.061, 503855.334 3...",POINT (503591.236 3905281.004),02201111,,022015121195200108011,022016061195200108011,022019041195200108011,022019111195200108011
181036,022011111195200108012,5200108012,52001,52,"POLYGON ((505731.785 3904249.139, 505637.515 3...",POINT (505545.192 3904295.175),02201111,,022015121195200108012,022016061195200108012,022019041195200108012,022019111195200108012
181037,022011111195200108013,5200108013,52001,52,"POLYGON ((506068.859 3903736.075, 506069.536 3...",POINT (506046.231 3903459.669),02201111,,022015121195200108013,022016061195200108013,022019041195200108013,022019111195200108013


In [None]:
df_unif['Elección'].unique()

array(['02201904', '02201911', '02201606', '02201512', '02201111'],
      dtype=object)

Realmente, ya no nos interesan las columnas de geometry o centroide, solo las secciones, y con esas columnas nos quedamos.

In [None]:
df_unit_simp = df_unif[['cod_sec', 'CUSEC', 'CUMUN', 'CPRO', 'Elección', 'cercana N11', 'cercana D15', 'cercana J16', 'cercana A19', 'cercana N19']]

In [None]:
df_unit_simp

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,Elección,cercana N11,cercana D15,cercana J16,cercana A19,cercana N19
0,022019041160100901001,0100901001,01009,01,02201904,,022015121160100901001,022016061160100901001,022019041160100901001,022019111160100901001
1,022019041160101001002,0101001002,01010,01,02201904,,022015121160101001002,022016061160101001002,022019041160101001002,022019111160101001002
2,022019041160103101001,0103101001,01031,01,02201904,,022015121160103101001,022016061160103101001,022019041160103101001,022019111160103101001
3,022019041160103301001,0103301001,01033,01,02201904,,022015121160103301001,022016061160103301001,022019041160103301001,022019111160103301001
4,022019041160103701001,0103701001,01037,01,02201904,,022015121160103701001,022016061160103701001,022019041160103701001,022019111160103701001
...,...,...,...,...,...,...,...,...,...,...
181034,022011111195200108010,5200108010,52001,52,02201111,,022015121195200108010,022016061195200108010,022019041195200108010,022019111195200108010
181035,022011111195200108011,5200108011,52001,52,02201111,,022015121195200108011,022016061195200108011,022019041195200108011,022019111195200108011
181036,022011111195200108012,5200108012,52001,52,02201111,,022015121195200108012,022016061195200108012,022019041195200108012,022019111195200108012
181037,022011111195200108013,5200108013,52001,52,02201111,,022015121195200108013,022016061195200108013,022019041195200108013,022019111195200108013


In [None]:
df_unit_simp.to_csv('similitud_secciones_1.csv', sep = ',', index = False)

Faltan las seccines de 2011, que cargamos de su fichero.

In [None]:
cols_str = {'cercana N11' : 'str'}

In [None]:
unif_prov = pd.read_csv('similitud_secciones.csv', dtype = cols_str)

In [None]:
unif_prov

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,geometry,Centroide,Elección,cercana N11,cercana D15,cercana J16,cercana A19,cercana N19
0,022019041160100901001,100901001,1009,1,MULTIPOLYGON (((556453.8348000003 4752758.3318...,POINT (555853.7350834189 4749228.022200624),2201904,022011111160100901001,,,,
1,022019041160101001002,101001002,1010,1,"POLYGON ((502035.2303999998 4771813.1971, 5020...",POINT (500019.4240826656 4771906.776683071),2201904,022011111160101001002,,,,
2,022019041160103101001,103101001,1031,1,MULTIPOLYGON (((538984.6357000005 4718139.6081...,POINT (535052.1484135857 4710767.819587744),2201904,022011111160103101001,,,,
3,022019041160103301001,103301001,1033,1,"POLYGON ((537063.5311000003 4703664.589, 53688...",POINT (535297.8292420629 4704792.204954741),2201904,022011111160103301001,,,,
4,022019041160103701001,103701001,1037,1,"POLYGON ((551570.9506000001 4739269.962300001,...",POINT (545408.8488544315 4733097.077845995),2201904,022011111160103701001,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
181034,022011111195200108010,5200108010,52001,52,"POLYGON ((504318.1946 3904957.401799999, 50431...",POINT (504226.9167456334 3904997.784367722),2201111,022011111195200108010,,,,
181035,022011111195200108011,5200108011,52001,52,"POLYGON ((503855.0149999997 3905844.061000001,...",POINT (503591.2360677876 3905281.003590522),2201111,022011111195200108011,,,,
181036,022011111195200108012,5200108012,52001,52,"POLYGON ((505731.7847999996 3904249.139, 50563...",POINT (505545.1919586309 3904295.175303707),2201111,022011111195200108012,,,,
181037,022011111195200108013,5200108013,52001,52,"POLYGON ((506068.8585000001 3903736.0748, 5060...",POINT (506046.2308284579 3903459.668678103),2201111,022011111195200108013,,,,


In [None]:
df_unit_simp

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,Elección,cercana N11,cercana D15,cercana J16,cercana A19,cercana N19
0,022019041160100901001,0100901001,01009,01,02201904,,022015121160100901001,022016061160100901001,022019041160100901001,022019111160100901001
1,022019041160101001002,0101001002,01010,01,02201904,,022015121160101001002,022016061160101001002,022019041160101001002,022019111160101001002
2,022019041160103101001,0103101001,01031,01,02201904,,022015121160103101001,022016061160103101001,022019041160103101001,022019111160103101001
3,022019041160103301001,0103301001,01033,01,02201904,,022015121160103301001,022016061160103301001,022019041160103301001,022019111160103301001
4,022019041160103701001,0103701001,01037,01,02201904,,022015121160103701001,022016061160103701001,022019041160103701001,022019111160103701001
...,...,...,...,...,...,...,...,...,...,...
181034,022011111195200108010,5200108010,52001,52,02201111,,022015121195200108010,022016061195200108010,022019041195200108010,022019111195200108010
181035,022011111195200108011,5200108011,52001,52,02201111,,022015121195200108011,022016061195200108011,022019041195200108011,022019111195200108011
181036,022011111195200108012,5200108012,52001,52,02201111,,022015121195200108012,022016061195200108012,022019041195200108012,022019111195200108012
181037,022011111195200108013,5200108013,52001,52,02201111,,022015121195200108013,022016061195200108013,022019041195200108013,022019111195200108013


Copiamos la columna de 2011, simplemente.

In [None]:
df_unit_simp['cercana N11'] = unif_prov['cercana N11']

In [None]:
df_unit_simp

Unnamed: 0,cod_sec,CUSEC,CUMUN,CPRO,Elección,cercana N11,cercana D15,cercana J16,cercana A19,cercana N19
0,022019041160100901001,0100901001,01009,01,02201904,022011111160100901001,022015121160100901001,022016061160100901001,022019041160100901001,022019111160100901001
1,022019041160101001002,0101001002,01010,01,02201904,022011111160101001002,022015121160101001002,022016061160101001002,022019041160101001002,022019111160101001002
2,022019041160103101001,0103101001,01031,01,02201904,022011111160103101001,022015121160103101001,022016061160103101001,022019041160103101001,022019111160103101001
3,022019041160103301001,0103301001,01033,01,02201904,022011111160103301001,022015121160103301001,022016061160103301001,022019041160103301001,022019111160103301001
4,022019041160103701001,0103701001,01037,01,02201904,022011111160103701001,022015121160103701001,022016061160103701001,022019041160103701001,022019111160103701001
...,...,...,...,...,...,...,...,...,...,...
181034,022011111195200108010,5200108010,52001,52,02201111,022011111195200108010,022015121195200108010,022016061195200108010,022019041195200108010,022019111195200108010
181035,022011111195200108011,5200108011,52001,52,02201111,022011111195200108011,022015121195200108011,022016061195200108011,022019041195200108011,022019111195200108011
181036,022011111195200108012,5200108012,52001,52,02201111,022011111195200108012,022015121195200108012,022016061195200108012,022019041195200108012,022019111195200108012
181037,022011111195200108013,5200108013,52001,52,02201111,022011111195200108013,022015121195200108013,022016061195200108013,022019041195200108013,022019111195200108013


Y finalmente guardamos el fichero csv con todas las secciones. Recordamos que hubo que variar algo los códigos de las secciones debido a los distintos códigos de las ccaa entre el INE e Interior, lo cual hicimos en otro cuaderno.

In [None]:
df_unit_simp.to_csv('similitud_secciones_def.csv', sep = ',', index = False)