# Data extraction
We have downloaded the dataset with demographic information for every one of the 8132 municipalities in Spain from the MITECO (Spanish Ministry for the Ecological Transition and the Demographic Challenge) website. We also have downloaded the electoral results for the july 2023 and november 2019 general elections in Spain from the Spanish Ministry for the Interior, which are in a Excel format.

The implementation of most of the funcions for data extraction are in the `src/data_extraction.py` file. 

In [1]:
import sys, os
sys.path.insert(0, os.path.abspath('..'))

%load_ext autoreload
%autoreload 2

from src import data_extraction as de

## `load_miteco` Function

The `load_miteco` function extracts the demographic data from the MITECO dataset and merges it in a GeoDataFrame with the geographical data of the municipalities in Spain. We have the following data:

### Geographical Identifiers
- `ccaa`: Autonomous community where the municipality is located
- `cod_ccaa`: Code of the autonomous community
- `cod_nut1`: Eurostats NUTS1 code (groups of autonomous communities)
- `cod_nut2`: Eurostats NUTS2 code (autonomous communities)
- `cod_nut3`: Eurostats NUTS3 code (provinces)
- `cod_prov`: Province code
- `codmun_ine`: Code of the municipality within its province
- `natcode`: National code of the municipality from the IGE (Instituto Geográfico Español)
- `nombre`: Name of the municipality
- `provincia`: Province where the municipality is located

### Demographic Variables
- `afi_1000`: Number of people affiliated to social security per 1000 inhabitants
- `dens_pob`: Population density (inhabitants per km²)
- `edad_media`: Median age of the population
- `pob_14`: Population in 2014
- `pob_14_23`: Variation in population between 2014 and 2023
- `pob_23`: Population in 2023
- `pob_esup16`: Share or count of the population older than 16 with higher education (university studies or equivalent)
- `pob_npum`: Share of the population living in the municipality in which they were born
- `porc_pob65`: Percentage of the population aged 65 years or older
- `porc_pob_e`: Percentage of foreign population living in the municipality
- `paro_1000`: Number of unemployed people per 1000 inhabitants
- `rat_mascul`: Masculinity ratio (defined as the number of men per 100 women)
- `rta_nt_med`: Mean Net Income (Renta Neta Media) of the municipality

In [2]:
gdf_miteco = de.load_miteco()
print(gdf_miteco.columns)
gdf_miteco

Index(['objectid', 'natcode', 'nombre', 'cod_nut1', 'cod_nut2', 'cod_nut3',
       'codmun_ine', 'ccaa', 'cod_ccaa', 'cod_pro', 'provincia', 'pob_npmun',
       'geometry', 'pob_23', 'dens_pob', 'edad_media', 'afi_1000', 'paro_1000',
       'porc_pob65', 'pob_esup16', 'porc_pob_e', 'rat_mascul', 'rta_nt_med',
       'pob_14', 'pob_14_23'],
      dtype='str')


Unnamed: 0,objectid,natcode,nombre,cod_nut1,cod_nut2,cod_nut3,codmun_ine,ccaa,cod_ccaa,cod_pro,...,edad_media,afi_1000,paro_1000,porc_pob65,pob_esup16,porc_pob_e,rat_mascul,rta_nt_med,pob_14,pob_14_23
0,7123,34123232003,A Arnoia,ES1,ES11,ES113,32003,Galicia,12,32,...,56.36,355.871886,67.615658,40.886700,19.795222,2.068966,95.321637,27744,1040,-3.653846
1,6948,34121515007,A Baña,ES1,ES11,ES111,15007,Galicia,12,15,...,55.33,271.789582,81.485302,39.303930,19.052133,0.540054,89.857550,29441,3754,-11.241343
2,7085,34123232014,A Bola,ES1,ES11,ES113,32014,Galicia,12,32,...,58.81,168.576105,109.656301,43.106618,14.705882,5.147059,91.921005,22124,1350,-20.814815
3,6966,34121515018,A Capela,ES1,ES11,ES111,15018,Galicia,12,15,...,52.58,516.170763,75.032342,31.755102,27.586207,2.204082,96.517413,33768,1356,-12.610619
4,7233,34123636009,A Cañiza,ES1,ES11,ES114,36009,Galicia,12,36,...,50.55,414.064964,100.914538,31.476051,15.183246,2.756598,95.693964,20760,5342,-4.717334
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77724,5184,34090808153,Òrrius,ES5,ES51,ES511,08153,Cataluña-Catalunya,09,08,...,41.63,301.094891,43.795620,15.510719,45.370370,2.395965,102.278481,40607,690,15.797101
77725,3652,34074242134,Ólvega,ES4,ES41,ES417,42134,Castilla y León,07,42,...,44.81,1109.604957,49.186677,19.166894,20.338983,15.682004,109.877913,33292,3814,-0.839014
77726,483,34011818147,Órgiva,ES6,ES61,ES614,18147,Andalucía,01,18,...,45.56,354.516679,142.439737,20.186496,21.621622,26.713866,99.894958,20724,5393,5.859447
77727,744,34012323092,Úbeda,ES6,ES61,ES616,23092,Andalucía,01,23,...,44.19,440.287466,117.860917,19.338266,31.703341,2.862427,95.050190,28148,35177,-3.886062


In [None]:
# Drop geometry (keep only tabular data) and Save to Excel
table_miteco = gdf_miteco.drop(columns='geometry').to_excel(f'data_miteco.xlsx', index=False)