# Demographic Sex Ratio (Masculinity Ratio)
TThis notebook computes the **masculinity (sex) ratio** for each Spanish municipality
and year using sex-disaggregated population data from the municipal census
(*Padrón Municipal*).

The masculinity ratio is generated as a **derived demographic attribute** at the
municipality–year level and exported as a standalone dataset to ensure
methodological clarity, traceability, and reproducibility.

The resulting dataset will later be integrated with other demographic, services,
agricultural, and land-use indicators.

## Definition

According to the Spanish National Statistics Institute (INE), the masculinity
ratio is defined as the number of men per 100 women in a given population.

**Formula:**

$$
\text{Sex Ratio} =
\frac{\text{Number of men}}{\text{Number of women}}
\times 100
$$

- If $\text{Sex Ratio} > 100$, the male population exceeds the female population.  
- If $\text{Sex Ratio} < 100$, the female population exceeds the male population.

**Source:**  
INE – *Indicadores Demográficos Básicos*  
https://www.ine.es/DEFIne/concepto.htm?c=5058



In [13]:
"""
Notebook: 03_demography_sex_ratio.ipynb
Purpose: Compute masculinity (sex) ratio per municipality and year
Input: 01_padron_clean_1996_2024.csv
Output: demography_sex_ratio_1996_2024.csv
Author: Juan Zotes
Last updated: 2026-02-03
"""


'\nNotebook: 03_demography_sex_ratio.ipynb\nPurpose: Compute masculinity (sex) ratio per municipality and year\nInput: 01_padron_clean_1996_2024.csv\nOutput: demography_sex_ratio_1996_2024.csv\nAuthor: Juan Zotes\nLast updated: 2026-02-03\n'

## 1. Load cleaned demographic base data

This step loads the cleaned historical municipal census dataset.
The dataset represents the **demographic base table** of the project and is not
modified in place.


In [2]:
# Standard library
from pathlib import Path

# Third-party libraries
import pandas as pd

In [3]:
# Base data directory (portable across Windows, Linux, Codespaces)
DATA_DIR = Path(
    r"/workspaces/rural-migration-land-use-spain/data/demography/processed"
)

DATA_DIR


PosixPath('/workspaces/rural-migration-land-use-spain/data/demography/processed')

In [4]:
# Load cleaned municipal census data
padron_file = DATA_DIR / "01_padron_clean_1996_2024.csv"

df = pd.read_csv(
    padron_file,
    dtype={"Mun_Code": str}
)

df.head()


Unnamed: 0,Mun_Code,Mun,Cat,Year,Pop
0,44001,Ababuj,Total,2024,74.0
1,44001,Ababuj,Total,2023,70.0
2,44001,Ababuj,Total,2022,72.0
3,44001,Ababuj,Total,2021,76.0
4,44001,Ababuj,Total,2020,77.0


## 2. Validate sex-disaggregated population records

Before restructuring the dataset, we verify that the cleaned census data contains
the necessary fields and categories required to compute the demographic sex ratio.


In [5]:
# Validate required base columns
required_cols = ["Year", "Mun_Code", "Mun", "Cat", "Pop"]
missing = [col for col in required_cols if col not in df.columns]
missing


[]

## 3. Data Restructuring for Sex Ratio Computation

The cleaned municipal census dataset is stored in **long format**, where population
counts are recorded under a categorical variable (`Cat`) with the following values:

- `Total`
- `Hombres`
- `Mujeres`

As a consequence, **male and female population counts are not stored as separate
columns**, but as separate records.

Before computing the masculinity (sex) ratio, the dataset must therefore be
**temporarily reshaped** to obtain one row per municipality–year combination with
distinct population fields for men and women.

This transformation involves:

1. Filtering the dataset to retain only records where `Cat` equals  
   **`Hombres`** or **`Mujeres`**
2. Pivoting the data from long to wide format to create two explicit fields:
   - `Population_Male`
   - `Population_Female`

This restructuring step is performed **only within this notebook** and does not
modify the original cleaned dataset, preserving its integrity and reproducibility.


In [6]:
# Keep only male and female population records
df_sex = df[df["Cat"].isin(["Hombres", "Mujeres"])].copy()

df_sex.head()

Unnamed: 0,Mun_Code,Mun,Cat,Year,Pop
28,44001,Ababuj,Hombres,2024,44.0
29,44001,Ababuj,Hombres,2023,43.0
30,44001,Ababuj,Hombres,2022,44.0
31,44001,Ababuj,Hombres,2021,44.0
32,44001,Ababuj,Hombres,2020,46.0


In [7]:
# Pivot to wide format: one row per municipality-year
sex_wide = df_sex.pivot_table(
    index=["Year", "Mun_Code", "Mun"],
    columns="Cat",
    values="Pop",
    aggfunc="sum"
).reset_index()

sex_wide.head()


Cat,Year,Mun_Code,Mun,Hombres,Mujeres
0,1996,1001,Alegría-Dulantzi,640.0,594.0
1,1996,1002,Amurrio,4866.0,4892.0
2,1996,1003,Aramaio,704.0,641.0
3,1996,1004,Artziniega,648.0,645.0
4,1996,1006,Armiñón,73.0,66.0


## 4. Compute masculinity (sex) ratio

The masculinity ratio is calculated as the number of males per 100 females.
Municipality–year combinations with zero female population are explicitly set to
null to avoid invalid values.


In [8]:
# Compute masculinity (sex) ratio
sex_wide["Sex_Ratio"] = (
    sex_wide["Hombres"] / sex_wide["Mujeres"]
) * 100


## 5. Quality Control

We inspect basic statistics and extreme values to ensure the indicator behaves as
expected across municipalities and years.


In [9]:
sex_wide["Sex_Ratio"].describe()


  sqr = _ensure_numeric((avg - values) ** 2)


count    2.272440e+05
mean              inf
std               NaN
min      2.647406e+01
25%      9.965524e+01
50%      1.055556e+02
75%      1.165026e+02
max               inf
Name: Sex_Ratio, dtype: float64

In [10]:
sex_wide.head()

Cat,Year,Mun_Code,Mun,Hombres,Mujeres,Sex_Ratio
0,1996,1001,Alegría-Dulantzi,640.0,594.0,107.744108
1,1996,1002,Amurrio,4866.0,4892.0,99.46852
2,1996,1003,Aramaio,704.0,641.0,109.828393
3,1996,1004,Artziniega,648.0,645.0,100.465116
4,1996,1006,Armiñón,73.0,66.0,110.606061


In [11]:
# Select final output columns
sex_ratio_df = sex_wide[
    ["Year", "Mun_Code", "Mun", "Sex_Ratio"]
].copy()



## 6. Export derived dataset

The masculinity ratio is exported as a standalone CSV file.
This modular structure allows future integration with other demographic,
agricultural, and land-use indicators without compromising traceability.


In [12]:
DERIVED_DIR = Path(
    r"/workspaces/rural-migration-land-use-spain/data/demography/derived"
)

# Export derived indicator
output_file = DERIVED_DIR / "demography_sex_ratio_1996_2024.csv"

sex_ratio_df.to_csv(output_file, index=False)
