# Data Discovery: fact_educacion & fact_calidad_ambiental

This notebook documents the research and validation of data sources for education levels and environmental quality in Barcelona.

## 1. Education Data (fact_educacion)

**Source:** Open Data BCN - Padró Municipal d'Habitants

### Sample Validation

In [None]:
import pandas as pd

# Load education sample
df_edu = pd.read_csv('../../docs/data_sources/samples/educacion_sample_2023.csv')
print(f"Columns: {df_edu.columns.tolist()}")
print(f"Samples:\n{df_edu.head()}")

# Check granularity
print(f"Number of barrios: {df_edu['Codi_Barri'].nunique()}")

### Confirmed Mappings

| Code | Level (Catalan) | Level (Spanish) | Description |
|------|-----------------|-----------------|-------------|
| 1 | Sense estudis | Sin estudios | Illiterate/No studies |
| 2 | Estudis primaris, certificat d'escolaritat, EGB | Estudios primarios, certificado de escolaridad, EGB | Primary education |
| 3 | Batxillerat elemental, graduat escolar, ESO, FPI | Bachillerato elemental, graduado escolar, ESO, FPI | Lower secondary education |
| 4 | Batxillerat superior, BUP, COU, FPII, CFGM grau mitjà | Bachillerato superior, BUP, COU, FPII, CFGM grado medio | Upper secondary education |
| 5 | Estudis universitaris, CFGS grau superior | Estudios universitarios, CFGS grado superior | Tertiary education (University/Higher Vocational) |
| 6 | No consta | No consta | Not available |

## 2. Air Quality Data (fact_calidad_aire)

**Source:** ASPB via Open Data BCN

### Sample Validation

In [None]:
# Load air quality sample
df_air = pd.read_csv('../../docs/data_sources/samples/calidad_aire_sample_2023.csv')
print(f"Columns: {df_air.columns.tolist()}")
print(f"Samples:\n{df_air.head()}")

### Confirmed Contaminant Codes

| Code | Contaminant | Unit |
|------|-------------|------|
| 1 | SO2 | µg/m³ |
| 6 | CO | mg/m³ |
| 7 | NO | µg/m³ |
| 8 | NO2 | µg/m³ |
| 9 | PM2.5 | µg/m³ |
| 10 | PM10 | µg/m³ |
| 12 | NOx | µg/m³ |
| 14 | O3 | µg/m³ |
| 22 | Black Carbon | µg/m³ |

## 3. Noise Data (fact_soroll)

**Source:** Mapa Estratègic de Soroll

The data is available in GPKG format and covers 2017 (and other quinquennial years). It provides noise levels on building facades.