<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/marco-canas/5_ml_dl_g_lideres/blob/main/lectura_geron_pytorch/part_1_the_fundamentals_of_machine_learning/chapter_1_the_machine_learning_landscape/2_types_of_machine_learning_systems.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
  <td>
    <a target="_blank" href="https://kaggle.com/kernels/welcome?src=https://github.com/marco-canas/5_ml_dl_g_lideres/blob/main/lectura_geron_pytorch/part_1_the_fundamentals_of_machine_learning/chapter_1_the_machine_learning_landscape/2_types_of_machine_learning_systems.ipynb"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" /></a>
  </td>
</table>

# Obtención y exploración de datos socioeconómicos

PIB per cápita y satisfacción con la vida (OECD / World Bank)


## Objetivo
Aprender a descargar, comprender y explorar datasets reales usados por Géron en *Hands-On Machine Learning*.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

## Preparación del directorio de datos

In [None]:
data_dir = Path("datasets/lifesat")
data_dir.mkdir(parents=True, exist_ok=True)
data_dir

## Descarga automática de datasets (urllib.request)

In [None]:
import urllib.request

root = "https://github.com/ageron/data/raw/main/lifesat/"
for fname in ("oecd_bli.csv", "gdp_per_capita.csv"):
    out = data_dir / fname
    if not out.is_file():
        print(f"Descargando {fname}...")
        urllib.request.urlretrieve(root + fname, out)
print("Proceso finalizado.")

## Carga de datos

In [None]:
oecd_bli = pd.read_csv(data_dir / "oecd_bli.csv")
gdp = pd.read_csv(data_dir / "gdp_per_capita.csv", delimiter="\t", encoding="latin1", na_values="n/a")

display(oecd_bli.head())
display(gdp.head())

## Inspección de columnas

In [None]:
oecd_bli.info()
gdp.info()

## Filtrado del índice de satisfacción con la vida

In [None]:
oecd_tot = oecd_bli[oecd_bli["INEQUALITY"] == "TOT"]
life_sat = oecd_tot[oecd_tot["Indicator"] == "Life satisfaction"][["Country", "Value"]]
life_sat.rename(columns={"Value": "Life satisfaction"}, inplace=True)
life_sat.head()

## Preparación del PIB per cápita

In [None]:
gdp.rename(columns={"2015": "GDP per capita"}, inplace=True)
gdp_pc = gdp[["Country", "GDP per capita"]]
gdp_pc.head()

## Unión de datasets

In [None]:
country_stats = pd.merge(life_sat, gdp_pc, on="Country")
country_stats.head()

## Visualización exploratoria

In [None]:
country_stats.plot(kind="scatter", x="GDP per capita", y="Life satisfaction",
                   title="PIB per cápita vs Satisfacción con la vida")
plt.show()

## Reflexión
- ¿Existe correlación entre riqueza y bienestar?
- ¿Qué países se desvían del patrón general?
