# 1. Census Data: Demographics & Income

**The Problem:** Census data is usually trapped in massive FTP CSVs or complex SQL tables. Matching it to geometry (tracts) is painful.

**The Solution:** AtlasBR unifies this into a single command.

**What you will learn:**
1.  **Strategies:** How to switch between Cloud (BigQuery) and Online (FTP) modes seamlessly.
2.  **Harmonization:** Why `habitantes` is always `habitantes`, regardless of the source year.
3.  **Spatial Aggregation:** Using H3 Hexagons to fix irregular Census tracts.

# --- 1. Setup & Authentication ---

In [1]:
import sys
import os
from pathlib import Path

# --- DEVELOPER SETUP (Optional) ---
# If running locally without 'pip install', we add the '../src' folder to path.
current_path = Path(os.getcwd())
if current_path.name == "tutorials":
    # Go up one level to root, then into 'src' (if using src-layout) or just root (flat-layout)
    root_dir = current_path.parent
    src_dir = root_dir / "src"
    
    if src_dir.exists():
        sys.path.append(str(src_dir))
    else:
        sys.path.append(str(root_dir))

import atlasbr

In [2]:
from atlasbr.app.census import load_census

# VISUAL TIP: We use logging to show you exactly what's happening under the hood.
atlasbr.configure_logging()

# CREDENTIALS CHECK:
project_id = os.getenv("GOOGLE_CLOUD_PROJECT") or os.getenv("GCLOUD_PROJECT_ID")
if not project_id:
    print("⚠️  Warning: No Google Cloud credentials found. We will demonstrate the FTP strategy.")

In [8]:
MUNICIPALITY = "Rio de Janeiro, RJ"

# 2. The Cloud Strategy (BigQuery)
*Best for: Speed, low memory usage, and querying specific subsets.*

We fetch data directly from the Data Lake. Note that we request specific **themes** (`basic`, `income`). AtlasBR knows exactly which tables to join.

In [4]:
try:
    gdf_cloud = load_census(
        places=[MUNICIPALITY],
        year=2010,
        themes=["basic", "income"],
        strategy="bd_table",
        geometry="tract"
    )
    print(f"✅ Loaded {len(gdf_cloud)} tracts from BigQuery.")
    
    # Let's see the standardized columns
    display(gdf_cloud[["habitantes", "domicilios", "rendimento_medio"]].head())

except Exception as e:
    print(f"❌ Cloud load skipped: {e}")

2025-12-16 18:57:45,172 - atlasbr - INFO -     🌍 Fetching municipality metadata from geobr...
2025-12-16 18:57:45,818 - atlasbr - INFO -     ℹ️  Resolved 'Niterói, RJ' -> 3303302
2025-12-16 18:57:45,819 - atlasbr - INFO - 🔄 Resolved 1 inputs into 1 unique municipalities.
2025-12-16 18:57:45,819 - atlasbr - INFO - Fetching Census Tracts for 1 municipalities (Year 2010)...
2025-12-16 18:57:48,847 - atlasbr - INFO -     ✂️  Clipping to Urban Area...
2025-12-16 18:57:48,848 - atlasbr - INFO -     ⬇️  Fetching Urban Areas (Epoch 2019) from IBGE (cached)...
2025-12-16 18:57:54,054 - atlasbr - INFO -        -> Retained 907 tracts after clip.
2025-12-16 18:57:54,055 - atlasbr - INFO -     📦 Loading theme: 'basic'...
2025-12-16 18:57:54,804 - atlasbr - INFO -     ☁️  Fetching 2 columns from basedosdados.br_ibge_censo_demografico.setor_censitario_basico_2010...


Downloading: 100%|[32m██████████[0m|

2025-12-16 18:57:56,013 - atlasbr - INFO -     📦 Loading theme: 'income'...
2025-12-16 18:57:56,014 - atlasbr - INFO -     ☁️  Fetching 1 columns from basedosdados.br_ibge_censo_demografico.setor_censitario_basico_2010...



Downloading: 100%|[32m██████████[0m|

2025-12-16 18:57:56,989 - atlasbr - INFO - ✅ Loaded Census 2010 for 1 municipalities.



✅ Loaded 907 tracts from BigQuery.


Unnamed: 0_level_0,habitantes,domicilios,rendimento_medio
id_setor_censitario,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
330330205000634,525.0,198.0,1855.16
330330205000827,513.0,161.0,2743.5
330330205000686,430.0,151.0,2729.93
330330205000687,683.0,218.0,3026.49
330330205000828,420.0,127.0,1259.64


# 3. The FTP Strategy
*Best for: Users without cloud credentials.*

Change `strategy="ftp_csv"`. AtlasBR will:
1.  Download the official ZIPs from IBGE (cached locally).
2.  Extract and parse the CSVs.
3.  **Rename columns** to match the BigQuery schema exactly.


In [9]:
gdf_ftp = load_census(
    places=[MUNICIPALITY],
    year=2022, # AtlasBR automatically finds the 2010 FTP URLs
    themes=["basic", "income"],
    strategy="ftp_csv",
    geometry="tract"
)

# PROOF: The schemas are identical.
# You can write your analysis code once, and it runs on both backends.
print("Columns match?", set(gdf_ftp.columns) == set(gdf_cloud.columns) if 'gdf_cloud' in locals() else "Skipped Cloud")
display(gdf_ftp[["habitantes", "rendimento_medio"]].head())

2025-12-16 23:22:02,694 - atlasbr - INFO -     ℹ️  Resolved 'Rio de Janeiro, RJ' -> 3304557
2025-12-16 23:22:02,696 - atlasbr - INFO -     📦 Loading theme: 'basic' via ftp_csv...
2025-12-16 23:22:02,697 - atlasbr - INFO -     ⬇️  Fetching basic (BR) from IBGE FTP...
2025-12-16 23:22:04,981 - atlasbr - INFO -     📦 Loading theme: 'income' via ftp_csv...
2025-12-16 23:22:04,982 - atlasbr - INFO -     ⬇️  Fetching income (BR) from IBGE FTP...
2025-12-16 23:22:05,950 - atlasbr - INFO -     🗺️  Fetching Tract Geometries...
2025-12-16 23:22:05,951 - atlasbr - INFO - Fetching Census Tracts for 1 municipalities (Year 2022)...


Columns match? Skipped Cloud


Unnamed: 0_level_0,habitantes,rendimento_medio
id_setor_censitario,Unnamed: 1_level_1,Unnamed: 2_level_1
330455705060003,196,1790.86
330455705060004,196,1618.88
330455705060005,666,1583.03
330455705060007,301,1531.0
330455705060008,332,6171.16


# 4. Solving the "MAUP" with H3
**The Problem:** Census tracts vary wildly in size. Some are tiny city blocks; others are massive rural areas. This distorts maps (visual bias).

**The Solution:** We aggregate data to a regular **H3 Hexagonal Grid**.
* **Extensive vars** (Population) are redistributed by area.
* **Intensive vars** (Income) are averaged.


In [12]:
gdf_hex = load_census(
    places=[MUNICIPALITY],
    year=2022,
    themes=["basic", "income", "age"],
    strategy="ftp_csv", # Using FTP to ensure it runs for everyone
    geometry="h3",      # <--- The Magic Switch
    h3_res=9            # Resolution 9 (~0.1 km² per hex)
)

print(f"⬢ Transformed {len(gdf_ftp)} irregular tracts into {len(gdf_hex)} regular hexagons.")

2025-12-17 00:22:14,889 - atlasbr - INFO -     ℹ️  Resolved 'Rio de Janeiro, RJ' -> 3304557
2025-12-17 00:22:14,897 - atlasbr - INFO -     📦 Loading theme: 'basic' via ftp_csv...


2025-12-17 00:22:14,900 - atlasbr - INFO -     ⬇️  Fetching basic (BR) from IBGE FTP...
2025-12-17 00:22:17,495 - atlasbr - INFO -     📦 Loading theme: 'income' via ftp_csv...
2025-12-17 00:22:17,496 - atlasbr - INFO -     ⬇️  Fetching income (BR) from IBGE FTP...


ValueError: No catalog entry found for Census 2022 ('age') using 'ftp_csv'. Available themes for 2022/ftp_csv: ['basic', 'income', 'race']

# 5. Interactive Exploration
Notice how the hexagons reveal density patterns that irregular tracts might hide.


In [11]:
# Explore Income Distribution
gdf_hex.explore(
    column="rendimento_medio",
    cmap="magma",
    tiles="CartoDB DarkMatter",
    tooltip=["h3_index", "habitantes", "rendimento_medio"],
    style_kwds={"fillOpacity": 0.6, "weight": 0},
    legend_kwds={"caption": "Average Income (R$)"}
)