# AtlasBR: Unifying Brazilian Spatial Data

**A Guide to Reproducible Urban Analytics**

Welcome to **AtlasBR**. This library solves a kind of "fragmentation problem" of Brazilian socio-economic data. Instead of writing custom SQL for BigQuery, parsing diverse FTP schemas, and manually fixing geometries, AtlasBR provides a unified, domain-driven interface to load:

1.  **Census 2010/2022** (Demographics, Income, Race, Built Environment)
2.  **RAIS** (Formal Employment)
3.  **CNES** (Health Infrastructure)
4.  **INEP** (School Census)

**Key Features Demonstrated:**

  * **Employment:** Filling the "public sector gap" in RAIS by injecting data from Schools and Hospitals.
  * **Hybrid Geocoding:** Combining Lat/Lon (Schools), CEPs (Firms), and Tracts (Census).
  * **Spatial Harmonization:** Re-aggregating data to H3 grids for comparison.


# 1\. Setup & Configuration

First, we ensure we can import the library. If you are running this notebook from the `tutorials/` folder without installing the package yet, the following cell will fix your python path.



In [None]:
import sys
import os
from pathlib import Path

# --- DEVELOPER SETUP (Optional) ---
# If running locally without 'pip install', we add the '../src' folder to path.
current_path = Path(os.getcwd())
if current_path.name == "tutorials":
    # Go up one level to root, then into 'src' (if using src-layout) or just root (flat-layout)
    root_dir = current_path.parent
    src_dir = root_dir / "src"
    
    if src_dir.exists():
        sys.path.append(str(src_dir))
    else:
        sys.path.append(str(root_dir))

import atlasbr
import pandas as pd
import logging
from getpass import getpass

# Enable library logging (shows progress emojis)
atlasbr.configure_logging(level=logging.INFO)

print(f"‚úÖ AtlasBR version {getattr(atlasbr, '__version__', 'dev')} loaded.")

### 1.1 Authentication

AtlasBR requires a Google Cloud Project ID to query the *Base dos Dados* data lake.

In [None]:

# --- üîë SECURE AUTHENTICATION ---
# Strategy: Try environment variable first, fallback to prompt.

project_id = os.getenv("GOOGLE_CLOUD_PROJECT") or os.getenv("GCLOUD_PROJECT_ID")

if not project_id:
    # This masks your input so secrets aren't saved in the notebook
    project_id = getpass("Enter Google Cloud Project ID: ")

try:
    atlasbr.set_billing_id(project_id.strip())
    print("‚úÖ Billing ID configured.")
except Exception as e:
    print(f"‚ùå Configuration failed: {e}")

# Check where our cache lives
info = atlasbr.infra.cache.get_cache_info()
print(f"üìÇ Local Cache: {info['location']} ({info['size_mb']} MB used)")


## 2\. The Census: Demographics & Tracts

The core of urban analysis is the Census. AtlasBR fetches geometry (Tracts), attributes (Income/Pop), and cleans the result (UTM projection + Urban Clipping) in one go.


### 2.1 Basic Usage

We request data for **Niter√≥i** and **S√£o Gon√ßalo** (RJ). The library resolves these names to IBGE IDs automatically.

In [None]:

# Load Census 2010 (Basic + Income), clipped to urban footprint
gdf_census = atlasbr.load_census(
    places=["Niter√≥i, RJ", "S√£o Gon√ßalo, RJ"],
    year=2010,
    themes=["basic", "income"],
    clip_urban=True  # ‚úÇÔ∏è Cuts away forests/water automatically
)

print(f"Loaded {len(gdf_census)} tracts.")
gdf_census.head()

### 2.2 Interactive Mapping

The output is a standard `GeoDataFrame`, ready for `explore()`.

In [None]:
# Map Average Income
gdf_census.explore(
    column="rendimento_medio",
    cmap="viridis",
    tiles="CartoDB positron",
    style_kwds={"stroke": False},
    legend_kwds={"caption": "Average Income (2010)"}
)

## 3\. The "RAIS+" Pipeline: Unified Employment

Official RAIS data (Formal Employment) is excellent for the private sector but often missing public servants (*statut√°rios*).

AtlasBR offers a **Federated Pipeline** (`include_public_sector=True`). It:

1.  Fetches private jobs (RAIS).
2.  Fetches schools (INEP) and health units (CNES).
3.  Harmonizes columns and assigns proxy CNAE codes.
4.  Merges everything into one geospatial table.

In [None]:
# Load Unified Jobs (Private + Public)
gdf_jobs = atlasbr.load_rais(
    places=["Niter√≥i, RJ"],
    year=2022,
    include_public_sector=True, # <--- The Magic Switch
    geocode=True                # <--- Calculates lat/lon via CEP or source coords
)

print(f"‚úÖ Total Establishments: {len(gdf_jobs)}")

### 3.1 Inspecting the Integration

We can verify the source of the data. Notice how "Escola (INEP)" and "Saude (CNES)" appear alongside standard RAIS types.

In [None]:
# Breakdown by source type
gdf_jobs["tipo_estabelecimento"].value_counts()

In [None]:
# Map the establishments (Schools in Blue, Health in Red, Private in Gray)
m = gdf_jobs[gdf_jobs["tipo_estabelecimento"] == "Escola (INEP)"].explore(
    color="blue", name="Schools"
)
m = gdf_jobs[gdf_jobs["tipo_estabelecimento"] == "Saude (CNES)"].explore(
    m=m, color="red", name="Health"
)
m # Display map

## 4\. Specialized Datasets

Sometimes you need the raw, domain-specific metrics (e.g., number of MRI machines or classroom counts) rather than just job counts.

### 4.1 Schools (INEP)

Note that Schools use high-precision Lat/Lon coordinates natively, unlike the CEP centroids used for businesses.

In [None]:
gdf_schools = atlasbr.load_schools(
    places=["Niter√≥i, RJ"],
    year=2023,
    gcp_billing=project_id,
    as_gdf=True
)

cols = ["rede", "quantidade_matricula_fundamental", "quantidade_docente_educacao_basica"]
gdf_schools[cols].head()

### 4.2 Health Units (CNES)

CNES data includes specific infrastructure metrics like bed counts.

In [None]:
gdf_health = atlasbr.load_cnes(
    places=["Niter√≥i, RJ"],
    year=2023,
    month=9,
    gcp_billing=project_id,
    geocode=True
)

gdf_health[["total_leitos_internacao", "total_consultorios"]].head()

## 5\. Spatial Harmonization (H3 Grids)

Comparing Census Tracts (irregular polygons) with other data is difficult. AtlasBR simplifies this by offering **H3 Interpolation** directly in the loader.

It uses **Tobler's Areal Weighting** to transfer data from Tracts to a regular Hexagonal Grid.


In [None]:

# Load Census data, but output as H3 Hexagons (Resolution 8)
gdf_hex = atlasbr.load_census(
    places=["Niter√≥i, RJ"],
    year=2010,
    themes=["income", "basic"],
    clip_urban=True,
    geometry="h3",  # <--- Change output format
    h3_res=9        # ~0.7km¬≤ hexagons
)

print(f"‚¨¢ Generated {len(gdf_hex)} hexagons.")
gdf_hex.explore(column="rendimento_medio", tiles="CartoDB DarkMatter")

## 6\. Cache Management

To avoid re-downloading large files, AtlasBR caches data locally. You can manage this cache easily.