In [None]:
# %% [markdown]
# # 1. Census Data: Demographics & Income
#
# **The Problem:** Census data is usually trapped in massive FTP CSVs or complex SQL tables. Matching it to geometry (tracts) is painful.
#
# **The Solution:** AtlasBR unifies this into a single command.
#
# **What you will learn:**
# 1.  **Strategies:** How to switch between Cloud (BigQuery) and Offline (FTP) modes seamlessly.
# 2.  **Harmonization:** Why `habitantes` is always `habitantes`, regardless of the source year.
# 3.  **Spatial Aggregation:** Using H3 Hexagons to fix irregular Census tracts.

# %%
# --- 1. Setup & Authentication ---
# Install the library if running in Colab:
# !pip install atlasbr[bd,geo,viz]

import os
import atlasbr
from atlasbr.app.census import load_census

# VISUAL TIP: We use logging to show you exactly what's happening under the hood.
atlasbr.configure_logging()

# CREDENTIALS CHECK:
if not os.getenv("GOOGLE_CLOUD_PROJECT"):
    print("⚠️  Warning: No Google Cloud credentials found. We will demonstrate the FTP strategy.")
else:
    print(f"✅ Authenticated as: {os.getenv('GOOGLE_CLOUD_PROJECT')}")

MUNICIPALITY = "Niterói, RJ"

# %% [markdown]
# ## 2. The Cloud Strategy (BigQuery)
# *Best for: Speed, low memory usage, and querying specific subsets.*
#
# We fetch data directly from the Data Lake. Note that we request specific **themes** (`basic`, `income`). AtlasBR knows exactly which tables to join.

# %%
try:
    gdf_cloud = load_census(
        places=[MUNICIPALITY],
        year=2010,
        themes=["basic", "income"],
        strategy="bd_table",
        geometry="tract"
    )
    print(f"✅ Loaded {len(gdf_cloud)} tracts from BigQuery.")
    
    # Let's see the standardized columns
    display(gdf_cloud[["habitantes", "domicilios", "rendimento_medio"]].head())

except Exception as e:
    print(f"❌ Cloud load skipped: {e}")

# %% [markdown]
# ## 3. The Offline Strategy (FTP)
# *Best for: Users without cloud credentials or for reproducibility.*
#
# Change `strategy="ftp_csv"`. AtlasBR will:
# 1.  Download the official ZIPs from IBGE (cached locally).
# 2.  Extract and parse the CSVs.
# 3.  **Rename columns** to match the BigQuery schema exactly.

# %%
gdf_ftp = load_census(
    places=[MUNICIPALITY],
    year=2010, # AtlasBR automatically finds the 2010 FTP URLs
    themes=["basic", "income"],
    strategy="ftp_csv",
    geometry="tract"
)

# PROOF: The schemas are identical.
# You can write your analysis code once, and it runs on both backends.
print("Columns match?", set(gdf_ftp.columns) == set(gdf_cloud.columns) if 'gdf_cloud' in locals() else "Skipped Cloud")
display(gdf_ftp[["habitantes", "rendimento_medio"]].head())

# %% [markdown]
# ## 4. Solving the "MAUP" with H3
# **The Problem:** Census tracts vary wildly in size. Some are tiny city blocks; others are massive rural areas. This distorts maps (visual bias).
#
# **The Solution:** We aggregate data to a regular **H3 Hexagonal Grid**.
# * **Extensive vars** (Population) are redistributed by area.
# * **Intensive vars** (Income) are averaged.

# %%
gdf_hex = load_census(
    places=[MUNICIPALITY],
    year=2010,
    themes=["basic", "income"],
    strategy="ftp_csv", # Using FTP to ensure it runs for everyone
    geometry="h3",      # <--- The Magic Switch
    h3_res=9            # Resolution 9 (~0.1 km² per hex)
)

print(f"⬢ Transformed {len(gdf_ftp)} irregular tracts into {len(gdf_hex)} regular hexagons.")

# %% [markdown]
# ## 5. Interactive Exploration
# Notice how the hexagons reveal density patterns that irregular tracts might hide.

# %%
# Explore Income Distribution
gdf_hex.explore(
    column="rendimento_medio",
    cmap="magma",
    tiles="CartoDB DarkMatter",
    tooltip=["h3_index", "habitantes", "rendimento_medio"],
    style_kwds={"fillOpacity": 0.6, "weight": 0},
    legend_kwds={"caption": "Average Income (R$)"}
)