# PM2.5 Santiago - Quick Start Guide

**Author:** Francisco Parrao  
**Date:** 2025-11-10  
**Project:** Spatiotemporal PM2.5 Prediction for Santiago using Satellite Data and ML

---

## Objective

This notebook provides a quick introduction to the project and demonstrates the basic workflow.

In [None]:
# Import libraries
import os
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
import folium

# Google Earth Engine
import ee
import geemap

# Project modules
from src.data_acquisition import GEEDataDownloader

# Display settings
%matplotlib inline
sns.set_style('darkgrid')
plt.rcParams['figure.figsize'] = (12, 8)

print("Libraries imported successfully!")

## 1. Initialize Google Earth Engine

In [None]:
# Initialize GEE (run this once per session)
try:
    ee.Initialize()
    print("✓ Google Earth Engine initialized successfully!")
except Exception as e:
    print(f"Error initializing GEE: {e}")
    print("Please run: earthengine authenticate")

## 2. Define Study Area (Santiago Metropolitan Region)

In [None]:
# Santiago bounding box
santiago_bbox = {
    'west': -71.0,
    'south': -33.8,
    'east': -70.4,
    'north': -33.2
}

# Create ee.Geometry
santiago_geometry = ee.Geometry.Rectangle([
    santiago_bbox['west'], santiago_bbox['south'],
    santiago_bbox['east'], santiago_bbox['north']
])

# Visualize on interactive map
Map = geemap.Map(center=[-33.5, -70.7], zoom=10)
Map.addLayer(santiago_geometry, {'color': 'red'}, 'Santiago Study Area')
Map.add_basemap('OpenStreetMap')
Map

## 3. Quick Data Check - Sentinel-5P NO₂

In [None]:
# Load Sentinel-5P NO2 for last month
from datetime import datetime, timedelta

end_date = datetime.now()
start_date = end_date - timedelta(days=30)

start_str = start_date.strftime('%Y-%m-%d')
end_str = end_date.strftime('%Y-%m-%d')

print(f"Loading NO₂ data: {start_str} to {end_str}")

# Load collection
no2_collection = (ee.ImageCollection('COPERNICUS/S5P/OFFL/L3_NO2')
                  .filterDate(start_str, end_str)
                  .filterBounds(santiago_geometry)
                  .select('NO2_column_number_density'))

# Get image count
count = no2_collection.size().getInfo()
print(f"Found {count} Sentinel-5P images")

# Calculate mean
no2_mean = no2_collection.mean()

# Visualization parameters
vis_params = {
    'min': 0,
    'max': 0.0002,
    'palette': ['blue', 'green', 'yellow', 'orange', 'red']
}

# Add to map
Map2 = geemap.Map(center=[-33.5, -70.7], zoom=10)
Map2.addLayer(no2_mean, vis_params, 'NO₂ (30-day mean)')
Map2.addLayer(santiago_geometry, {'color': 'red'}, 'Santiago')
Map2.add_colorbar(vis_params, label='NO₂ column density (mol/m²)')
Map2

## 4. SINCA Stations Location

In [None]:
# Create sample station locations (you'll replace with actual SINCA data)
stations = pd.DataFrame({
    'station': ['Cerrillos', 'Cerro Navia', 'El Bosque', 'Independencia',
                'La Florida', 'Las Condes', 'Pudahuel', 'Puente Alto'],
    'lat': [-33.50, -33.42, -33.56, -33.41, -33.52, -33.40, -33.44, -33.61],
    'lon': [-70.71, -70.74, -70.69, -70.66, -70.58, -70.58, -70.77, -70.58]
})

# Create map
m = folium.Map(location=[-33.5, -70.7], zoom_start=11)

# Add stations
for idx, row in stations.iterrows():
    folium.Marker(
        location=[row['lat'], row['lon']],
        popup=row['station'],
        icon=folium.Icon(color='red', icon='info-sign')
    ).add_to(m)

# Add bounding box
folium.Rectangle(
    bounds=[[santiago_bbox['south'], santiago_bbox['west']],
            [santiago_bbox['north'], santiago_bbox['east']]],
    color='blue',
    fill=False
).add_to(m)

m

## 5. Next Steps

Now that you've verified the setup works, proceed with:

1. **Notebook 01:** Data Exploration
2. **Notebook 02:** SINCA Data Analysis
3. **Notebook 03:** Satellite Data Extraction
4. **Notebook 04:** Feature Engineering
5. **Notebook 05:** Baseline Models
6. **Notebook 06:** ML Modeling
7. **Notebook 07:** Model Evaluation
8. **Notebook 08:** Spatial Analysis
9. **Notebook 09:** Population Exposure

---

## Useful Commands

```bash
# Download Sentinel-5P data
python src/data_acquisition/gee_downloader.py --start-date 2019-01-01 --end-date 2025-11-10 --monthly

# Download SINCA data (manual)
python src/data_acquisition/sinca_scraper.py --start-date 2019-01-01 --end-date 2025-11-10

# Run all tests
pytest tests/
```