# Urban Flood Vulnerability Assessment - Sri Lanka

**Assignment 2 - Scientific Programming for Geospatial Sciences**

This notebook demonstrates the complete flood vulnerability analysis workflow:
1. Data Loading
2. NumPy Array Operations (raster processing)
3. PyTorch Tensor Operations (with performance comparison)
4. Vector Processing (GeoPandas/Shapely)
5. Xarray Data Cubes
6. Raster-Vector Integration
7. Visualization

In [None]:
# imports
import numpy as np
import pandas as pd
import xarray as xr
import geopandas as gpd
import torch
import matplotlib.pyplot as plt
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# our modules
import sys
sys.path.append('..')
from src import data_loading, raster_analysis, tensor_operations, vector_analysis, integration, visualization

print("All imports successful!")

## 1. Data Loading

Load the required datasets for the analysis.

In [None]:
# define data paths
DATA_DIR = Path('../data')

# check if data exists
# NOTE: you need to download the data first, see data/README.md
print("Data directory contents:")
if DATA_DIR.exists():
    for f in DATA_DIR.iterdir():
        print(f"  - {f.name}")
else:
    print("  Data directory not found. Please create it and add data files.")

In [None]:
# example: load rainfall data (CHIRPS)
# uncomment when you have the data

# rainfall = data_loading.load_chirps_data(
#     DATA_DIR / 'chirps_2020_srilanka.nc',
#     time_slice=('2020-01-01', '2020-12-31')
# )
# print(f"Rainfall data shape: {rainfall.shape}")
# print(f"Rainfall dimensions: {rainfall.dims}")

## 2. NumPy Array Operations

Demonstrate array-based raster processing operations.

In [None]:
# create sample rainfall data for demonstration
# in practice this would come from CHIRPS
np.random.seed(42)
sample_rainfall = np.random.exponential(scale=30, size=(365, 100, 100))
print(f"Sample rainfall shape: {sample_rainfall.shape}")
print(f"Max value: {sample_rainfall.max():.2f} mm")

In [None]:
# operation 1: create extreme rainfall mask
extreme_mask = raster_analysis.create_extreme_rainfall_mask(sample_rainfall, threshold=100)
print(f"Extreme rainfall events (>100mm): {extreme_mask.sum()} occurrences")

In [None]:
# operation 2: count extreme events per pixel
extreme_counts = raster_analysis.count_extreme_events(sample_rainfall, threshold=100)
print(f"Max extreme events at a location: {extreme_counts.max()}")

# visualize
plt.figure(figsize=(8, 6))
plt.imshow(extreme_counts, cmap='Reds')
plt.colorbar(label='Number of extreme events')
plt.title('Extreme Rainfall Events per Location')
plt.show()

In [None]:
# operation 3: calculate 95th percentile rainfall
p95_rainfall = raster_analysis.calculate_percentile_rainfall(sample_rainfall, percentile=95)
print(f"95th percentile range: {p95_rainfall.min():.2f} - {p95_rainfall.max():.2f} mm")

In [None]:
# operation 4: normalize for vulnerability calculation
rainfall_norm = raster_analysis.normalize_array(p95_rainfall, method='minmax')
print(f"Normalized range: {rainfall_norm.min():.2f} - {rainfall_norm.max():.2f}")

## 3. PyTorch Tensor Operations

Demonstrate GPU-aware tensor operations and performance comparison.

In [None]:
# check GPU availability
tensor_operations.print_gpu_info()

In [None]:
# convert to tensor
rainfall_tensor = tensor_operations.numpy_to_tensor(p95_rainfall, device='auto')
print(f"Tensor device: {rainfall_tensor.device}")
print(f"Tensor shape: {rainfall_tensor.shape}")

In [None]:
# apply gaussian convolution to find storm centers
smoothed_tensor = tensor_operations.apply_gaussian_convolution(
    rainfall_tensor, kernel_size=5, sigma=1.5
)

# convert back for visualization
smoothed = tensor_operations.tensor_to_numpy(smoothed_tensor)

# compare original vs smoothed
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].imshow(p95_rainfall, cmap='Blues')
axes[0].set_title('Original Rainfall')
axes[1].imshow(smoothed, cmap='Blues')
axes[1].set_title('Smoothed (PyTorch Convolution)')
plt.tight_layout()
plt.show()

In [None]:
# PERFORMANCE COMPARISON: NumPy vs PyTorch
# this is required by the assignment

print("Running performance comparison (this may take a moment)...")
perf_results = tensor_operations.compare_numpy_vs_torch(
    p95_rainfall, 
    kernel_size=5, 
    sigma=1.5, 
    num_iterations=10
)

print("\n" + "="*50)
print("PERFORMANCE COMPARISON RESULTS")
print("="*50)
print(f"Array size: {p95_rainfall.shape}")
print(f"Kernel size: 5x5 Gaussian")
print(f"")
print(f"NumPy (scipy.ndimage): {perf_results['numpy_time']*1000:.2f} ms ± {perf_results['numpy_std']*1000:.2f} ms")
print(f"PyTorch ({perf_results['device']}):    {perf_results['torch_time']*1000:.2f} ms ± {perf_results['torch_std']*1000:.2f} ms")
print(f"")
print(f"Speedup: {perf_results['speedup']:.2f}x")
print("="*50)

## 4. Vector Processing (GeoPandas/Shapely)

Demonstrate at least 3 required geospatial operations.

In [None]:
# create sample data for demonstration
# in practice this would come from Google Buildings and OSM

from shapely.geometry import box, Point, LineString

# sample admin boundaries (3 districts)
admin_boundaries = gpd.GeoDataFrame({
    'district_id': ['D001', 'D002', 'D003'],
    'district_name': ['Colombo', 'Gampaha', 'Kalutara'],
    'geometry': [
        box(79.8, 6.8, 80.0, 7.0),
        box(79.9, 7.0, 80.1, 7.2),
        box(79.7, 6.5, 79.9, 6.8)
    ]
}, crs='EPSG:4326')

# sample buildings (100 random points converted to polygons)
np.random.seed(42)
building_points = [
    Point(np.random.uniform(79.7, 80.1), np.random.uniform(6.5, 7.2))
    for _ in range(100)
]
buildings = gpd.GeoDataFrame({
    'building_id': [f'B{i:03d}' for i in range(100)],
    'geometry': [p.buffer(0.002) for p in building_points]  # make small polygons
}, crs='EPSG:4326')

# sample roads
roads = gpd.GeoDataFrame({
    'highway': ['primary', 'secondary', 'primary'],
    'geometry': [
        LineString([(79.7, 6.8), (80.1, 6.8)]),
        LineString([(79.9, 6.5), (79.9, 7.2)]),
        LineString([(79.8, 7.0), (80.0, 7.0)])
    ]
}, crs='EPSG:4326')

print(f"Admin boundaries: {len(admin_boundaries)} districts")
print(f"Buildings: {len(buildings)} footprints")
print(f"Roads: {len(roads)} segments")

In [None]:
# OPERATION 1: Spatial Join - assign district to each building
buildings_joined = vector_analysis.spatial_join_buildings_to_admin(
    buildings, admin_boundaries, admin_id_col='district_id'
)
print("Operation 1: Spatial Join")
print(buildings_joined[['building_id', 'district_id']].head(10))

In [None]:
# OPERATION 2: Buffer Analysis - create road buffers
road_buffers = vector_analysis.create_road_buffers(
    roads, buffer_distance=0.01, road_types=['primary']
)
print("\nOperation 2: Buffer Analysis")
print(f"Created {len(road_buffers)} road buffers")

# plot
fig, ax = plt.subplots(figsize=(10, 8))
admin_boundaries.plot(ax=ax, alpha=0.3, edgecolor='black')
road_buffers.plot(ax=ax, alpha=0.5, color='yellow')
buildings.plot(ax=ax, color='red', markersize=5)
ax.set_title('Road Buffers and Buildings')
plt.show()

In [None]:
# OPERATION 3: Density Calculation
admin_with_density = vector_analysis.calculate_building_density(
    buildings, admin_boundaries, admin_id_col='district_id'
)
print("\nOperation 3: Building Density Calculation")
print(admin_with_density[['district_name', 'building_count', 'building_density']])

## 5. Xarray Data Cubes

Demonstrate multi-dimensional data handling.

In [None]:
# create sample xarray dataset (simulating CHIRPS)
times = pd.date_range('2020-01-01', periods=365, freq='D')
lats = np.linspace(6.5, 7.2, 50)
lons = np.linspace(79.7, 80.1, 50)

# create rainfall data cube
np.random.seed(42)
rainfall_data = np.random.exponential(scale=20, size=(365, 50, 50))

rainfall_cube = xr.DataArray(
    data=rainfall_data,
    dims=['time', 'latitude', 'longitude'],
    coords={
        'time': times,
        'latitude': lats,
        'longitude': lons
    },
    name='precipitation'
)

print("Data Cube Information:")
print(rainfall_cube)

In [None]:
# temporal operations

# annual maximum
annual_max = rainfall_cube.groupby('time.year').max(dim='time')
print(f"Annual maximum shape: {annual_max.shape}")

# monthly mean
monthly_mean = rainfall_cube.groupby('time.month').mean(dim='time')
print(f"Monthly mean shape: {monthly_mean.shape}")

In [None]:
# temporal slicing - extract monsoon season
monsoon = rainfall_cube.sel(time=slice('2020-05-01', '2020-09-30'))
print(f"Monsoon period: {monsoon.time.min().values} to {monsoon.time.max().values}")
print(f"Monsoon days: {len(monsoon.time)}")
print(f"Max monsoon rainfall: {monsoon.max().values:.2f} mm")

In [None]:
# spatial aggregation
total_rainfall = rainfall_cube.sum(dim='time')

plt.figure(figsize=(8, 6))
total_rainfall.plot(cmap='Blues')
plt.title('Total Annual Rainfall (mm)')
plt.show()

## 6. Raster-Vector Integration

The core requirement: bidirectional raster-vector operations.

In [None]:
# for demonstration, save sample raster to file
import rasterio
from rasterio.transform import from_bounds

# save annual max as geotiff
output_dir = Path('../outputs')
output_dir.mkdir(exist_ok=True)

bounds = (79.7, 6.5, 80.1, 7.2)
transform = from_bounds(*bounds, 50, 50)

rainfall_sample = p95_rainfall
raster_path = output_dir / 'sample_rainfall.tif'

with rasterio.open(
    raster_path, 'w',
    driver='GTiff',
    height=100, width=100,
    count=1, dtype='float32',
    crs='EPSG:4326',
    transform=from_bounds(*bounds, 100, 100)
) as dst:
    dst.write(rainfall_sample.astype('float32'), 1)

print(f"Saved sample raster to: {raster_path}")

In [None]:
# RASTER -> VECTOR: Zonal Statistics
admin_with_rainfall = integration.extract_zonal_statistics(
    admin_boundaries,
    raster_path,
    stats=['mean', 'max', 'min'],
    prefix='rainfall_'
)

print("Zonal Statistics (Raster -> Vector):")
print(admin_with_rainfall[['district_name', 'rainfall_mean', 'rainfall_max']])

In [None]:
# VECTOR -> RASTER: Rasterize building density
# first calculate building density
admin_with_density = vector_analysis.calculate_building_density(
    buildings, admin_boundaries, admin_id_col='district_id'
)

# rasterize
density_raster = integration.rasterize_vector(
    admin_with_density,
    value_column='building_density',
    resolution=(-0.01, 0.01)
)

print("Rasterized Building Density (Vector -> Raster):")
print(f"Shape: {density_raster.shape}")

plt.figure(figsize=(8, 6))
density_raster.plot(cmap='Reds')
plt.title('Building Density Raster')
plt.show()

## 7. Visualization

Create maps and charts for the final output.

In [None]:
# add vulnerability score for visualization
# (in practice this would come from the full integration pipeline)
admin_with_density['vulnerability_score'] = np.random.uniform(0.3, 0.9, len(admin_with_density))
admin_with_density['id'] = admin_with_density['district_id']

In [None]:
# create interactive vulnerability map
vuln_map = visualization.create_vulnerability_map(
    admin_with_density,
    value_column='vulnerability_score',
    title='Flood Vulnerability Score'
)

# display in notebook
vuln_map

In [None]:
# save map to HTML
vuln_map.save(output_dir / 'vulnerability_map.html')
print(f"Map saved to: {output_dir / 'vulnerability_map.html'}")

In [None]:
# create ranking chart
ranking_chart = visualization.create_vulnerability_ranking_chart(
    admin_with_density,
    name_column='district_name',
    value_column='vulnerability_score',
    top_n=10,
    title='Top Vulnerable Districts'
)

ranking_chart.show()

In [None]:
# create static map for report
fig = visualization.create_static_map(
    admin_with_density,
    value_column='vulnerability_score',
    title='Flood Vulnerability Assessment - Sri Lanka'
)

# save for report
fig.savefig(output_dir / 'vulnerability_map.png', dpi=150, bbox_inches='tight')
print(f"Figure saved to: {output_dir / 'vulnerability_map.png'}")

## Summary

This notebook demonstrated all required technical components:

| Component | Implementation |
|-----------|----------------|
| NumPy Arrays | Masking, normalization, percentile calculation |
| PyTorch Tensors | Gaussian convolution with GPU awareness, performance comparison |
| Vector Processing | Spatial join, buffer analysis, density calculation (3+ operations) |
| Xarray Data Cubes | Temporal slicing, aggregation, groupby operations |
| Raster-Vector Integration | Zonal statistics (R→V), rasterization (V→R) |