# Amazon Rainforest Moisture Recycling Cascade

**Author:** Jason Holt  
**Date:** December 2025  
**Research Question:** How does moisture recycling create cascade dynamics in the Amazon?

---

## Background

The Amazon rainforest generates ~50% of its own rainfall through evapotranspiration and moisture recycling. When a forest cell tips (deforestation or dieback), it reduces rainfall for downwind cells, potentially triggering a cascade.

This is a **spatial tipping cascade** - unlike the 4-element Earth system, this model has hundreds of coupled forest cells.

### Key Concept: Critical Rainfall (r_crit)

Each forest cell has a critical rainfall threshold. If rainfall drops below r_crit, the forest transitions to savanna. Typical values: 1000-1700 mm/year.

### Data

Using test data from PyCascades (2003 monthly moisture transport, 1° resolution).

In [None]:
import sys
import numpy as np
import networkx as nx
import glob
import pycascades as pc
import matplotlib.pyplot as plt
import matplotlib.colors as colors
from netCDF4 import Dataset

# Optional cartopy for maps (if available)
try:
    import cartopy.crs as ccrs
    import cartopy.feature as cfeature
    HAS_CARTOPY = True
except ImportError:
    HAS_CARTOPY = False
    print("Cartopy not available - using simple plots")

print(f"PyCascades version: {pc.__version__ if hasattr(pc, '__version__') else 'unknown'}")

In [None]:
# Connect to Dask cluster for parallel computation
from dask.distributed import Client, as_completed
import time

try:
    client = Client('cascades-dask-scheduler:8786', timeout='10s')
    print(f"✓ Connected to Dask cluster")
    print(f"  Dashboard: http://localhost:30787")
    print(f"  Workers: {len(client.scheduler_info()['workers'])}")
    print(f"  Total cores: {sum(w['nthreads'] for w in client.scheduler_info()['workers'].values())}")
    DASK_AVAILABLE = True
except Exception as e:
    print(f"⚠ Dask not available: {e}")
    print("  Falling back to sequential execution")
    DASK_AVAILABLE = False

## 1. Load and Explore the Data

In [None]:
# Path to test data (check possible locations)
import os
possible_paths = [
    "/opt/research-local/external/pycascades/amazon_rainforest/data/",
    "/opt/research/pycascades/amazon_rainforest/data/",
    "/workspace/external/pycascades/amazon_rainforest/data/"
]
data_path = None
for p in possible_paths:
    if os.path.exists(p):
        data_path = p
        break
if data_path is None:
    raise FileNotFoundError(f"Amazon data not found in: {possible_paths}")
print(f"Using data path: {data_path}")

year = "2003"

# Load all monthly files
data_files = sorted(glob.glob(f"{data_path}*{year}*.nc"))
print(f"Found {len(data_files)} monthly data files:")
for f in data_files:
    print(f"  {f.split('/')[-1]}")

In [None]:
# Explore the structure of one data file
sample_data = Dataset(data_files[0])

print("Variables in dataset:")
for var in sample_data.variables:
    shape = sample_data.variables[var].shape
    print(f"  {var}: shape={shape}")

# Get coordinates
lon = sample_data.variables['lon'][:]
lat = sample_data.variables['lat'][:]
rain = sample_data.variables['rain'][:]
network = sample_data.variables['network'][:]

print(f"\nSpatial extent:")
print(f"  Longitude: {lon.min():.1f} to {lon.max():.1f}")
print(f"  Latitude: {lat.min():.1f} to {lat.max():.1f}")
print(f"  Number of cells: {len(lon)}")
print(f"\nRainfall range: {np.nanmin(rain):.0f} to {np.nanmax(rain):.0f} mm")
print(f"Network matrix shape: {network.shape}")

In [None]:
# Compute annual rainfall (sum of monthly)
annual_rain = np.zeros(len(lon))
for f in data_files:
    ds = Dataset(f)
    annual_rain += ds.variables['rain'][:]

print(f"Annual rainfall statistics:")
print(f"  Min: {np.nanmin(annual_rain):.0f} mm/year")
print(f"  Max: {np.nanmax(annual_rain):.0f} mm/year")
print(f"  Mean: {np.nanmean(annual_rain):.0f} mm/year")
print(f"  Std: {np.nanstd(annual_rain):.0f} mm/year")

In [None]:
# Visualize annual rainfall
fig, ax = plt.subplots(figsize=(10, 8))

scatter = ax.scatter(lon, lat, c=annual_rain, cmap='YlGnBu', s=50, edgecolors='none')
cbar = plt.colorbar(scatter, label='Annual Rainfall [mm/year]')
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
ax.set_title('Amazon Basin Annual Rainfall (2003)')
ax.set_aspect('equal')
plt.tight_layout()
plt.show()

# Histogram
fig, ax = plt.subplots(figsize=(10, 4))
ax.hist(annual_rain, bins=30, edgecolor='black', alpha=0.7)
ax.axvline(x=1500, color='r', linestyle='--', label='r_crit = 1500 mm')
ax.axvline(x=1700, color='orange', linestyle='--', label='r_crit = 1700 mm')
ax.set_xlabel('Annual Rainfall [mm/year]')
ax.set_ylabel('Number of cells')
ax.set_title('Distribution of Amazon Cell Rainfall')
ax.legend()
plt.tight_layout()
plt.show()

## 2. Build the Moisture Recycling Network

In [None]:
# Critical rainfall threshold
r_crit = 1700  # mm/year - cells below this will tip

print(f"Building Amazon network with r_crit = {r_crit} mm/year...")
print(f"Cells currently below threshold: {np.sum(annual_rain < r_crit)} / {len(annual_rain)}")

# Generate network using PyCascades
no_cpl_dummy = False  # True = no coupling (control), False = with coupling
net = pc.amazon.generate_network(r_crit, data_files, no_cpl_dummy)

In [None]:
# Network statistics
n_nodes = net.number_of_nodes()
n_edges = net.number_of_edges()
avg_degree = n_edges / n_nodes
avg_clustering = nx.average_clustering(net)

print(f"Network Statistics:")
print(f"  Nodes (forest cells): {n_nodes}")
print(f"  Edges (moisture links): {n_edges}")
print(f"  Average degree: {avg_degree:.2f}")
print(f"  Average clustering: {avg_clustering:.4f}")

# Degree distribution
degrees = [d for n, d in net.degree()]
in_degrees = [d for n, d in net.in_degree()]
out_degrees = [d for n, d in net.out_degree()]

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
axes[0].hist(degrees, bins=30, edgecolor='black', alpha=0.7)
axes[0].set_xlabel('Total Degree')
axes[0].set_ylabel('Count')
axes[0].set_title('Degree Distribution')

axes[1].hist(in_degrees, bins=30, edgecolor='black', alpha=0.7, color='green')
axes[1].set_xlabel('In-Degree (receives moisture)')
axes[1].set_title('In-Degree Distribution')

axes[2].hist(out_degrees, bins=30, edgecolor='black', alpha=0.7, color='orange')
axes[2].set_xlabel('Out-Degree (sends moisture)')
axes[2].set_title('Out-Degree Distribution')

plt.tight_layout()
plt.show()

## 3. Run Cascade Simulation

In [None]:
def run_cascade(net, initial_state=None):
    """
    Run cascade simulation until equilibrium.
    Returns: (convergence_time, num_tipped, tip_states)
    """
    n_nodes = net.number_of_nodes()
    
    if initial_state is None:
        initial_state = np.full(n_nodes, -1.0)  # All cells start untipped
    
    ev = pc.evolve(net, initial_state)
    
    tolerance = 0.01
    t_step = 1
    realtime_break = 300  # seconds timeout
    
    # Check initial state
    if not ev.is_equilibrium(tolerance):
        print("Note: Initial state is not equilibrium - will evolve")
    
    # Evolve to equilibrium
    try:
        ev.equilibrate(tolerance, t_step, realtime_break)
    except Exception as e:
        print(f"Warning: {e}")
    
    times, states = ev.get_timeseries()
    final_state = states[-1]
    
    conv_time = times[-1] - times[0]
    num_tipped = net.get_number_tipped(final_state)
    tip_states = net.get_tip_states(final_state)
    
    return conv_time, num_tipped, tip_states, times, states

print("Running cascade simulation (this may take a minute)...")
conv_time, num_tipped, tip_states, times, states = run_cascade(net)

print(f"\nResults:")
print(f"  Convergence time: {conv_time:.2f} time units")
print(f"  Cells tipped: {num_tipped} / {net.number_of_nodes()} ({100*num_tipped/net.number_of_nodes():.1f}%)")

In [None]:
# Visualize cascade result
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Map of tipped vs untipped cells
ax = axes[0]
colors_map = ['green' if not t else 'brown' for t in tip_states]
ax.scatter(lon, lat, c=colors_map, s=50, edgecolors='black', linewidths=0.5)
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
ax.set_title(f'Amazon Cascade Result (r_crit={r_crit} mm)\nGreen=Forest, Brown=Tipped')
ax.set_aspect('equal')

# Time evolution of cascade
ax = axes[1]
n_tipped_over_time = [np.sum(states[i] > 0) for i in range(len(times))]
ax.plot(times, n_tipped_over_time, 'b-', linewidth=2)
ax.set_xlabel('Time')
ax.set_ylabel('Number of Tipped Cells')
ax.set_title('Cascade Propagation Over Time')
ax.axhline(y=num_tipped, color='r', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()

## 4. Critical Rainfall Sensitivity Analysis

How does the cascade size depend on r_crit?

In [None]:
# Scan over different r_crit values - PARALLELIZED VERSION
r_crit_values = np.arange(1200, 1900, 100)

def run_single_r_crit(r_c, data_files_list):
    """Run cascade for a single r_crit value (with and without coupling)."""
    import pycascades as pc
    import numpy as np
    
    def run_cascade_simple(net):
        n_nodes = net.number_of_nodes()
        initial_state = np.full(n_nodes, -1.0)
        ev = pc.evolve(net, initial_state)
        try:
            ev.equilibrate(0.01, 1, 300)
        except:
            pass
        times, states = ev.get_timeseries()
        final_state = states[-1]
        conv_time = times[-1] - times[0]
        num_tipped = net.get_number_tipped(final_state)
        tip_states = net.get_tip_states(final_state)
        return conv_time, num_tipped, tip_states
    
    # With coupling
    net_cpl = pc.amazon.generate_network(r_c, data_files_list, no_cpl_dummy=False)
    conv_time, num_tipped, _ = run_cascade_simple(net_cpl)
    
    # Without coupling
    net_nocpl = pc.amazon.generate_network(r_c, data_files_list, no_cpl_dummy=True)
    _, num_tipped_nocpl, _ = run_cascade_simple(net_nocpl)
    
    return {
        'r_crit': r_c,
        'tipped_with_coupling': num_tipped,
        'tipped_no_coupling': num_tipped_nocpl,
        'cascade_amplification': num_tipped - num_tipped_nocpl,
        'conv_time': conv_time
    }

print(f"Scanning {len(r_crit_values)} r_crit values...")
start_time = time.time()

if DASK_AVAILABLE and len(client.scheduler_info()['workers']) > 0:
    # PARALLEL EXECUTION
    print(f"Using Dask parallel execution ({len(client.scheduler_info()['workers'])} workers)")
    
    # Submit all jobs
    futures = []
    for r_c in r_crit_values:
        future = client.submit(run_single_r_crit, r_c, data_files)
        futures.append(future)
    
    # Gather results as they complete
    results = []
    for future in as_completed(futures):
        result = future.result()
        results.append(result)
        print(f"  r_crit = {result['r_crit']} mm/year... tipped: {result['tipped_with_coupling']} (cascade adds {result['cascade_amplification']})")
    
    # Sort by r_crit
    results = sorted(results, key=lambda x: x['r_crit'])
else:
    # SEQUENTIAL FALLBACK
    print("Using sequential execution (Dask not available)")
    results = []
    for r_c in r_crit_values:
        print(f"  r_crit = {r_c} mm/year...", end=" ")
        result = run_single_r_crit(r_c, data_files)
        results.append(result)
        print(f"tipped: {result['tipped_with_coupling']} (cascade adds {result['cascade_amplification']})")

elapsed = time.time() - start_time
print(f"\n✓ Done in {elapsed:.1f} seconds")

In [None]:
# Plot sensitivity results
import pandas as pd
df = pd.DataFrame(results)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Tipped cells vs r_crit
ax = axes[0]
ax.plot(df['r_crit'], df['tipped_with_coupling'], 'b-o', linewidth=2, markersize=8, label='With moisture coupling')
ax.plot(df['r_crit'], df['tipped_no_coupling'], 'r--s', linewidth=2, markersize=8, label='No coupling (control)')
ax.fill_between(df['r_crit'], df['tipped_no_coupling'], df['tipped_with_coupling'], alpha=0.3, label='Cascade effect')
ax.set_xlabel('Critical Rainfall r_crit [mm/year]')
ax.set_ylabel('Number of Tipped Cells')
ax.set_title('Cascade Size vs Critical Rainfall Threshold')
ax.legend()
ax.grid(True, alpha=0.3)

# Cascade amplification
ax = axes[1]
ax.bar(df['r_crit'], df['cascade_amplification'], width=80, color='orange', edgecolor='black')
ax.set_xlabel('Critical Rainfall r_crit [mm/year]')
ax.set_ylabel('Additional Cells Tipped by Cascade')
ax.set_title('Cascade Amplification Effect')
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Summary table
print("\nSummary Table:")
print(df.to_string(index=False))

## 5. Comparison: With vs Without Coupling

Visualize which cells tip due to cascade effects (wouldn't tip without moisture coupling).

In [None]:
# Run both simulations at r_crit = 1500 - PARALLELIZED VERSION
r_crit_compare = 1500

def run_comparison_sim(r_c, data_files_list, with_coupling):
    """Run single comparison simulation."""
    import pycascades as pc
    import numpy as np
    
    net = pc.amazon.generate_network(r_c, data_files_list, no_cpl_dummy=(not with_coupling))
    n_nodes = net.number_of_nodes()
    initial_state = np.full(n_nodes, -1.0)
    ev = pc.evolve(net, initial_state)
    try:
        ev.equilibrate(0.01, 1, 300)
    except:
        pass
    times, states = ev.get_timeseries()
    final_state = states[-1]
    tip_states = net.get_tip_states(final_state)
    return tip_states

print(f"Running comparison at r_crit = {r_crit_compare} mm/year...")
start_time = time.time()

if DASK_AVAILABLE and len(client.scheduler_info()['workers']) > 0:
    # PARALLEL EXECUTION - run both simultaneously
    print("Using Dask parallel execution")
    future_cpl = client.submit(run_comparison_sim, r_crit_compare, data_files, True)
    future_nocpl = client.submit(run_comparison_sim, r_crit_compare, data_files, False)
    
    tip_cpl = future_cpl.result()
    tip_nocpl = future_nocpl.result()
else:
    # SEQUENTIAL FALLBACK
    print("Using sequential execution")
    tip_cpl = run_comparison_sim(r_crit_compare, data_files, True)
    tip_nocpl = run_comparison_sim(r_crit_compare, data_files, False)

# Categorize cells
categories = []
for i in range(len(tip_cpl)):
    if not tip_cpl[i] and not tip_nocpl[i]:
        categories.append('stable')  # Never tips
    elif tip_cpl[i] and tip_nocpl[i]:
        categories.append('direct')  # Tips due to direct rainfall deficit
    elif tip_cpl[i] and not tip_nocpl[i]:
        categories.append('cascade')  # Tips only due to cascade
    else:
        categories.append('unknown')  # Shouldn't happen (tips without coupling but not with)

# Count
from collections import Counter
counts = Counter(categories)

elapsed = time.time() - start_time
print(f"\n✓ Done in {elapsed:.1f} seconds")
print(f"\nCell categories at r_crit = {r_crit_compare}:")
print(f"  Stable (forest): {counts.get('stable', 0)}")
print(f"  Direct tipping: {counts.get('direct', 0)}")
print(f"  Cascade tipping: {counts.get('cascade', 0)}")
if counts.get('unknown', 0) > 0:
    print(f"  Anomalous: {counts.get('unknown', 0)}")

In [None]:
# Visualize the three categories
color_map = {'stable': 'green', 'direct': 'brown', 'cascade': 'red', 'unknown': 'gray'}
colors_list = [color_map[c] for c in categories]

fig, ax = plt.subplots(figsize=(12, 10))
scatter = ax.scatter(lon, lat, c=colors_list, s=60, edgecolors='black', linewidths=0.5)

# Legend
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='green', edgecolor='black', label=f'Stable forest ({counts.get("stable", 0)})'),
    Patch(facecolor='brown', edgecolor='black', label=f'Direct tipping ({counts.get("direct", 0)})'),
    Patch(facecolor='red', edgecolor='black', label=f'Cascade tipping ({counts.get("cascade", 0)})'),
]
# Add unknown to legend if present
if counts.get('unknown', 0) > 0:
    legend_elements.append(
        Patch(facecolor='gray', edgecolor='black', label=f'Anomalous ({counts.get("unknown", 0)})')
    )
    print(f"⚠️ Found {counts['unknown']} 'unknown' cells - tips without coupling but not with coupling")
    print("   This may indicate numerical edge cases near the tipping threshold")

ax.legend(handles=legend_elements, loc='upper left', fontsize=12)

ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
ax.set_title(f'Amazon Tipping Mechanisms (r_crit = {r_crit_compare} mm/year)\nRed cells tip ONLY due to moisture cascade')
ax.set_aspect('equal')
plt.tight_layout()
plt.show()

## 6. Key Findings & Conclusions

*Record observations after running:*

### Observations

1. **Network structure:**
   - 

2. **Cascade behavior:**
   - 

3. **Critical threshold sensitivity:**
   - 

4. **Cascade amplification:**
   - 

### Comparison with 4-Element Earth System Model

| Aspect | Earth System (4 elements) | Amazon (N cells) |
|--------|---------------------------|------------------|
| Network size | 4 nodes | ~hundreds nodes |
| Timescales | 1-98 years | Similar (fast) |
| Coupling type | Abstract probabilities | Physical moisture transport |
| Noise | Lévy/Gaussian | Deterministic |
| Cascade driver | External GMT forcing | Internal rainfall dynamics |

---

*This notebook completes Phase 2 exploration of cascade dynamics in PyCascades.*