# Transport Case Study: Group 0 (Demo)

**Scenario**: Industrial Trichloroethylene (TCE) Spill

**Group Members**: 
- TODO: Add your names

**Date**: TODO

---

## 1. Overview and Learning Objectives

### Problem Statement
A 30-day trichloroethylene (TCE) spill from an industrial facility has contaminated the groundwater in the Limmat Valley. Your task is to:
1. Model the transport of TCE through the aquifer over a 2-year period
2. Analyze how the existing well field (pumping and injection wells) affects plume migration
3. Assess whether contamination reaches the Limmat River or compliance monitoring points
4. **(Optional)** Verify your numerical model against analytical solutions

### Learning Objectives
By completing this case study, you will:
- Set up a coupled MODFLOW-MT3D transport model
- Apply the telescope approach to refine resolution around source and wells
- Define contaminant source terms using the SSM package
- Analyze well-contaminant interactions (capture zones, spreading)
- **(Optional)** Verify numerical results with analytical solutions
- Communicate findings in a professional modeling report

### Deliverables
1. This completed notebook with all code executed and results displayed
2. Completed `case_config_transport.yaml` with justified parameter choices
3. Professional report (3-4 pages PDF) summarizing methods, results, and conclusions
4. **(Optional)** Analytical comparison section with plots and discussion

### Key Questions to Answer
- Will the pumping wells capture the TCE plume before it reaches the Limmat River?
- What is the maximum extent of the contamination (area where C > 5 mg/L)?
- When will contamination reach monitoring locations?
- How do injection wells/Sickergalerie affect plume spreading?
- **(Optional)** How well does a 1D analytical solution predict plume behavior compared to the full 2D/3D numerical model?

---
## 2. Workflow Summary

### Transport Case Study Workflow

This case study follows a simpler workflow than the flow case study (no scenario variations):

```
1. Load fresh base parent model (independent from your flow case results)
   ↓
2. Load well locations from flow case study (case_config.yaml)
   ↓
3. Define transport submodel domain (around wells and source area)
   ↓
4. Create refined grid for submodel (5m cells for better resolution)
   ↓
5. Extract boundary conditions from parent model
   ↓
6. Set up MODFLOW submodel with wells (steady-state flow)
   ↓
7. Run flow model and verify convergence
   ↓
8. Set up MT3D transport model (transient concentrations)
   ↓
9. Define contaminant source term (SSM package)
   ↓
10. Run 2-year transport simulation
    ↓
11. Post-process: concentration maps, breakthrough curves, mass balance
    ↓
12. Quality checks: physical reasonableness, mass balance
    ↓
13. OPTIONAL: Analytical comparison 
    ↓
14. Interpret results and write professional report
```

### Key Concept: Steady Flow + Transient Transport

We assume **steady-state flow** (heads don't change with time) but **transient transport** (concentrations evolve over time). This is the standard approach for long-term contamination problems because:
- Groundwater flow reaches equilibrium quickly (days to weeks)
- Contaminant transport is much slower (months to years)
- Allows us to focus on transport processes without rerunning flow at each time step

### Differences from Flow Case Study

| Aspect | Flow Case Study | Transport Case Study |
|--------|----------------|---------------------|
| Starting model | Base parent model | Same fresh base parent model |
| Wells | Student implements from concession | **Reuse from case_config.yaml** |
| Scenarios | 3 stages + parameter variations | **Single run: wells + transport** |
| Complexity | 3-stage workflow | **Simpler: 1-stage** |
| Focus | Flow system response | **Contaminant fate and transport** |
| Analysis | Drawdown, river leakage | **Plume migration, breakthrough** |
| Time dimension | Steady-state | **Steady flow + transient transport** |
| Analytical check | Not applicable | **Optional** |

### Time Estimate
- Setup and configuration: 2-3 hours
- Model execution and debugging: 2-3 hours
- Analysis and visualization: 2-3 hours
- **(Optional) Analytical comparison: 0.5-1 hour** 
- Report writing: 2-3 hours
- **Total: ~8-10 hours** (or 10-11 hours with optional section)

---
## 3. Configuration and Setup

### Import Libraries

Import all necessary Python libraries for modeling, analysis, and visualization.

In [None]:
# Import required libraries
import sys
import os
import numpy as np
import pickle
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
import matplotlib.patches as mpatches
from shapely.geometry import Point, Polygon
from shapely.affinity import rotate
import flopy
from flopy.discretization import StructuredGrid

# print current working directory
print("Current working directory: ", os.getcwd())

# Add the support repo to the path
sys.path.append(os.path.abspath('../../../SUPPORT_REPO/src'))
sys.path.append(os.path.abspath('../../../SUPPORT_REPO/src/scripts/scripts_exercises'))

# Import local modules
import case_utils 
from data_utils import download_named_file, get_default_data_folder
import grid_utils
from print_images import display_image
import plot_utils

### Load Configuration

Load transport scenario configuration from `case_config_transport.yaml`.

In [None]:
# Load flow case study configuration
CASE_YAML = 'case_config.yaml'
cfg = case_utils.load_yaml(CASE_YAML)

# Get group configuration
group_number = cfg['group'].get('number', 0)
if not isinstance(group_number, int) or group_number < 0 or group_number > 8:
    raise ValueError("Group number must be an integer between 0 and 8.")

print(f"Group number: {group_number}")

# Load and merge the transport configuration
TRANSPORT_YAML = 'case_config_transport.yaml'
cfg_transport = case_utils.load_yaml(TRANSPORT_YAML)

# Merge transport config into cfg (transport-specific keys added to main config)
# This keeps both configs accessible from a single object
cfg.update(cfg_transport)

print(f"Configuration loaded successfully. Available sections: {list(cfg.keys())}")

---
## 4. Load Parent Flow Model

### Download and Load Base Model

Load the fresh base parent model (independent from your flow case study results). This ensures a known-good starting point for transport modeling.

**Important**: We use the baseline model, not your modified flow case study model, to:
- Avoid error propagation from flow modeling
- Start from a verified, converged flow field
- Simplify the workflow

In [None]:
# Download parent base model and save it to the transport workspace
# To make sure not to carry over any unintended changes from the flow case study, 
# we download a fresh copy of the base model specified in the transport configuration.

# After cfg.update(cfg_transport), cfg['model'] now contains the transport model config
# with workspace pointing to the transport subdirectory
parent_base_model_name = cfg['model']['data_name']
parent_workspace = os.path.expanduser(cfg['model']['workspace'])

# Download to the transport-specific directory
parent_base_model_path = download_named_file(
    parent_base_model_name, 
    dest_folder=parent_workspace,
    data_type=None,  # Don't append additional subdirectory
)

# Handle zip file extraction if needed
if parent_base_model_path.endswith('.zip'):
    import zipfile
    extract_path = os.path.dirname(parent_base_model_path)
    with zipfile.ZipFile(parent_base_model_path, 'r') as zip_ref:
        zip_ref.extractall(extract_path)
    parent_base_model_path = os.path.join(extract_path, cfg['model']['namefile'])

print(f'Downloaded the parent base model to: {parent_base_model_path}')
print(f'Model workspace: {parent_workspace}')

### Verify Parent Model

Run the parent model to verify it converges and produces reasonable flow field.

In [None]:
# ----- Load model results ----- #
parent_base_namefile = os.path.basename(parent_base_model_path)
m_parent_base = flopy.modflow.Modflow.load(
    parent_base_namefile, 
    model_ws=parent_workspace, 
    check=False, 
    forgive=False, 
    exe_name='mfnwt'
)
# Check if heads file exists, if not run the model
parent_hds_path = os.path.join(parent_workspace, f"{m_parent_base.name}.hds")
if not os.path.exists(parent_hds_path):
    print("Parent model heads file not found. Running parent model...")
    success, buff = m_parent_base.run_model(silent=True, report=True)
    if not success:
        raise RuntimeError("Parent model failed to run")
    print("✓ Parent model run completed")

# Load and visualize groundwater heads
headobj = flopy.utils.HeadFile(parent_hds_path)
print(f'Heads loaded from {parent_hds_path}')
heads = headobj.get_data()[0]  # Layer 0, stress period 0


# ----- Create visualization ----- #
# Create visualization
fig, ax = plt.subplots(figsize=(16, 12))

# Plot model with heads
pmv = flopy.plot.PlotMapView(model=m_parent_base, ax=ax)

# Plot head distribution as colored background
heads_masked = np.ma.masked_where(m_parent_base.bas6.ibound.array[0] <= 0, heads)
im = pmv.plot_array(heads_masked, alpha=0.6, cmap='Blues')

# Add head contours
contour_levels = np.linspace(np.nanmin(heads_masked), np.nanmax(heads_masked), 15)
cont = pmv.contour_array(heads_masked, levels=contour_levels, colors='black', 
                        linewidths=1.5, linestyles='-')
ax.clabel(cont, inline=True, fontsize=9, fmt='%.1f m')

# Plot model grid (light)
pmv.plot_grid(color='gray', alpha=0.3, linewidth=0.5)

# Add colorbar
cbar = plt.colorbar(im, ax=ax, shrink=0.3, pad=0.02)
cbar.set_label('Hydraulic Head (m a.s.l.)', fontsize=12)

# Formatting
ax.set_title(f'Parent Base Model - Groundwater Flow Field\nGroup {group_number} - Hydraulic Heads', 
             fontsize=14, fontweight='bold')
ax.set_xlabel('X Coordinate (m)', fontsize=12)
ax.set_ylabel('Y Coordinate (m)', fontsize=12)
ax.set_aspect('equal')

# Add text box with model info
info_text = f'Model: {m_parent_base.name}\nGrid: {m_parent_base.nrow}×{m_parent_base.ncol}\nCell size: {m_parent_base.dis.delr[0]:.0f}m'
ax.text(0.02, 0.98, info_text, transform=ax.transAxes, fontsize=10,
        verticalalignment='top', bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

plt.tight_layout()
plt.show()

---
## 5. Load Well Data from Flow Case Study

### Load Wells from case_config.yaml

Load the well locations and pumping/injection rates from your flow case study. Wells are **reused**, not reimplemented.

**Key Tasks:**
1. Read well data from flow case `case_config.yaml` or concession CSV
2. Identify which wells are **pumping** (negative Q) vs **injection** (positive Q)
3. Map well locations to parent model grid
4. Visualize well locations relative to planned source location

In [None]:
# TODO: Load well data from case_config.yaml
# Parse well information
# Separate pumping vs injection wells
# Map to grid coordinates

# Load the rotated model grid for visualization
print(f"Constructing modelgrid_path:")
modelgrid_path = os.path.join(os.path.dirname(parent_base_model_path), f"{parent_base_namefile.replace('.nam', '')}_modelgrid.pkl")
with open(modelgrid_path, 'rb') as f:
    parent_modelgrid = pickle.load(f)

print(f"Parent model grid loaded")
print(f"Grid rotation: {parent_modelgrid.angrot} degrees")
print(f"Grid extent: X [{parent_modelgrid.extent[0]:.1f}, {parent_modelgrid.extent[1]:.1f}]")
print(f"             Y [{parent_modelgrid.extent[2]:.1f}, {parent_modelgrid.extent[3]:.1f}]")

# Get well locations for the specified group
scenario = case_utils.get_scenario_for_group(CASE_YAML, group_number)
concession_id = scenario.get('concession', None)
if concession_id is None:
    raise ValueError(f"Concession ID not defined for group {group_number}")

# Load and filter wells by concession
well_data_path = download_named_file(name='wells', data_type='gis')
wells_gdf = gpd.read_file(well_data_path, layer='GS_GRUNDWASSERFASSUNGEN_OGD_P')
wells_gdf = case_utils.filter_wells_by_concession(wells_gdf, concession_id)

print(f"\nWells for concession {concession_id}:")
print(wells_gdf[['GWR_ID', 'GWR_PREFIX', 'FASSART']])

# Identify source location from transport scenario
# Navigate through transport_scenarios -> options -> find id matching group_number
transport_scenarios = cfg.get('transport_scenarios', {})
scenario_options = transport_scenarios.get('options', [])

# Find the scenario matching the group number
scenario_config = None
for option in scenario_options:
    if option.get('id') == group_number:
        scenario_config = option
        break

if scenario_config is None:
    raise ValueError(f"No transport scenario found for group {group_number}")

print(f"\nTransport scenario: {scenario_config.get('title', 'Unknown')}")
print(f"Contaminant: {scenario_config.get('contaminant', 'Unknown')}")

# Extract source location (relative coordinates from config)
source_config = scenario_config.get('source', {})
source_location = source_config.get('location', {})
source_easting_relative = source_location.get('easting', None)
source_northing_relative = source_location.get('northing', None)

if source_easting_relative is None or source_northing_relative is None:
    raise ValueError(f"Source location not defined for group {group_number} in case_config_transport.yaml")

# Convert relative coordinates to absolute Swiss coordinates
# Use the first well (or centroid of wells) as reference point
reference_well = wells_gdf.iloc[0]
reference_easting = reference_well.geometry.x
reference_northing = reference_well.geometry.y

source_easting = reference_easting + source_easting_relative
source_northing = reference_northing + source_northing_relative

print(f"\nSource location:")
print(f"  Reference well: {reference_well['GWR_PREFIX']} at ({reference_easting:.1f}, {reference_northing:.1f})")
print(f"  Relative offset: ({source_easting_relative:+.1f}, {source_northing_relative:+.1f}) m")
print(f"  Absolute coordinates (Swiss LV03/95): ({source_easting:.1f}, {source_northing:.1f}) m")

# Calculate distance and direction from source to each well
wells_gdf['distance_to_source'] = np.sqrt(
    (wells_gdf.geometry.x - source_easting)**2 + 
    (wells_gdf.geometry.y - source_northing)**2
)

# Calculate bearing from source to well (degrees from North)
wells_gdf['bearing_from_source'] = np.degrees(
    np.arctan2(wells_gdf.geometry.x - source_easting, 
               wells_gdf.geometry.y - source_northing)
)

print(f"\nWell positions relative to source:")
for idx, row in wells_gdf.iterrows():
    print(f"  {row['GWR_PREFIX']} ({row['FASSART']}): "
          f"{row['distance_to_source']:.1f} m at {row['bearing_from_source']:.1f}° from N")

# Identify closest well to source
closest_well_idx = wells_gdf['distance_to_source'].idxmin()
closest_well = wells_gdf.loc[closest_well_idx]
print(f"\nClosest well to source: {closest_well['GWR_PREFIX']} "
      f"({closest_well['distance_to_source']:.1f} m away)")

# Create source point geometry for plotting
source_point = gpd.GeoDataFrame(
    {'id': ['source'], 'type': ['contamination_source']},
    geometry=[Point(source_easting, source_northing)],
    crs=wells_gdf.crs
)

### Visualize Well Field

Plot well locations on model grid to understand spatial arrangement.

In [None]:
# Visualize wells and source on parent model
case_utils.plot_wells_on_model(m_parent_base, modelgrid=parent_modelgrid, wells_gdf=wells_gdf, 
                               concession_id=concession_id, source_point=source_point)

---
## 6. Define Transport Submodel Domain

### Estimate Plume Travel Distance

Before defining the submodel domain, estimate how far the plume might travel in 2 years:

**Travel distance** = velocity × time = (K·i/n) × t

Where:
- K = hydraulic conductivity (m/day) - from parent model
- i = hydraulic gradient (m/m) - from parent model heads
- n = effective porosity - from config (typically 0.25)
- t = simulation time (2 years = 730 days)

**Rule of thumb**: Buffer should be at least 1.5× to 2× estimated travel distance to ensure plume stays within domain.

In [None]:
# Extract hydraulic conductivity from parent model
lpf = m_parent_base.get_package('LPF')
if lpf is None:
    upw = m_parent_base.get_package('UPW')
    if upw is None:
        raise ValueError("No LPF or UPW package found in parent model")
    hk = upw.hk.array  # Hydraulic conductivity (m/day)
else:
    hk = lpf.hk.array  # Hydraulic conductivity (m/day)

# Get average K in layer 0 near the source location
# Use the source location to find nearby cells
layer = 0
source_row, source_col = parent_modelgrid.intersect(source_easting, source_northing)[:2]

# Sample K in a 3x3 neighborhood around source
row_start = max(0, source_row - 1)
row_end = min(hk.shape[1], source_row + 2)
col_start = max(0, source_col - 1)
col_end = min(hk.shape[2], source_col + 2)

k_local = hk[layer, row_start:row_end, col_start:col_end]
k_avg = np.mean(k_local[k_local > 0])  # Average non-zero K values

print(f"\nHydraulic Conductivity near source:")
print(f"  Layer {layer}, Row {source_row}, Col {source_col}")
print(f"  Local K range: {np.min(k_local[k_local > 0]):.2f} to {np.max(k_local):.2f} m/day")
print(f"  Average K: {k_avg:.2f} m/day")

# Calculate hydraulic gradient from head distribution
# Note: 'heads' is already 2D (layer 0 extracted in previous cell)
# Sample gradient along flow direction near source
dx = parent_modelgrid.delr[source_col]  # Cell size in x-direction
dy = parent_modelgrid.delc[source_row]  # Cell size in y-direction

# Calculate gradient using central differences
if source_col > 0 and source_col < heads.shape[1] - 1:
    dh_dx = (heads[source_row, source_col + 1] - 
             heads[source_row, source_col - 1]) / (2 * dx)
else:
    dh_dx = 0.0

if source_row > 0 and source_row < heads.shape[0] - 1:
    dh_dy = (heads[source_row + 1, source_col] - 
             heads[source_row - 1, source_col]) / (2 * dy)
else:
    dh_dy = 0.0

# Hydraulic gradient magnitude
gradient = np.sqrt(dh_dx**2 + dh_dy**2)

print(f"\nHydraulic Gradient near source:")
print(f"  dh/dx: {dh_dx:.6f}")
print(f"  dh/dy: {dh_dy:.6f}")
print(f"  Gradient magnitude: {gradient:.6f}")

# Get porosity from transport configuration
porosity = scenario_config['transport']['porosity']
print(f"\nEffective porosity: {porosity}")

# Calculate seepage velocity (Darcy velocity / porosity)
darcy_velocity = k_avg * gradient  # m/day
seepage_velocity = darcy_velocity / porosity  # m/day

print(f"\nVelocity calculations:")
print(f"  Darcy velocity (q): {darcy_velocity:.4f} m/day")
print(f"  Seepage velocity (v): {seepage_velocity:.4f} m/day ({seepage_velocity * 365:.1f} m/year)")

# Estimate 2-year travel distance
simulation_time_days = 730  # 2 years
travel_distance = seepage_velocity * simulation_time_days  # meters

print(f"\n2-year plume travel distance estimate:")
print(f"  Distance = velocity × time")
print(f"  Distance = {seepage_velocity:.4f} m/day × {simulation_time_days} days")
print(f"  Distance ≈ {travel_distance:.0f} m")

# Suggest buffer distances for submodel
# Rule of thumb: 1.5× to 2× travel distance
buffer_min = 1.5 * travel_distance
buffer_max = 2.0 * travel_distance

print(f"\nRecommended submodel buffer:")
print(f"  Minimum: {buffer_min:.0f} m (1.5× travel distance)")
print(f"  Maximum: {buffer_max:.0f} m (2.0× travel distance)")
print(f"  Suggested: {int(np.ceil(buffer_min / 100) * 100)} m (rounded up to nearest 100 m)")

# Check against configured buffer
config_buffer = scenario_config['submodel']['buffer_north_m']
print(f"\nConfigured buffer in case_config_transport.yaml: {config_buffer} m")
if config_buffer < buffer_min:
    print(f"  ⚠️  WARNING: Configured buffer may be insufficient!")
    print(f"  Consider increasing to at least {int(np.ceil(buffer_min / 100) * 100)} m")
elif config_buffer > buffer_max:
    print(f"  ✓ Buffer is conservative (larger than needed)")
else:
    print(f"  ✓ Buffer is adequate for 2-year simulation")

### Define Submodel Extent

Define the submodel domain boundaries. The domain should:
1. Include the contaminant source location
2. Include all wells from the well field
3. Have sufficient buffer for 2-year plume migration
4. Avoid placing boundaries where steep gradients are expected
5. Align with parent grid cells for easier boundary extraction

**Typical domain**: 600×600 m around source and wells (manageable grid: ~120×120 cells with 5m spacing)

In [None]:
# Define submodel parameters
sub_cell_size = scenario_config['submodel']['cell_size_m']  # from config
parent_cell_size = parent_modelgrid.delr[0]  # parent model cell size

print(f"Parent model cell size: {parent_cell_size} m")
print(f"Submodel cell size: {sub_cell_size} m") 
print(f"Refinement ratio: {parent_cell_size/sub_cell_size}×")

# Get buffer distances from configuration
buffer_north_m = scenario_config['submodel']['buffer_north_m']
buffer_south_m = scenario_config['submodel']['buffer_south_m']
buffer_east_m = scenario_config['submodel']['buffer_east_m']
buffer_west_m = scenario_config['submodel']['buffer_west_m']

print(f"\nBuffer distances from config:")
print(f"  North: {buffer_north_m} m")
print(f"  South: {buffer_south_m} m")
print(f"  East: {buffer_east_m} m")
print(f"  West: {buffer_west_m} m")

# Combine wells and source for extent calculation
# Include source location in the extent calculation
all_points_x = list(wells_gdf.geometry.x) + [source_easting]
all_points_y = list(wells_gdf.geometry.y) + [source_northing]

points_x_min = min(all_points_x)
points_x_max = max(all_points_x)
points_y_min = min(all_points_y)
points_y_max = max(all_points_y)

print(f"\nWells + Source extent:")
print(f"  X: [{points_x_min:.1f}, {points_x_max:.1f}] (span: {points_x_max - points_x_min:.1f} m)")
print(f"  Y: [{points_y_min:.1f}, {points_y_max:.1f}] (span: {points_y_max - points_y_min:.1f} m)")

# Calculate submodel bounds in real-world coordinates (Swiss coordinate system)
submodel_xmin = points_x_min - buffer_west_m
submodel_xmax = points_x_max + buffer_east_m  
submodel_ymin = points_y_min - buffer_south_m
submodel_ymax = points_y_max + buffer_north_m

print(f"\nSubmodel extent with buffers (real-world coordinates):")
print(f"  X: [{submodel_xmin:.1f}, {submodel_xmax:.1f}] (span: {submodel_xmax - submodel_xmin:.1f} m)")
print(f"  Y: [{submodel_ymin:.1f}, {submodel_ymax:.1f}] (span: {submodel_ymax - submodel_ymin:.1f} m)")

# Calculate expected grid dimensions
expected_ncol = int(np.ceil((submodel_xmax - submodel_xmin) / sub_cell_size))
expected_nrow = int(np.ceil((submodel_ymax - submodel_ymin) / sub_cell_size))

print(f"\nExpected submodel grid dimensions:")
print(f"  {expected_nrow} rows × {expected_ncol} cols")
print(f"  Total cells: {expected_nrow * expected_ncol:,}")

# Get parent model grid parameters for coordinate transformation
parent_xll = parent_modelgrid.xoffset
parent_yll = parent_modelgrid.yoffset
parent_rotation = parent_modelgrid.angrot

print(f"\nParent model grid parameters:")
print(f"  Origin (xll, yll): ({parent_xll:.1f}, {parent_yll:.1f})")
print(f"  Rotation angle: {parent_rotation} degrees")

# Convert wells and source from real-world to local model coordinates
# This removes the rotation and offset, putting points in the model's local coordinate system
points_local_coords = []

# Add wells
for idx, well in wells_gdf.iterrows():
    local_x, local_y = parent_modelgrid.get_local_coords(well.geometry.x, well.geometry.y)
    points_local_coords.append((local_x, local_y))
    print(f"Well {well.get('GWR_PREFIX', idx)}: ({well.geometry.x:.1f}, {well.geometry.y:.1f}) -> local ({local_x:.1f}, {local_y:.1f})")

# Add source
source_local_x, source_local_y = parent_modelgrid.get_local_coords(source_easting, source_northing)
points_local_coords.append((source_local_x, source_local_y))
print(f"Source: ({source_easting:.1f}, {source_northing:.1f}) -> local ({source_local_x:.1f}, {source_local_y:.1f})")

# Calculate submodel bounds in local coordinates
points_local_x = [coord[0] for coord in points_local_coords]
points_local_y = [coord[1] for coord in points_local_coords]

local_points_x_min = min(points_local_x)
local_points_x_max = max(points_local_x)
local_points_y_min = min(points_local_y)  
local_points_y_max = max(points_local_y)

print(f"\nWells + Source extent in local coordinates:")
print(f"  X: [{local_points_x_min:.1f}, {local_points_x_max:.1f}]")
print(f"  Y: [{local_points_y_min:.1f}, {local_points_y_max:.1f}]")

# Add buffers in local coordinates
submodel_local_xmin = local_points_x_min - buffer_west_m
submodel_local_xmax = local_points_x_max + buffer_east_m
submodel_local_ymin = local_points_y_min - buffer_south_m  
submodel_local_ymax = local_points_y_max + buffer_north_m

print(f"\nSubmodel bounds in local coordinates (with buffers):")
print(f"  X: [{submodel_local_xmin:.1f}, {submodel_local_xmax:.1f}]")
print(f"  Y: [{submodel_local_ymin:.1f}, {submodel_local_ymax:.1f}]")

# Convert submodel boundary back to real-world coordinates
# Create corner points in local coordinates
local_corners = [
    (submodel_local_xmin, submodel_local_ymin),  # SW
    (submodel_local_xmax, submodel_local_ymin),  # SE  
    (submodel_local_xmax, submodel_local_ymax),  # NE
    (submodel_local_xmin, submodel_local_ymax),  # NW
]

# Transform back to real-world coordinates
real_world_corners = []
for local_x, local_y in local_corners:
    real_x, real_y = parent_modelgrid.get_coords(local_x, local_y)
    real_world_corners.append((real_x, real_y))
    
print(f"\nSubmodel corners in real-world coordinates:")
for i, (x, y) in enumerate(real_world_corners):
    corners = ['SW', 'SE', 'NE', 'NW']
    print(f"  {corners[i]}: ({x:.1f}, {y:.1f})")

# Create the properly aligned submodel boundary polygon
real_world_boundary_coords = real_world_corners + [real_world_corners[0]]
submodel_boundary_poly = Polygon(real_world_boundary_coords)

# Clip the submodel boundary to the parent model boundary
parent_model_boundary_file = download_named_file(
    name='model_boundary',
    data_type='gis'
)
parent_model_boundary = gpd.read_file(parent_model_boundary_file)
clipped_submodel_boundary = submodel_boundary_poly.intersection(parent_model_boundary.geometry[0])

# Create GeoDataFrame for the aligned submodel boundary
submodel_boundary_gdf = gpd.GeoDataFrame(
    [{'geometry': clipped_submodel_boundary, 'name': 'transport_submodel_domain'}],
    crs=parent_modelgrid.crs
)

print(f"\nAligned submodel boundary created:")
print(f"  Area: {clipped_submodel_boundary.area / 1e6:.3f} km²")
print(f"  Perimeter: {clipped_submodel_boundary.length / 1e3:.2f} km")

### Visualize Submodel Domain

Plot the submodel extent on the parent model grid.

In [None]:
# Visualize the aligned submodel domain with source and wells
fig, ax = plt.subplots(figsize=(16, 14))

# Plot parent model
pmv = flopy.plot.PlotMapView(model=m_parent_base, modelgrid=parent_modelgrid, ax=ax)
pmv.plot_grid(color='lightgrey', alpha=0.3, linewidth=0.5)

# Plot ibound
pmv.plot_array(m_parent_base.bas6.ibound.array, alpha=0.3, cmap='RdYlBu', vmin=-1, vmax=1)

# Plot head contours for reference
contour_levels = np.linspace(np.nanmin(heads_masked), np.nanmax(heads_masked), 10)
cont = pmv.contour_array(heads_masked, levels=contour_levels, colors='blue', 
                        linewidths=1, linestyles='--', alpha=0.5)
ax.clabel(cont, inline=True, fontsize=8, fmt='%.0f m')

# Plot aligned submodel boundary
submodel_boundary_gdf.plot(ax=ax, facecolor='orange', alpha=0.2, edgecolor='red', 
                          linewidth=3, label='Transport Submodel Domain', zorder=3)

# Plot wells
wells_gdf.plot(ax=ax, color='blue', markersize=150, label='Wells', zorder=5,
               marker='o', edgecolors='white', linewidth=2)

# Plot source location
source_point.plot(ax=ax, color='red', markersize=300, label='Contamination Source', 
                 zorder=6, marker='*', edgecolors='darkred', linewidth=2)

# Add well labels
for idx, well in wells_gdf.iterrows():
    ax.annotate(well['GWR_PREFIX'], 
                xy=(well.geometry.x, well.geometry.y),
                xytext=(10, 10), textcoords='offset points',
                fontsize=9, color='blue', fontweight='bold',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.7))

# Add scale information text box
domain_info = (f"Domain: {submodel_xmax - submodel_xmin:.0f}m × {submodel_ymax - submodel_ymin:.0f}m\n"
               f"Grid: {expected_nrow}×{expected_ncol} cells ({sub_cell_size}m)\n"
               f"Total cells: {expected_nrow * expected_ncol:,}\n"
               f"2-year travel: ~{travel_distance:.0f}m")
ax.text(0.02, 0.02, domain_info, transform=ax.transAxes, fontsize=10,
        verticalalignment='bottom', bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.9))

# Create legend
legend_handles = [
    mpatches.Patch(facecolor='orange', alpha=0.3, edgecolor='red', linewidth=2, 
                   label='Transport Submodel Domain'),
    mlines.Line2D([], [], marker='*', color='red', markeredgecolor='darkred', 
                  markeredgewidth=2, markersize=15, linestyle='None', label='Contamination Source'),
    mlines.Line2D([], [], marker='o', color='blue', markeredgecolor='white', 
                  markeredgewidth=2, markersize=10, linestyle='None', label='Wells')
]

ax.set_title(f'Transport Submodel Domain - Group {group_number}\n'
             f'Grid-Aligned Domain for 2-Year Plume Migration Analysis',
             fontsize=14, fontweight='bold')
ax.set_xlabel('X Coordinate (m)', fontsize=12)
ax.set_ylabel('Y Coordinate (m)', fontsize=12)
ax.legend(handles=legend_handles, loc='upper right', fontsize=11)
ax.set_aspect('equal')

plt.tight_layout()
plt.show()

# Print summary
print("\n" + "="*60)
print("SUBMODEL DOMAIN SUMMARY")
print("="*60)
print(f"Purpose: 2-year TCE transport simulation")
print(f"Domain size: {submodel_xmax - submodel_xmin:.0f} × {submodel_ymax - submodel_ymin:.0f} m")
print(f"Grid: {expected_nrow} rows × {expected_ncol} cols at {sub_cell_size}m resolution")
print(f"Total cells: {expected_nrow * expected_ncol:,}")
print(f"Estimated 2-year plume travel: ~{travel_distance:.0f} m")
print(f"Buffer adequacy: {'✓ Adequate' if buffer_north_m >= buffer_min else '⚠ May be insufficient'}")
print("="*60)

---
## 7. Create Telescope Submodel for Flow

### Why Telescope?

Transport modeling requires finer grid resolution than flow modeling because:
- Need to resolve source area (sharp concentration gradients)
- Peclet number constraint: Δx ≤ 2·αL (for αL=10m → Δx ≤ 20m)
- Parent model cells (~50-50m) are too coarse for accurate transport

The telescope approach:
1. Uses coarse parent model for regional flow (efficient)
2. Creates refined submodel only where needed (5m cells)
3. Extracts boundary conditions from parent model
4. Runs transport on refined grid (accurate)

We'll apply the exact same grid generation workflow from notebook 4, using our submodel boundary polygon as the domain definition. This ensures consistency with the established methodology.

### Create Refined Grid

Generate a refined grid for the submodel domain with 5m cell spacing.

#### Rotate the submodel boundary polygon for regular grid alignment

In [None]:
# Buffer the model boundary gdf
submodel_boundary_gdf['geometry'] = submodel_boundary_gdf['geometry'].buffer(10)

# Define the rotation angle in degrees
grid_rotation_angle = 30  # degrees, identified by trial and error, you can adjust this angle to minimize the number of cells outside the boundary
origin_rotation = Point(0, 0)  # Origin for rotation, can be adjusted as needed
# Rotate the model boundary polygon
submodel_boundary_gdf_rotated = submodel_boundary_gdf.copy()

submodel_boundary_gdf_rotated['geometry'] = submodel_boundary_gdf_rotated['geometry'].apply(
    lambda geom: rotate(geom, grid_rotation_angle, origin=origin_rotation)
)
# Get the bounding box of the rotated geometry
xmin_rotated, ymin_rotated, xmax_rotated, ymax_rotated = submodel_boundary_gdf_rotated.total_bounds
# Plot the rotated boundary to verify
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
submodel_boundary_gdf_rotated.plot(ax=ax, facecolor='none', edgecolor='blue', linewidth=2)
ax.set_title("Figure 2: Rotated Model Boundary.")
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
plt.show()

#### Create Sub-model Grid 

In [None]:
# --- 2. Creation of a new Model Grid based on the rotated Model Boundary ---
# We now have new bounding box coordinates for the rotated model boundary. 
# These we need to rotate back to the original coordinate system to create a
# regular grid that fits the rotated boundary.
# We use the rotated bounding box to define the grid dimensions.
# Calculate the new grid dimensions based on the rotated bounding box
width_rotated = xmax_rotated - xmin_rotated
height_rotated = ymax_rotated - ymin_rotated

# Calculate the number of rows and columns based on the rotated bounding box
ncol_rotated = int(np.ceil(width_rotated / sub_cell_size)) - 1 # Based on visual inspection of rotated grid.
nrow_rotated = int(np.ceil(height_rotated / sub_cell_size))

# Compare number of rows and columns with the original grid
print(f"Rotated Grid: {ncol_rotated} columns, {nrow_rotated} rows")

# Define the delr and delc for the rotated grid
delr_rotated = np.full(ncol_rotated, sub_cell_size)
delc_rotated = np.full(nrow_rotated, sub_cell_size)
nlay = parent_modelgrid.nlay

# Plot the rotated grid and the rotated boundary to verify
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
# Create a new StructuredGrid with the rotated dimensions
rotated_grid = StructuredGrid(
    delr=delr_rotated,
    delc=delc_rotated,
    top=np.ones((nrow_rotated, ncol_rotated)) * 100,  # Example top elevation
    botm=np.ones((nlay, nrow_rotated, ncol_rotated)) * 50,  # Example bottom elevation
    xoff=xmin_rotated,  # Use the lower-left of the rotated extent
    yoff=ymin_rotated,  # Use the lower-left of the rotated extent
    angrot=0,  # We are currently in the rotated coordinate system, so no additional rotation is needed
    lenuni=2,  # Length unit code: 2 for meters
    crs=submodel_boundary_gdf_rotated.crs.to_string()  # Automatically get CRS from geopackage
)
pmv = flopy.plot.PlotMapView(modelgrid=rotated_grid, ax=ax)
pc = pmv.plot_array(rotated_grid.top, alpha=0.5, cmap='terrain')
pmv.plot_grid()
ax.set_aspect('equal', adjustable='box') # Ensure correct aspect ratio
ax.set_title("Figure 3: Rotated FloPy Grid with Rotated Boundary.")

#### Rotate the new submodel grid to align with parent grid rotation

In [None]:
# --- 3. Rotation of the new Model Grid in the CH Coordinate System ---
# Now we need to rotate the lower-left corner of the rotated grid back to the 
# original coordinate system.
# The lower-left corner of the rotated bounding box
# Create points from the rotated bounding box coordinates
min_point_rotated = Point(xmin_rotated, ymin_rotated)
max_point_rotated = Point(xmax_rotated, ymax_rotated)

# Apply inverse rotation (negative angle) around the same origin
min_point_original = rotate(min_point_rotated, -grid_rotation_angle, 
                            origin=origin_rotation)
max_point_original = rotate(max_point_rotated, -grid_rotation_angle, 
                            origin=origin_rotation)

# Extract the coordinates
xmin_original = min_point_original.x
ymin_original = min_point_original.y
xmax_original = max_point_original.x
ymax_original = max_point_original.y

print(f"Original coordinates after inverse rotation:")
print(f"xmin: {xmin_original:.2f}, ymin: {ymin_original:.2f}")
print(f"xmax: {xmax_original:.2f}, ymax: {ymax_original:.2f}")

xll = xmin_original
yll = ymin_original

print(f"Corrected grid lower-left corner:")
print(f"xll = {xll:.2f}")
print(f"yll = {yll:.2f}")
print(f"Number of cells in the rotated grid: {nrow_rotated * ncol_rotated * nlay}")

# Create the FloPy structured grid with the rotated bounding box
sub_modelgrid = StructuredGrid(
    delr=delr_rotated,
    delc=delc_rotated,
    xoff=xmin_original,  # Use the lower-left of the rotated extent
    yoff=ymin_original,  # Use the lower-left of the rotated extent
    angrot=-grid_rotation_angle,  # Apply the desired rotation to the grid
    lenuni=2,  # Length unit code: 2 for meters
    crs=submodel_boundary_gdf.crs.to_string()  # Automatically get CRS from geopackage
)

# Update grid polygons, tag active cells (≥50% inside), and get IBOUND
grid_gdf, ibound = grid_utils.build_grid_gdf_and_ibound(
    modelgrid=sub_modelgrid,
    boundary_gdf=submodel_boundary_gdf,        # your boundary GeoDataFrame
    frac_threshold=0.5,      # change if needed
    nlay=nlay                 # use your model's nlay
)
# Count the number of active cells
active_cells = ibound[ibound > 0].sum()
print(f"Total number of active cells in the grid: {active_cells}")

print("Model grid created with the following parameters:")
print(sub_modelgrid)

# Plot the rotated grid and the model_boundary to check alignment
fig, ax = plt.subplots(1, 1, figsize=(12, 12))
pmv = flopy.plot.PlotMapView(modelgrid=sub_modelgrid, ax=ax)
pmv.plot_grid() 
submodel_boundary_gdf.plot(ax=ax, facecolor='none', edgecolor='red', linewidth=2)
red_line = mlines.Line2D([], [], color='red', linewidth=2, label='Model Boundary')
ax.legend(handles=[red_line], loc='upper right')
ax.set_title("Figure 4: Correctly Rotated Grid with Model Boundaries")
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
plt.show()
ax.set_aspect('equal', adjustable='box') # Ensure correct aspect ratio

### Interpolate Aquifer Properties

Interpolate hydraulic conductivity, layer elevations, and other properties from parent to refined grid.


In [None]:
# TODO: Check the property fields interpolated from parent to submodel below:
# - Hydraulic conductivity (K)
# - Layer top and bottom elevations
# - Specific storage (if transient)
# Visualize interpolated K field

#### Model Top

In [None]:
# Load and resample DEM to submodel grid (following notebook 4)
dem_path = download_named_file('dem_hres', data_type='gis')
rio = flopy.utils.Raster.load(dem_path)

print(f"DEM loaded:")
print(f"  CRS: {rio.crs}")
print(f"  Bounds: {rio.bounds}")

# Resample DEM to submodel grid
print("Resampling DEM to submodel grid...")
import time
t0 = time.time()
submodel_top = rio.resample_to_grid(sub_modelgrid, band=rio.bands[0], method="nearest")
resample_time = time.time() - t0

# Clean up the resampled data
submodel_top = np.round(submodel_top, 1)  # Round to 10 cm
valid = np.isfinite(submodel_top) & (submodel_top > 0)

if not np.any(valid):
    raise RuntimeError("No valid DEM data found in submodel area")

print(f"DEM resampling completed in {resample_time:.2f} seconds")
print(f"  Elevation range: {submodel_top[valid].min():.1f} to {submodel_top[valid].max():.1f} m")

# Plot the submodel_top on the submodel grid
fig, ax = plt.subplots(figsize=(10, 10))
pmv = flopy.plot.PlotMapView(modelgrid=sub_modelgrid, ax=ax)
im = pmv.plot_array(submodel_top, cmap='terrain', vmin=np.nanmin(submodel_top), vmax=np.nanmax(submodel_top))
pmv.plot_grid(color='grey', alpha=0.2)
cbar = plt.colorbar(im, ax=ax, shrink=0.5)
cbar.set_label('Elevation (m a.s.l.)')
ax.set_title('Resampled DEM on Submodel Grid')
ax.set_aspect('equal', adjustable='box') # Ensure correct aspect ratio
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
plt.show()

#### Model Bottom

In [None]:
# Define submodel bottom based on groundwater levels and aquifer thickness
# Load groundwater levels from file & interpolate to submodel grid
isolines = download_named_file('groundwater_map_norm', data_type='gis')
gdf_isolines = gpd.read_file(isolines, layer='GS_GW_ISOHYPSE_MW_L')
gw_elevations = grid_utils.interpolate_isohypses_to_grid(gdf_isolines, sub_modelgrid)

'''# Optional, plot gw_elevations for verification
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
im = ax.imshow(gw_elevations, extent=sub_modelgrid.extent, origin='upper', cmap='Blues')
ax.set_title("Figure 5: Interpolated Groundwater Elevations on Submodel Grid")
ax.set_xlabel("X-coordinate")
ax.set_ylabel("Y-coordinate")
plt.colorbar(im, ax=ax, label="Groundwater Elevation (m a.s.l.)")
ax.set_aspect('equal', adjustable='box')
plt.show()'''

# Load groundwater thickness from file, requires 4_model_implementation.ipynb to have been run once first
workspace = os.path.join(get_default_data_folder(), 'limmat_valley_model')
thickness_path = os.path.join(workspace, 'aquifer_thickness_contours.gpkg')
aquifer_thickness_gdf = gpd.read_file(thickness_path, layer='aquifer_thickness_contours')
# Interpolate aquifer thickness to submodel grid
aquifer_thickness_resampled = grid_utils.interpolate_aquifer_thickness_to_grid_with_contour_densification(
    contour_gdf=aquifer_thickness_gdf,
    modelgrid=sub_modelgrid,
    thickness_column='aquifer_thickness',
    contour_interval=2.0,  # Create intermediate contours every 2m
    plot_intermediate=False,  # Show the contour densification step
    plot_points=False,  # Set to True if you want to see final interpolation points
    buffer_distance=300
)
# Smooth the resampled aquifer thickness to remove small-scale noise
from scipy.ndimage import gaussian_filter
aquifer_thickness_resampled = gaussian_filter(aquifer_thickness_resampled, sigma=4)

'''# Plot aquifer thickness for verification
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
im = ax.imshow(aquifer_thickness_resampled, extent=sub_modelgrid.extent, origin='upper', cmap='YlOrBr')
ax.set_title("Figure 6: Interpolated Aquifer Thickness on Submodel Grid")
ax.set_xlabel("X-coordinate")
ax.set_ylabel("Y-coordinate")
plt.colorbar(im, ax=ax, label="Aquifer Thickness (m)")
ax.set_aspect('equal', adjustable='box')
plt.show()'''

# Calculate bottom elevation
submodel_bottom = gw_elevations - aquifer_thickness_resampled

# Ensure bottom is 3D array format
if submodel_bottom.ndim == 2:
    submodel_bottom = submodel_bottom[np.newaxis, :, :]

print(f"Submodel bottom calculated:")
print(f"  Bottom range: {submodel_bottom[0][valid].min():.1f} to {submodel_bottom[0][valid].max():.1f} m")

# Define the delr and delc for the submodel grid
delr = np.full(sub_modelgrid.ncol, sub_cell_size)
delc = np.full(sub_modelgrid.nrow, sub_cell_size)

# Update the submodel grid with real elevations
submodel_grid = StructuredGrid(
    delr=delr,
    delc=delc,
    top=submodel_top,
    botm=submodel_bottom,
    nlay=nlay,
    xoff=xll,
    yoff=yll,
    angrot=-grid_rotation_angle
)

print("Submodel grid updated with DEM elevations")

# Plot the final submodel grid with bottom elevations
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
pmv = flopy.plot.PlotMapView(modelgrid=submodel_grid, ax=ax)
im = pmv.plot_array(submodel_grid.botm, cmap='terrain', vmin=np.nanmin(submodel_grid.botm), vmax=np.nanmax(submodel_grid.botm))
pmv.plot_grid(color='grey', alpha=0.2)
cbar = plt.colorbar(im, ax=ax, shrink=0.5)
cbar.set_label('Elevation (m a.s.l.)')
ax.set_title('Submodel Bottom Elevations')
ax.set_aspect('equal', adjustable='box') # Ensure correct aspect ratio
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
plt.show()

#### Sub-model DIS package

In [None]:
# Define working directory for submodel
# Create a dedicated output workspace for the transport submodel
transport_base_ws = os.path.expanduser(cfg['output']['workspace'])
# Add group number to path and add sub-directory for the sub_base model
sub_base_ws = transport_base_ws + str(group_number) + "/sub_base"
case_utils.ensure_dir(sub_base_ws)

print(f"Transport submodel workspace: {sub_base_ws}")

# Create the sub_base model
m_sub_base = flopy.modflow.Modflow(
    parent_base_namefile.replace('.nam', ''), 
    model_ws=sub_base_ws,
    version='mfnwt',
    exe_name='mfnwt.exe'  # Ensure the executable is correctly specified
)

# ⚠️ IMPORTANT: Set temporal discretization to match transport model (2 stress periods)
# The flow model needs the same number of stress periods as the transport model
# for MT3D-USGS to read flow data correctly for each transport period.
# 
# Both periods are steady-state with identical conditions since flow is constant.
# Period 1: Source active (30 days)
# Period 2: Source off, monitoring (700+ days)

# Get transport simulation time from config
simulation_time_days = scenario_config['simulation']['duration_days']
source_time_days = scenario_config['source']['duration_days']

print(f"Configuring MODFLOW temporal discretization to match transport model:")
print(f"  Period 1 (source active): {source_time_days} days")
print(f"  Period 2 (source off): {simulation_time_days} days")
print(f"  Total simulation time: {simulation_time_days} days")

# Define temporal discretization for submodel (2 periods to match transport)
nper = 2
perlen = [source_time_days, simulation_time_days]  # Days
nstp = [1, 1]  # 1 timestep per period (steady-state flow)
tsmult = [1.0, 1.0]  # No time step multiplier needed for steady-state
steady = [True, True]  # Both periods are steady-state (flow doesn't change)

sub_base_dis = flopy.modflow.ModflowDis(
    model=m_sub_base,
    model_ws=sub_base_ws,
    nlay=nlay,
    nrow=sub_modelgrid.nrow,
    ncol=sub_modelgrid.ncol,
    delr=delr,
    delc=delc,
    xul=xll,
    yul=yll + (sub_modelgrid.nrow * sub_cell_size),  # Upper-left y-coordinate
    angrot=-grid_rotation_angle,
    crs=sub_modelgrid.crs.to_string(),
    top=submodel_top,
    botm=submodel_bottom,
    nper=nper,
    perlen=perlen,
    nstp=nstp,
    tsmult=tsmult,
    steady=steady,
    itmuni=4,  # Time unit: days
    lenuni=2,  # Length unit: meters
)

print(f"✓ DIS package created with {nper} stress periods matching transport model")

# Plot sub_base_grid for verification
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
pmv = flopy.plot.PlotMapView(model=m_sub_base, ax=ax)
pmv.plot_grid(color='grey', alpha=0.2)
ax.set_title("Submodel Grid Verification")
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
ax.set_aspect('equal', adjustable='box') # Ensure correct aspect ratio
plt.show()

#### UPW package

In [None]:
# Get the UPW package from parent model
parent_upw = m_parent_base.upw
parent_ibound = m_parent_base.bas6.ibound.array  # Get parent model IBOUND

# Extract hydraulic conductivity arrays from parent model
parent_hk = parent_upw.hk.array  # Horizontal hydraulic conductivity
parent_vka = parent_upw.vka.array  # Vertical hydraulic conductivity (or anisotropy ratio)
parent_sy = parent_upw.sy.array  # Specific yield
parent_ss = parent_upw.ss.array  # Specific storage

print(f"Parent model aquifer parameters:")
print(f"  HK range: {parent_hk.min():.2f} to {parent_hk.max():.2f} m/day")
print(f"  VKA range: {parent_vka.min():.6f} to {parent_vka.max():.6f}")# Get the UPW package from parent model
parent_upw = m_parent_base.upw
parent_ibound = m_parent_base.bas6.ibound.array  # Get parent model IBOUND

# Extract hydraulic conductivity arrays from parent model
parent_hk = parent_upw.hk.array  # Horizontal hydraulic conductivity
parent_vka = parent_upw.vka.array  # Vertical hydraulic conductivity (or anisotropy ratio)

print(f"Parent model aquifer parameters:")
print(f"  HK range: {parent_hk.min():.2f} to {parent_hk.max():.2f} m/day")
print(f"  VKA range: {parent_vka.min():.6f} to {parent_vka.max():.6f}")

# Check parent model active cells
if parent_ibound.ndim == 3:
    parent_active_cells = (parent_ibound[0, :, :] == 1).sum()
    active_mask_parent = parent_ibound[0, :, :] == 1
    print(f"  Active cells in parent model: {parent_active_cells:,}")
else:
    parent_active_cells = (parent_ibound == 1).sum()
    active_mask_parent = parent_ibound == 1
    print(f"  Active cells in parent model: {parent_active_cells:,}")

# Use representative uniform values from active parent cells instead of interpolation
print("\nUsing uniform parameter values from active parent model cells...")
print(f"Submodel grid shape: {sub_modelgrid.nrow} x {sub_modelgrid.ncol}")

# Extract statistics from active cells only (exclude zeros and inactive cells)
active_hk = parent_hk[0][active_mask_parent]
active_vka = parent_vka[0][active_mask_parent]

# Filter out zeros and get representative values
valid_hk = active_hk[active_hk > 0]
valid_vka = active_vka[active_vka > 0]

# Use median values for uniform parameters (more robust than mean)
uniform_hk = np.median(valid_hk) if len(valid_hk) > 0 else 20.0  # Default for gravel aquifer
uniform_vka = np.median(valid_vka) if len(valid_vka) > 0 else 2.0  # Default VKA

print(f"\nRepresentative uniform values from active cells:")
print(f"  Uniform HK: {uniform_hk:.2f} m/day (from {len(valid_hk)} active cells)")
print(f"  Uniform VKA: {uniform_vka:.6f} (from {len(valid_vka)} active cells)")

# Create uniform parameter arrays for submodel
sub_hk = np.full((nlay, sub_modelgrid.nrow, sub_modelgrid.ncol), uniform_hk)
sub_vka = np.full((nlay, sub_modelgrid.nrow, sub_modelgrid.ncol), uniform_vka)

# Ensure all values are physically realistic (positive)
sub_hk = np.maximum(sub_hk, 0.1)  # Minimum 0.1 m/day
sub_vka = np.maximum(sub_vka, 0.001)  # Minimum VKA

print(f"\nSubmodel aquifer parameters (uniform values, all >0):")
print(f"  HK: {sub_hk.min():.2f} to {sub_hk.max():.2f} m/day (uniform)")
print(f"  VKA: {sub_vka.min():.6f} to {sub_vka.max():.6f} (uniform)")

# Verify array dimensions match submodel grid
print(f"\nArray dimension verification:")
print(f"  sub_hk shape: {sub_hk.shape}")
print(f"  Expected shape: ({nlay}, {sub_modelgrid.nrow}, {sub_modelgrid.ncol})")

# Create arrays with submodel dimensions for parent parameters
# For parameters that don't need interpolation, create uniform arrays
sub_laytyp = np.ones(nlay, dtype=int) * parent_upw.laytyp.array[0]  # Use first layer value

# Extract scalar value from parent hani array - it might be 2D or 3D
if parent_upw.hani.array.ndim == 3:
    hani_value = parent_upw.hani.array[0, 0, 0]  # 3D array: [layer, row, col]
elif parent_upw.hani.array.ndim == 2:
    hani_value = parent_upw.hani.array[0, 0]     # 2D array: [row, col]
else:
    hani_value = parent_upw.hani.array[0]        # 1D array: [layer]

# Ensure hani_value is reasonable (>0)
if hani_value <= 0:
    hani_value = 1.0  # Default isotropic
    print(f"  Warning: Parent HANI ≤0, using default value: {hani_value}")

print(f"  hani_value used: {hani_value}")

# Get specific yield and storage if needed (not used in UPW but used for transport)
sub_sy = np.full((nlay, sub_modelgrid.nrow, sub_modelgrid.ncol), np.mean(parent_upw.sy.array[0]))
sub_ss = np.full((nlay, sub_modelgrid.nrow, sub_modelgrid.ncol), np.mean(parent_upw.ss.array[0]))

# Get porosity from transport model config
porosity_value = scenario_config['transport']['porosity']

if sub_sy.all() <= 0: 
    sub_sy.fill(porosity_value)
    # If we don't have porosity in the parent model, we can set it here
    print(f"\nUsing porosity value from transport config: {porosity_value}")

if sub_ss.all() <= 0:
    sub_ss.fill(1e-5)  # Typical value for unconfined aquifer
    print(f"Using default specific storage value: 1e-5")

# Verify specific yield and storage values
print(f"  SY: {sub_sy.min():.6f} to {sub_sy.max():.6f} (uniform)")
print(f"  SS: {sub_ss.min():.6e} to {sub_ss.max():.6e} (uniform)")

# Create UPW package for submodel with uniform arrays
sub_base_upw = flopy.modflow.ModflowUpw(
    m_sub_base,
    laytyp=sub_laytyp,
    hk=sub_hk,
    hani=hani_value,  # Use scalar value
    vka=sub_vka,
    sy=sub_sy,
    ss=sub_ss,
    ipakcb=53  # Save cell-by-cell budget
)

# Visualize uniform hydraulic conductivity
fig, axes = plt.subplots(1, 2, figsize=(16, 8))

# Plot HK
im1 = axes[0].imshow(sub_hk[0], extent=sub_modelgrid.extent, origin='upper', cmap='viridis')
axes[0].set_title('Hydraulic Conductivity (HK)\n[m/day]')
plt.colorbar(im1, ax=axes[0], shrink=0.7)

# Plot VKA
im2 = axes[1].imshow(sub_vka[0], extent=sub_modelgrid.extent, origin='upper', cmap='plasma')
axes[1].set_title('Vertical Hydraulic Conductivity (VKA)\n[m/day or ratio]')
plt.colorbar(im2, ax=axes[1], shrink=0.7)

# Format axes
for ax in axes.flat:
    ax.set_xlabel('X Coordinate (m)')
    ax.set_ylabel('Y Coordinate (m)')
    ax.set_aspect('equal')

plt.suptitle('Uniform Aquifer Parameters on Submodel Grid', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("✓ UPW package created with uniform parameters from parent model statistics")
print("✓ All hydraulic conductivity values are >0 and physically realistic")

### Extract Boundary Conditions

#### BAS package

Extract heads from parent model along submodel boundaries to use as boundary conditions.

In [None]:
# Extract parent model heads for boundary interpolation
print("Extracting parent model heads for boundary condition interpolation...")

# Load parent model heads
parent_hds_path = os.path.join(parent_workspace, f"{m_parent_base.name}.hds")
if not os.path.exists(parent_hds_path):
    print("Parent model heads file not found. Running parent model...")
    success, buff = m_parent_base.run_model(silent=True, report=True)
    if not success:
        raise RuntimeError("Parent model failed to run")
    print("✓ Parent model run completed")

# Load parent heads
headobj_parent = flopy.utils.HeadFile(parent_hds_path)
parent_heads = headobj_parent.get_data()[0]  # Layer 0, stress period 0

# Get parent model grid coordinates and active cells
parent_ibound = m_parent_base.bas6.ibound.array[0]  # Layer 0
parent_active_mask = parent_ibound == 1

# Extract coordinates of active parent cells
parent_x_centers = parent_modelgrid.xcellcenters[parent_active_mask]
parent_y_centers = parent_modelgrid.ycellcenters[parent_active_mask]
parent_heads_active = parent_heads[parent_active_mask]

# Filter out any invalid heads (NaN, inf, or unrealistic values)
valid_head_mask = (np.isfinite(parent_heads_active) & 
                   (parent_heads_active > 300) &  # Reasonable lower bound for elevation
                   (parent_heads_active < 600))   # Reasonable upper bound for elevation

parent_coords_valid = np.column_stack([
    parent_x_centers[valid_head_mask],
    parent_y_centers[valid_head_mask]
])
parent_heads_valid = parent_heads_active[valid_head_mask]

print(f"Parent model head extraction:")
print(f"  Total parent cells: {parent_heads.size:,}")
print(f"  Active parent cells: {np.sum(parent_active_mask):,}")
print(f"  Valid heads for interpolation: {len(parent_heads_valid):,}")
print(f"  Head range: {parent_heads_valid.min():.1f} to {parent_heads_valid.max():.1f} m")

# Create KDTree for efficient nearest neighbor search
from scipy.spatial import cKDTree
parent_tree = cKDTree(parent_coords_valid)

print("✓ Parent model heads prepared for boundary interpolation")

In [None]:
# Create complete CHD boundary
def create_complete_boundary_ibound(submodel_grid, boundary_polygon, boundary_thickness=1):
    """
    Create IBOUND array with complete CHD boundary around submodel domain.
    
    Parameters:
    -----------
    submodel_grid : flopy.discretization.StructuredGrid
        Submodel grid
    boundary_polygon : shapely.geometry.Polygon
        Clipped boundary polygon
    boundary_thickness : int
        Number of cell layers to mark as CHD from domain edge
    
    Returns:
    --------
    ibound : numpy.ndarray
        IBOUND array with CHD boundaries
    boundary_cells : list
        List of boundary cell information
    """
    from shapely.geometry import Point
    
    # Initialize IBOUND as all active cells
    ibound = np.ones((nlay, submodel_grid.nrow, submodel_grid.ncol), dtype=int)
    boundary_cells = []
    
    # Method 1: Mark cells outside or on boundary of clipped polygon as CHD
    for i in range(submodel_grid.nrow):
        for j in range(submodel_grid.ncol):
            # Get cell center coordinates
            x_center = submodel_grid.xcellcenters[i, j]
            y_center = submodel_grid.ycellcenters[i, j]
            cell_point = Point(x_center, y_center)
            
            # Check if cell center is outside boundary or very close to boundary
            distance_to_boundary = cell_point.distance(boundary_polygon.boundary)
            
            # Mark as CHD if:
            # 1. Cell is outside the boundary polygon, OR
            # 2. Cell is very close to boundary (within half cell size)
            if (not boundary_polygon.contains(cell_point) or 
                distance_to_boundary < sub_cell_size * 0.5):
                
                # Only mark as CHD if cell has valid coordinates (not completely outside model domain)
                if (submodel_grid.extent[0] <= x_center <= submodel_grid.extent[1] and
                    submodel_grid.extent[2] <= y_center <= submodel_grid.extent[3]):
                    
                    ibound[0, i, j] = -1  # CHD cell
                    boundary_cells.append({
                        'submodel_row': i,
                        'submodel_col': j,
                        'x': x_center,
                        'y': y_center,
                        'distance_to_boundary': distance_to_boundary
                    })
    
    # Method 2: Ensure edge cells are CHD (safety measure)
    edge_thickness = max(1, boundary_thickness)
    
    # Top and bottom edges
    ibound[:, :edge_thickness, :] = -1
    ibound[:, -edge_thickness:, :] = -1
    
    # Left and right edges  
    ibound[:, :, :edge_thickness] = -1
    ibound[:, :, -edge_thickness:] = -1
    
    # Add edge cells to boundary_cells list if not already included
    for i in range(submodel_grid.nrow):
        for j in range(submodel_grid.ncol):
            if ibound[0, i, j] == -1:
                x_center = submodel_grid.xcellcenters[i, j]
                y_center = submodel_grid.ycellcenters[i, j]
                
                # Check if this cell is already in boundary_cells
                cell_exists = any(
                    cell['submodel_row'] == i and cell['submodel_col'] == j 
                    for cell in boundary_cells
                )
                
                if not cell_exists:
                    boundary_cells.append({
                        'submodel_row': i,
                        'submodel_col': j,
                        'x': x_center,
                        'y': y_center,
                        'distance_to_boundary': 0.0  # Edge cell
                    })
    
    return ibound, boundary_cells

# Detect the complete boundary and create IBOUND
print("Creating complete CHD boundary around submodel domain...")
submodel_ibound_complete, boundary_cells_complete = create_complete_boundary_ibound(
    submodel_grid, clipped_submodel_boundary, boundary_thickness=1
)

print(f"Complete boundary detection results:")
print(f"  Total boundary cells: {len(boundary_cells_complete)}")
print(f"  CHD cells: {np.sum(submodel_ibound_complete == -1):,}")
print(f"  Active cells: {np.sum(submodel_ibound_complete == 1):,}")

# Head interpolation for all boundary cells
print("Interpolating heads for all boundary cells...")

# Use the same KDTree approach but for all boundary cells
sub_boundary_coords_complete = []
for cell in boundary_cells_complete:
    sub_boundary_coords_complete.append([cell['x'], cell['y']])

sub_boundary_coords_complete = np.array(sub_boundary_coords_complete)

# Perform inverse distance weighted interpolation
k_neighbors = min(5, len(parent_heads_valid))
distances, neighbor_indices = parent_tree.query(sub_boundary_coords_complete, k=k_neighbors)

# Handle zero distances
distances = np.maximum(distances, 1e-10)

# Calculate weights
weights = 1.0 / distances
weights = weights / weights.sum(axis=1, keepdims=True)

# Weighted average
interpolated_heads_complete = np.sum(
    parent_heads_valid[neighbor_indices] * weights, axis=1
)

# Create CHD package data for all boundary cells
chd_data_complete = []
for i, cell in enumerate(boundary_cells_complete):
    interpolated_head = float(interpolated_heads_complete[i])
    
    chd_data_complete.append([
        0,  # Layer 0
        cell['submodel_row'],
        cell['submodel_col'],
        interpolated_head,
        interpolated_head
    ])

print(f"Complete CHD data created: {len(chd_data_complete)} cells")

# Update variables for consistency with rest of notebook
submodel_ibound_clipped = submodel_ibound_complete
boundary_cells = boundary_cells_complete
submodel_chd_data = chd_data_complete

# Visualize the complete boundary conditions
fig, ax = plt.subplots(figsize=(12, 10))

# Define colormap for IBOUND visualization  
import matplotlib.colors as mcolors
cmap = mcolors.ListedColormap(['blue', 'white'])  # CHD=-1: blue, Active=1: white
bounds = [-1.5, -0.5, 1.5]
norm = mcolors.BoundaryNorm(bounds, cmap.N)

pmv = flopy.plot.PlotMapView(modelgrid=submodel_grid, ax=ax)
im = pmv.plot_array(submodel_ibound_complete[0], cmap=cmap, norm=norm)
pmv.plot_grid(color='gray', alpha=0.3, linewidth=0.3)

# Plot original clipped boundary for reference
if hasattr(clipped_submodel_boundary, 'exterior'):
    boundary_x, boundary_y = clipped_submodel_boundary.exterior.xy
    ax.plot(boundary_x, boundary_y, 'red', linewidth=2, label='Original Boundary')

# Plot wells
wells_gdf.plot(ax=ax, color='red', markersize=80, label='Wells', zorder=5,
               edgecolors='white', linewidth=1)

# Colorbar
cbar = plt.colorbar(im, ax=ax, shrink=0.3, ticks=[-1, 1])
cbar.ax.set_yticklabels(["CHD (-1)", "Active (1)"])
cbar.set_label("IBOUND")

ax.set_title('Complete Submodel Boundary Conditions\n(Blue: CHD boundary, White: Active cells)')
ax.legend()
ax.set_aspect('equal')
plt.tight_layout()
plt.show()

print("✓ Complete CHD boundary created around entire submodel domain")

In [None]:
# Create CHD package with data for BOTH stress periods
# Flow is steady-state, so same heads apply to both periods
sub_base_chd = flopy.modflow.ModflowChd(
    model=m_sub_base,
    stress_period_data={
        0: submodel_chd_data,  # Period 1 (source active)
        1: submodel_chd_data   # Period 2 (source off) - same heads
    },
    ipakcb=53,
    model_ws=sub_base_ws
)
print(f"✓ CHD package created with boundary data for {nper} stress periods")

# Extract CHD package data and visualize with continuous colormap
fig, ax = plt.subplots(figsize=(12, 10))

pmv = flopy.plot.PlotMapView(model=m_sub_base, ax=ax)
pmv.plot_grid(color='gray', alpha=0.3, linewidth=0.3)

# Extract CHD data
chd_package = m_sub_base.chd
chd_data = chd_package.stress_period_data[0]  # First stress period

# Create arrays to hold CHD head values for plotting
chd_array = np.full((m_sub_base.nrow, m_sub_base.ncol), np.nan)
chd_coords_x = []
chd_coords_y = []
chd_heads = []

for chd_cell in chd_data:
    layer, row, col, start_head, end_head = chd_cell
    chd_array[row, col] = start_head
    
    # Also collect coordinates for scatter plot option
    x = m_sub_base.modelgrid.xcellcenters[row, col]
    y = m_sub_base.modelgrid.ycellcenters[row, col]
    chd_coords_x.append(x)
    chd_coords_y.append(y)
    chd_heads.append(start_head)

# Method 1: Plot CHD as array (shows cells as squares)
chd_masked = np.ma.masked_where(np.isnan(chd_array), chd_array)
im = pmv.plot_array(chd_masked, alpha=0.8, cmap='viridis')

# Method 2: Alternative - plot as scatter points (optional, comment out if using array method)
# scatter = ax.scatter(chd_coords_x, chd_coords_y, 
#                     c=chd_heads, 
#                     s=30,  # Size of points
#                     cmap='viridis',
#                     edgecolors='white',
#                     linewidth=0.5,
#                     alpha=0.8)

# Add colorbar
cbar = plt.colorbar(im, ax=ax, shrink=0.7)
cbar.set_label('CHD Head (m a.s.l.)', fontsize=12)

# Add contours of CHD heads for better visualization
if len(chd_heads) > 3:  # Need at least a few points for contouring
    from scipy.interpolate import griddata
    
    # Create a regular grid for interpolation
    xi = np.linspace(ax.get_xlim()[0], ax.get_xlim()[1], 50)
    yi = np.linspace(ax.get_ylim()[0], ax.get_ylim()[1], 50)
    xi_grid, yi_grid = np.meshgrid(xi, yi)
    
    # Interpolate CHD values
    zi_grid = griddata((chd_coords_x, chd_coords_y), chd_heads, 
                       (xi_grid, yi_grid), method='linear')
    
    # Add contour lines
    contours = ax.contour(xi_grid, yi_grid, zi_grid, 
                         levels=8, colors='white', linewidths=1, alpha=0.8)
    ax.clabel(contours, inline=True, fontsize=9, fmt='%.1f m')

ax.set_title('CHD Package - Specified Heads with Continuous Colormap')
ax.set_xlabel('X Coordinate (m)')
ax.set_ylabel('Y Coordinate (m)')
ax.set_aspect('equal')
plt.tight_layout()
plt.show()

# Print CHD statistics
print(f"CHD Package Summary:")
print(f"  Number of CHD cells: {len(chd_data)}")
print(f"  Head range: {min(chd_heads):.2f} to {max(chd_heads):.2f} m")
print(f"  Mean head: {np.mean(chd_heads):.2f} m")
print(f"  Standard deviation: {np.std(chd_heads):.2f} m")

sub_base_bas = flopy.modflow.ModflowBas(m_sub_base, ibound=submodel_ibound_clipped, strt=gw_elevations)

# Plot IBOUND and starting heads
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
pmv = flopy.plot.PlotMapView(model=m_sub_base, ax=ax)
im = pmv.plot_ibound()
plt.colorbar(im, ax=ax, shrink=0.7, ticks=[-1, 0, 1])
pmv.plot_grid(color='gray', alpha=0.3, linewidth=0.3)
ax.set_title('IBOUND Array')
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
ax.set_aspect('equal')  

#### RECH package

In [None]:
# Extract recharge from parent model
parent_rch = m_parent_base.rch
if parent_rch is not None:
    parent_recharge = parent_rch.rech.array
    
    # Get representative recharge value from parent model
    # Check if recharge is 2D or 3D array
    if parent_recharge.ndim == 3:
        # 3D array: [stress_period, row, col]
        parent_rech_values = parent_recharge[0]  # First stress period
    else:
        # 2D array: [row, col]
        parent_rech_values = parent_recharge
    
    # Get active cells in parent model for statistics
    # Make sure we match the dimensions correctly
    if parent_ibound.ndim == 3:
        parent_active_mask = parent_ibound[0] == 1  # Use first layer
        print(f"Parent IBOUND shape: {parent_ibound.shape}, using layer 0")
    else:
        parent_active_mask = parent_ibound == 1
        print(f"Parent IBOUND shape: {parent_ibound.shape}")
    
    print(f"Parent recharge shape: {parent_rech_values.shape}")
    print(f"Parent active mask shape: {parent_active_mask.shape}")
    
    # Ensure dimensions match
    if parent_rech_values.shape != parent_active_mask.shape:
        print(f"Warning: Dimension mismatch between recharge {parent_rech_values.shape} and active mask {parent_active_mask.shape}")
        # If recharge is 1D (single value), expand it to match grid
        if parent_rech_values.ndim == 0 or (parent_rech_values.ndim == 1 and len(parent_rech_values) == 1):
            uniform_recharge = float(parent_rech_values) if parent_rech_values.ndim == 0 else parent_rech_values[0]
            print(f"Using scalar recharge value: {uniform_recharge:.6f} m/day")
        else:
            # Use fallback value
            uniform_recharge = 0.110 / 365.25  # 110 mm/year converted to m/day
            print(f"Dimension mismatch, using default recharge: {uniform_recharge:.6f} m/day")
    else:
        # Calculate representative recharge from active cells
        active_recharge_values = parent_rech_values[parent_active_mask]
        valid_recharge = active_recharge_values[active_recharge_values > 0]
        
        if len(valid_recharge) > 0:
            uniform_recharge = np.median(valid_recharge)
            print(f"Extracted recharge from parent model:")
            print(f"  Active cells with recharge: {len(valid_recharge):,}")
            print(f"  Recharge range: {valid_recharge.min():.6f} to {valid_recharge.max():.6f} m/day")
            print(f"  Median recharge: {uniform_recharge:.6f} m/day ({uniform_recharge*365.25*1000:.1f} mm/year)")
        else:
            # Fallback to typical values for Swiss conditions
            uniform_recharge = 0.110 / 365.25  # 110 mm/year converted to m/day
            print(f"No valid recharge found in parent model, using default: {uniform_recharge:.6f} m/day")
        
else:
    # No recharge package in parent model - use default
    uniform_recharge = 0.110 / 365.25  # m/day (110 mm/year)
    print(f"No RCH package in parent model, using default recharge: {uniform_recharge:.6f} m/day")

# Create uniform recharge array for submodel
sub_recharge_array = np.full((sub_modelgrid.nrow, sub_modelgrid.ncol), uniform_recharge)

# Create RCH package for submodel
sub_base_rch = flopy.modflow.ModflowRch(
    m_sub_base,
    rech=sub_recharge_array,
    nrchop=3  # Apply recharge to highest active cell
)

print(f"\nSubmodel RCH package created:")
print(f"  Uniform recharge rate: {uniform_recharge:.6f} m/day ({uniform_recharge*365.25*1000:.1f} mm/year)")
print(f"  Applied to grid: {sub_modelgrid.nrow} × {sub_modelgrid.ncol} cells")

# Visualize recharge distribution
fig, ax = plt.subplots(figsize=(10, 8))

pmv = flopy.plot.PlotMapView(modelgrid=submodel_grid, ax=ax)
im = pmv.plot_array(sub_recharge_array * 365.25 * 1000, cmap='Blues', alpha=0.7)  # Convert to mm/year for display
pmv.plot_grid(color='gray', alpha=0.3, linewidth=0.3)

# Plot wells for reference
wells_gdf.plot(ax=ax, color='red', markersize=60, label='Wells', zorder=5,
               edgecolors='white', linewidth=1)

cbar = plt.colorbar(im, ax=ax, shrink=0.3)
cbar.set_label('Recharge Rate (mm/year)')

ax.set_title(f'Uniform Recharge on Submodel Grid\n{uniform_recharge*365.25*1000:.1f} mm/year')
ax.set_xlabel('X Coordinate (m)')
ax.set_ylabel('Y Coordinate (m)')
ax.legend()
ax.set_aspect('equal')
plt.tight_layout()
plt.show()

print("✓ RCH package created with uniform recharge from parent model")

#### WEL package
Here, we reuse the well locations and rates loaded from the flow case study.

TODO: Update well rates in the cell below based on actual concessioned rates

In [None]:
# Get the well rates from the Zurich GIS browser (re-use the rates from your 
# groups flow case study)

# TODO: Update well rates based on actual concessioned rates 
# Hint: To simplify the transport simulation, we can reduce the total pumping 
# rates. This will reduce the flow velocities in the cells near the wells, 
# which helps to reduce numerical dispersion issues in MT3DMS.
well_rates_m3d = 1400 / 1000 * 86400  # 1400 l/s converted to m3/day
# Reduce to 20% for transport simulation
well_rates_m3d *= 0.2
round(well_rates_m3d)

# Map wells to submodel grid cells
from flopy.utils.gridintersect import GridIntersect
from scipy.spatial import cKDTree

# Create GridIntersect object for the submodel
gi = GridIntersect(submodel_grid, method='vertex', rtree=True)

# Also prepare KDTree for fallback nearest-cell lookup
xc_sub = submodel_grid.xcellcenters
yc_sub = submodel_grid.ycellcenters
centers_flat = np.column_stack([xc_sub.ravel(), yc_sub.ravel()])
kdtree = cKDTree(centers_flat)

well_cells = []
for idx, well in wells_gdf.iterrows():
    well_x, well_y = well.geometry.x, well.geometry.y
    
    # Try GridIntersect first
    try:
        result = gi.intersect(Point(well_x, well_y))
        if len(result) > 0:
            # Extract row, col from intersection result
            if hasattr(result, 'iloc'):  # DataFrame
                row = int(result.iloc[0]['row'])  
                col = int(result.iloc[0]['col'])
            else:  # Other formats
                row = int(result[0]['row'])
                col = int(result[0]['col'])
        else:
            raise ValueError("No intersection found")
    except:
        # Fallback to nearest cell center
        dist, idx_flat = kdtree.query([well_x, well_y])
        row, col = np.unravel_index(idx_flat, xc_sub.shape)
    
    # Check if cell is active (not CHD)
    if submodel_ibound_clipped[0, row, col] == 1:  # Active cell
        well_cells.append({
            'well_idx': idx,
            'well_id': well.get('GWR_ID', f'well_{idx}'),
            'x': well_x,
            'y': well_y,
            'layer': 0,
            'row': row, 
            'col': col,
            'fassart': well.get('FASSART', 'Unknown')
        })
        print(f"Well {well.get('GWR_ID', idx)}: mapped to cell (L0, R{row}, C{col})")
    else:
        print(f"Warning: Well {well.get('GWR_ID', idx)} mapped to CHD cell (L0, R{row}, C{col}) - skipping")

print(f"\nMapped {len(well_cells)} wells to active submodel cells")

# Count wells per FASSART type
from collections import Counter
fassart_counts = Counter(w['fassart'] for w in well_cells)
print(f"\nWells per FASSART type:")
for fassart, count in fassart_counts.items():
    print(f"  {fassart}: {count} wells")

# Define pumping rates based on scenario
# Divide the total concessioned rate by the number of wells for each FASSART
pumping_rates = {}
for fassart, count in fassart_counts.items():
    if 'Entnahme' in fassart:
        # Pumping wells - negative rate, divided by number of wells
        pumping_rates[fassart] = -well_rates_m3d / count
    elif 'Rückgabe' in fassart or 'Sickergalerie' in fassart:
        # Injection/infiltration wells - positive rate, divided by number of wells
        pumping_rates[fassart] = +well_rates_m3d / count
    elif 'Sickergalierie' in fassart:
        # Common misspelling handling
        pumping_rates[fassart] = +well_rates_m3d / count
    else:
        # Default for unknown types
        pumping_rates[fassart] = -100 / count

print(f"\nPumping rates per well (divided by number of wells per FASSART):")
for fassart, rate in pumping_rates.items():
    print(f"  {fassart}: {rate:.1f} m³/day per well")

# Create WEL stress period data
wel_data = []
for well_info in well_cells:
    fassart = well_info['fassart']
    
    # Get the rate for this FASSART type (already divided by number of wells)
    rate = pumping_rates.get(fassart, -100)
    
    wel_data.append([well_info['layer'], well_info['row'], well_info['col'], rate])
    print(f"  {well_info['well_id']} ({fassart}): {rate:.1f} m³/day at (L{well_info['layer']}, R{well_info['row']}, C{well_info['col']})")

# Create WEL package with data for BOTH stress periods
# Wells operate at same rate in both periods (steady-state flow)
wel = flopy.modflow.ModflowWel(
    model=m_sub_base,
    stress_period_data={
        0: wel_data,  # Period 1 (source active)
        1: wel_data   # Period 2 (source off) - same pumping rates
    },
    ipakcb=53
)

print(f"\nWEL package created with {len(wel_data)} wells for {nper} stress periods")
total_pumping = sum(rate for _, _, _, rate in wel_data if rate < 0)
total_injection = sum(rate for _, _, _, rate in wel_data if rate > 0)
print(f"  Total pumping: {total_pumping:,.0f} m³/day")
print(f"  Total injection: {total_injection:,.0f} m³/day")
print(f"  Net extraction: {total_pumping + total_injection:,.0f} m³/day")

#### Solver & output control

In [None]:
# Define stress periods for BOTH MODFLOW and MT3D (must match!)
# This is needed because MT3D needs to know the temporal discretization
source_config = scenario_config['source']
source_time_days = source_config['duration_days']
simulation_time_days = scenario_config['simulation']['duration_days']

# WORKAROUND for SIGILL error: Use SINGLE stress period instead of two
# MT3D-USGS can still handle time-varying sources via SSM package
print("="*70)
print("USING SINGLE STRESS PERIOD (WORKAROUND FOR SIGILL ERROR)")
print("="*70)
nper = 1
total_days = source_time_days + simulation_time_days
perlen = [total_days]
nstp = [int(total_days)]  # One time step per day
steady = [False]  # Transient for transport
tsmult = [1.0]  # No time step multiplication for single period

print(f"\nStress period configuration (for MODFLOW and MT3D):")
print(f"  Period 0 (entire simulation): {total_days} days, {nstp[0]} timesteps")
print(f"  Total duration: {total_days} days ({total_days/365:.2f} years)")
print(f"  Steady-state: {steady}")
print(f"\nNote: SSM will handle time-varying source:")
print(f"  - Source active: days 0-{source_time_days}")
print(f"  - Source OFF: days {source_time_days}-{total_days}")
print("="*70)

# Update the MODFLOW DIS package to match transport periods
print("\nUpdating MODFLOW DIS package to match transport simulation...")

# IMPORTANT: Save grid parameters BEFORE removing DIS package
# (these attributes come from DIS and will be lost after removal)
old_dis = m_sub_base.dis
nlay_save = old_dis.nlay
nrow_save = old_dis.nrow
ncol_save = old_dis.ncol
delr_save = old_dis.delr.array.copy()
delc_save = old_dis.delc.array.copy()
top_save = old_dis.top.array.copy()
botm_save = old_dis.botm.array.copy()

print(f"  Saved grid parameters: {nlay_save} layers, {nrow_save} rows, {ncol_save} cols")

# Remove old DIS package
m_sub_base.remove_package('DIS')

# Create new DIS package with transport-compatible stress periods
sub_base_dis = flopy.modflow.ModflowDis(
    m_sub_base,
    nlay=nlay_save,
    nrow=nrow_save,
    ncol=ncol_save,
    delr=delr_save,
    delc=delc_save,
    top=top_save,
    botm=botm_save,
    nper=nper,
    perlen=perlen,
    nstp=nstp,
    steady=steady,
    tsmult=tsmult,
    itmuni=4,  # Time units: days
    lenuni=2   # Length units: meters
)

print(f"\n✓ MODFLOW DIS package updated")
print(f"  Number of stress periods: {nper}")
print(f"  Period lengths: {perlen}")
print(f"  Timesteps per period: {nstp}")
print(f"  Steady-state: {steady}")

#### Update MODFLOW to Match Transport Simulation Periods

**CRITICAL:** MODFLOW and MT3D must have the same stress period structure. We need to update the existing steady-state MODFLOW model to use 2 transient stress periods matching the transport scenario.

In [None]:
# NWT Solver
nwt = flopy.modflow.ModflowNwt(
    m_sub_base,
    headtol=0.01,      # Head tolerance
    fluxtol=5.0,       # Flux tolerance  
    maxiterout=100,    # Maximum outer iterations
    thickfact=1e-05,   # Thickness factor for dry cells
    linmeth=1,         # Linear solution method (1=GMRES, 2=XMD)
    iprnwt=1,          # Print flag
    ibotav=0,          # Bottom averaging flag
    options='COMPLEX'  # Use complex option for difficult problems
)

# Output Control - CRITICAL for MT3D-USGS coupling!
# MT3D-USGS needs flow data at EVERY transport timestep
# For compatibility, we save at regular intervals matching MT3D timesteps

print("Configuring Output Control for MT3D coupling...")
oc_stress_period_data = {}

# Save every 10 timesteps to reduce file size while maintaining temporal resolution
# MT3D can interpolate between saved timesteps if needed
save_interval = 10

for per in range(nper):
    # Always save first and last timestep of each period
    timesteps_to_save = [0, nstp[per]-1]
    
    # Add intermediate saves every save_interval timesteps
    for stp in range(save_interval, nstp[per]-1, save_interval):
        timesteps_to_save.append(stp)
    
    # Create OC entries
    for stp in timesteps_to_save:
        oc_stress_period_data[(per, stp)] = ['save head', 'save budget']
    
    print(f"  Period {per}: Saving {len([s for p, s in oc_stress_period_data.keys() if p == per])} timesteps out of {nstp[per]}")

oc = flopy.modflow.ModflowOc(
    m_sub_base,
    stress_period_data=oc_stress_period_data,
    compact=True
)

print(f"\n✓ Output Control configured")
print(f"  Total save points: {len(oc_stress_period_data)}")
print(f"  This ensures MT3D-USGS has flow data for transport simulation")

#### Link-MT3D (LMT) Package

**CRITICAL:** The LMT package tells MODFLOW to write the `mt3d_link.ftl` file that MT3D-USGS needs.

Without this package, MODFLOW will NOT create the link file, even if Output Control is properly configured.

In [None]:
# Create LMT package - this is what actually triggers creation of mt3d_link.ftl
# Must be created BEFORE MT3D model object is created
lmt = flopy.modflow.ModflowLmt(
    m_sub_base,
    output_file_name='mt3d_link.ftl',
    output_file_unit=54,
    output_file_header='extended',
    output_file_format='formatted', 
    package_flows=['UPW', 'RCH', 'CHD', 'WEL']
)

print("✓ LMT (Link-MT3D) package created")
print(f"  Output file: mt3d_link.ftl")
print(f"  This package tells MODFLOW to write flow data for MT3D-USGS")

### Run Submodel Flow Simulation

Run steady-state flow on the refined submodel.

In [None]:
# Write input files and run the submodel
print("Writing submodel input files...")
m_sub_base.write_input()

# Check model setup
print("\nChecking submodel setup...")
chk = m_sub_base.check(f=None, verbose=False)
if chk.summary_array is not None and len(chk.summary_array) > 0:
    print("Model check warnings found - reviewing...")
    for warning in chk.summary_array:
        print(f"  Warning: {warning}")
else:
    print("Model check passed")

# Run the submodel
print(f"\nRunning refined submodel...")
success, buff = m_sub_base.run_model(silent=False, report=True)

### Verify Submodel Flow Results

Check that submodel flow field is reasonable and consistent with parent model.

In [None]:
# Quality checks and verification
print("\n=== Model Quality Checks ===")

# 1. Check mass balance error
listing_file = os.path.join(sub_base_ws, f"{m_sub_base.name}.list")
if os.path.exists(listing_file):
    with open(listing_file, 'r') as f:
        lines = f.readlines()
        for i, line in enumerate(lines):
            if 'PERCENT DISCREPANCY' in line:
                print(f"✓ {line.strip()}")
                # Extract percent value
                try:
                    percent = float(line.split('=')[-1].strip())
                    if abs(percent) < 1.0:
                        print(f"  Mass balance OK: {abs(percent):.3f}% < 1%")
                    else:
                        print(f"  WARNING: Mass balance error {abs(percent):.3f}% exceeds 1%")
                except:
                    pass
                break
else:
    print("  Warning: Listing file not found")

# 2. Check for dry cells or convergence issues
if success:
    print("✓ Model converged successfully")
else:
    print("✗ Model failed to converge")

# Load heads for visualization
hds_path = os.path.join(sub_base_ws, f"{m_sub_base.name}.hds")
hds = flopy.utils.HeadFile(hds_path)
heads = hds.get_data()

# Check for dry cells
dry_cells = np.sum(heads < -1e10)
if dry_cells == 0:
    print(f"✓ No dry cells detected")
else:
    print(f"✗ Warning: {dry_cells} dry cells detected")

print(f"\nHead statistics:")
print(f"  Min: {heads[heads > -1e10].min():.2f} m")
print(f"  Max: {heads[heads > -1e10].max():.2f} m")
print(f"  Mean: {heads[heads > -1e10].mean():.2f} m")

# 3. Visualizations
print("\n=== Creating Visualizations ===")

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Head contours with flow vectors
ax1 = axes[0]
pmv = flopy.plot.PlotMapView(model=m_sub_base, ax=ax1, layer=0)
pmv.plot_grid(color='gray', alpha=0.2, linewidth=0.3)

# Plot heads
heads_masked = np.ma.masked_where(m_sub_base.bas6.ibound.array[0] <= 0, heads[0])
im1 = pmv.plot_array(heads_masked, alpha=0.6, cmap='Blues')
plt.colorbar(im1, ax=ax1, label='Head (m)', shrink=0.8)

# Add contours
contour_levels = np.linspace(heads_masked.min(), heads_masked.max(), 10)
cs = pmv.contour_array(heads_masked, levels=contour_levels, colors='black', linewidths=0.5, alpha=0.5)
ax1.clabel(cs, inline=True, fontsize=8, fmt='%.1f')

ax1.set_xlabel('X (m)')
ax1.set_ylabel('Y (m)')
ax1.set_title('Hydraulic Head')
ax1.set_aspect('equal')

# Plot 2: CHD boundary analysis - comparing specified vs simulated vs parent
ax2 = axes[1]

# Extract CHD boundary cells
chd_package = m_sub_base.chd
chd_data = chd_package.stress_period_data[0]

specified_chd_heads = []  # What we specified in CHD package
simulated_sub_heads = []  # What the submodel computed
parent_heads_at_boundary = []  # Parent model heads at same location

# Load parent heads
parent_hds_path = os.path.join(parent_workspace, f"{m_parent_base.name}.hds")
headobj_parent = flopy.utils.HeadFile(parent_hds_path)
parent_heads_data = headobj_parent.get_data()

# Iterate through CHD cells
for idx, chd_cell in enumerate(chd_data):
    lay, row, col, start_head, end_head = chd_cell
    
    # Get simulated submodel head at this CHD location
    sub_head = heads[int(lay), int(row), int(col)]
    
    # Only use valid heads (> 0 to exclude inactive cells)
    if sub_head > 0 and start_head > 0:
        # Get corresponding parent head
        x = m_sub_base.modelgrid.xcellcenters[int(row), int(col)]
        y = m_sub_base.modelgrid.ycellcenters[int(row), int(col)]
        
        try:
            p_row, p_col = parent_modelgrid.intersect(x, y)
            if (0 <= p_row < parent_heads_data.shape[1] and 
                0 <= p_col < parent_heads_data.shape[2]):
                parent_head = parent_heads_data[0, p_row, p_col]
                # Only include if parent head is also valid (> 0)
                if parent_head > 0:
                    specified_chd_heads.append(start_head)
                    simulated_sub_heads.append(sub_head)
                    parent_heads_at_boundary.append(parent_head)
        except:
            pass

# Plot comparison: Specified CHD vs Parent heads
if len(specified_chd_heads) > 0:
    ax2.scatter(parent_heads_at_boundary, specified_chd_heads, 
                alpha=0.4, s=15, c='green', label='Specified CHD vs Parent')
    #ax2.scatter(parent_heads_at_boundary, simulated_sub_heads, 
    #            alpha=0.4, s=15, c='blue', label='Simulated Sub vs Parent')
    
    # Add 1:1 line
    all_heads = specified_chd_heads + simulated_sub_heads + parent_heads_at_boundary
    min_val = min(all_heads)
    max_val = max(all_heads)
    ax2.plot([min_val, max_val], [min_val, max_val], 'r--', label='1:1 line', linewidth=2)
    
    # Calculate differences
    chd_parent_diff = np.array(specified_chd_heads) - np.array(parent_heads_at_boundary)
    sim_parent_diff = np.array(simulated_sub_heads) - np.array(parent_heads_at_boundary)
    
    rmse_specified = np.sqrt(np.mean(chd_parent_diff**2))
    rmse_simulated = np.sqrt(np.mean(sim_parent_diff**2))
    
    info_text = f'Valid CHD cells: {len(specified_chd_heads)}\n'
    info_text += f'Specified CHD RMSE: {rmse_specified:.3f} m\n'
    info_text += f'Simulated Sub RMSE: {rmse_simulated:.3f} m\n'
    info_text += f'Max diff (CHD-Parent): {chd_parent_diff.max():.2f} m\n'
    info_text += f'Max diff (Sim-Parent): {sim_parent_diff.max():.2f} m'
    
    ax2.text(0.02, 0.98, info_text, transform=ax2.transAxes, 
             verticalalignment='top', fontsize=9,
             bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
    
    ax2.set_xlabel('Parent Model Head (m)')
    ax2.set_ylabel('Submodel Head (m)')
    ax2.set_title('CHD Boundary Analysis\n(Green: Specified CHD, Blue: Simulated)')
    ax2.legend(loc='lower right', fontsize=9)
    ax2.grid(True, alpha=0.3)
    ax2.set_aspect('equal')
    
    print(f"\n✓ CHD Boundary Analysis:")
    print(f"  - Valid CHD cells: {len(specified_chd_heads)} (from {len(chd_data)} total)")
    print(f"  - Specified CHD vs Parent RMSE: {rmse_specified:.3f} m")
    print(f"  - Simulated vs Parent RMSE: {rmse_simulated:.3f} m")
    print(f"  - Mean difference (Specified CHD - Parent): {chd_parent_diff.mean():.3f} m")
    
    if rmse_specified > 1.0:
        print(f"\n  ⚠ NOTE: CHD boundary heads differ from parent model by ~{rmse_specified:.1f}m")
        print(f"    This is due to k-nearest neighbor interpolation (k=5) with grid resolution mismatch:")
        print(f"    - Parent grid: ~50m cells")
        print(f"    - Submodel grid: 5m cells at rotated boundary")
        print(f"    Impact on transport: Boundary effects decay within ~2-3 parent cell widths (~100-150m)")
        print(f"    Since transport focuses on well field area (interior), this is acceptable for this exercise.")
else:
    ax2.text(0.5, 0.5, 'No boundary data available', 
             ha='center', va='center', transform=ax2.transAxes)

plt.tight_layout()
plt.show()

print("\n✓ Quality checks complete. Ready for transport modeling.")

---
## 8. Create MT3D-USGS Transport Model

### IMPORTANT: Single Stress Period Configuration

**Workaround for SIGILL error:** We're using a **single stress period** (760 days total) instead of two separate periods. This avoids the crash that occurs when MODFLOW-NWT transitions between stress periods on certain Mac architectures.

**Source implementation:** For a pulse source with a single stress period, we use **initial concentration** at the source location rather than time-varying SSM. The contaminant is placed at the source cell at t=0 and then naturally disperses and advects away.

**Key parameters:**
- `nper = 1` (single stress period)
- `perlen = [760]` (total simulation time)
- `nstp = [760]` (one timestep per day)
- Source implemented via initial concentration in `sconc_array`

In [None]:
# Create MT3D-USGS model linked to the MODFLOW submodel
print("Creating MT3D-USGS transport model...")

# Note: Stress periods (nper, perlen, nstp, tsmult) were already defined above
# and used to update the MODFLOW DIS package. They must match!

print(f"Transport simulation structure (matching MODFLOW):")
print(f"  Single stress period: {perlen[0]} days, {nstp[0]} timesteps")
print(f"  Total duration: {sum(perlen)} days ({sum(perlen)/365:.2f} years)")
print(f"  Source active: days 0-{source_time_days}")
print(f"  Source OFF: days {source_time_days}-{total_days}")

# Create MT3D model object
mt = flopy.mt3d.Mt3dms(
    modelname=parent_base_namefile.replace('nwt.nam', 'mt3d'),
    modflowmodel=m_sub_base,
    model_ws=sub_base_ws,
    version='mt3d-usgs',
    exe_name='mt3dusgs'
)

print(f"\n✓ MT3D-USGS model created")
print(f"  Model name: {mt.name}")
print(f"  Workspace: {mt.model_ws}")
print(f"  Version: mt3d-usgs")
print(f"  Grid dimensions: {mt.nrow} rows × {mt.ncol} cols × {mt.nlay} layers")
print(f"  Total cells: {mt.nrow * mt.ncol * mt.nlay:,}")
print(f"\nStress periods automatically inherited from MODFLOW:")
print(f"  nper = {mt.nper}")
# Note: MT3D doesn't have a dis attribute - discretization comes from MODFLOW
print(f"  perlen = {perlen}")
print(f"  nstp = {nstp}")

### Basic Transport Package (BTN)

Set up the BTN package with porosity, initial conditions, and time stepping.

**Key parameters:**
- `prsity`: Effective porosity (0.25 for sand/gravel)
- `sconc`: Initial concentration (0 everywhere)
- `icbund`: Active transport cells (1=active, -1=constant, 0=inactive)
- `nprs`: Number of times to save concentration output

In [None]:
# Set up Basic Transport Package (BTN)
print("Setting up BTN package...")

# Get transport parameters from config
transport_params = scenario_config['transport']
porosity = float(transport_params['porosity'])  # Ensure it's a scalar float
print(f"Transport parameters:")
print(f"  Porosity: {porosity}")

# Get IBOUND from MODFLOW model to create ICBUND
ibound = m_sub_base.bas6.ibound.array

# Define the porosity array
porosity_array = np.full_like(ibound, porosity, dtype=float)

# Create ICBUND array (active transport cells) with same shape as IBOUND
# 1 = active transport cell (concentration can vary)
# 0 = inactive (no transport)
# -1 = constant concentration cell (for continuous sources only)
icbund = np.ones_like(ibound, dtype=int)

# Inactive transport cells where flow is inactive (ibound <= 0)
icbund[ibound <= 0] = 0

# Plot ICBUND
'''
fig, ax = plt.subplots(figsize=(8, 6))
pmv = flopy.plot.PlotMapView(model=m_sub_base, ax=ax, layer=0)
im = pmv.plot_array(icbund[0], cmap='viridis', alpha=0.7)
pmv.plot_grid(color='gray', alpha=0.3, linewidth=0.3)
cbar = plt.colorbar(im, ax=ax, shrink=0.3)
cbar.set_label('ICBUND Values')
ax.set_title('Transport ICBUND Array (Layer 0)')
ax.set_xlabel('X Coordinate (m)')
ax.set_ylabel('Y Coordinate (m)')
ax.set_aspect('equal')
plt.tight_layout()
plt.show()
'''

# For PULSE source: Keep source cell as ACTIVE (icbund = 1)
# We'll inject mass using SSM package for period 0, then stop
# DO NOT set icbund = -1 for pulse sources (that's for continuous sources)

print(f"\nTransport grid:")
print(f"  Active transport cells: {np.sum(icbund == 1):,}")
print(f"  Inactive transport cells: {np.sum(icbund == 0):,}")
print(f"  Constant concentration cells: {np.sum(icbund == -1):,}")

# Set initial concentration to zero everywhere
sconc = 0.0  # Initial concentration (kg/m³)
sconc_array = np.full_like(ibound, sconc, dtype=float)

# Set initial concentration array
# For INSTANTANEOUS PULSE: Set initial concentration at source cell ONLY
sconc_array = np.zeros_like(ibound, dtype=float)

# Get source location from config
source_location = source_config['location']
source_easting_relative = source_location['easting']
source_northing_relative = source_location['northing']

# Calculate source position in submodel
# Add source_easting and source_northing to first well 
first_well = well_cells[0]
source_x = first_well['x'] + source_easting_relative
source_y = first_well['y'] + source_northing_relative
source_gdf = gpd.GeoDataFrame(geometry=[Point(source_x, source_y)], crs=wells_gdf.crs)
source_row, source_col = submodel_grid.intersect(source_x, source_y)
source_layer = 0  # Assuming source is in top layer

# Get source cell indices
source_row, source_col = sub_modelgrid.intersect(source_x, source_y)
source_layer = 0  # Top layer

# Set initial concentration at source cell
source_concentration_mgL = float(source_config['concentration_mg_L'])  # mg/L
source_concentration_kgm3 = source_concentration_mgL / 1000.0  # Convert to kg/m³
sconc_array[source_layer, source_row, source_col] = source_concentration_kgm3  

print(f"\nInitial conditions (INSTANTANEOUS PULSE):")
print(f"  Source cell: Layer {source_layer}, Row {source_row}, Col {source_col}")
print(f"  Source location: ({source_x:.1f}, {source_y:.1f})")
print(f"  Initial concentration at source: {source_concentration_mgL} mg/L = {source_concentration_kgm3:.6f} kg/m³")
print(f"  Initial concentration elsewhere: 0.0 kg/m³")

# Set number of transport steps per stress period
# For transport, we typically need finer time steps than flow
# Rule of thumb: Courant number ≤ 1, so Δt ≤ Δx·n/(v)
# With 5m cells, n=0.25, v≈1m/day → Δt ≤ 1.25 days
# Use 5 day time steps for transport
dt0 = 5.0  # Initial transport time step (days)
nprs = 1  # Save output at specified times
# Specify the output times (accumulated), here, we choose to save after every 
# 30 days.  
timprs = scenario_config['simulation']['output_times_days']
print(f"  Transport output times (days): {timprs}")

# Create BTN package
btn = flopy.mt3d.Mt3dBtn(
    mt,
    ncomp=1,           # Number of species (1 for single contaminant)
    mcomp=1,           # Number of mobile species
    tunit='D',         # Time unit: Days
    lunit='M',         # Length unit: Meters
    munit='KG',        # Mass unit: Kilograms
    prsity=porosity_array,   # Effective porosity (array)
    icbund=icbund,     # Active transport cells
    sconc=sconc_array,       # Initial concentration
    dt0=dt0,           # Initial transport time step
    nprs=nprs,         # Save output at nprs times
    timprs=timprs,     # Specific output times
    nper=nper,         # Number of stress periods
    perlen=perlen,     # Length of each stress period
    nstp=nstp,         # Number of time steps in each period
    tsmult=tsmult,     # Time step multiplier
    chkmas=True,      # Check mass balance
    nprmas=30,        # Print mass balance every nprmas steps
    obs=None           # No observation points for now
)

print(f"\n✓ BTN package created")
print(f"  Number of stress periods: {nper}")
print(f"  Stress period lengths: {perlen}")
print(f"  Total simulation time: {sum(perlen):.0f} days ({sum(perlen)/365:.2f} years)")
print(f"  Initial transport time step: {dt0} days")
print(f"  Concentration outputs: {nprs} time points")
print(f"  Source type: PULSE (mass injection via SSM for period 0 only)")

### Advection Package (ADV)

Configure the advection scheme. Recommended: TVD (Total Variation Diminishing) for accuracy and stability.

**Options for mixelm:**
- `mixelm=0`: Standard finite-difference method 
- `mixelm=1`: MOC (Forward tracking method of characteristics)
- `mixelm=2`: MMOC (Modified MOC)
- `mixelm=3`: HMOC (Hybrid MOC)
- `mixelm=-1`: TVD - **recommended** (accurate, stable) 

In [None]:
# Use TVD scheme (mixelm=3, HMOC)
# Use upstream weighting (nadvfd = 1  # (1 = Upstream weighting, default)) 
adv = flopy.mt3d.Mt3dAdv(mt, mixelm=3, nadvfd=1)

### Dispersion Package (DSP)

Set dispersivity values from configuration.

**Typical values:**
- Longitudinal dispersivity (αL): 10 m (scale-dependent, ~10% of travel distance)
- Transverse dispersivity (αT): 1 m (typically αL/10)
- Vertical dispersivity (αV): 0.1 m (typically αL/100)

**Molecular diffusion** is usually negligible compared to mechanical dispersion.

In [None]:
# TODO: Check DSP package
# Load dispersivity values from config
# al = longitudinal dispersivity
# trpt = ratio of transverse to longitudinal (αT/αL)
# trpv = ratio of vertical to longitudinal (αV/αL)
aL = transport_params['longitudinal_dispersivity_m']
aT = transport_params['transverse_dispersivity_m']
aV = transport_params['vertical_dispersivity_m']
trpt = aT / aL
trpv = aV / aL
dsp = flopy.mt3d.Mt3dDsp(mt, al=aL, trpt=trpt, trpv=trpv)

### Reaction Package (RCT) - If Applicable

For Group 0 (TCE), this is a **conservative tracer** (no sorption or decay), so RCT is not needed.

**For other groups with reactions:**
- **Sorption**: `isothm=1` (linear), `sp1=Kd` (distribution coefficient)
- **Decay**: `rc1=λ` (first-order decay constant, 1/day)

**Note**: Groups 3, 4, 7, 8 will need this package.

In [None]:
# TODO: Check if RCT package is needed & set up accordingly
# For Group 0: No RCT package needed (conservative tracer)

# For groups with reactions (example):
# rct = flopy.mt3d.Mt3dRct(mt, isothm=1, sp1=Kd, rc1=lambda_decay)

### GCG Solver Package

Configure the Generalized Conjugate Gradient solver for MT3D.

In [None]:
# TODO: Create GCG package
gcg = flopy.mt3d.Mt3dGcg(mt, mxiter=50, iter1=50, cclose=1e-3)

---
## 9. Define Source Term (SSM Package)

### Map Source Location to Grid

Convert the source coordinates (Swiss LV03/LV95) to submodel grid indices (layer, row, column).

In [None]:
# TODO: Check loading of source location from config
# source_easting, source_northing from case_config_transport.yaml
# Convert to grid indices (layer, row, col)
# Verify source is within active model domain

# Location relative to pumping well (first in list if cluster)
source_easting = source_config['location']['easting']
source_northing = source_config['location']['northing']

# Add source_easting and source_northing to first well 
first_well = well_cells[0]
source_x = first_well['x'] + source_easting
source_y = first_well['y'] + source_northing
source_gdf = gpd.GeoDataFrame(geometry=[Point(source_x, source_y)], crs=wells_gdf.crs)
source_row, source_col = submodel_grid.intersect(source_x, source_y)
source_layer = 0  # Assuming source is in top layer

print(f"Source location:")
print(f"  Relative to well {first_well['well_id']}: ({source_easting}, {source_northing}) m")
print(f"  Absolute coordinates: ({source_x:.2f}, {source_y:.2f}) m")
print(f"  Mapped to cell: (L{source_layer}, R{source_row}, C{source_col})")

# Verify cell is active for transport
if icbund[source_layer, source_row, source_col] != 1:
    print(f"  Warning: Source cell is not active for transport (ICBUND={icbund[source_layer, source_row, source_col]})")    

### Visualize Source Location

Plot source location on submodel grid to verify correct placement.

In [None]:
# TODO: Create map showing to verify source and well locations

# Visualize active cells with wells and source location
fig, ax = plt.subplots(figsize=(10, 8))

pmv = flopy.plot.PlotMapView(modelgrid=submodel_grid, ax=ax)
im = pmv.plot_bc(package=sub_base_chd)
pmv.plot_grid(color='gray', alpha=0.3, linewidth=0.3)

# Plot wells for reference
wells_gdf.plot(ax=ax, color='red', markersize=60, label='Wells', zorder=5,
               edgecolors='white', linewidth=1)

# Plot source location
source_gdf.plot(ax=ax, color='yellow', markersize=100, marker='*', label='Source', zorder=6,
                 edgecolors='black', linewidth=1)

ax.set_title(f'CHD boundary cells in blue, Wells in red, Source in yellow')
ax.set_xlabel('X Coordinate (m)')
ax.set_ylabel('Y Coordinate (m)')
ax.legend()
ax.set_aspect('equal')
plt.tight_layout()
plt.show()

### Set Up SSM Package for Time-Varying Source

**CORRECTED APPROACH for 30-day Pulse Source:**

For a contamination source that is **active for 30 days then stops**, we need to implement it differently than a continuous source.

**Two Implementation Options:**

**Option 1: Using Constant Concentration Cells (icbund = -1) - SIMPLER**
- Period 0 (0-30 days): `icbund[source] = -1`, `sconc[source] = C_source`
- Period 1 (30+ days): `icbund[source] = -1`, `sconc[source] = 0.0`
- **Limitation**: `sconc` only sets initial conditions, cannot vary between periods
- **Workaround**: Use very small injection well with SSM

**Option 2: Using Injection Well with SSM (itype = 2) - RECOMMENDED FOR PULSE**
- Add a tiny injection well (e.g., 0.001 m³/day) at the source location
- Period 0: SSM specifies concentration at well
- Period 1: SSM specifies zero concentration (or remove well)
- **Advantage**: Properly handles time-varying concentrations

**For Group 0 (30-day pulse), we'll use a hybrid approach:**
- Keep `icbund = 1` (active transport cell, NOT constant concentration)
- Use SSM with mass loading to inject contaminant during period 0
- Period 1: No injection (natural decay/dispersion)

In [None]:
# SSM Package - Source/Sink Mixing for PULSE SOURCE
print("Setting up SSM (Source/Sink Mixing) package for 30-day pulse...")
print("="*70)
print("IMPORTANT: Using INITIAL CONCENTRATION approach for pulse source")
print("="*70)

# Get source configuration
source_concentration_mgL = source_config['concentration_mg_L']
source_concentration_kgm3 = source_concentration_mgL / 1000.0  # Convert mg/L to kg/m3

print(f"\nSource configuration:")
print(f"  Concentration: {source_concentration_mgL} mg/L = {source_concentration_kgm3} kg/m³")
print(f"  Location: Layer {source_layer}, Row {source_row}, Col {source_col}")
print(f"  Duration: {source_time_days} days")

# Calculate cell properties
cell_volume = delr[source_col] * delc[source_row] * (
    m_sub_base.dis.top.array[source_row, source_col] - 
    m_sub_base.dis.botm.array[source_layer, source_row, source_col]
)
pore_volume = cell_volume * porosity

print(f"\nCell properties:")
print(f"  Cell volume: {cell_volume:.2f} m³")
print(f"  Pore volume: {pore_volume:.2f} m³")

# APPROACH: Set initial concentration in source cell instead of using SSM
# This is simpler for a pulse source with single stress period
# The contaminant will naturally disperse and be transported away

# Update initial concentration at source location
sconc_array[source_layer, source_row, source_col] = source_concentration_kgm3

print(f"\n✓ Initial concentration set at source cell")
print(f"  Initial mass in source: {source_concentration_kgm3 * pore_volume:.6f} kg")

# SSM PACKAGE - For pumping wells, we DON'T need to specify them in SSM
# MT3D will automatically use the ambient concentration at well cells
# This allows us to see what concentration is actually being pumped

print("\n" + "="*70)
print("Setting up SSM package...")
print("="*70)
print("NOTE: Wells are NOT included in SSM (they will extract ambient concentration)")
print("      This allows breakthrough curves to show actual pumped concentrations")

# Create empty SSM data (no sources/sinks to specify)
# SSM package is still created but with empty stress period data
ssm_data = {}

for per in range(nper):
    # Format: (layer, row, col, concentration, flow rate)
    ssm_data[per] = [[0, 0, 0, 0.0, 1]]  # Dummy entry; will be ignored by MT3D

print(f"\n✓ SSM package configured for {nper} stress period(s)")
print(f"  No concentration boundary conditions specified")
print(f"  Wells will extract ambient aquifer concentration")

# Create SSM package
ssm = flopy.mt3d.Mt3dSsm(mt, stress_period_data=ssm_data)

print(f"\n✓ SSM package created")
print(f"  Wells pump ambient water (concentration = 0)")
print(f"\n" + "="*70)
print(f"PULSE SOURCE IMPLEMENTATION")
print(f"="*70)
print(f"Method: Initial concentration in source cell")
print(f"  • Initial concentration: {source_concentration_kgm3} kg/m³")
print(f"  • Initial mass: {source_concentration_kgm3 * pore_volume:.6f} kg")
print(f"  • Contaminant will naturally disperse and advect")
print(f"  • Wells pump ambient water")
print(f"  • Source cell remains active (icbund = 1)")
print(f"="*70)

In [None]:
# Validate SSM implementation
print("="*70)
print("SSM PACKAGE VALIDATION")
print("="*70)

print(f"\nStress period configuration:")
for per in range(nper):
    if per in ssm_data and len(ssm_data[per]) > 0:
        n_sources = len(ssm_data[per])
        source_info = ssm_data[per][0]
        lay, row, col, css, itype = source_info[:5]
        print(f"  Period {per} ({perlen[per]:.0f} days):")
        print(f"    Sources: {n_sources}")
        print(f"    Location: Layer {lay}, Row {row}, Col {col}")
        print(f"    Type: itype = {itype}")
        print(f"    Rate: {css:.6f} kg/day")
        if css > 0:
            total_mass_period = css * perlen[per]
            print(f"    Total mass this period: {total_mass_period:.4f} kg")
    else:
        print(f"  Period {per}: No SSM sources (using initial concentration)")

print(f"\n" + "="*70)
print(f"EXPECTED BEHAVIOR:")
print(f"="*70)
print(f"• Initial mass placed at source location: {source_concentration_kgm3 * pore_volume:.6f} kg")
print(f"• Contaminant will advect with groundwater flow")
print(f"• Contaminant will disperse (spread out)")
print(f"• Concentration at source will decrease as plume moves away")
print(f"• Peak concentration will migrate downstream")
print(f"• Plume extent will grow over {total_days} days")
print(f"="*70)

---
## 10. Run Transport Simulation

### Write MT3D Input Files

Write all MT3D package files to disk.

In [None]:
mt.write_input()

mt.check(f=None, verbose=True)

### Run MT3D

Execute the transport simulation. This may take several minutes depending on grid size and time steps.

**Monitor for:**
- Convergence at each time step
- Mass balance errors
- Warnings about negative concentrations (instability)
- Courant/Peclet number violations

In [None]:
success, buff = mt.run_model(silent=False)

### Load Concentration Results

Read the UCN (concentration) file to extract results at different times.

In [None]:
# Load concentration results from the transport model workspace
import os
ucn_path = os.path.join(sub_base_ws, 'MT3D001.UCN')
ucn = flopy.utils.UcnFile(ucn_path)
# Get times available
times = ucn.get_times()
print(f"Concentration file loaded: {ucn_path}")
print(f"Available times: {times}")

---
## 11. Post-Processing and Visualization

### Concentration Maps at Multiple Times

Create concentration contour maps showing plume evolution over time.

In [None]:
# Create concentration contour maps for all output times
import matplotlib.pyplot as plt
import matplotlib
import numpy as np

print("Creating concentration contour maps...")
print(f"Available times: {times}")

# Get which times to plot (exclude t=0)
plot_times = [t for t in times if t > 0]
n_times = len(plot_times)
print(f"Plotting {n_times} time steps")

# Create multi-panel figure
n_cols = 2
n_rows = int(np.ceil(n_times / n_cols))
fig, axes = plt.subplots(n_rows, n_cols, figsize=(18, 6*n_rows))

# Handle single row case
if n_rows == 1:
    axes = axes.reshape(1, -1)
axes_flat = axes.flatten()

# Plot concentration for each time
for idx, time in enumerate(plot_times):
    ax = axes_flat[idx]
    
    # Get concentration at this time (layer 0)
    conc = ucn.get_data(totim=time)[0, :, :]  # kg/m³
    conc_mgL = conc * 1000.0  # Convert to mg/L
    
    # Mask inactive cells
    conc_masked = np.ma.masked_where(ibound[0] <= 0, conc_mgL)
    max_conc = np.max(conc_masked)
    
    # Create map view
    pmv = flopy.plot.PlotMapView(modelgrid=sub_modelgrid, ax=ax, layer=0)
    
    # Plot concentration
    pc = pmv.plot_array(conc_masked, cmap='YlOrRd', vmin=0, vmax=0.100, alpha=0.7)
    cmap = plt.matplotlib.colormaps.get_cmap('YlOrRd')
    cmap.set_bad(color='lightgray')
    
    # Add contours
    if max_conc > 1:
        levels = [0.001, 0.005, 0.010, 0.025, 0.050, 0.075, 0.100]
        levels = [l for l in levels if l <= max_conc]
        pmv.contour_array(conc_masked, levels=levels, colors='black', 
                         linewidths=0.8, alpha=0.6)
    
    # Overlay features
    pmv.plot_grid(lw=0.2, alpha=0.3)
    wells_gdf.plot(ax=ax, color='blue', markersize=60, label='Wells', zorder=5,
                   edgecolors='white', linewidth=1)
    #ax.plot(source_x, source_y, 'r*', markersize=20, label='Source', zorder=10)
    source_gdf.plot(ax=ax, color='red', markersize=100, marker='*', label='Source', zorder=10,
                     edgecolors='black', linewidth=1)

    # Colorbar and labels
    plt.colorbar(pc, ax=ax, label='Concentration (mg/L)', shrink=0.6)
    ax.set_title(f't = {time:.1f} days\nmax c = {max_conc:.2f} mg/L', 
                fontsize=12, fontweight='bold')
    ax.set_xlabel('Easting (m)')
    ax.set_ylabel('Northing (m)')
    ax.legend(loc='upper right', fontsize=8)
    ax.set_aspect('equal')

# Hide unused subplots
for idx in range(n_times, len(axes_flat)):
    axes_flat[idx].set_visible(False)

plt.tight_layout()
plt.show()

print(f"\n✓ Created concentration maps for {n_times} time steps")

### Mass Balance Analysis

Track the fate of contaminant mass:
- Mass remaining in aquifer
- Mass exported through boundaries
- Mass captured by pumping wells
- Mass lost to decay (if applicable)

In [None]:
# Calculate mass balance over time in the submodel

import numpy as np
import matplotlib.pyplot as plt

print("=" * 80)
print("MASS BALANCE ANALYSIS")
print("=" * 80)

# Get concentration data for all times
times = ucn.get_times()
print(f"\nAnalyzing mass balance for {len(times)} time steps...")

# Calculate cell volumes for all active cells
# Get thickness for each cell (handle multi-dimensional arrays)
if submodel_bottom.ndim > 1:
    thickness = (submodel_top - submodel_bottom)[0]  # Layer 0
else:
    thickness = submodel_top[0] - submodel_bottom[0]

# Cell volumes (m³) - can vary by cell if thickness varies
cell_volumes = delr[0] * delc[0] * thickness
print(f"Cell volume range: {np.min(cell_volumes):.2f} to {np.max(cell_volumes):.2f} m³")
print(f"Mean cell volume: {np.mean(cell_volumes):.2f} m³")

# Initialize arrays to store mass components
mass_in_aquifer = []

# Loop through all time steps
for idx, time in enumerate(times):
    # Get concentration at this time (kg/m³)
    conc = ucn.get_data(totim=time)[0, :, :]  # Layer 0
    
    # Mask inactive cells
    conc_active = np.where(ibound[0] > 0, conc, 0.0)
    
    # Calculate mass in aquifer (kg)
    # Mass = concentration × volume × porosity
    mass_in_cells = conc_active * cell_volumes * porosity
    total_mass = np.sum(mass_in_cells)
    
    mass_in_aquifer.append(total_mass)
    
    if idx == 0:
        print(f"\nTime {time:.1f} days: Mass in aquifer = {total_mass:.4f} kg")
    elif idx == len(times) - 1:
        print(f"Time {time:.1f} days: Mass in aquifer = {total_mass:.4f} kg")

# Calculate mass balance metrics
peak_mass = np.max(mass_in_aquifer)
final_mass = mass_in_aquifer[-1]
peak_time_idx = np.argmax(mass_in_aquifer)
peak_time = times[peak_time_idx]

print(f"\n{'─' * 80}")
print("MASS BALANCE SUMMARY")
print(f"{'─' * 80}")
print(f"Peak mass in aquifer: {peak_mass:.4f} kg at t = {peak_time:.1f} days")
print(f"Final mass in aquifer: {final_mass:.4f} kg at t = {times[-1]:.1f} days")
print(f"Mass reduction: {((peak_mass - final_mass) / peak_mass * 100):.2f}%")

# Calculate mass loss rate (kg/day)
if len(times) > 1 and peak_time < times[-1]:
    mass_loss_rate = (peak_mass - final_mass) / (times[-1] - peak_time)
    print(f"Average mass loss rate: {mass_loss_rate:.6f} kg/day")

# Plot mass in aquifer over time
fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(times, mass_in_aquifer, 'b-', linewidth=2, label='Mass in aquifer')
ax.axhline(y=peak_mass, color='r', linestyle='--', alpha=0.5, 
           label=f'Peak mass ({peak_mass:.4f} kg)')
ax.axhline(y=final_mass, color='g', linestyle='--', alpha=0.5,
           label=f'Final mass ({final_mass:.4f} kg)')
ax.axvline(x=peak_time, color='orange', linestyle=':', alpha=0.5,
           label=f'Peak time ({peak_time:.1f} days)')

ax.set_xlabel('Time (days)', fontsize=12, fontweight='bold')
ax.set_ylabel('Mass in Aquifer (kg)', fontsize=12, fontweight='bold')
ax.set_title('Contaminant Mass Balance Over Time', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)
ax.legend(loc='best', fontsize=10)

plt.tight_layout()
plt.show()

print(f"\n✓ Mass balance calculation complete")
print("=" * 80)

---
## 12. Quality Checks

### Mass Balance Verification

Ensure mass is conserved throughout the simulation (error < 1%).

In [None]:
# TODO: Extract mass balance from MT3D output
# Check cumulative mass balance error
# Flag if error > 1%

# Extract mass balance from MT3D output and verify accuracy

import os
import re

print("=" * 80)
print("MASS BALANCE VERIFICATION FROM MT3D OUTPUT")
print("=" * 80)

# Read the MT3D list file which contains mass balance information
list_file = os.path.join(transport_base_ws + str(group_number), 'sub_base', 'MT3D001.MAS')

if os.path.exists(list_file):
    print(f"\nReading mass balance file: {list_file}")
    
    with open(list_file, 'r') as f:
        content = f.read()
    
    # Extract mass balance information
    # Look for cumulative mass balance at the end
    lines = content.split('\n')
    
    # Initialize variables
    mass_in = 0.0
    mass_out = 0.0
    mass_stored = 0.0
    discrepancy = 0.0
    percent_error = 0.0
    
    # Parse the mass balance summary (typically at the end of the file)
    in_summary = False
    for i, line in enumerate(lines):
        if 'CUMULATIVE MASS BUDGETS AT END OF TRANSPORT STEP' in line.upper() or \
           'MASS BUDGET FOR ENTIRE MODEL' in line.upper():
            in_summary = True
        
        if in_summary:
            # Look for mass in
            if 'TOTAL IN' in line.upper() or 'SOURCES' in line.upper():
                parts = line.split()
                for j, part in enumerate(parts):
                    try:
                        if part.replace('.', '').replace('-', '').replace('E', '').replace('+', '').isdigit():
                            mass_in = float(part)
                            break
                    except:
                        pass
            
            # Look for mass out
            if 'TOTAL OUT' in line.upper() or 'SINKS' in line.upper():
                parts = line.split()
                for j, part in enumerate(parts):
                    try:
                        if part.replace('.', '').replace('-', '').replace('E', '').replace('+', '').isdigit():
                            mass_out = float(part)
                            break
                    except:
                        pass
            
            # Look for mass stored/in aquifer
            if 'IN - OUT' in line.upper() or 'STORAGE' in line.upper():
                parts = line.split()
                for j, part in enumerate(parts):
                    try:
                        if part.replace('.', '').replace('-', '').replace('E', '').replace('+', '').isdigit():
                            mass_stored = float(part)
                            break
                    except:
                        pass
            
            # Look for discrepancy/error
            if 'DISCREPANCY' in line.upper() or 'PERCENT ERROR' in line.upper():
                parts = line.split()
                for j, part in enumerate(parts):
                    try:
                        if part.replace('.', '').replace('-', '').replace('E', '').replace('+', '').isdigit():
                            if 'PERCENT' in line.upper():
                                percent_error = float(part)
                            else:
                                discrepancy = float(part)
                            break
                    except:
                        pass
    
    print("\n" + "─" * 80)
    print("CUMULATIVE MASS BALANCE (from MT3D)")
    print("─" * 80)
    print(f"Total mass IN:        {mass_in:.6f} kg")
    print(f"Total mass OUT:       {mass_out:.6f} kg")
    print(f"Mass in storage:      {mass_stored:.6f} kg")
    print(f"Discrepancy:          {discrepancy:.6f} kg")
    
    # Calculate percent error if not already found
    if percent_error == 0.0 and mass_in != 0.0:
        percent_error = abs(discrepancy / mass_in * 100)
    
    print(f"Percent error:        {percent_error:.4f} %")
    print("─" * 80)
    
    # Verify mass balance quality
    threshold = 1.0  # 1% error threshold
    
    if percent_error < threshold:
        print(f"\n✓ PASS: Mass balance error ({percent_error:.4f}%) is below {threshold}% threshold")
        print("  Mass is well conserved in the simulation.")
    else:
        print(f"\n✗ WARNING: Mass balance error ({percent_error:.4f}%) exceeds {threshold}% threshold")
        print("  Consider:")
        print("  - Reducing time step size")
        print("  - Checking Courant and Peclet numbers")
        print("  - Reviewing solver convergence criteria")
        print("  - Using implicit solution scheme (GCG package)")
    
    # Print last few lines of mass balance file for reference
    print("\n" + "─" * 80)
    print("MASS BALANCE FILE EXCERPT (last 30 lines):")
    print("─" * 80)
    for line in lines[-30:]:
        if line.strip():
            print(line)
    
else:
    print(f"\n✗ ERROR: Mass balance file not found: {list_file}")
    print("  The MT3D simulation may not have completed successfully.")
    print("  Check the MT3D listing file for errors.")

print("\n" + "=" * 80)

### Stability Criteria

Check Courant and Peclet numbers to verify numerical stability.

**Courant number (advective stability):**
```
Cr = v·Δt / Δx ≤ 1
```

**Peclet number (dispersive stability):**
```
Pe = v·Δx / D ≤ 2-4
```

Where:
- v = pore water velocity (m/day)
- Δt = time step (days)
- Δx = cell size (m)
- D = dispersion coefficient = αL·v

In [None]:
# STABILITY ANALYSIS: COURANT AND PECLET NUMBERS

print("=" * 80)
print("NUMERICAL STABILITY ANALYSIS")
print("=" * 80)

# ============================================================================
# 1. EXTRACT VELOCITY FROM FLOW MODEL
# ============================================================================
print("\n" + "─" * 80)
print("1. EXTRACTING VELOCITY FROM FLOW MODEL")
print("─" * 80)

# Get cell budget file from flow model
cbc_file = os.path.join(sub_base_ws, parent_base_namefile.replace('nam', 'cbc'))

if os.path.exists(cbc_file):
    # Load cell budget file
    cbb = flopy.utils.CellBudgetFile(cbc_file)
    
    # Get specific discharge (Darcy velocity) in each direction
    # FloPy returns face flows, we need to convert to velocity
    qx = cbb.get_data(text='FLOW RIGHT FACE')[0]  # Flow in x-direction
    qy = cbb.get_data(text='FLOW FRONT FACE')[0]  # Flow in y-direction
    qz = cbb.get_data(text='FLOW LOWER FACE')[0] if nlay > 1 else None  # Flow in z-direction
    
    # Convert face flows to Darcy velocities
    # qx is flow through right face (m³/day), divide by face area to get velocity
    # Face area in x-direction: delc * thickness
    darcy_vx = np.zeros_like(qx, dtype=float)
    darcy_vy = np.zeros_like(qy, dtype=float)
    darcy_vz = np.zeros_like(qz, dtype=float) if qz is not None else None
    
    for lay in range(nlay):
        for row in range(nrow_rotated):
            for col in range(ncol_rotated):
                # Get cell thickness
                if lay == 0:
                    top = sub_base_dis.top.array[row, col]
                else:
                    top = sub_base_dis.botm.array[lay-1, row, col]
                bot = sub_base_dis.botm.array[lay, row, col]
                thickness = top - bot
                
                if thickness > 0:
                    # Darcy velocity in x-direction (qx is flow through right face)
                    face_area_x = delc_rotated[row] * thickness
                    darcy_vx[lay, row, col] = qx[lay, row, col] / face_area_x if face_area_x > 0 else 0
                    
                    # Darcy velocity in y-direction (qy is flow through front face)
                    face_area_y = delr_rotated[col] * thickness
                    darcy_vy[lay, row, col] = qy[lay, row, col] / face_area_y if face_area_y > 0 else 0
                    
                    # Darcy velocity in z-direction (if 3D)
                    if darcy_vz is not None:
                        face_area_z = delr_rotated[col] * delc_rotated[row]
                        darcy_vz[lay, row, col] = qz[lay, row, col] / face_area_z if face_area_z > 0 else 0
    
    # Calculate Darcy velocity magnitude
    if darcy_vz is not None:
        darcy_v_magnitude = np.sqrt(darcy_vx**2 + darcy_vy**2 + darcy_vz**2)
    else:
        darcy_v_magnitude = np.sqrt(darcy_vx**2 + darcy_vy**2)
    
    # Convert Darcy velocity to seepage velocity: v_seepage = v_darcy / n
    seepage_v_magnitude = darcy_v_magnitude / porosity
    
    # Get statistics for active cells
    active_darcy_v = darcy_v_magnitude[icbund == 1]
    active_seepage_v = seepage_v_magnitude[icbund == 1]
    
    print(f"\n✓ Velocity extracted from cell budget file")
    print(f"\nDarcy velocity statistics (active cells):")
    print(f"  Minimum:  {np.min(active_darcy_v):.6f} m/day")
    print(f"  Maximum:  {np.max(active_darcy_v):.6f} m/day")
    print(f"  Mean:     {np.mean(active_darcy_v):.6f} m/day")
    print(f"  Median:   {np.median(active_darcy_v):.6f} m/day")
    
    print(f"\nSeepage velocity statistics (active cells):")
    print(f"  Porosity: {porosity}")
    print(f"  Minimum:  {np.min(active_seepage_v):.6f} m/day")
    print(f"  Maximum:  {np.max(active_seepage_v):.6f} m/day")
    print(f"  Mean:     {np.mean(active_seepage_v):.6f} m/day")
    print(f"  Median:   {np.median(active_seepage_v):.6f} m/day")
    
    # ============================================================================
    # 2. CALCULATE COURANT NUMBER
    # ============================================================================
    print("\n" + "─" * 80)
    print("2. COURANT NUMBER CALCULATION")
    print("─" * 80)
    
    # Courant number: Cr = v * dt / dx
    # v = seepage velocity (m/day)
    # dt = transport time step (days)
    # dx = grid cell size (m)
    
    # Get transport time step from BTN package
    dt_transport = dt0  # Initial transport time step
    
    # Grid spacing (assuming uniform in submodel)
    dx = sub_cell_size
    dy = sub_cell_size
    
    # Calculate Courant numbers in each direction
    courant_x = (seepage_v_magnitude * dt_transport) / dx
    courant_y = (seepage_v_magnitude * dt_transport) / dy
    
    # Get maximum Courant number (worst case)
    courant_max = np.max(courant_x[icbund == 1])
    courant_mean = np.mean(courant_x[icbund == 1])
    courant_median = np.median(courant_x[icbund == 1])
    
    print(f"\nCourant Number (Cr = v·Δt/Δx):")
    print(f"  Grid spacing (Δx): {dx} m")
    print(f"  Time step (Δt):    {dt_transport} days")
    print(f"  Maximum Cr:        {courant_max:.4f}")
    print(f"  Mean Cr:           {courant_mean:.4f}")
    print(f"  Median Cr:         {courant_median:.4f}")
    
    # Check Courant criterion
    courant_threshold = 1.0
    courant_recommended = 0.75
    
    print(f"\nCourant Number Criteria:")
    if courant_max <= courant_recommended:
        print(f"  ✓ EXCELLENT: Max Cr ({courant_max:.4f}) ≤ {courant_recommended} (recommended)")
        print(f"    Numerical stability is excellent.")
    elif courant_max <= courant_threshold:
        print(f"  ✓ ACCEPTABLE: Max Cr ({courant_max:.4f}) ≤ {courant_threshold} (threshold)")
        print(f"    Numerical stability is adequate.")
        print(f"    Consider reducing time step for better accuracy (target Cr ≤ {courant_recommended}).")
    else:
        print(f"  ✗ WARNING: Max Cr ({courant_max:.4f}) > {courant_threshold}")
        print(f"    Risk of numerical instability and oscillations!")
        print(f"    REQUIRED ACTIONS:")
        recommended_dt = dx * courant_recommended / np.max(active_seepage_v)
        print(f"    - Reduce time step to ≤ {recommended_dt:.2f} days")
        print(f"    - OR increase grid spacing (current: {dx} m)")
        print(f"    - OR use implicit solver (current transport uses explicit scheme)")
    
    # ============================================================================
    # 3. CALCULATE PECLET NUMBER
    # ============================================================================
    print("\n" + "─" * 80)
    print("3. PECLET NUMBER CALCULATION")
    print("─" * 80)
    
    # Peclet number: Pe = v * dx / D = dx / alpha_L
    # where D = alpha_L * v (dispersion coefficient)
    # Simplified: Pe = dx / alpha_L
    
    # Get dispersivity from transport config
    aL_value = aL  # Longitudinal dispersivity from DSP package
    
    # Calculate grid Peclet number
    peclet_grid = dx / aL_value
    
    print(f"\nGrid Peclet Number (Pe = Δx/αL):")
    print(f"  Grid spacing (Δx):          {dx} m")
    print(f"  Longitudinal dispersivity (αL): {aL_value} m")
    print(f"  Grid Peclet number:         {peclet_grid:.4f}")
    
    # Alternative: velocity-based Peclet number
    # Pe = v * dx / (alpha_L * v) = dx / alpha_L (same as above)
    # But sometimes defined as Pe = v * dx / D_L where D_L = alpha_L * v + D_molecular
    # For groundwater, molecular diffusion is negligible
    
    peclet_mean_velocity = np.mean(active_seepage_v) * dx / (aL_value * np.mean(active_seepage_v))
    print(f"  Velocity-based Pe (mean):   {peclet_mean_velocity:.4f}")
    
    # Check Peclet criterion
    peclet_threshold = 4.0
    peclet_recommended = 2.0
    
    print(f"\nPeclet Number Criteria:")
    if peclet_grid <= peclet_recommended:
        print(f"  ✓ EXCELLENT: Pe ({peclet_grid:.4f}) ≤ {peclet_recommended} (recommended)")
        print(f"    Numerical dispersion is minimal.")
    elif peclet_grid <= peclet_threshold:
        print(f"  ✓ ACCEPTABLE: Pe ({peclet_grid:.4f}) ≤ {peclet_threshold} (threshold)")
        print(f"    Numerical dispersion is acceptable.")
        print(f"    Consider refining grid for better accuracy (target Pe ≤ {peclet_recommended}).")
    else:
        print(f"  ✗ WARNING: Pe ({peclet_grid:.4f}) > {peclet_threshold}")
        print(f"    Excessive numerical dispersion - plume will be artificially smeared!")
        print(f"    REQUIRED ACTIONS:")
        recommended_dx = aL_value * peclet_recommended
        print(f"    - Reduce grid spacing to ≤ {recommended_dx:.2f} m (current: {dx} m)")
        print(f"    - OR increase dispersivity (current: {aL_value} m)")
        print(f"    - Grid refinement is strongly recommended")
    
    # ============================================================================
    # 4. SUMMARY AND RECOMMENDATIONS
    # ============================================================================
    print("\n" + "=" * 80)
    print("STABILITY ANALYSIS SUMMARY")
    print("=" * 80)
    
    print(f"\nModel Configuration:")
    print(f"  Grid spacing:        {dx} m")
    print(f"  Time step:           {dt_transport} days")
    print(f"  Porosity:            {porosity}")
    print(f"  Dispersivity (αL):   {aL_value} m")
    print(f"  Max seepage vel.:    {np.max(active_seepage_v):.6f} m/day")
    
    print(f"\nStability Numbers:")
    print(f"  Courant (max):       {courant_max:.4f}  [Threshold: ≤ 1.0, Recommended: ≤ 0.75]")
    print(f"  Peclet (grid):       {peclet_grid:.4f}  [Threshold: ≤ 4.0, Recommended: ≤ 2.0]")
    
    # Overall assessment
    courant_ok = courant_max <= courant_threshold
    peclet_ok = peclet_grid <= peclet_threshold
    
    print(f"\nOverall Assessment:")
    if courant_ok and peclet_ok:
        print(f"  ✓ PASS: Both stability criteria are satisfied")
        print(f"  The numerical solution should be stable and accurate.")
        if courant_max > courant_recommended or peclet_grid > peclet_recommended:
            print(f"\n  Note: While acceptable, consider these improvements:")
            if courant_max > courant_recommended:
                print(f"    • Reduce time step for better accuracy")
            if peclet_grid > peclet_recommended:
                print(f"    • Refine grid for less numerical dispersion")
    else:
        print(f"  ✗ FAIL: One or more stability criteria are violated")
        print(f"\n  CRITICAL: Model configuration needs adjustment!")
        if not courant_ok:
            print(f"    • Courant number too high - risk of instability")
        if not peclet_ok:
            print(f"    • Peclet number too high - excessive numerical dispersion")
    
    print("\n" + "=" * 80)
    
else:
    print(f"\n✗ ERROR: Cell budget file not found: {cbc_file}")
    print("  Cannot extract velocities without CBC file.")
    print("  Ensure MODFLOW model ran successfully.")
    print("  Skipping Courant and Peclet number calculations.")

As mentioned above, the flow velocities around the injection and pumping wells are quite high, leading to Courant numbers significantly greater than 1 in those areas. This indicates that the chosen time step may be too large for stable advection-dominated transport near the wells. To improve stability, consider reducing the time step or refining the grid further around these high-velocity zones.

### Physical Reasonableness Checks

Verify results make physical sense.

---
## 13. OPTIONAL: Analytical Verification

### Why Consider Analytical Verification?

Analytical verification demonstrates professional modeling practice and deepens your understanding of when simple vs. complex methods are needed. **This section is optional but highly recommended** for:
- Understanding the limitations of analytical solutions
- Learning when numerical modeling is truly necessary
- Demonstrating verification best practices

**If you choose to skip this section**, proceed directly to Section 15 (Sensitivity Analysis). Your report should focus on the numerical model results and well-contaminant interactions.

**If you choose to complete this section**, follow the Tier 1 requirements below.

---

### Tier 1 Requirements (Group 0 - Conservative Tracer)

Group 0 (conservative tracer) can perform full 1D comparison:
1. Extract 1D transect from MT3D results along flow direction
2. Implement 1D analytical solution
3. Plot comparison at multiple times
4. Calculate and plot breakthrough curves
5. Discuss discrepancies and their causes

**Time estimate**: 30-60 minutes with provided templates

---

### Extract 1D Transect from MT3D

Extract concentration along a transect from source in the direction of flow.

In [None]:
# TODO: Define transect line
# - Start at source
# - Extend in flow direction (use velocity vector)
# - Sample concentrations along transect at multiple times
# - Store distance from source and concentration

### Implement Analytical Solution

Calculate 1D analytical solution with same parameters as MT3D.


In [None]:
# TODO: Implement analytical solution function (or use existing library)
# Input parameters from MT3D setup:
#   - v = pore water velocity (K*i/n)
#   - D = dispersion coefficient (alpha_L * v)
#   - C0 = source concentration
#   - t_pulse = 30 days

# Calculate analytical solution for same times as MT3D
# Handle pulse source with superposition

### Compare Concentration Profiles

Plot analytical vs numerical concentration along transect at multiple times.

In [None]:
# TODO: Create comparison plots
# For times = [90d, 365d, 1825d, 3650d]:
#   - Plot C(x) analytical (solid line)
#   - Plot C(x) MT3D (points)
#   - Add legend, labels, title
# Create multi-panel figure showing evolution

### Discuss Discrepancies

Analyze differences between analytical and numerical results.

**Expected differences due to:**
1. **2D/3D spreading**: Numerical model allows transverse dispersion, analytical is 1D only
2. **Grid discretization**: MT3D uses discrete cells, analytical is continuous
3. **Numerical dispersion**: Some artificial spreading from numerical scheme
4. **Well influence**: Wells affect flow field in 2D/3D but not in 1D analytical
5. **Boundary effects**: Numerical model has finite boundaries
6. **Heterogeneity**: Numerical model may have varying K, analytical assumes uniform

**Questions to address:**
- Where is agreement best? (near source vs far field, early vs late time)
- Are differences within acceptable uncertainty?
- When is analytical solution "good enough" vs when is numerical needed?
- What aspects cannot be captured by analytical methods?

In [None]:
# TODO: Quantify differences
# Calculate RMSE between analytical and numerical
# Calculate relative error at peak concentration
# Identify times/locations with largest discrepancies
# Create table summarizing agreement metrics

### Conclusions from Analytical Comparison

Synthesize findings and answer: **When is analytical sufficient, and when is numerical modeling necessary?**

**If you completed this optional section, write 2-3 paragraphs discussing:**

1. **Quality of agreement**: How well did analytical match numerical? Where/when was agreement best?

2. **Causes of discrepancies**: Which factors (2D effects, wells, boundaries, etc.) caused the largest differences?

3. **When to use each method**: 
   - When would you recommend analytical solution for this problem?
   - What aspects required numerical modeling?
   - How would you approach a similar problem in the future?

**Bonus credit value**: This section can earn you +5-10% extra credit depending on depth of analysis and quality of discussion.

---
## 15. Sensitivity Analysis

### Parameter Sensitivity

Test how results change with ±50% variation in dispersivity.

**Rationale**: Dispersivity is scale-dependent and uncertain. Understanding sensitivity helps assess prediction reliability.

In [None]:
# TODO: Run MT3DMS with:
# - Base case: αL = 10 m
# - Low case: αL = 5 m (less spreading)
# - High case: αL = 15 m (more spreading)

# Compare:
# - Plume extent at 2 years
# - Breakthrough time at monitoring point
# - Maximum concentration at compliance point

# Plot all three scenarios on same map

### Other Parameter Sensitivities (Optional)

If time permits, test sensitivity to:
- Porosity (affects velocity)
- Source concentration (linear scaling)
- Source duration (mass loading)
- Well pumping rates (capture efficiency)

---
## 16. Summary and Conclusions

### Key Findings

Summarize the main results of your transport modeling study.

**TODO: Summarize (bullet points):**

**Plume Behavior:**
- Maximum plume extent: X m² at Y months/years
- Maximum concentration at monitoring point: X mg/L at Y months/years
- Time to reach river/compliance point: X months/years (or "not reached in 2 years")

**Well Interactions:**
- Pumping wells captured X% of contaminant mass
- [Effect of injection wells on plume spreading]
- [Net protective or spreading effect of well field]

**Analytical Comparison (if completed):**
- Agreement quality: [good/moderate/poor] along flow transect
- Key discrepancies due to: [2D effects, wells, boundaries, etc.]
- Analytical would be sufficient for: [screening-level assessment]
- Numerical was needed for: [well interactions, accurate predictions]

**Sensitivity:**
- Results most sensitive to: [dispersivity, porosity, well rates]
- Uncertainty range on key predictions: ±X%

### Interpretation and Implications

Discuss what the results mean for the scenario.

**TODO: Write 2-3 paragraphs addressing:**

1. **Risk assessment**: Does contamination reach sensitive receptors? Are regulatory thresholds exceeded? What is the timeline for impact?

2. **Well field implications**: How do current pumping/injection operations affect contamination? Should well operations be modified?

3. **Uncertainty and limitations**: What are the main sources of uncertainty? What additional data would reduce uncertainty? What are model limitations?

4. **(If analytical comparison completed)** What insights did the analytical verification provide about when simple vs. complex models are needed?

### Recommendations

Provide actionable recommendations based on your findings.

**TODO: Recommend (bullet points):**

**Monitoring:**
- [Install monitoring wells at specific locations]
- [Sampling frequency and parameters]

**Well Operations:**
- [Adjust pumping rates if needed for capture]
- [Consider turning off/relocating injection if spreading contamination]

**Further Investigation:**
- [Additional site characterization needed]
- [Model refinements or alternative scenarios]

**Remediation (if applicable):**
- [Natural attenuation feasible? Time frame?]
- [Active remediation needed? What approach?]

---
## 17. Professional Report Preparation

### Report Structure (3-4 pages)

Your final deliverable is a professional PDF report with the following structure:

**1. Problem Statement and Objectives (0.5 page)**
- Brief description of TCE spill and well field
- Modeling objectives (key questions)
- Why numerical modeling was needed

**2. Methodology (0.75 page)**
- Transport model setup summary (domain, grid, parameters)
- Key assumptions and justification
- **(If completed)** Brief mention of analytical verification approach

**3. Results (1.5-2 pages, mostly figures)**
- **Figure 1**: Concentration map at 2 years with wells
- **Figure 2**: Breakthrough curve at critical location
- **Figure 3** (optional): Analytical vs numerical comparison (if completed)
- Brief text: Maximum concentrations, breakthrough times, plume extent
- Well-contaminant interaction summary

**4. Discussion and Conclusions (0.75-1 page)**
- Interpretation: What do results mean for scenario?
- **(If completed)** Analytical comparison: When is simple method sufficient?
- Parameter uncertainty: Which parameters matter most?
- Recommendations: Well management, monitoring, remediation

### Report Template

A Word/LaTeX template is provided in the course materials. Use it to ensure consistent formatting and professional appearance.

### Figure Quality Guidelines

- High resolution (at least 300 DPI)
- Clear labels and legends
- Descriptive captions
- Referenced in text
- Consistent color schemes

### Writing Tips

- **Be concise**: Every sentence should add value
- **Be specific**: "100 mg/L" not "high concentration"
- **Past tense**: "The model predicted..." not "The model predicts..."
- **Professional tone**: Technical but accessible
- **Cite sources**: FloPy, MT3DMS, parameter references

### Checklist Before Submission

- [ ] All figures have captions and are referenced in text
- [ ] Key results quantified (not just "the plume moved downgradient")
- [ ] **(If completed)** Analytical comparison included with discussion
- [ ] Recommendations are specific and actionable
- [ ] Report is 3-4 pages (not 10 pages!)
- [ ] Spell-checked and proofread
- [ ] PDF format (not Word doc)
- [ ] File named: `transport_report_group0.pdf`

### Grading Notes

**Assignment (100%):**
- Technical implementation: 50%
- Professional report: 50%

---
## Acknowledgments and References

### Software
- FloPy: Python package for MODFLOW (Bakker et al., 2016)
- MODFLOW-NWT: Groundwater flow model (Niswonger et al., 2011)
- MT3D: Multi-species transport model (Zheng and Wang, 1999)

### References

**TODO: Add references for:**
- TCE properties and parameter values
- Dispersivity scale relationships
- Ogata-Banks analytical solution
- Any other sources used

### Course Materials

This notebook is based on:
- Teaching notebook: `4b_transport_model_implementation.ipynb`
- Transport planning document: `transport_planning.md`
- Support repository: Analytical solution functions and plotting utilities

---

**End of Case Study**