# River Input Data Checker
**Author: Jun Sasaki | Created: 2025-09-04 | Updated: 2025-09-07**

**Purpose:** Check and visualize FVCOM river input files without requiring simulation output

This notebook reads:
- FVCOM grid file (.grd)
- River namelist file (.nml)
- River NetCDF file (.nc)

And visualizes:
- River node locations on mesh map
- Time series of river discharge, temperature, and salinity
- Interactive plots using plotly (optional)

In [None]:
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from cartopy.io.img_tiles import GoogleTiles

# Import xfvcom modules
from xfvcom import (
    FvcomInputLoader,
    FvcomPlotter, 
    FvcomPlotConfig,
    FvcomPlotOptions,
    parse_river_namelist,
    make_node_marker_post,
    read_fvcom_river_nc,  # Use this instead of manual decoding
)

# Create directory for output images
png_dir = Path("PNG")
png_dir.mkdir(exist_ok=True)

print("Notebook initialized successfully")

## 1. Define Input File Paths
Modify these paths to point to your input files

In [None]:
# Define base directory for your FVCOM project
base_path = Path("~/Github/TB-FVCOM/goto2023").expanduser()

# Grid file (required)
grid_file = "TokyoBay18_grd.dat"  # or .dat file
grid_path = base_path / "input" / grid_file

# River input files
river_nc_file = "tb_sewer.nc"
river_nml_file = "RIVERS_NAMELIST10_sewer.nml"
river_nc_path = base_path / "input/2020" / river_nc_file
river_nml_path = base_path / "input/2020" / river_nml_file

# UTM zone for Tokyo Bay (required for geographic grids)
utm_zone = 54

# Check if files exist and provide helpful messages
files_ok = True

if not grid_path.exists():
    print(f"⚠ Grid file not found: {grid_path}")
    print("  Please update 'grid_path' to point to your FVCOM grid file")
    files_ok = False
else:
    print(f"✓ Grid file found: {grid_path}")

if not river_nc_path.exists():
    print(f"⚠ River NC file not found: {river_nc_path}")
    print("  Please update 'river_nc_path' to point to your river NetCDF file")
    # Try to find alternative in notebooks directory
    local_river = Path("river.nc")
    if local_river.exists():
        print(f"  Found alternative: {local_river}")
        river_nc_path = local_river
    else:
        files_ok = False
else:
    print(f"✓ River NC file found: {river_nc_path}")

if not river_nml_path.exists():
    print(f"⚠ River NML file not found: {river_nml_path}")
    print("  Please update 'river_nml_path' to point to your river namelist file")
    files_ok = False
else:
    print(f"✓ River NML file found: {river_nml_path}")

if not files_ok:
    print("\n" + "="*60)
    print("Please update the file paths above to continue.")
    print("The notebook will still run but some sections may be skipped.")
    print("="*60)

## 2. Load Grid Data
Load the FVCOM grid file to get mesh structure and node coordinates

In [None]:
# Load grid using FvcomInputLoader
loader = FvcomInputLoader(
    grid_path=grid_path,
    utm_zone=utm_zone,
    add_dummy_time=False,  # We don't need dummy time for this
    add_dummy_siglay=False  # We don't need dummy sigma layers
)

# Get the dataset and grid object
grid_ds = loader.ds
grid_obj = loader.grid

print("Grid loaded successfully")
print(f"Number of nodes: {grid_obj.node}")
print(f"Number of elements: {grid_obj.nele}")
print("Coordinate range:")
print(f"  Longitude: {grid_ds.lon.min().values:.3f} - {grid_ds.lon.max().values:.3f}")
print(f"  Latitude: {grid_ds.lat.min().values:.3f} - {grid_ds.lat.max().values:.3f}")

## 3. Load River Configuration
Parse the river namelist file to get river names and node locations

In [None]:
# Parse river namelist
river_df = parse_river_namelist(river_nml_path, to_zero_based=True)

print(f"Number of rivers: {len(river_df)}")
print("\nRiver configuration:")
print(river_df[['name', 'grid_location', 'file']].to_string())

## 4. Load River NetCDF Data
Read the river forcing NetCDF file to get time series data

In [None]:
# Load river NetCDF file using xfvcom's built-in function
print(f"Loading river NetCDF file: {river_nc_path}")

# Use xfvcom's read_fvcom_river_nc for automatic decoding
river_data = read_fvcom_river_nc(river_nc_path)

# Convert to more convenient format
river_ds = river_data  # The function returns a dictionary with DataFrames

print("River NetCDF variables:")
print(f"  Time range: {river_data['datetime'][0]} to {river_data['datetime'][-1]}")
print(f"  Number of time steps: {len(river_data['datetime'])}")

# Check number of rivers
n_rivers_nc = len(river_data.get('river_names', []))
if n_rivers_nc == 0 and 'river_flux' in river_data:
    n_rivers_nc = river_data['river_flux'].shape[1]

print(f"\nNumber of rivers in NetCDF: {n_rivers_nc}")

# Get river names (already decoded by read_fvcom_river_nc)
if 'river_names' in river_data:
    river_names_nc = river_data['river_names']
    print(f"Rivers: {', '.join(river_names_nc)}")
else:
    river_names_nc = [f"River_{i+1}" for i in range(n_rivers_nc)]
    print("No river names in file, using generic names")

# Check if number matches namelist
if n_rivers_nc != len(river_df):
    print(f"WARNING: Number mismatch with namelist ({len(river_df)} rivers)")

## 5. Plot River Nodes on Mesh Map
Visualize where the rivers are located on the mesh

In [None]:
# Create plotter for visualization
cfg = FvcomPlotConfig()
plotter = FvcomPlotter(grid_ds, cfg)

# Option to show river names (set to False to show only numbers)
show_river_names = True

# Set map domain (adjust these to zoom in on specific areas)
# Example: Focus on central Tokyo Bay
xlim=(139.72, 139.82)
ylim=(35.62, 35.67)

# Define marker and text styling
mkw = {"marker": "o", "color": "red", "markersize": 3, "zorder": 4}
tkw = {"fontsize": 10, "color": "yellow", "ha": "center", "va": "bottom",
       "zorder": 5, "clip_on": True}

# Convert 0-based indices to 1-based for make_node_marker_post
# (river_df.grid_location is already 0-based from parse_river_namelist)
river_nodes_1based = river_df.grid_location + 1

# Create post-processing function using make_node_marker_post
# Note: respect_bounds=True (default) will only show markers within xlim/ylim
pp = make_node_marker_post(
    river_nodes_1based,  # Pass 1-based indices
    plotter,
    marker_kwargs=mkw,
    text_kwargs=tkw,
    index_base=1,
    respect_bounds=True,  # Only show markers within the specified bounds
)

# Add river names if enabled
if show_river_names:
    # Create a wrapper function that calls both the marker function and adds names
    def pp_with_names(ax):
        # First call the original post-processing function
        pp(ax)
        
        # Then add river names next to the numbers (only for rivers within bounds)
        for idx, row in river_df.iterrows():
            node_idx = row['grid_location']  # 0-based index
            lon = plotter.ds.lon.values[node_idx]
            lat = plotter.ds.lat.values[node_idx]
            
            # Only add names for rivers within the specified bounds
            if xlim[0] <= lon <= xlim[1] and ylim[0] <= lat <= ylim[1]:
                # Add river name text (slightly offset from the number)
                ax.text(lon, lat - 0.002, f"{row['name']}", 
                        fontsize=10, color='yellow', 
                        transform=ccrs.PlateCarree(), 
                        ha='left', va='top', zorder=5,
                        bbox=dict(boxstyle='round,pad=0.2', facecolor='black', alpha=0))
    
    post_process_func = pp_with_names
else:
    post_process_func = pp

# Plot options
opts = FvcomPlotOptions(
    figsize=(10, 12),
    add_tiles=True,
    tile_provider=GoogleTiles(style="satellite"),
    mesh_color="#ffffff",
    mesh_linewidth=0.3,
    title="River Input Nodes on FVCOM Mesh (Zoomed View)",
    xlim=xlim,
    ylim=ylim,
)

# Create the plot
ax = plotter.plot_2d(da=None, post_process_func=post_process_func, opts=opts)

# Save figure
ax.figure.savefig(png_dir / "river_nodes_map.png", dpi=300, bbox_inches='tight')
print("River nodes map saved to PNG/river_nodes_map.png")

# Report which rivers are shown vs hidden
rivers_shown = []
rivers_hidden = []
for idx, row in river_df.iterrows():
    node_idx = row['grid_location']
    lon = plotter.ds.lon.values[node_idx]
    lat = plotter.ds.lat.values[node_idx]
    if xlim[0] <= lon <= xlim[1] and ylim[0] <= lat <= ylim[1]:
        rivers_shown.append(row['name'])
    else:
        rivers_hidden.append(row['name'])

print(f"\nRivers shown in this view: {len(rivers_shown)}")
print(f"Rivers outside view bounds: {len(rivers_hidden)}")
print("\nTo see all rivers, set respect_bounds=False in make_node_marker_post()")

In [None]:
# Create a full map view showing all rivers
# This demonstrates the use of respect_bounds=False

# Create post-processing function that shows ALL markers
pp_all = make_node_marker_post(
    river_nodes_1based,
    plotter,
    marker_kwargs={"marker": "o", "color": "red", "markersize": 2, "zorder": 4},
    text_kwargs={"fontsize": 6, "color": "yellow", "ha": "center", "va": "bottom", "zorder": 5},
    index_base=1,
    respect_bounds=False,  # Show all markers regardless of xlim/ylim
)

# Full map extent
full_xlim = (float(grid_ds.lon.min()), float(grid_ds.lon.max()))
full_ylim = (float(grid_ds.lat.min()), float(grid_ds.lat.max()))

# Plot options for full view
opts_full = FvcomPlotOptions(
    figsize=(12, 10),
    add_tiles=True,
    tile_provider=GoogleTiles(style="satellite"),
    mesh_color="#ffffff",
    mesh_linewidth=0.2,
    title="All River Input Nodes on FVCOM Mesh (Full View)",
    xlim=full_xlim,
    ylim=full_ylim,
)

# Create the full map plot
ax_full = plotter.plot_2d(da=None, post_process_func=pp_all, opts=opts_full)

# Save figure
ax_full.figure.savefig(png_dir / "river_nodes_map_full.png", dpi=300, bbox_inches='tight')
print(f"Full map with all {len(river_df)} rivers saved to PNG/river_nodes_map_full.png")

### 5.1 Full Map View (All Rivers)
Show all river nodes on the complete mesh without bounds restriction

## 6. Plot River Time Series
Visualize discharge, temperature, and salinity for each river

In [None]:
# Create figure with subplots for time series
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)
fig.suptitle('River Input Time Series', fontsize=14, fontweight='bold')

# River names are already decoded by read_fvcom_river_nc
river_names_nc = river_data.get('river_names', [f"River {i+1}" for i in range(n_rivers_nc)])

print(f"Plotting time series for {n_rivers_nc} rivers")

# Plot 1: River discharge
ax = axes[0]
if 'river_flux' in river_data:
    flux_df = river_data['river_flux']
    for i in range(min(n_rivers_nc, flux_df.shape[1])):
        river_name = river_names_nc[i] if i < len(river_names_nc) else f"River {i+1}"
        ax.plot(flux_df.index, flux_df.iloc[:, i].values, label=river_name, linewidth=1)
    ax.set_ylabel('Discharge (m³/s)', fontsize=12)
    ax.set_title('River Discharge', fontsize=12)
    ax.grid(True, alpha=0.3)
    ax.legend(ncol=3, loc='upper right', fontsize=8)

# Plot 2: River temperature
ax = axes[1]
if 'river_temp' in river_data:
    temp_df = river_data['river_temp']
    for i in range(min(n_rivers_nc, temp_df.shape[1])):
        river_name = river_names_nc[i] if i < len(river_names_nc) else f"River {i+1}"
        ax.plot(temp_df.index, temp_df.iloc[:, i].values, label=river_name, linewidth=1)
    ax.set_ylabel('Temperature (°C)', fontsize=12)
    ax.set_title('River Temperature', fontsize=12)
    ax.grid(True, alpha=0.3)
    ax.legend(ncol=3, loc='upper right', fontsize=8)

# Plot 3: River salinity
ax = axes[2]
if 'river_salt' in river_data:
    salt_df = river_data['river_salt']
    for i in range(min(n_rivers_nc, salt_df.shape[1])):
        river_name = river_names_nc[i] if i < len(river_names_nc) else f"River {i+1}"
        ax.plot(salt_df.index, salt_df.iloc[:, i].values, label=river_name, linewidth=1)
    ax.set_ylabel('Salinity (PSU)', fontsize=12)
    ax.set_title('River Salinity', fontsize=12)
    ax.grid(True, alpha=0.3)
    ax.legend(ncol=3, loc='upper right', fontsize=8)

# Format x-axis
axes[-1].set_xlabel('Time', fontsize=12)
axes[-1].tick_params(axis='x', rotation=45)

# Adjust layout and save
plt.tight_layout()
fig.savefig(png_dir / "river_timeseries.png", dpi=300, bbox_inches='tight')
plt.show()
print("River time series saved to PNG/river_timeseries.png")

In [None]:
# Optional: Create interactive plots using plotly
try:
    from xfvcom.plot import plot_timeseries_multi_variable, print_plotly_instructions
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    
    # Create interactive plot for all rivers
    print("Creating interactive visualization...")
    
    # Prepare data for plotly
    n_rivers_to_plot = min(10, n_rivers_nc)  # Limit to 10 rivers for clarity
    
    # Create subplots
    fig = make_subplots(
        rows=3, cols=1,
        subplot_titles=('River Discharge', 'River Temperature', 'River Salinity'),
        shared_xaxes=True,
        vertical_spacing=0.1
    )
    
    # Define colors for rivers
    colors = ['blue', 'red', 'green', 'orange', 'purple', 
              'brown', 'pink', 'gray', 'olive', 'cyan']
    
    # Plot discharge
    if 'river_flux' in river_data:
        flux_df = river_data['river_flux']
        for i in range(min(n_rivers_to_plot, flux_df.shape[1])):
            river_name = river_names_nc[i] if i < len(river_names_nc) else f"River {i+1}"
            fig.add_trace(
                go.Scatter(
                    x=flux_df.index,
                    y=flux_df.iloc[:, i].values,
                    mode='lines',
                    name=river_name,
                    line=dict(color=colors[i % len(colors)]),
                    legendgroup=river_name,
                    showlegend=True
                ),
                row=1, col=1
            )
    
    # Plot temperature
    if 'river_temp' in river_data:
        temp_df = river_data['river_temp']
        for i in range(min(n_rivers_to_plot, temp_df.shape[1])):
            river_name = river_names_nc[i] if i < len(river_names_nc) else f"River {i+1}"
            fig.add_trace(
                go.Scatter(
                    x=temp_df.index,
                    y=temp_df.iloc[:, i].values,
                    mode='lines',
                    name=river_name,
                    line=dict(color=colors[i % len(colors)]),
                    legendgroup=river_name,
                    showlegend=False
                ),
                row=2, col=1
            )
    
    # Plot salinity
    if 'river_salt' in river_data:
        salt_df = river_data['river_salt']
        for i in range(min(n_rivers_to_plot, salt_df.shape[1])):
            river_name = river_names_nc[i] if i < len(river_names_nc) else f"River {i+1}"
            fig.add_trace(
                go.Scatter(
                    x=salt_df.index,
                    y=salt_df.iloc[:, i].values,
                    mode='lines',
                    name=river_name,
                    line=dict(color=colors[i % len(colors)]),
                    legendgroup=river_name,
                    showlegend=False
                ),
                row=3, col=1
            )
    
    # Update axes labels
    fig.update_yaxes(title_text="Discharge (m³/s)", row=1, col=1)
    fig.update_yaxes(title_text="Temperature (°C)", row=2, col=1)
    fig.update_yaxes(title_text="Salinity (PSU)", row=3, col=1)
    fig.update_xaxes(title_text="Date", row=3, col=1)
    
    # Update layout
    fig.update_layout(
        height=800,
        title_text="River Input Time Series (Interactive)",
        hovermode='x unified',
        showlegend=True
    )
    
    # Show the plot
    fig.show()
    
    # Print instructions
    print_plotly_instructions()
    
    if n_rivers_to_plot < n_rivers_nc:
        print(f"\nNote: Showing first {n_rivers_to_plot} rivers for clarity.")
        print(f"      Total rivers in file: {n_rivers_nc}")
        
except ImportError:
    print("Plotly not installed. Skipping interactive visualization.")
    print("Install with: pip install plotly")

## 6.1 Interactive Time Series (Optional)
Create interactive plots using plotly for better exploration of the data

## 7. Individual River Analysis
Select and analyze individual rivers in detail

In [None]:
# Select a river to analyze (change this index as needed)
river_index = 0  # First river (0-based index)

# River names are already decoded
river_names_nc = river_data.get('river_names', [f"River_{i+1}" for i in range(n_rivers_nc)])

if river_index < n_rivers_nc:
    river_name_nc = river_names_nc[river_index]
    print(f"Analyzing River from NetCDF: {river_name_nc}")
    
    # Try to find matching river in NML file
    matching_rivers = river_df[river_df['name'].str.contains(river_name_nc, case=False, na=False)]
    if not matching_rivers.empty:
        river_info = matching_rivers.iloc[0]
        print(f"Matched with NML entry: {river_info['name']}")
        print(f"Node location (0-based): {river_info['grid_location']}")
        print(f"Input file: {river_info['file']}")
        
        # Get node coordinates
        node_idx = river_info['grid_location']
        lon = grid_ds.lon.values[node_idx]
        lat = grid_ds.lat.values[node_idx]
        print(f"Coordinates: ({lon:.4f}°E, {lat:.4f}°N)")
    else:
        print(f"No matching river found in NML file for '{river_name_nc}'")
    
    # Get time series data using DataFrames from read_fvcom_river_nc
    if 'river_flux' in river_data:
        flux = river_data['river_flux'].iloc[:, river_index]
        print("\nDischarge Statistics:")
        print(f"  Mean: {flux.mean():.2f} m³/s")
        print(f"  Min: {flux.min():.2f} m³/s")
        print(f"  Max: {flux.max():.2f} m³/s")
        print(f"  Std: {flux.std():.2f} m³/s")
    
    if 'river_temp' in river_data:
        temp = river_data['river_temp'].iloc[:, river_index]
        print("\nTemperature Statistics:")
        print(f"  Mean: {temp.mean():.2f} °C")
        print(f"  Min: {temp.min():.2f} °C")
        print(f"  Max: {temp.max():.2f} °C")
    
    if 'river_salt' in river_data:
        salt = river_data['river_salt'].iloc[:, river_index]
        print("\nSalinity Statistics:")
        print(f"  Mean: {salt.mean():.3f} PSU")
        print(f"  Min: {salt.min():.3f} PSU")
        print(f"  Max: {salt.max():.3f} PSU")
    
    # Create detailed plot for this river
    fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)
    fig.suptitle(f"River: {river_name_nc}", fontsize=14, fontweight='bold')
    
    # Discharge
    if 'river_flux' in river_data:
        flux = river_data['river_flux'].iloc[:, river_index]
        axes[0].plot(flux.index, flux.values, 'b-', linewidth=1.5)
        axes[0].set_ylabel('Discharge (m³/s)')
        axes[0].grid(True, alpha=0.3)
        axes[0].axhline(y=flux.mean(), color='r', linestyle='--', alpha=0.5, label='Mean')
        axes[0].legend()
    
    # Temperature
    if 'river_temp' in river_data:
        temp = river_data['river_temp'].iloc[:, river_index]
        axes[1].plot(temp.index, temp.values, 'r-', linewidth=1.5)
        axes[1].set_ylabel('Temperature (°C)')
        axes[1].grid(True, alpha=0.3)
    
    # Salinity
    if 'river_salt' in river_data:
        salt = river_data['river_salt'].iloc[:, river_index]
        axes[2].plot(salt.index, salt.values, 'g-', linewidth=1.5)
        axes[2].set_ylabel('Salinity (PSU)')
        axes[2].grid(True, alpha=0.3)
    
    axes[-1].set_xlabel('Time')
    axes[-1].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    safe_name = river_name_nc.replace(' ', '_').replace('/', '_')
    fig.savefig(png_dir / f"river_{river_index}_{safe_name}.png", 
               dpi=300, bbox_inches='tight')
    plt.show()
    print(f"\nDetailed plot saved to PNG/river_{river_index}_{safe_name}.png")
else:
    print(f"River index {river_index} is out of range (0-{n_rivers_nc-1})")

## 8. Summary and Export
Create a summary report of all river inputs

In [None]:
# Create summary DataFrame using simplified data
river_names_nc = river_data.get('river_names', [f"River {i+1}" for i in range(n_rivers_nc)])

summary_data = []
for i in range(n_rivers_nc):
    river_name_nc = river_names_nc[i] if i < len(river_names_nc) else f"River {i+1}"
    
    # Try to find matching river in NML file for location info
    matching_rivers = river_df[river_df['name'].str.contains(river_name_nc, case=False, na=False)]
    
    if not matching_rivers.empty:
        river_info = matching_rivers.iloc[0]
        node_idx = river_info['grid_location']
        lon = grid_ds.lon.values[node_idx]
        lat = grid_ds.lat.values[node_idx]
    else:
        node_idx = -1
        lon = np.nan
        lat = np.nan
    
    # Get statistics using pandas DataFrame methods
    row_data = {
        'River Name': river_name_nc,
        'Node Index': node_idx if node_idx >= 0 else 'Not found',
        'Longitude': lon,
        'Latitude': lat,
    }
    
    # Add statistics for each variable
    if 'river_flux' in river_data:
        flux = river_data['river_flux'].iloc[:, i]
        row_data.update({
            'Mean Discharge (m³/s)': flux.mean(),
            'Max Discharge (m³/s)': flux.max(),
        })
    
    if 'river_temp' in river_data:
        temp = river_data['river_temp'].iloc[:, i]
        row_data['Mean Temp (°C)'] = temp.mean()
    
    if 'river_salt' in river_data:
        salt = river_data['river_salt'].iloc[:, i]
        row_data['Mean Salinity (PSU)'] = salt.mean()
    
    summary_data.append(row_data)

summary_df = pd.DataFrame(summary_data)

# Display summary
print("River Input Summary:")
print("="*80)
print(summary_df.to_string())

# Export to CSV
csv_path = png_dir / "river_summary.csv"
summary_df.to_csv(csv_path, index=False)
print(f"\nSummary exported to {csv_path}")

# Calculate totals
if 'Mean Discharge (m³/s)' in summary_df.columns:
    total_discharge = summary_df['Mean Discharge (m³/s)'].sum()
    print(f"\nTotal mean discharge from all rivers: {total_discharge:.2f} m³/s")

# Show warning about mismatch if applicable
print(f"\nNote: NetCDF file contains {n_rivers_nc} rivers")
print(f"      NML file references {len(river_df)} rivers")
if n_rivers_nc != len(river_df):
    print("WARNING: Number mismatch between NetCDF and NML files!")

## Notes

This notebook provides a comprehensive tool for checking FVCOM river input files without requiring simulation output. 

### Key Features:
1. **Direct Input File Reading**: Reads grid, river namelist, and river NetCDF files directly
2. **Spatial Visualization**: Shows river node locations on the mesh
3. **Time Series Analysis**: Plots discharge, temperature, and salinity for all rivers
4. **Individual River Analysis**: Detailed analysis of selected rivers
5. **Summary Export**: Creates CSV summary of all river inputs

### Customization:
- Modify file paths in Section 1 to point to your input files
- Adjust plot bounds and styling in Section 5
- Change the river index in Section 7 to analyze different rivers
- Add additional analysis as needed

### Requirements:
- xfvcom package with FvcomInputLoader support
- Grid file in .grd or .dat format
- River namelist file (.nml)
- River NetCDF file with standard FVCOM river variables