# River Time Series Extender using xfvcom (demonstration)
**Author: Jun Sasaki | Created: 2025-09-05 Update: 2025-09-06**

**Purpose:** Extend FVCOM river input time series using xfvcom's new extension utilities

This notebook demonstrates the new xfvcom utilities for river time series extension:
- Simple one-line extension with `extend_river_nc_file()`
- Multiple extension methods (forward fill, linear, seasonal)
- Full integration with xfvcom's existing functionality

## 1. Setup and Imports

In [None]:
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Import xfvcom utilities
import xfvcom
from xfvcom import (
    extend_river_nc_file,
    read_fvcom_river_nc,
    write_fvcom_river_nc,
    extend_timeseries_ffill,
    extend_timeseries_linear,
    extend_timeseries_seasonal,
)

# Create output directory
output_dir = Path("extended_river_files")
output_dir.mkdir(exist_ok=True)

print(f"xfvcom version: {xfvcom.__version__}")
print("Setup complete")

## 2. Simple One-Line Extension and exporting to netcdf

The simplest way to extend a river NetCDF file using xfvcom:

In [None]:
# Define input and output paths
base_path = Path("~/Github/TB-FVCOM/goto2023").expanduser()
input_file = base_path / "input/2020" / "TokyoBay2020kisarazufinal_sewer.nc"

# Check if input file exists
if not input_file.exists():
    print(f"Warning: Input file not found at {input_file}")
    print("Please update the path to your river NetCDF file")
    # You can use a different file path here
    # input_file = Path("your_river_file.nc")
else:
    print(f"Input file found: {input_file}")

# Extend the river file with one function call!
output_ffill = output_dir / "river_extended_ffill.nc"

# Forward fill extension (constant values after original data)
extend_river_nc_file(
    input_path=input_file,
    output_path=output_ffill,
    extend_to="2022-01-01 00:00:00",
    method='ffill'
)

print(f"\nExtended file created: {output_ffill}")

## 3. Different Extension Methods

xfvcom provides three extension methods:
1. **Forward Fill** - Constant values (last value repeated)
2. **Linear** - Linear extrapolation based on trend
3. **Seasonal** - Repeat seasonal patterns

In [None]:
# Read the original data
original_data = read_fvcom_river_nc(input_file)

print(f"Original data range: {original_data['datetime'][0]} to {original_data['datetime'][-1]}")
print(f"Number of rivers: {len(original_data.get('river_names', []))}")
if 'river_names' in original_data:
    print(f"Rivers: {', '.join(original_data['river_names'])}")

# Demonstrate different extension methods on the first river's discharge
if 'river_flux' in original_data:
    # Get first river's data
    river_name = original_data['river_names'][0] if 'river_names' in original_data else 'River_1'
    
    # Create figure showing different extension methods
    fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=True)
    
    # Original data
    orig_flux = original_data['river_flux'].iloc[:, 0]
    
    # Method 1: Forward Fill
    ffill_data = extend_timeseries_ffill(
        original_data['river_flux'].iloc[:, [0]], 
        '2022-01-01 00:00:00'
    )
    axes[0].plot(orig_flux, 'b-', label='Original', linewidth=2)
    axes[0].plot(ffill_data, 'r--', label='Forward Fill', alpha=0.7)
    axes[0].set_ylabel('Discharge (m³/s)')
    axes[0].set_title(f'{river_name} - Forward Fill Extension')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Method 2: Linear Extrapolation
    linear_data = extend_timeseries_linear(
        original_data['river_flux'].iloc[:, [0]], 
        '2022-01-01 00:00:00',
        lookback_periods=60  # Use last 60 time steps for trend
    )
    axes[1].plot(orig_flux, 'b-', label='Original', linewidth=2)
    axes[1].plot(linear_data, 'g--', label='Linear Extrapolation', alpha=0.7)
    axes[1].set_ylabel('Discharge (m³/s)')
    axes[1].set_title(f'{river_name} - Linear Extension')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    # Method 3: Seasonal Pattern (repeat last year)
    # Note: Bug fixed - now properly handles hourly data
    seasonal_data = extend_timeseries_seasonal(
        original_data['river_flux'].iloc[:, [0]], 
        '2022-01-01 00:00:00',
        period='1Y'  # Repeat yearly pattern
    )
    axes[2].plot(orig_flux, 'b-', label='Original', linewidth=2)
    axes[2].plot(seasonal_data, 'm--', label='Seasonal Repeat', alpha=0.7)
    axes[2].set_ylabel('Discharge (m³/s)')
    axes[2].set_title(f'{river_name} - Seasonal Extension')
    axes[2].legend()
    axes[2].grid(True, alpha=0.3)
    
    # Format x-axis
    axes[-1].set_xlabel('Date')
    for ax in axes:
        ax.axvline(x=original_data['datetime'][-1], color='red', 
                   linestyle=':', alpha=0.5, label='Extension Start')
    
    plt.tight_layout()
    plt.show()
else:
    print("No river_flux data found in the file")

### 3.1 Interactive Plot with Plotly

In [None]:
# Alternative: Create an interactive HTML plot using plotly (more reliable)
# First install if needed: pip install plotly

try:
    import plotly.graph_objects as go
    from plotly.subplots import make_subplots
    plotly_available = True
    print("Using Plotly for interactive plots")
except ImportError:
    plotly_available = False
    print("Plotly not available. Install with: pip install plotly")

if plotly_available and 'river_flux' in original_data:
    # Get data
    river_name = original_data['river_names'][0] if 'river_names' in original_data else 'River_1'
    orig_flux = original_data['river_flux'].iloc[:, 0]
    
    # Generate extensions
    ffill_data = extend_timeseries_ffill(
        original_data['river_flux'].iloc[:, [0]], 
        '2021-12-31 23:00:00'
    )
    linear_data = extend_timeseries_linear(
        original_data['river_flux'].iloc[:, [0]], 
        '2021-12-31 23:00:00',
        lookback_periods=60
    )
    seasonal_data = extend_timeseries_seasonal(
        original_data['river_flux'].iloc[:, [0]], 
        '2021-12-31 23:00:00',
        period='1Y'
    )
    
    # Create subplots
    fig = make_subplots(
        rows=3, cols=1,
        subplot_titles=(
            f'{river_name} - Forward Fill Extension',
            f'{river_name} - Linear Extension',
            f'{river_name} - Seasonal Extension'
        ),
        vertical_spacing=0.1
    )
    
    # Add original data to all subplots
    for row in [1, 2, 3]:
        fig.add_trace(
            go.Scatter(
                x=orig_flux.index, 
                y=orig_flux.values,
                mode='lines',
                name='Original',
                line=dict(color='blue', width=2),
                showlegend=(row == 1)
            ),
            row=row, col=1
        )
        
        # Add vertical line for extension start
        fig.add_vline(
            x=original_data['datetime'][-1], 
            line_dash="dot", 
            line_color="red",
            opacity=0.5,
            row=row, col=1
        )
    
    # Add extended data
    fig.add_trace(
        go.Scatter(
            x=ffill_data.index,
            y=ffill_data.values.flatten(),
            mode='lines',
            name='Forward Fill',
            line=dict(color='red', width=1.5, dash='dash'),
            opacity=0.7
        ),
        row=1, col=1
    )
    
    fig.add_trace(
        go.Scatter(
            x=linear_data.index,
            y=linear_data.values.flatten(),
            mode='lines',
            name='Linear',
            line=dict(color='green', width=1.5, dash='dash'),
            opacity=0.7
        ),
        row=2, col=1
    )
    
    fig.add_trace(
        go.Scatter(
            x=seasonal_data.index,
            y=seasonal_data.values.flatten(),
            mode='lines',
            name='Seasonal',
            line=dict(color='magenta', width=1.5, dash='dash'),
            opacity=0.7
        ),
        row=3, col=1
    )
    
    # Update layout
    fig.update_xaxes(title_text="Date", row=3, col=1)
    fig.update_yaxes(title_text="Discharge (m³/s)", row=1, col=1)
    fig.update_yaxes(title_text="Discharge (m³/s)", row=2, col=1)
    fig.update_yaxes(title_text="Discharge (m³/s)", row=3, col=1)
    
    fig.update_layout(
        height=800,
        title_text="River Extension Methods - Interactive Plot<br><sub>Hover for values, drag to zoom, double-click to reset</sub>",
        hovermode='x unified'
    )
    
    # Show the plot
    fig.show()
    
    print("\n" + "="*60)
    print("Plotly Interactive Features:")
    print("="*60)
    print("  📍 Mouse Interactions:")
    print("    • Hover: View exact values at cursor position")
    print("    • Click & Drag: Zoom into selected area")
    print("    • Double-click: Reset view to original zoom")
    print("    • Scroll: Zoom in/out (over plot area)")
    print("")
    print("  🔧 Toolbar Controls (top right):")
    print("    • Camera icon: Download plot as PNG")
    print("    • Zoom: Click and drag to zoom")
    print("    • Pan: Move around the plot")
    print("    • Box Select: Select rectangular region")
    print("    • Lasso Select: Freeform selection")
    print("    • Zoom in/out: Fixed increment zoom")
    print("    • Autoscale: Fit all data in view")
    print("    • Reset axes: Return to default view")
    print("")
    print("  📊 Legend Interactions:")
    print("    • Single-click: Toggle series visibility")
    print("    • Double-click: Isolate/show all series")
    
elif not plotly_available:
    # Fallback to static matplotlib plot with higher resolution
    # %matplotlib inline
    # import matplotlib.pyplot as plt
    
    print("Creating high-resolution static plot instead...")
    
    if 'river_flux' in original_data:
        river_name = original_data['river_names'][0] if 'river_names' in original_data else 'River_1'
        orig_flux = original_data['river_flux'].iloc[:, 0]
        
        # Generate extensions
        ffill_data = extend_timeseries_ffill(
            original_data['river_flux'].iloc[:, [0]], 
            '2021-12-31 23:00:00'
        )
        linear_data = extend_timeseries_linear(
            original_data['river_flux'].iloc[:, [0]], 
            '2021-12-31 23:00:00',
            lookback_periods=60
        )
        seasonal_data = extend_timeseries_seasonal(
            original_data['river_flux'].iloc[:, [0]], 
            '2021-12-31 23:00:00',
            period='1Y'
        )
        
        # Create high-res static plot
        fig, axes = plt.subplots(3, 1, figsize=(14, 10), dpi=100)
        
        # Plot each method
        axes[0].plot(orig_flux, 'b-', label='Original', linewidth=2)
        axes[0].plot(ffill_data, 'r--', label='Forward Fill', alpha=0.7, linewidth=1.5)
        axes[0].set_title(f'{river_name} - Forward Fill Extension')
        
        axes[1].plot(orig_flux, 'b-', label='Original', linewidth=2)
        axes[1].plot(linear_data, 'g--', label='Linear', alpha=0.7, linewidth=1.5)
        axes[1].set_title(f'{river_name} - Linear Extension')
        
        axes[2].plot(orig_flux, 'b-', label='Original', linewidth=2)
        axes[2].plot(seasonal_data, 'm--', label='Seasonal', alpha=0.7, linewidth=1.5)
        axes[2].set_title(f'{river_name} - Seasonal Extension')
        
        for ax in axes:
            ax.axvline(x=original_data['datetime'][-1], color='red', 
                      linestyle=':', alpha=0.5, label='Extension Start')
            ax.legend(loc='best')
            ax.grid(True, alpha=0.3)
            ax.set_ylabel('Discharge (m³/s)')
        
        axes[-1].set_xlabel('Date')
        plt.tight_layout()
        plt.show()
        
        print("\nStatic plot displayed. For interactivity, install plotly:")
        print("  pip install plotly")

## 4. Advanced Usage: Custom Processing

For more control, you can read, process, and write the data separately:

In [None]:
# Read the data
data = read_fvcom_river_nc(input_file)

# Apply different extension methods to different variables
extend_to = pd.Timestamp('2021-12-31 23:00:00')

# Forward fill for discharge (conservative approach)
if 'river_flux' in data:
    data['river_flux'] = extend_timeseries_ffill(data['river_flux'], extend_to)
    print(f"Extended river_flux to {extend_to} using forward fill")

# Seasonal pattern for temperature (realistic variation)
if 'river_temp' in data:
    data['river_temp'] = extend_timeseries_seasonal(
        data['river_temp'], 
        extend_to,
        period='1Y'  # Repeat annual temperature cycle
    )
    print(f"Extended river_temp to {extend_to} using seasonal pattern")

# Forward fill for salinity (usually constant for rivers)
if 'river_salt' in data:
    data['river_salt'] = extend_timeseries_ffill(data['river_salt'], extend_to)
    print(f"Extended river_salt to {extend_to} using forward fill")

# Update datetime index
data['datetime'] = data['river_flux'].index if 'river_flux' in data else data['river_temp'].index

# Write the custom extended data
output_custom = output_dir / "river_extended_custom.nc"
write_fvcom_river_nc(output_custom, data)

print(f"\nCustom extended file created: {output_custom}")
print(f"Extended from {original_data['datetime'][0]} to {data['datetime'][-1]}")
print(f"Total time steps: {len(data['datetime'])} (added {len(data['datetime']) - len(original_data['datetime'])} steps)")

## 5. Batch Processing Multiple Files

Process multiple river files at once:

In [None]:
# Example: Process all river NC files in a directory
# (Uncomment and modify paths as needed)

# input_dir = Path("path/to/river/files")
# output_dir = Path("extended_files")
# output_dir.mkdir(exist_ok=True)
# 
# for nc_file in input_dir.glob("*river*.nc"):
#     output_file = output_dir / f"{nc_file.stem}_extended{nc_file.suffix}"
#     
#     try:
#         extend_river_nc_file(
#             input_path=nc_file,
#             output_path=output_file,
#             extend_to="2025-12-31 23:00:00",
#             method='ffill'
#         )
#         print(f"✓ Extended {nc_file.name}")
#     except Exception as e:
#         print(f"✗ Failed to extend {nc_file.name}: {e}")

#print("Batch processing example (commented out - modify paths as needed)")

## 6. Verification and Comparison

Compare original and extended files:

In [None]:
# Read the extended file
extended_data = read_fvcom_river_nc(output_ffill)

# Create comparison plot
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Select first river for visualization
river_idx = 0
river_name = original_data['river_names'][river_idx] if 'river_names' in original_data else f"River {river_idx+1}"

variables = [
    ('river_flux', 'Discharge (m³/s)', axes[0]),
    ('river_temp', 'Temperature (°C)', axes[1]),
    ('river_salt', 'Salinity (PSU)', axes[2])
]

for var_name, ylabel, ax in variables:
    if var_name in original_data:
        # Plot original
        orig = original_data[var_name].iloc[:, river_idx]
        ax.plot(orig.index, orig.values, 'b-', label='Original', linewidth=2)
        
        # Plot extended
        ext = extended_data[var_name].iloc[:, river_idx]
        ax.plot(ext.index, ext.values, 'r--', label='Extended', alpha=0.7)
        
        # Mark extension point
        ax.axvline(x=original_data['datetime'][-1], color='green', 
                   linestyle=':', alpha=0.5, label='Extension Start')
        
        ax.set_ylabel(ylabel)
        ax.set_title(var_name.replace('river_', '').title())
        ax.grid(True, alpha=0.3)
        ax.legend(loc='best', fontsize=8)
        
        # Format dates on x-axis
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
        ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))
        plt.setp(ax.xaxis.get_majorticklabels(), rotation=45, ha='right')

fig.suptitle(f'River: {river_name} - Original vs Extended Data', fontsize=14)
plt.tight_layout()
plt.show()

# Summary statistics
print("\nExtension Summary:")
print("=" * 50)
print(f"Original time range: {original_data['datetime'][0]} to {original_data['datetime'][-1]}")
print(f"Extended time range: {extended_data['datetime'][0]} to {extended_data['datetime'][-1]}")
print(f"Original time steps: {len(original_data['datetime'])}")
print(f"Extended time steps: {len(extended_data['datetime'])}")
print(f"Added time steps: {len(extended_data['datetime']) - len(original_data['datetime'])}")

# Calculate extension duration
orig_duration = (original_data['datetime'][-1] - original_data['datetime'][0]).total_seconds() / 86400
ext_duration = (extended_data['datetime'][-1] - extended_data['datetime'][0]).total_seconds() / 86400
added_days = ext_duration - orig_duration

print(f"\nDuration:")
print(f"  Original: {orig_duration:.1f} days")
print(f"  Extended: {ext_duration:.1f} days")
print(f"  Added: {added_days:.1f} days ({added_days/365:.1f} years)")

## 7. Data Integrity Check

Verify that the original data is preserved and extension is applied correctly:

In [None]:
import numpy as np

# Check data integrity
print("Data Integrity Check")
print("=" * 50)

# Check if original portion is preserved
orig_len = len(original_data['datetime'])
integrity_ok = True

for var_name in ['river_flux', 'river_temp', 'river_salt']:
    if var_name in original_data and var_name in extended_data:
        orig_values = original_data[var_name].values
        ext_values = extended_data[var_name].values[:orig_len]
        
        if np.allclose(orig_values, ext_values, rtol=1e-6, atol=1e-8, equal_nan=True):
            print(f"✓ {var_name}: Original data preserved correctly")
        else:
            print(f"✗ {var_name}: Data mismatch detected!")
            integrity_ok = False
            
            # Find where differences occur
            diff_mask = ~np.isclose(orig_values, ext_values, rtol=1e-6, atol=1e-8)
            if diff_mask.any():
                diff_indices = np.where(diff_mask)
                print(f"  Differences at indices: {diff_indices}")

# Check forward fill (for ffill method)
if output_ffill.exists():
    print("\nForward Fill Verification:")
    for var_name in ['river_flux', 'river_temp', 'river_salt']:
        if var_name in extended_data:
            # Get last original value and extended values
            last_orig = extended_data[var_name].iloc[orig_len-1, :]
            extended_portion = extended_data[var_name].iloc[orig_len:, :]
            
            # Check if all extended values match the last original value
            all_match = True
            for col in extended_data[var_name].columns:
                if not np.all(extended_portion[col] == last_orig[col]):
                    all_match = False
                    break
            
            if all_match:
                print(f"✓ {var_name}: Forward fill applied correctly")
            else:
                print(f"✗ {var_name}: Forward fill may have issues")

if integrity_ok:
    print("\n✓ All integrity checks passed!")
else:
    print("\n✗ Some integrity checks failed. Please review the data.")

## Summary

This notebook demonstrated the new xfvcom utilities for river time series extension:

### Key Features:
1. **Simple one-line extension**: `extend_river_nc_file()`
2. **Multiple methods**: Forward fill, linear extrapolation, seasonal patterns
3. **Preserves FVCOM format**: Uses netCDF4 directly, maintains all attributes
4. **Flexible processing**: Can handle each variable differently
5. **Batch capable**: Process multiple files easily

### New xfvcom Functions:
- `extend_river_nc_file()` - High-level extension function
- `read_fvcom_river_nc()` - Read river NetCDF files
- `write_fvcom_river_nc()` - Write FVCOM-compatible NetCDF
- `extend_timeseries_ffill()` - Forward fill extension
- `extend_timeseries_linear()` - Linear extrapolation
- `extend_timeseries_seasonal()` - Seasonal pattern repetition

### Usage Examples:
```python
# Simple extension
extend_river_nc_file('input.nc', 'output.nc', '2025-12-31', method='ffill')

# Custom processing
data = read_fvcom_river_nc('input.nc')
data['river_flux'] = extend_timeseries_ffill(data['river_flux'], '2025-12-31')
write_fvcom_river_nc('output.nc', data)
```

These utilities are now part of xfvcom and can be imported directly:
```python
from xfvcom import extend_river_nc_file
```