# River Time Series Extender
**Author: Jun Sasaki | Created: 2025-09-04 | Updated: 2025-09-04**

**Purpose:** Extend FVCOM river input time series using forward fill (ffill) method

This notebook:
1. Reads an existing river NetCDF file
2. Extends the time series to a specified end date using forward fill
3. Writes the extended data to a new NetCDF file using netCDF4 (preserving FVCOM format)
4. Visualizes the original and extended time series for verification

## 1. Setup and Imports

In [None]:
from pathlib import Path
from datetime import datetime

import numpy as np
import pandas as pd
import netCDF4 as nc
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# For visualizing extended vs original data
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

# Create directory for output files
output_dir = Path("extended_river_files")
output_dir.mkdir(exist_ok=True)

print("Setup complete")

## 2. Define Input/Output Paths and Parameters

In [None]:
# Input file path - modify this to your river NetCDF file
base_path = Path("~/Github/TB-FVCOM/goto2023").expanduser()
input_nc_path = base_path / "input/2020" / "TokyoBay2020kisarazufinal_sewer.nc"

# Output file path
output_nc_path = output_dir / "extended_river.nc"

# Extension parameters
# Specify the end datetime for extension (format: 'YYYY-MM-DD HH:MM:SS')
extend_to_datetime = "2021-01-01 00:00:00"  # Extend to end of 2021

# Time interval (hours) - will be detected from input file if not specified
time_interval_hours = None  # Set to None to auto-detect, or specify like 1, 24, etc.

# Verify input file exists
if not input_nc_path.exists():
    raise FileNotFoundError(f"Input file not found: {input_nc_path}")
    
print(f"Input file: {input_nc_path}")
print(f"Output file: {output_nc_path}")
print(f"Extend to: {extend_to_datetime}")

## 3. Functions for Reading and Writing River NetCDF

In [None]:
def decode_fvcom_time(itime, itime2):
    """
    Decode FVCOM time format to pandas datetime.
    
    Parameters
    ----------
    itime : array-like
        Modified Julian Day values
    itime2 : array-like
        Milliseconds since midnight
    
    Returns
    -------
    pd.DatetimeIndex
        Decoded datetime values
    """
    # Modified Julian Day epoch
    base_date = pd.Timestamp('1858-11-17')
    
    times = []
    for day, ms in zip(itime, itime2):
        dt = base_date + pd.Timedelta(days=int(day)) + pd.Timedelta(milliseconds=int(ms))
        times.append(dt)
    
    return pd.DatetimeIndex(times)


def encode_fvcom_time(datetimes):
    """
    Encode datetime to FVCOM time format.
    
    Parameters
    ----------
    datetimes : pd.DatetimeIndex
        Datetime values to encode
    
    Returns
    -------
    tuple
        (itime, itime2, times_str) for FVCOM format
    """
    base_date = pd.Timestamp('1858-11-17')
    
    itime = []
    itime2 = []
    times_str = []
    
    for dt in datetimes:
        # Calculate days since base date
        delta = dt - base_date
        days = delta.days
        
        # Calculate milliseconds since midnight
        midnight = dt.replace(hour=0, minute=0, second=0, microsecond=0)
        ms_since_midnight = (dt - midnight).total_seconds() * 1000
        
        itime.append(days)
        itime2.append(int(ms_since_midnight))
        
        # Format time string (YYYY-MM-DD HH:MM:SS.SSS)
        time_str = dt.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]  # Keep milliseconds
        times_str.append(time_str)
    
    return np.array(itime, dtype=np.int32), np.array(itime2, dtype=np.int32), times_str


def read_river_nc(filepath):
    """
    Read FVCOM river NetCDF file.
    
    Parameters
    ----------
    filepath : Path or str
        Path to river NetCDF file
    
    Returns
    -------
    dict
        Dictionary containing river data and metadata
    """
    data = {}
    
    with nc.Dataset(filepath, 'r') as ds:
        # Store global attributes
        data['global_attrs'] = {attr: ds.getncattr(attr) for attr in ds.ncattrs()}
        
        # Store dimensions
        data['dimensions'] = {dim: len(ds.dimensions[dim]) for dim in ds.dimensions}
        
        # Read time variables
        data['Itime'] = ds.variables['Itime'][:]
        data['Itime2'] = ds.variables['Itime2'][:]
        
        # Decode time to datetime
        data['datetime'] = decode_fvcom_time(data['Itime'], data['Itime2'])
        
        # Read Times string if exists
        if 'Times' in ds.variables:
            data['Times'] = ds.variables['Times'][:]
        
        # Read river names
        river_dim = 'river' if 'river' in ds.dimensions else 'rivers'
        data['river_dim'] = river_dim
        
        if 'river_names' in ds.variables:
            river_names_raw = ds.variables['river_names'][:]
            # Decode river names
            river_names = []
            for i in range(data['dimensions'][river_dim]):
                if river_names_raw.ndim == 1:
                    name = river_names_raw[i]
                else:
                    name = river_names_raw[i, :]
                
                # Convert to string
                if isinstance(name, bytes):
                    name_str = name.decode('utf-8').strip()
                elif hasattr(name, 'tobytes'):
                    name_str = name.tobytes().decode('utf-8').strip('\x00').strip()
                else:
                    name_str = ''.join([chr(c) if isinstance(c, (int, np.integer)) else str(c) 
                                       for c in name.flatten()]).strip('\x00').strip()
                river_names.append(name_str)
            data['river_names'] = river_names
        
        # Read river data variables
        for var_name in ['river_flux', 'river_temp', 'river_salt']:
            if var_name in ds.variables:
                var = ds.variables[var_name]
                data[var_name] = var[:]
                # Store variable attributes
                data[f'{var_name}_attrs'] = {attr: var.getncattr(attr) for attr in var.ncattrs()}
    
    return data

## 4. Read Original River NetCDF File

In [None]:
# Read the input file
print(f"Reading river NetCDF file: {input_nc_path}")
river_data = read_river_nc(input_nc_path)

# Display information about the data
print("\nFile dimensions:")
for dim, size in river_data['dimensions'].items():
    print(f"  {dim}: {size}")

print("\nTime range:")
print(f"  Start: {river_data['datetime'][0]}")
print(f"  End: {river_data['datetime'][-1]}")
print(f"  Number of time steps: {len(river_data['datetime'])}")

# Detect time interval
if len(river_data['datetime']) > 1:
    detected_interval = (river_data['datetime'][1] - river_data['datetime'][0]).total_seconds() / 3600
    print(f"  Detected time interval: {detected_interval} hours")
    if time_interval_hours is None:
        time_interval_hours = detected_interval
else:
    if time_interval_hours is None:
        time_interval_hours = 1  # Default to 1 hour
        print(f"  Using default interval: {time_interval_hours} hours")

if 'river_names' in river_data:
    print(f"\nRivers ({len(river_data['river_names'])}):")    
    for i, name in enumerate(river_data['river_names']):
        print(f"  {i+1}. {name}")

## 5. Extend Time Series with Forward Fill

In [None]:
def extend_timeseries_ffill(data, extend_to_str, interval_hours):
    """
    Extend time series data using forward fill.
    
    Parameters
    ----------
    data : dict
        River data dictionary from read_river_nc
    extend_to_str : str
        End datetime string (YYYY-MM-DD HH:MM:SS)
    interval_hours : float
        Time interval in hours
    
    Returns
    -------
    dict
        Extended river data
    """
    # Parse end datetime
    end_dt = pd.Timestamp(extend_to_str)
    
    # Get original time range
    orig_start = data['datetime'][0]
    orig_end = data['datetime'][-1]
    
    print(f"Original time range: {orig_start} to {orig_end}")
    print(f"Extending to: {end_dt}")
    
    if end_dt <= orig_end:
        print("Warning: Target end time is not after the original end time.")
        print("No extension needed.")
        return data
    
    # Create extended time array
    freq_str = f'{interval_hours}h' if interval_hours == int(interval_hours) else f'{int(interval_hours * 60)}min'
    extended_time = pd.date_range(start=orig_start, end=end_dt, freq=freq_str)
    
    # Copy data dictionary
    extended_data = data.copy()
    
    # Update time-related fields
    extended_data['datetime'] = extended_time
    extended_data['Itime'], extended_data['Itime2'], times_str = encode_fvcom_time(extended_time)
    
    # Encode Times string array
    max_str_len = 30  # FVCOM typically uses 26-30 chars for time strings
    times_array = np.zeros((len(extended_time), max_str_len), dtype='S1')
    for i, ts in enumerate(times_str):
        ts_bytes = ts.encode('utf-8')
        times_array[i, :len(ts_bytes)] = list(ts_bytes)
    extended_data['Times'] = times_array
    
    # Extend river data variables using forward fill
    n_orig_times = len(data['datetime'])
    n_extended_times = len(extended_time)
    river_dim = data['river_dim']
    data['dimensions'][river_dim]
    
    for var_name in ['river_flux', 'river_temp', 'river_salt']:
        if var_name in data:
            orig_values = data[var_name]
            
            # Create DataFrame for easier forward fill
            df_orig = pd.DataFrame(orig_values, index=data['datetime'])
            
            # Reindex to extended time and forward fill
            df_extended = df_orig.reindex(extended_time, method='ffill')
            
            # Convert back to numpy array
            extended_data[var_name] = df_extended.values
            
            print(f"Extended {var_name}: {orig_values.shape} -> {extended_data[var_name].shape}")
    
    # Update dimensions
    extended_data['dimensions']['time'] = n_extended_times
    
    print("\nExtension complete:")
    print(f"  Original time steps: {n_orig_times}")
    print(f"  Extended time steps: {n_extended_times}")
    print(f"  Added time steps: {n_extended_times - n_orig_times}")
    
    return extended_data


# Perform the extension
extended_data = extend_timeseries_ffill(river_data, extend_to_datetime, time_interval_hours)

## 6. Visualize Original vs Extended Data

In [None]:
# Select rivers to visualize (max 4 for clarity)
n_rivers = extended_data['dimensions'][extended_data['river_dim']]
rivers_to_plot = min(4, n_rivers)

# Create figure with subplots
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)
fig.suptitle('Original vs Extended River Time Series', fontsize=14, fontweight='bold')

# Original data time range
orig_end_idx = len(river_data['datetime'])

# Plot each variable
variables = [
    ('river_flux', 'Discharge (m³/s)', axes[0]),
    ('river_temp', 'Temperature (°C)', axes[1]),
    ('river_salt', 'Salinity (PSU)', axes[2])
]

for var_name, ylabel, ax in variables:
    if var_name in extended_data:
        for i in range(rivers_to_plot):
            river_name = extended_data['river_names'][i] if 'river_names' in extended_data else f"River {i+1}"
            
            # Plot original data
            ax.plot(river_data['datetime'], 
                   river_data[var_name][:, i], 
                   label=f"{river_name} (original)",
                   linewidth=1.5,
                   alpha=0.8)
            
            # Plot extended part with different style
            ax.plot(extended_data['datetime'][orig_end_idx-1:], 
                   extended_data[var_name][orig_end_idx-1:, i],
                   '--',
                   label=f"{river_name} (extended)",
                   linewidth=1.2,
                   alpha=0.6)
        
        # Add vertical line at extension point
        ax.axvline(x=river_data['datetime'][-1], color='red', linestyle=':', 
                   linewidth=1, alpha=0.5, label='Extension start')
        
        ax.set_ylabel(ylabel, fontsize=11)
        ax.grid(True, alpha=0.3)
        ax.legend(ncol=3, loc='upper right', fontsize=8)

# Format x-axis
axes[-1].set_xlabel('Time', fontsize=11)
axes[-1].xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
axes[-1].xaxis.set_major_locator(mdates.MonthLocator(interval=2))
plt.setp(axes[-1].xaxis.get_majorticklabels(), rotation=45, ha='right')

plt.tight_layout()
plt.show()

print(f"Visualization shows {rivers_to_plot} out of {n_rivers} rivers")
if n_rivers > rivers_to_plot:
    print(f"Note: Only first {rivers_to_plot} rivers are shown for clarity")

## 7. Write Extended Data to New NetCDF File

In [None]:
def write_river_nc(filepath, data):
    """
    Write river data to FVCOM-format NetCDF file using netCDF4.
    
    Parameters
    ----------
    filepath : Path or str
        Output file path
    data : dict
        River data dictionary
    """
    river_dim = data['river_dim']
    
    with nc.Dataset(filepath, 'w', format='NETCDF4_CLASSIC') as ds:
        # Create dimensions
        ds.createDimension('time', data['dimensions']['time'])
        ds.createDimension(river_dim, data['dimensions'][river_dim])
        
        # Handle string dimensions
        if 'river_names' in data:
            max_name_len = max(len(name) for name in data['river_names']) + 1
            ds.createDimension('namelen', max_name_len)
        
        if 'Times' in data:
            ds.createDimension('DateStrLen', data['Times'].shape[1])
        
        # Set global attributes
        if 'global_attrs' in data:
            for attr, value in data['global_attrs'].items():
                ds.setncattr(attr, value)
        
        # Add modification note
        ds.setncattr('history', f"Extended with forward fill to {data['datetime'][-1]} on {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        
        # Create and write time variables
        itime_var = ds.createVariable('Itime', 'i4', ('time',))
        itime_var.units = 'days since 1858-11-17 00:00:00'
        itime_var.format = 'modified julian day (MJD)'
        itime_var.time_zone = 'UTC'
        itime_var[:] = data['Itime']
        
        itime2_var = ds.createVariable('Itime2', 'i4', ('time',))
        itime2_var.units = 'msec since 00:00:00'
        itime2_var.time_zone = 'UTC'
        itime2_var[:] = data['Itime2']
        
        # Write Times string array
        if 'Times' in data:
            times_var = ds.createVariable('Times', 'c', ('time', 'DateStrLen'))
            times_var.time_zone = 'UTC'
            times_var[:] = data['Times']
        
        # Write river names
        if 'river_names' in data:
            names_var = ds.createVariable('river_names', 'c', (river_dim, 'namelen'))
            # Convert river names to character array
            for i, name in enumerate(data['river_names']):
                name_chars = np.zeros(max_name_len, dtype='S1')
                name_bytes = name.encode('utf-8')
                name_chars[:len(name_bytes)] = list(name_bytes)
                names_var[i, :] = name_chars
        
        # Write river data variables
        for var_name in ['river_flux', 'river_temp', 'river_salt']:
            if var_name in data:
                var = ds.createVariable(var_name, 'f4', ('time', river_dim))
                
                # Set attributes if available
                attrs_key = f'{var_name}_attrs'
                if attrs_key in data:
                    for attr, value in data[attrs_key].items():
                        var.setncattr(attr, value)
                else:
                    # Set default attributes
                    if var_name == 'river_flux':
                        var.long_name = 'river runoff volume flux'
                        var.units = 'm^3/s'
                    elif var_name == 'river_temp':
                        var.long_name = 'river runoff temperature'
                        var.units = 'degrees Celsius'
                    elif var_name == 'river_salt':
                        var.long_name = 'river runoff salinity'
                        var.units = 'PSU'
                
                # Write data
                var[:] = data[var_name]
    
    print(f"\nExtended river NetCDF file written to: {filepath}")


# Write the extended data to file
write_river_nc(output_nc_path, extended_data)

## 8. Verify Output File

In [None]:
# Read the output file to verify
print("Verifying output file...")
print("="*60)

# Read back the written file
verify_data = read_river_nc(output_nc_path)

# Check dimensions
print("\nOutput file dimensions:")
for dim, size in verify_data['dimensions'].items():
    print(f"  {dim}: {size}")

# Check time range
print("\nOutput time range:")
print(f"  Start: {verify_data['datetime'][0]}")
print(f"  End: {verify_data['datetime'][-1]}")
print(f"  Number of time steps: {len(verify_data['datetime'])}")

# Verify data integrity
print("\nData integrity check:")

# Check if original data is preserved
orig_len = len(river_data['datetime'])
for var_name in ['river_flux', 'river_temp', 'river_salt']:
    if var_name in river_data and var_name in verify_data:
        orig_values = river_data[var_name]
        verify_values = verify_data[var_name][:orig_len]
        
        if np.allclose(orig_values, verify_values, rtol=1e-6, atol=1e-8):
            print(f"  ✓ {var_name}: Original data preserved")
        else:
            print(f"  ✗ {var_name}: Data mismatch detected!")

# Check forward fill worked correctly
print("\nForward fill verification:")
for var_name in ['river_flux', 'river_temp', 'river_salt']:
    if var_name in verify_data:
        # Check last original value equals all extended values
        last_orig_values = verify_data[var_name][orig_len-1, :]
        extended_values = verify_data[var_name][orig_len:, :]
        
        # Check if all extended values match the last original value for each river
        all_match = True
        for river_idx in range(verify_data['dimensions'][verify_data['river_dim']]):
            if not np.all(extended_values[:, river_idx] == last_orig_values[river_idx]):
                all_match = False
                break
        
        if all_match:
            print(f"  ✓ {var_name}: Forward fill applied correctly")
        else:
            print(f"  ✗ {var_name}: Forward fill may have issues")

print("\n" + "="*60)
print("Extension complete!")
print(f"Original file: {input_nc_path}")
print(f"Extended file: {output_nc_path}")
print(f"Time extended from {river_data['datetime'][-1]} to {verify_data['datetime'][-1]}")

## 9. Summary Statistics

In [None]:
# Calculate and display summary statistics
print("Summary Statistics")
print("="*60)

# Time extension statistics
orig_duration = (river_data['datetime'][-1] - river_data['datetime'][0]).total_seconds() / 86400
extended_duration = (extended_data['datetime'][-1] - extended_data['datetime'][0]).total_seconds() / 86400
extension_days = extended_duration - orig_duration

print("\nTime Extension:")
print(f"  Original duration: {orig_duration:.1f} days")
print(f"  Extended duration: {extended_duration:.1f} days")
print(f"  Extension added: {extension_days:.1f} days")
print(f"  Extension percentage: {(extension_days/orig_duration)*100:.1f}%")

# Data statistics for each river
if 'river_names' in extended_data:
    print("\nRiver Statistics (using extended values):")
    print("-" * 60)
    
    for i, river_name in enumerate(extended_data['river_names']):
        print(f"\n{river_name}:")
        
        if 'river_flux' in extended_data:
            flux_values = extended_data['river_flux'][:, i]
            print(f"  Discharge: {flux_values[-1]:.3f} m³/s (constant after extension)")
        
        if 'river_temp' in extended_data:
            temp_values = extended_data['river_temp'][:, i]
            print(f"  Temperature: {temp_values[-1]:.2f} °C (constant after extension)")
        
        if 'river_salt' in extended_data:
            salt_values = extended_data['river_salt'][:, i]
            print(f"  Salinity: {salt_values[-1]:.3f} PSU (constant after extension)")

print("\n" + "="*60)
print("\nNotebook execution completed successfully!")
print(f"Extended river file saved to: {output_nc_path.absolute()}")

## Notes and Usage

### How to Use This Notebook

1. **Modify Input Parameters** (Section 2):
   - Set `input_nc_path` to your river NetCDF file
   - Set `extend_to_datetime` to your desired end date
   - Optionally set `time_interval_hours` (auto-detected if None)

2. **Run All Cells**: Execute the notebook from top to bottom

3. **Check Output**: The extended file will be saved to `extended_river_files/` directory

### Forward Fill Method

The forward fill (ffill) method:
- Takes the last available value for each river and time series variable
- Propagates these values forward to fill the extended time period
- Maintains constant discharge, temperature, and salinity after the original data ends

### FVCOM Format Preservation

This notebook:
- Uses netCDF4 package directly (not xarray) as required
- Preserves FVCOM time format (Itime, Itime2, Times)
- Maintains all variable attributes and global attributes
- Follows FVCOM conventions for river forcing files

### Customization Options

- To use different interpolation methods instead of forward fill, modify the `extend_timeseries_ffill` function
- To add seasonal variation or trends, enhance the extension logic in Section 5
- To process multiple files in batch, wrap the main logic in a loop

### Requirements

- Python packages: numpy, pandas, netCDF4, matplotlib
- FVCOM river NetCDF file with standard variables (river_flux, river_temp, river_salt)
- Write permissions in the output directory