# Gap Filling Workflow

This notebook demonstrates the gap filling workflow for the Snow Drought Index package. It covers loading data, performing gap filling using quantile mapping, and evaluating the performance of the gap filling methods.

In [None]:
# Import required packages
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import random
import datetime

# Import snowdroughtindex package
from snowdroughtindex.core import data_preparation, gap_filling
from snowdroughtindex.utils import visualization, io

## 1. Data Loading

First, we'll load the SWE data that needs gap filling.

In [None]:
# Define data paths
swe_path = '../data/input_data/SWE_data.nc'

# Load data using the implemented functions
swe_data = data_preparation.load_swe_data(swe_path)

# Convert to DataFrame for gap filling
swe_df = data_preparation.preprocess_swe(swe_data)

# Set the index to time for time-series operations
if 'time' in swe_df.columns:
    swe_df = swe_df.set_index('time')

## 2. Data Exploration

Let's explore the data to understand the extent of missing values.

In [None]:
# Count missing values per station
missing_values = swe_df.isna().sum()

# Calculate percentage of missing values per station
missing_percentage = (missing_values / len(swe_df)) * 100

# Display stations with missing data
print("Stations with missing data:")
print(missing_percentage[missing_percentage > 0].sort_values(ascending=False))

# Plot missing data percentage
plt.figure(figsize=(12, 6))
missing_percentage[missing_percentage > 0].sort_values(ascending=False).plot(kind='bar')
plt.title('Percentage of Missing Values by Station')
plt.ylabel('Missing Values (%)')
plt.xlabel('Station ID')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 3. Gap Filling Parameters

Define parameters for the gap filling process.

In [None]:
# Parameters for gap filling
window_days = 15  # Number of days to select data for around a certain doy
min_obs_corr = 10  # Minimum number of overlapping observations required to calculate correlation
min_obs_cdf = 5  # Minimum number of stations required to calculate a station's cdf
min_corr = 0.7  # Minimum correlation value required to keep a donor station

## 4. Perform Gap Filling

Now we'll use the quantile mapping method to fill gaps in the SWE data.

In [None]:
# Perform gap filling
gapfilled_data, data_type_flags, donor_stationIDs = gap_filling.qm_gap_filling(
    swe_df, window_days, min_obs_corr, min_obs_cdf, min_corr
)

# Display summary of gap filling results
filled_gaps = (data_type_flags == 1).sum().sum()
total_gaps = swe_df.isna().sum().sum()
print(f"Total gaps in original data: {total_gaps}")
print(f"Gaps filled: {filled_gaps}")
print(f"Percentage of gaps filled: {filled_gaps / total_gaps * 100:.2f}%")

## 5. Visualize Gap Filling Results

Let's visualize the results of the gap filling process for a few selected stations.

In [None]:
# Select a few stations with filled gaps for visualization
stations_with_filled_gaps = data_type_flags.sum()[data_type_flags.sum() > 0].sort_values(ascending=False).index[:3]

# Plot original and gap-filled data for selected stations
for station in stations_with_filled_gaps:
    plt.figure(figsize=(12, 6))
    
    # Plot original data
    plt.plot(swe_df.index, swe_df[station], 'b-', label='Original Data')
    
    # Plot gap-filled data
    filled_mask = data_type_flags[station] == 1
    plt.scatter(gapfilled_data.loc[filled_mask].index, 
                gapfilled_data.loc[filled_mask, station], 
                color='r', marker='o', label='Gap-Filled Data')
    
    plt.title(f'Gap Filling Results for Station {station}')
    plt.xlabel('Date')
    plt.ylabel('SWE (mm)')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

## 6. Evaluate Gap Filling Performance

We'll evaluate the performance of the gap filling method using artificial gaps.

In [None]:
# Parameters for artificial gap filling evaluation
iterations = 3  # Number of iterations for artificial gap filling
artificial_gap_perc = 20  # Percentage of data to remove for artificial gap filling
min_obs_KGE = 5  # Minimum number of observations for KGE calculation

# Perform artificial gap filling evaluation
evaluation = gap_filling.artificial_gap_filling(
    swe_df, iterations, artificial_gap_perc, window_days, 
    min_obs_corr, min_obs_cdf, min_corr, min_obs_KGE, flag=0
)

# Plot evaluation results
evaluation_plot = gap_filling.plots_artificial_gap_evaluation(evaluation)
plt.show()

## 7. Save Gap-Filled Data

Save the gap-filled data for use in subsequent analyses.

In [None]:
# Convert gap-filled data back to xarray Dataset
gapfilled_dataset = xr.Dataset.from_dataframe(gapfilled_data)

# Save gap-filled data
gapfilled_dataset.to_netcdf('../data/processed/swe_gapfilled.nc')

# Save data type flags and donor station IDs for reference
data_type_flags.to_csv('../data/processed/data_type_flags.csv')
donor_stationIDs.to_csv('../data/processed/donor_stationIDs.csv')

print("Gap-filled data and metadata saved successfully.")

## 8. Summary

In this notebook, we've demonstrated the gap filling workflow for the Snow Drought Index package. We've loaded SWE data, performed gap filling using quantile mapping, visualized the results, evaluated the performance of the gap filling method, and saved the gap-filled data for use in subsequent analyses.

The workflow uses the following key functions from the `gap_filling` module:
- `qm_gap_filling()` for filling gaps in the data using quantile mapping
- `artificial_gap_filling()` for evaluating the performance of the gap filling method
- `plots_artificial_gap_evaluation()` for visualizing the evaluation results

These functions provide a standardized and reusable way to fill gaps in SWE data for the Snow Drought Index calculations.