# Data Preparation Workflow

This notebook demonstrates the data preparation workflow for the Snow Drought Index package. It covers loading data, preprocessing, station extraction and filtering, and data availability assessment.

In [None]:
# Import required packages
import numpy as np
import pandas as pd
import xarray as xr
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Point

# Import snowdroughtindex package
from snowdroughtindex.core import data_preparation
from snowdroughtindex.utils import visualization, io

## 1. Data Loading

First, we'll load the SWE data and other required datasets.

In [None]:
# Define data paths
swe_path = '../data/input_data/SWE_data.nc'
precip_path = '../data/input_data/precip_data.nc'
basin_path = '../data/input_data/basin_shapefile.shp'

# Load data using the implemented functions
swe_data = data_preparation.load_swe_data(swe_path)
precip_data = data_preparation.load_precip_data(precip_path)
basin_data = data_preparation.load_basin_data(basin_path)

## 2. Data Preprocessing

Next, we'll preprocess the data to prepare it for analysis.

In [None]:
# Preprocess SWE data
swe_processed = data_preparation.preprocess_swe(swe_data)

# Preprocess precipitation data
precip_processed = data_preparation.preprocess_precip(precip_data)

# Convert to GeoDataFrame for spatial operations
swe_gdf = data_preparation.convert_to_geodataframe(swe_processed)

## 3. Station Extraction and Filtering

Now, we'll extract stations within the basin of interest.

In [None]:
# Define basin ID
basin_id = 'example_basin'  # Replace with actual basin ID

# Extract stations within the basin
stations_in_basin, basin_buffer = data_preparation.extract_stations_in_basin(swe_gdf, basin_data, basin_id)

# Filter data for stations in the basin
station_ids = stations_in_basin['station_id'].tolist()
swe_basin = data_preparation.filter_stations(swe_data, station_ids)

## 4. Data Availability Assessment

Finally, we'll assess the availability of data for the stations in the basin.

In [None]:
# Assess data availability
availability = data_preparation.assess_data_availability(swe_basin)

# Visualize data availability (assuming this function exists in the visualization module)
# visualization.plot_data_availability(availability)

# Alternative: Basic visualization using matplotlib
plt.figure(figsize=(10, 6))
availability.plot(cmap='viridis')
plt.colorbar(label='Data Availability (%)')
plt.title('SWE Data Availability by Station')
plt.xlabel('Station ID')
plt.ylabel('Variable')
plt.tight_layout()
plt.show()

## 5. Save Processed Data

Save the processed data for use in subsequent analyses.

In [None]:
# Save processed data (assuming this function exists in the io module)
# io.save_processed_data(swe_basin, '../data/processed/swe_basin_processed.nc')

# Alternative: Save using xarray's built-in methods
swe_basin.to_netcdf('../data/processed/swe_basin_processed.nc')

## 6. Summary

In this notebook, we've demonstrated the data preparation workflow for the Snow Drought Index package. We've loaded data, preprocessed it, extracted stations within the basin of interest, assessed data availability, and saved the processed data for use in subsequent analyses.

The workflow uses the following key functions from the `data_preparation` module:
- `load_swe_data()`, `load_precip_data()`, `load_basin_data()` for data loading
- `preprocess_swe()`, `preprocess_precip()` for data preprocessing
- `convert_to_geodataframe()` for converting data to GeoDataFrame
- `extract_stations_in_basin()` for extracting stations within a basin
- `filter_stations()` for filtering data by station
- `assess_data_availability()` for assessing data availability

These functions provide a standardized and reusable way to prepare data for the Snow Drought Index calculations.