# **Magpie Workflow v0.4**

Magpie, an open-source Python and R model configuration workflow developed in Google Colab, to assist with geospatial data preparation and model construction for Raven.<br>

Users can upload their own data or use the workflows data collection section to access open-source data available for North America, such as topographic, land use, soil, and climatic forcing data. The user-uploaded or workflow-acquired data is then formatted and entered into a series of tools, such as BasinMaker, to discretize the basin into subbasins and hydrologic response units and the RavenR software library, to generate the necessary configuration inputs for a Raven model. The Magpie workflow also runs the Raven executable and generates visualizations of the model outputs. <br>

To goal of Magpie is to provide a user-friendly experience with transparent and reproducible scientific outputs. The workflow significantly reduces model configuration time for experienced modellers and allows for open modification and customization, while also providing a welcoming platform for those new to hydrologic modelling.<br>

For more information, please review the <font color=#5559AB>"Overview"</font> subsection which provides a general overview of each subsection.


New to Google Colab? Check out the following [Overview of Google Colab Features video](https://youtu.be/rpr0PAd_0Dc) that demonstrates some of the Google Colab features used in the Magpie Workflow.

In the case this is your first time using Magpie, check out the [Overview and Python Library Installation video](https://youtu.be/umPQJJstr8A) to see how to finish setting up the Magpie Workflow

If you run into an error, restart your workflow (“Runtime” -> “Restart
Runtime”). After the workflow has been restarted, be sure to run “Workflow
Set up (Mandatory)” -> “Run to Set up Workflow”.

If you run into any issues, please email: hburdett@uwaterloo.ca

## <font color=#5559AB> Overview </font>


#### <font color=grey>Python Library Installation</font>



> The "Python Libraries Installation" section of the Magpie Workflow guide streamlines the installation process by saving specific versions of Python libraries directly to the user's Google Drive. This one-time setup significantly reduces future installation times and guarantees consistent library versions, enhancing reproducibility. Users are advised to execute this subsection only **once** during their initial use of the Magpie Workflow. While this step is optional and time-consuming, it is highly recommended for frequent users. Undertaking this longer initial installation eliminates the need for repetitive library installations with each notebook reconnection, saving considerable time in the long run.


#### <font color=grey>R Library Installation</font>



> The "R Libraries Installation" section is designed to optimize the setup process in the Magpie Workflow by installing specific versions of R libraries onto the user's personal Google Drive. This strategy is aimed at minimizing installation times for subsequent uses of the workflow and ensuring consistent results through reproducibility. The installed R libraries are conveniently accessible directly from the user's drive. It's important to note that this procedure is required only **once**, specifically during the first initiation of the Magpie Workflow. Although this step is optional and initially time-consuming, it is strongly recommended for those who anticipate regular use of the workflow. Completing this one-time, extensive installation process prevents the need for repeated library installations each time the notebook is reconnected, thereby offering significant time savings over repeated uses.



#### <font color=grey>Workflow Set up (Mandatory)</font>

> The "Workflow Set Up" is an essential and mandatory section that needs to be executed every time the notebook is disconnected or restarted. This crucial subsection links the Google Colab Notebook to the user's Google Drive, enabling seamless data transfer and storage between the notebook and the drive. It also takes care of loading the required libraries and establishing the working directories. It's important to remember that if there are any changes to the workflow's name, corresponding updates must be made to the paths for `folder_name`, `model_name`, and `final_output_folder`. This ensures that the workflow accurately references and accesses the correct directories and data within the user's drive.

#### <font color=grey>1.0 Data Collection</font>

>

In section 1.0 of the Magpie, users have complete flexibility in generating inputs for BasinMaker. Each subsection offers two options: uploading personal data or utilizing the Magpie workflow to acquire data from Google Earth Engine. The workflow can provide all necessary data for a model or supplement missing input layers for BasinMaker.

For users opting to upload their data, here are the specific accepted formats for each subsection:
- *Shapefile_study_area*: Polygon shapefile (.shp) representing the study area.
- *DEM*: Digital Elevation Model as a raster file (.tif).
- *Elevation_band*: Polygon shapefile (.shp).
- *Land_cover*: Either a polygon shapefile (.shp) for direct use or a raster file (.tif)/
- *Soil*: Similar to Land Cover, this can be a polygon shapefile (.shp) for direct use or a raster file (.tif).

Please note that it is not mandatory to have all these files; any missing files can be generated within the Magpie Workflow.

Each section clearly outlines the necessary inputs, data sources, and guides users on saving the data to their drive. It's important to remember that the land cover, vegetation, and soil classifications used in the RVP and RVH Raven models are produced in subsection 1f) Classification. Users have the flexibility to define these classifications by manually inputting their choices. Please ensure to include a comma between values and do not include spaces.


#### <font color=grey>2.0 Discretize Basin</font>

> BasinMaker, a tool innovatively created by Han et al. (2021), is designed to construct vector-based hydrological lake-river routing networks with remarkable efficiency and automation. It features versatile discretization options, enabling the representation of any number of lakes within a watershed. To facilitate this process, users have the flexibility to either upload their own files or leverage subsection 1.0 Data Collection in the BasinMaker framework to create the necessary input files. Potential inputs for discretization include DEM, elevation bands, aspect, landcover, and soil layers. Please note, users are not required to have all the layers previously mentioned for the discretization process.

> Please note within the Magpie wqorkflow, it uses BasinMaker light which pulls from a pre-existing routing products. Users have the option to upload subbasin and/or HRU layers derived from BasinMaker full which is run through a GIS interface such as ArcGIS or QGIS. Magpie can then be used at any step to finialize the process of discretization and generating RVH and RVP files.

#### <font color=grey>3.0 Forcing Data</font>

> Raven supports gridded or station-based forcing inputs exclusively in NetCDF format (*.nc files). This subsection offers users access to five different forcing input options that include downloading and formatting the forcing data into a Raven RVT data:
* *3a) CaSPAr Data* - the Canadian Surface Prediction Archive (CaSPAr) is an archive of numerical weather predictions issued by Environment and Climate Change Canada.
* *3b) Gridded Weights Generator* - produces a grid weights file that can be used in the hydrologic modelling framework Raven to handle gridded NetCDF inputs and map them to the subbasins/ HRUs a distributed model is based on
* *3c) DayMet Data* - Daymet provides long-term, continuous, gridded estimates of daily weather and climatology variables by interpolating and extrapolating ground-based observations through statistical modelling techniques; includes daily, monthly, and annual
* *3d) Environment Canada Climate Data* - gauge data - accesses historical climate data, such as temperature, precipitation, degree days, relative humidity, wind speed and direction; includes hourly, daily, and monthly values
* *3e) Format Uploaded Observational Data* - Gauge Data - Users can upload their own meteorological or flow data to be formatted into an RVT file. Each subsection demonstrates how the data should be formatted.
* *3f) Hydrometric data (HYDAT*) - gauge data - historical water level and flow (discharge) data collected at over 7700 hydrometric stations across Canada; includes daily and monthly flow levels.

#### <font color=grey>4.0 Raven Input Files</font>

>This section, written in R, focuses on generating the RVI, RVH, and RVC input files for Raven (RVI, RVH, and RVCfiles) utilizing RavenR, developed by Chlumsky et al. (2022). <br>
<br>
For the RVI, there are several template options available; for more information, please see the Raven Manual. After creating the RVI, the user must open the RVI file (which can be done directly in Google Colab) and modify the simulation details, such as the “:StartDate”, “:Duration”, “:Timestep”.
A .rvp_temp.rvp file generated in BasinMaker is required to produce an RVP file. <br>
<br>
A blank RVC file is produced and can be modified directly in Google Colab to set the initial storage conditions of the simulation. For more information on the RVC file please visit the Raven Manual.

#### <font color=grey>5.0 Run Raven</font>

>The Raven model is ready to be run! Define the model name and ensure that the RVI, RVH, RVC, RVP, and RVT files all have the same name. Additionally, users are required to specify an output folder name. This name is used to store the model's outputs within the “workflow_outputs” folder, organizing the results for easy access and review. For those who prefer to run the Raven model on their local machine or a different server, the workflow includes a convenient feature: `Download Raven Input Files`. When selected, this option compiles all necessary Raven input files and forcing data into a single compressed (zipped) file. This file can then be easily downloaded, offering users the flexibility to run the model in an environment of their choosing.

#### <font color=grey>6.0 Visualization of Raven Model Outputs</font>


> This section will be improved at a later date. Currently, it offers an interactive plot feature for visualizing hydrographs generated by the Raven model. The custom outputs from the model, particularly for each Hydrologic Response Unit (HRU), are formatted to be compatible with Pandas, the Python data analysis library. This compatibility allows for the effective visualization of these outputs as line graphs.

**RavenView**
> RavenView is an online tool for visualizing Raven model output. This subsection generates a zipped file that can be downloaded and then dragged and dropped into RavenView to examine Raven model inputs and outputs more thoroughly.

#### <font color=grey>Run into an Error within the Workflow?</font>

First try restarting the workflow, "Runtime" -> "Restart Runtime"

The cells are run in real time, so if the issue presists and you are familiar with coding, the script can be adjusted and run again.

If an issue presists please feel free to email me at hburdett@uwaterloo.ca with the subject "Magpie Issue"

## <font color=#5559AB>Python Library Installation</font>

Only needs to be run the **first time** the Magpie Workflow is being set-up

Once the cell is done running, restart the workflow ("Runtime" --> "Restart runtime") or “Ctrl+M”

Check out the [Getting Started with Magpie Workflow: Overview and Library Installations](https://youtu.be/umPQJJstr8A) short video for more information

In [None]:
#@markdown <font size="+2"><font color=#5559AB> **Run to Set up Python Librairies for Workflow** </font>

#@markdown To avoid long installation times each time the Magpie Workflow is utilized, the “Python Library Installation” subsection installs a several Python libraries onto the users personal Google Drive. The Python libraries can then be loaded directly from the users drive;
#@markdown be sure to run each of the cells in this section.

#@markdown Estimated execuation time: 12min

import os, sys
from google.colab import drive

#mounting google drive allows us to work with its contents
drive.mount('/content/google_drive')

# define output directory path
main_dir = "/content"
packages_dir = os.path.join(main_dir, 'google_drive', 'MyDrive', 'Packages')
packages_path = os.path.isdir(packages_dir)

if not packages_path:
  os.makedirs(packages_dir)
  print("created  folder: ", packages_dir)
else:
  print(packages_dir, "folder already exists")

# changes path to files
nb_path = '/content/notebooks'
os.symlink(packages_dir, nb_path)
sys.path.insert(0,nb_path)

!pip install --target=$nb_path pandas==1.5.3 &> /dev/null
!pip install --target=$nb_path netCDF4==1.6.3 &> /dev/null

!pip install --target=$nb_path wget &> /dev/null
!pip install --target=$nb_path rasterio==1.3.7 &> /dev/null
!pip install --target=$nb_path rioxarray==0.14.1 &> /dev/null

!pip install --target=$nb_path fiona==1.9.4.post1 &> /dev/null
!pip install --target=$nb_path pytest==7.2.2 &> /dev/null
!pip install --target=$nb_path ipyleaflet==0.17.2 &> /dev/null

!pip install --target=$nb_path scipy==1.10.1 &> /dev/null
!pip install --target=$nb_path joblib==1.2.0 &> /dev/null

!pip install --target=$nb_path ipywidgets==7.7.1 &> /dev/null
!pip install --target=$nb_path shapely==2.0.1 &> /dev/null
!pip install --target=$nb_path pyproj==3.5.0 &> /dev/null

!pip install --target=$nb_path rtree==1.0.1 &> /dev/null
!pip install --target=$nb_path --upgrade --no-cache-dir gdown &> /dev/null
!pip install --target=$nb_path git+https://github.com/python-visualization/folium &> /dev/null

print('Please restart the workflow (Runtime -> Restart session (Ctrl+M))')

## <font color=#5559AB>R Library Installation</font>

Only needs to be run the **first time** the Magpie Workflow is being set-up

Once the cell is done running, restart the workflow ("Runtime" --> "Restart runtime")

Check out the [Getting Started with Magpie Workflow: Overview and Library Installations](https://youtu.be/umPQJJstr8A) short video for more information

In [None]:
#@markdown <font size="+2"><font color=#5559AB> **Run to Set up R Librairies for Workflow** </font>

#@markdown Connect to google drive in order to store libraries

%load_ext rpy2.ipython

from google.colab import drive
drive.mount('/content/google_drive')

import os

# define output directory path
main_dir = "/content"
packages_dir = os.path.join(main_dir, 'google_drive', 'MyDrive', 'R_Packages')
packages_path = os.path.isdir(packages_dir)

if not packages_path:
  os.makedirs(packages_dir)
  print("created  folder: ", packages_dir)
else:
  print(packages_dir, "folder already exists")


In [None]:
#@markdown <font color=grey> **RavenR Library Installation** </font>

#@markdown To avoid long installation times each time the Magpie Workflow is utilized, the “RavenR Library Installation” subsection installs a several R libraries onto the users personal Google Drive. The R libraries can then be loaded directly from the users drive;
#@markdown be sure to run each of the cells in this section.

#@markdown Estimated execuation time: 12.5min

%%R

# Start measuring time
start_time <- Sys.time()

# Install igraph package with a specific version
devtools::install_version("igraph", version = "1.2.6", repos = "https://cloud.r-project.org/", lib = "/content/google_drive/MyDrive/R_Packages")

# Install RavenR package
install.packages("RavenR", lib = "/content/google_drive/MyDrive/R_Packages")

# End measuring time
end_time <- Sys.time()

# Calculate the elapsed time in minutes
elapsed_time_minutes <- as.numeric(difftime(end_time, start_time, units = "mins"))

cat(paste("Installation time:", elapsed_time_minutes, "minutes\n"))



In [None]:
#@markdown <font color=grey> **R Library Installation** </font>

#@markdown Only need to install if using section **Generate DEM - R elevatr Package**

#@markdown Estimated execuation time: 40min

%%R

# Start measuring time
start_time <- Sys.time()

# Define the library path and packages to check
library_path <- "/content/google_drive/MyDrive/R_Packages"
packages <- c("terra", "sf", "slippymath", "raster", "s2", "elevatr")

# Create the library path if it doesn't exist
if (!dir.exists(library_path)) {
  dir.create(library_path, recursive = TRUE)
}

# Function to check and install missing packages
check_and_install <- function(package_name) {
  if (!require(package_name, character.only = TRUE, lib.loc = library_path)) {
    install.packages(package_name, lib = library_path, repos = "http://cran.r-project.org")
    library(package_name, lib.loc = library_path, character.only = TRUE)
  }
}

# Check and install each package
for (pkg in packages) {
  check_and_install(pkg)
}
# End measuring time
end_time <- Sys.time()

# Calculate the elapsed time in minutes
elapsed_time_minutes <- as.numeric(difftime(end_time, start_time, units = "mins"))

cat(paste("Installation time:", elapsed_time_minutes, "minutes\n"))


# **Workflow Set up (Mandatory)**

The "Workflow Set up" section is mandatory as it mounts the Magpie workflow to the user's Google Drive. This allows for data to be transferred between the user's Google Drive and workflow so that when the notebook becomes disconnected, the data is not lost. Additionally, this section loads required packages and define working and output directories.

For more information visit the [Mandatory Workflow Set up video](https://youtu.be/krakrlHOpkU)

**MUST BE RUN EACH TIME THE WORKFLOW BECOMES DISCONNECTED OR IS RESTARTED**

In [None]:
import sys
import warnings
import importlib

#@markdown <font size="+3"><font color=#5559AB> **Run to Set up Workflow** </font> <br>

# connect to google drive
from google.colab import drive
drive.mount('/content/google_drive')

sys.path.append('/content/google_drive/MyDrive/Packages')
warnings.filterwarnings('ignore')

# function to be used throughout workflow to check is packages are installed
def check_and_install_libraries(library_list):
    for library_name in library_list:
        try:
            importlib.import_module(library_name)
            print(f"{library_name} is already installed.")
        except ImportError:
            print(f"{library_name} is not installed. Installing...")
            try:
                import subprocess
                subprocess.check_call(["pip", "install", library_name])
                print(f"{library_name} has been successfully installed.")
            except Exception as e:
                print(f"Failed to install {library_name}. Error: {str(e)}")

# check libraries
libraries_to_check = ["geopandas==0.13.3","rasterio==1.3.7","rasterstats==0.19.0"]
check_and_install_libraries(libraries_to_check)

import pandas as pd
import numpy as np
import os
import wget
import geopandas as gpd
import rioxarray as rxr
import xarray as xr
import rasterio
from glob import glob
import json
import shutil
import sys
import subprocess
import matplotlib.pyplot as plt

#@markdown **Define name of workflow folder:**
folder_name = "Magpie_Workflow" #@param ["Magpie_Workflow"] {allow-input: true}
#@markdown the name of the folder where Magpie is saved

#@markdown ****

#@markdown **Define model name:**
model_name = "petawawa" #@param ["Magpie_Workflow"] {allow-input: true}
#@markdown the model name will be use to name the RVI, RVP, RVH, RVT, RVC files to keep naming consistent

#@markdown ****

#@markdown **Define output  file name:**
final_output_folder = "outputA" #@param ["Magpie_Workflow"] {allow-input: true}
#@markdown define the folder name for Raven model outputs to be saved

# define main output directory
main_dir = os.path.join('/content', 'google_drive', 'MyDrive', folder_name)

print("\n-----------------------------------------------------")
print(f"Main Directory: {main_dir}")
print("-----------------------------------------------------")

# temporary directory
temporary_dir = os.path.join('/content',"temporary_data")

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

if write_to_configuration_file:
  data = {
      "_comment": "------------SET UP MAIN DIRECTORY---------------",
      "main_dir": f"{main_dir}",
      "temporary_dir": f"{temporary_dir}",
      "model_name": f"{model_name}",
      "output_folder_name": f"{final_output_folder}",
  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path, "0_main_setup.json")

  # Writing data to the JSON file
  with open(file_path, 'w') as json_file:
      json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


# **1.0 Data Collection**

This section allows users to collect and format data needed to run the HRU discretization tool in BasinMaker.

Each section is optional, so users can pick and choose which sections best suit their needs.

Users have the option to upload to generate the data within each section

## <font color=#5559AB> 1a) Study Area </font>

This section utilizes BasinMaker, developed by Han et al. (2021), to extract/format a shapefile of your study area to be used throughout the Magpie Workflow. The shapefile can be generated by either the gauge name or latitude and longitude. Don't have that information? No problem, the first section of the workbook helps to identify the coordinates and gauge name.


### <font color=grey> **Upload Study Area** </font>

In [None]:
# check libraries
libraries_to_check = ["folium"]
check_and_install_libraries(libraries_to_check)

import folium
from IPython.display import display

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

def format_shapefile(shp_file_path, main_dir):
    """
    Formats a shapefile by dissolving its boundaries and saving the result as 'studyArea_outline.shp'.

    Parameters:
    - shp_file_path (str): The path to the folder containing the shapefile.
    - main_dir (str): The main directory where the formatted shapefile will be saved.

    Returns:
    None
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('Format Shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Find the name of the shapefile in the given path
    for shp_file in os.listdir(shp_file_path):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    # Read the shapefile into a GeoDataFrame
    studyArea_bound = gpd.read_file(os.path.join(shp_file_path, shp_file_name))

    # Add a temporary column 'dissolve' with a constant value for dissolving boundaries
    studyArea_bound["dissolve"] = 1

    # Extract relevant columns for boundary dissolve
    boundary = studyArea_bound[['dissolve', 'geometry']]

    # Dissolve boundaries based on the 'dissolve' column
    cont_studyArea = boundary.dissolve(by='dissolve')

    # Remove all contents in the folder containing shapefiles
    for f in glob(os.path.join(shp_file_path, '*')):
        os.remove(f)

    # Save the dissolved boundary GeoDataFrame to a new shapefile
    cont_studyArea.to_file(os.path.join(main_dir, 'shapefile', 'studyArea_outline.shp'))

def visualize_shapefile(shp_file_path):
    """
    Visualizes a shapefile on a Folium map and checks/reprojects the coordinate system if needed.

    Parameters:
    - shp_file_path (str): The path to the shapefile.

    Returns:
    None
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('Visualize Shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Read the shapefile into a GeoDataFrame
    shp_boundary = gpd.read_file(os.path.join(shp_file_path))

    # Check the projection of the shapefile
    if shp_boundary.crs != 'EPSG:4326':
        # Reproject the shapefile to EPSG:4326 if the coordinate systems don't match
        shp_lyr_crs = shp_boundary.to_crs(epsg=4326)
        print('Shapefile layer has been reprojected to match EPSG:4326.')
    else:
        shp_lyr_crs = shp_boundary
        print('Coordinate systems match!')

    # Determine the bounds of the provided shapefile
    bounds = shp_lyr_crs.bounds
    west, south, east, north = bounds.loc[0]
    shp_bounds = [south, west]

    # Create a Folium map centered on the shapefile bounds
    map = folium.Map(location=shp_bounds, zoom_start=10)

    # Add the shapefile geometry to the Folium map
    folium.GeoJson(data=shp_boundary["geometry"]).add_to(map)

    # Display the Folium map
    display(map)

    print('\n-----------------------------------------------------------------------------------------------')
    print('Shapefile visualization complete!')
    print('-----------------------------------------------------------------------------------------------')

In [None]:
#@markdown <font color=#5559AB> **Upload Study Area Shapefile** </font> <br>

#@markdown Here users can upload a shapefile (.shp) of their study area

# generate drive folder
shp_dir = os.path.join(main_dir, 'shapefile')
shp_path = os.path.isdir(shp_dir)
if not shp_path:
  os.makedirs(shp_dir)
  print("created  folder: ", shp_dir)
print('\n-----------------------------------------------------------------------------------------------')
print('Upload Shapefile')
print('-----------------------------------------------------------------------------------------------')
print(f'drag-and-drop shapefile into the following folder: {shp_dir}')
input_response = input("Have you uploaded the study area (.shp) file (yes or no): ")
if input_response == 'yes':
  shp_file_path = os.path.join(main_dir, 'shapefile')
  # format shapefile
  format_shapefile(shp_file_path, main_dir)
  # visualize final shapefile
  visualize_shapefile(shp_file_path)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

data = {
    "_comment": "------------1a) STUDY AREA---------------",
    "generate_shapefile": "no",
}

# Specify the file path where you want to save the JSON file
file_path = os.path.join(config_path,"1a_studyArea.json")

# Writing data to the JSON file
with open(file_path, 'w') as json_file:
    json.dump(data, json_file, indent=1)  # indent parameter for pretty formatting (optional)


### <font color=grey> **Generate Study Area** </font>

Check out the short video [Generating Study Area Shapefile in Magpie Workflow](https://youtu.be/AfKEiR6Ms6Y) for more information

In [None]:
# check libraries
libraries_to_check = ["geopy", "folium","simpledbf","branca","time"]
check_and_install_libraries(libraries_to_check)

#!python -m pip install https://github.com/dustming/basinmaker/archive/fix_head_water_riv_segment.zip &> /dev/null
!python -m pip install https://github.com/dustming/basinmaker/archive/master.zip

import time
from geopy.geocoders import Nominatim
import folium
import branca
import simpledbf
from IPython.display import display
from basinmaker import basinmaker
from shapely.geometry import box, Point
from basinmaker.postprocessing.plotleaflet import plot_routing_product_with_ipyleaflet
from basinmaker.postprocessing.downloadpd import Download_Routing_Product_For_One_Gauge
from basinmaker.postprocessing.downloadpdptspurepy import Download_Routing_Product_From_Points_Or_LatLon
from basinmaker.postprocessing.downloadpdptspurepy import Extract_Routing_Product

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>


def studyArea_location(city_name, define_lat, define_lon):
    """
    Collects study area location information using either city name or defined coordinates.

    Parameters:
    - city_name (str): The name of the city to determine coordinates.
    - define_lat (str): Defined latitude (if available).
    - define_lon (str): Defined longitude (if available).

    Returns:
    Tuple of latitude and longitude.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Collect study area location information')
    print('-----------------------------------------------------------------------------------------------')

    # Determine coordinates based on city name using geopy.geocoders library
    if city_name != 'NA' and define_lat == 'NA':
        geolocator = Nominatim(user_agent="Magpie")
        location = geolocator.geocode(city_name)
        print(location)
        found_lat, found_lon = location.latitude, location.longitude
        lat, lon = found_lat, found_lon
        print("Study area coordinates:", lat, lon)
        return lat, lon

    # Use defined coordinates
    elif city_name == 'NA' and define_lat != 'NA':
        lat, lon = float(define_lat), float(define_lon)
        print("Study area coordinates:", lat, lon)
        return lat, lon

    # If neither coordinates nor city name are defined, encourage the user to define a gauge name
    elif city_name and define_lat == "NA":
        lat = None
        print("Define gauge name")

def interactive_gauge_map(lat, lon, main_dir):
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Interactive gauge plot')
    print('-----------------------------------------------------------------------------------------------')

    # Read in CSV with gauge information
    gauge_info = pd.read_csv(os.path.join(main_dir, 'extras', 'subbasin_plots', 'obs_gauges_NA_v2-1.csv'))

    # Format fancy folium pop-up
    def fancy_html(row):
        subId_val, obs_gauge, lat_info, lon_info, sub_Reg = (
            gauge_info['SubId'].iloc[row],
            gauge_info['Obs_NM'].iloc[row],
            gauge_info['POINT_Y'].iloc[row],
            gauge_info['POINT_X'].iloc[row],
            gauge_info['Sub_Region'].iloc[row]
        )
        html = f"""<!DOCTYPE html>
            <html>
            <p>SubID: {subId_val}</p>
            <p>Obs Gauge: {obs_gauge}</p>
            <p>Lat: {lat_info}</p>
            <p>Lon: {lon_info}</p>
            <p>Sub Region: {sub_Reg}</p>
            </html>
            """
        return html

    # Generate map with rectangular guide to assist in identifying the ideal subbasin/gauges to use
    if lat is None:
        print("Define gauge name in cell above")
    else:
        grid_pt = (lat, lon)
        W, E, N, S = grid_pt[1] - 0.5, grid_pt[1] + 0.5, grid_pt[0] + 0.5, grid_pt[0] - 0.5
        upper_left, upper_right, lower_right, lower_left = (N, W), (N, E), (S, E), (S, W)
        line_color, fill_color, weight, text = 'red', 'red', 2, 'text'
        edges = [upper_left, upper_right, lower_right, lower_left]

        map_osm = folium.Map(location=[lat, lon], zoom_start=9)
        folium.LatLngPopup().add_to(map_osm)

        for i in range(len(gauge_info)):
            html = fancy_html(i)
            iframe = branca.element.IFrame(html=html, width=200, height=200)
            popup = folium.Popup(iframe, parse_html=True)
            # Adds markers for each gauge
            folium.Marker([gauge_info['POINT_Y'].iloc[i], gauge_info['POINT_X'].iloc[i]], popup=popup).add_to(map_osm)

        # Displays interactive map
        display(map_osm.add_child(folium.vector_layers.Polygon(locations=edges, color=line_color, fill_color=fill_color,
                                                               weight=weight, popup=folium.Popup(text))))

def download_routing_product_lat_lon(lat, lon, product_name):
    """
    Download BasinMaker routing product using latitude and longitude.

    Parameters:
    - lat (float): Latitude coordinate.
    - lon (float): Longitude coordinate.
    - product_name (str): Name of the routing product to be downloaded.

    Returns:
    - str: Path to the downloaded routing product.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Download routing product')
    print('-----------------------------------------------------------------------------------------------')
    print('product_name: ', product_name)

    # Download routing product using provided coordinates
    product_path = Download_Routing_Product_From_Points_Or_LatLon(
        product_name=product_name, Lat=[lat], Lon=[lon]
    )

    print('Successfully downloaded routing product using lat and lon!')
    return product_path

def download_routing_product_gauge(product_name, gauge_name):
    """
    Download BasinMaker routing product using a gauge name.

    Parameters:
    - product_name (str): Name of the routing product to be downloaded.
    - gauge_name (str): Name of the gauge for which the routing product is downloaded.

    Returns:
    - str: Path to the downloaded routing product.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Download routing product')
    print('-----------------------------------------------------------------------------------------------')

    # Download routing product using the provided gauge name
    subid, product_path = Download_Routing_Product_For_One_Gauge(gauge_name=gauge_name, product_name=product_name)

    print('Successfully downloaded routing product using the gauge name!')
    return product_path


# extracts drainage area
def extract_drainage_area(product_path,most_down_stream_subbasin_ids,
                          most_up_stream_subbasin_ids,temporary_dir,version_num,main_dir):
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Extract drainage area and simplify drainage product')
    print('-----------------------------------------------------------------------------------------------')
    most_down_stream_subbasin_ids_lst = [most_down_stream_subbasin_ids]
    most_up_stream_subbasin_ids_lst = [most_up_stream_subbasin_ids]

    # define the folder path for downloaded and unziped lake river routing prodcut folder, where several GIS files exist
    unzip_routing_product_folder = product_path

    # define another folder that will save the outputs
    folder_product_for_interested_gauges=os.path.join(temporary_dir,f'catchment_extraction_{most_down_stream_subbasin_ids}')
    # if folder doesn's exist
    if not os.path.exists(folder_product_for_interested_gauges):
      os.makedirs(folder_product_for_interested_gauges)

    # Initialize the basinmaker
    start = time.time()
    bm = basinmaker.postprocess()

    # extract subregion of the routing product
    bm.Select_Subregion_Of_Routing_Structure(
        path_output_folder = folder_product_for_interested_gauges,
        routing_product_folder = unzip_routing_product_folder,
        most_down_stream_subbasin_ids=most_down_stream_subbasin_ids_lst,
        most_up_stream_subbasin_ids=most_up_stream_subbasin_ids_lst,               # -1: extract to the most-upstream (headwater) subbasin; other subbasin ID: extract the areas from the outlet to the provided subbasin.
        gis_platform="purepy",
    )
    end = time.time()
    print("This section took  ", end - start, " seconds")

    # read in study area
    studyArea_bound = gpd.read_file(os.path.join(folder_product_for_interested_gauges,f'catchment_without_merging_lakes_{version_num}.shp'))
    studyArea_bound["dissolve"] = 1

    boundary = studyArea_bound[['dissolve', 'geometry']]
    cont_studyArea = boundary.dissolve(by='dissolve')

    # generate drive folder
    shp_dir = os.path.join(main_dir, 'shapefile')
    shp_path = os.path.isdir(shp_dir)
    if not shp_path:
      os.makedirs(shp_dir)
      print("created  folder: ", shp_dir)

    # save to drive
    cont_studyArea.to_file(os.path.join(shp_dir,'studyArea_outline.shp'))

    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Shapefile complete!')
    print('-----------------------------------------------------------------------------------------------')

def format_shapefile(shp_file_path):
  print('\n-----------------------------------------------------------------------------------------------')
  print('( ) Format Shapefile')
  print('-----------------------------------------------------------------------------------------------')
  # find name of shapefile
  for shp_file in os.listdir(os.path.join(shp_file_path)):
      if shp_file.endswith(".shp"):
        shp_file_name = shp_file

  studyArea_bound = gpd.read_file(os.path.join(shp_file_path,shp_file_name))
  studyArea_bound["dissolve"] = 1

  boundary = studyArea_bound[['dissolve', 'geometry']]
  cont_studyArea = boundary.dissolve(by='dissolve')

  # remove contents in folder
  for f in glob (os.path.join(shp_file_path,'*')):
    os.remove(f)

  cont_studyArea.to_file(os.path.join(main_dir, 'shapefile','studyArea_outline.shp'))

def visualize_shapefile(shp_file_path):
  print('\n-----------------------------------------------------------------------------------------------')
  print('( ) Visualize Shapefile')
  print('-----------------------------------------------------------------------------------------------')

  shp_boundary = gpd.read_file(os.path.join(shp_file_path))

  # check projection
  if shp_boundary.crs !=  'EPSG:4326':
    # reproject
    shp_lyr_crs = shp_boundary.to_crs(epsg=4326)
    print('Shapefile layer has been reprojected to match shapefile')
  else:
    shp_lyr_crs = shp_boundary
    print('Coordinate systems match!')

  # determine the boundary of the provided shapefile
  bounds = shp_lyr_crs.bounds
  west, south, east, north = bounds = bounds.loc[0]
  shp_bounds = [south,west]

  map = folium.Map(location=shp_bounds, zoom_start=10)
  folium.GeoJson(data=shp_boundary["geometry"]).add_to(map)
  display(map)

  print('\n-----------------------------------------------------------------------------------------------')
  print('( ) Shapefile complete!')
  print('-----------------------------------------------------------------------------------------------')

def remove_temp_data(main_dir, temporary_dir, product_path):
  print('\n-----------------------------------------------------------------------------------------------')
  print('( ) Remove unnecessary files')
  print('-----------------------------------------------------------------------------------------------')

  # Remove the temporary directory if it exists
  if os.path.exists(temporary_dir):
      shutil.rmtree(temporary_dir)
      print(f"Deleted temporary directory: {temporary_dir}")

  # Remove the product directory if it exists
  if os.path.exists(product_path):
      shutil.rmtree(product_path)
      print(f"Deleted product directory: {product_path}")

  # Remove all .zip files in the current directory
  zip_files_rm = glob("*.zip")
  for files_rm in zip_files_rm:
      os.remove(files_rm)
      print(f"Deleted zip file: {files_rm}")

  # Remove folders that start with "drainage_region" in the main directory
  for item in os.listdir('/content'):
      item_path = os.path.join('/content', item)
      if os.path.isdir(item_path) and item.startswith("drainage_region"):
          shutil.rmtree(item_path)
          print(f"Deleted folder: {item_path}")

In [None]:
#@markdown <font color=#5559AB> **Option A** </font> <br>
#@markdown <font color=grey> example, "Waterloo, ON, Canada" <br>
#@markdown *If you rather enter your own coordinates, enter "NA" into city_name* </font> <br>

city_name = 'Waterloo, ON, Canada' #@param {type:"string"}

#@markdown <font color=#5559AB> **Option B** </font> <br>
#@markdown <font color=grey> example, 43.4652699 -80.5222961 <br>
#@markdown *If you rather use the city name derived coordinates, leave this as "NA" (be sure to include " " around NA)* </font> <br>

define_lat = 'NA' #@param {type:"string"}
define_lon = 'NA' #@param {type:"string"}

# assign lat and lon to either found or identified coordinates
if city_name != 'NA' and define_lat == 'NA':
    lat, lon = studyArea_location(city_name,define_lat,define_lon)
elif city_name == 'NA' and define_lat != 'NA':
    lat,lon = float(define_lat), float(define_lon),
    print("Study area coordinates:", lat,lon)
elif city_name and define_lat == "NA":
    lat = None
    print("Define gauge name in cell below")

In [None]:
#@markdown <font color=#5559AB> **Identifying obervation gauges** </font> <br><br>
#@markdown  Check "visualize" to produce an interactive map that includes an overlaying rectangle to assist in identifying which observation gauges are located near your study area. It also provides information about which subbasins and subbasin region your study area is in. <br><br>
#@markdown Give the map a little time to load even after the cell is done running, it is loading a lot of information and can take a bit. <br><br>

map_visualize = True #@param {type:"boolean"}

# generate map with rectangular guide to assist in identifying the ideal subbasin/gauges to use
if map_visualize == True:
  interactive_map_response = 'yes'
  if lat is None:
    print("Define gauge name in cell above")
  else:
    interactive_gauge_map(lat,lon, main_dir)
else:
  interactive_map_response = 'no'

<font color=#5559AB> **Download routing product for catchment of interest** </font> <br>
The North American Lake-River Routing Product (version 2.1) covers the main drainage regions across the North America (Canada and the USA). The Ontario Lake-River Routing Product(version 1.0) covers the main drainage regions across the Ontario Province, Canada. <br>
Both the NA routing product and the OLRRP provide sub-region-wise product for download. <br>
BasinMaker has two built-in functions for routing product download, which are named Download_Routing_Product_For_One_Gauge and Download_Routing_Product_From_Points_Or_LatLon.<br><br>
This leaves us two options for data download: <br>

<font color=#5559AB> **Option #1**: Provide function Download_Routing_Product_For_One_Gauge with gauge ID. </font> <br>

<font color=#5559AB>**Option #2**: Provide function Download_Routing_Product_From_Points_Or_LatLon with the outlet coordinates (lat-lon in degree decimals). </font><br><br>

*All options are for BasinMaker to find out the subbasin ID of the outlet subbasin. BasinMaker will use subbasin ID to extract the drainage areas.*<br>

The map above or the following link can be used to help identify gauges of interest: </font>
https://wateroffice.ec.gc.ca/search/historical_e.html

In [None]:
#@markdown <font color=#5559AB> **Define the product name** </font><br>
#@markdown [OLRRP](https://uwaterloo-olrrp.shinyapps.io/OLRRP-V2/) to use the Ontario Lake-River Routing Product(version 1.0)<br>
#@markdown [NALRP](https://hydrology.uwaterloo.ca/basinmaker/download_regional.html) to use the North American Lake-River Routing Product (version 2.1)</font><br>

product_name = "NALRP"  #@param ["NALRP", "OLRRP"]


#@markdown <font color= #5559AB> **Option 1:**</font> Download routing product using gauge name<br>
#@markdown <font color=grey>example, "02GA024" </font><br>

gauge_name = "02GA024" #@param {type:"string"}

#@markdown <font color= #5559AB> **Option 2:**</font> Download routing product using lat and lon <br>
#@markdown <font color=grey>*The previously defined latitude and longitudes will be used if the "Define Gauge Name" is set to "NA"* <br>

# downloads BasinMaker routing product with lat and lon

if product_name == "NALRP":
    if gauge_name == "NA":
      lat_val = [float(define_lat)]
      lon_val = [float(define_lon)]
      subid,product_path = Download_Routing_Product_From_Points_Or_LatLon(product_name = product_name,Lat = lat_val,Lon = lon_val)
    else:
      product_path = download_routing_product_gauge(product_name,gauge_name)

if product_name == "OLRRP":
    if gauge_name != "NA":

      print('\n-----------------------------------------------------------------------------------------------')
      print('( ) Download routing product')
      print('-----------------------------------------------------------------------------------------------')
      print('product_name: ', product_name)

      # define path
      product_path = os.path.join('/content','drainage_region_olrrp')

      # Download routing product using provided coordinates
      Extract_Routing_Product(version='v2-0', by='Obs_NM', obs_nm=gauge_name,output_path=product_path)
    else:
      print('Downloading the routing product with lat and lon for OLRRP is currently unavailable')


In [None]:
def map_with_gauges_and_clipped_shapefile(subbasin_path, river_path, gauge_csv_path, subbasin_of_interest=None):
    """
    Display a map with gauges, clipped subbasins, rivers, and the grey outline of the original shapefile.
    """

    import geopandas as gpd
    import pandas as pd
    import folium
    from shapely.geometry import box
    from IPython.display import display

    # Load subbasin and river shapefiles
    try:
        gdf = gpd.read_file(subbasin_path)
        river_gdf = gpd.read_file(river_path)
    except Exception as e:
        print(f"Error loading shapefiles: {e}")
        return

    # Load gauge data
    try:
        gauge_df = pd.read_csv(gauge_csv_path)
        gauge_gdf = gpd.GeoDataFrame(
            gauge_df,
            geometry=gpd.points_from_xy(gauge_df['POINT_X'], gauge_df['POINT_Y']),
            crs="EPSG:4326"
        )
    except Exception as e:
        print(f"Error loading gauge data: {e}")
        return

    # Ensure CRS consistency
    gdf = gdf.to_crs(epsg=4326)
    river_gdf = river_gdf.to_crs(epsg=4326)
    gauge_gdf = gauge_gdf.to_crs(epsg=4326)

    # Define rectangle bounds based on the extent of the subbasin shapefile or a specific subbasin
    if subbasin_of_interest:
        subbasin = gdf[gdf['SubId'] == subbasin_of_interest]
        if not subbasin.empty:
            bounds = subbasin.total_bounds  # [minx, miny, maxx, maxy]
        else:
            print(f"Subbasin of interest {subbasin_of_interest} not found.")
            return
    else:
        bounds = gdf.total_bounds  # [minx, miny, maxx, maxy]

    # Expand bounds for visualization
    minx, miny, maxx, maxy = bounds
    W, E, S, N = minx - 0.2, maxx + 0.2, miny - 0.2, maxy + 0.2
    rectangle_polygon = box(W, S, E, N)

    # Clip shapefiles
    gdf_clipped = gpd.clip(gdf, rectangle_polygon)
    river_clipped = gpd.clip(river_gdf, rectangle_polygon)
    gauge_clipped = gpd.clip(gauge_gdf, rectangle_polygon)

    # Center map at the centroid of the rectangle
    center_lat = (S + N) / 2
    center_lon = (W + E) / 2

    # Create the map
    map_osm = folium.Map(
        location=[center_lat, center_lon],
        zoom_start=12,
        tiles="https://{s}.basemaps.cartocdn.com/light_nolabels/{z}/{x}/{y}{r}.png",
        attr="© OpenStreetMap contributors © CARTO"
    )

    # Add the grey outline of the original shapefile
    folium.GeoJson(
        gdf.to_json(),
        name="Original Subbasin Outline",
        style_function=lambda feature: {'color': 'grey', 'weight': 1, 'fillOpacity': 0},
    ).add_to(map_osm)

    # Add clipped rivers
    folium.GeoJson(
        river_clipped.to_json(),
        name="Clipped Rivers",
        style_function=lambda feature: {'color': 'lightblue', 'weight': 3},
    ).add_to(map_osm)

    # Add subbasins and highlight subbasin of interest
    for _, row in gdf_clipped.iterrows():
        centroid = row.geometry.centroid
        subid = row.get("SubId", "Unknown")

        if subbasin_of_interest and subid == subbasin_of_interest:
            folium.GeoJson(
                row.geometry,
                style_function=lambda feature: {'fillColor': 'yellow', 'color': 'orange', 'weight': 3, 'fillOpacity': 0.5},
                name=f"Highlighted Subbasin {subbasin_of_interest}"
            ).add_to(map_osm)

        # Add subbasin ID
        folium.Marker(
            location=[centroid.y, centroid.x],
            icon=folium.DivIcon(html=f'<div style="font-size: 10px;">{subid}</div>'),
        ).add_to(map_osm)

    # Add clipped gauges
    for _, row in gauge_clipped.iterrows():
        folium.Marker(
            location=[row.geometry.y, row.geometry.x],
            popup=row.get("Obs_NM", "Unnamed Gauge"),
            icon=folium.Icon(color="blue", icon="info-sign"),
        ).add_to(map_osm)

    # Add the clipped subbasin shapefile
    folium.GeoJson(
        gdf_clipped.to_json(),
        name="Clipped Subbasins",
        style_function=lambda feature: {
            'fillColor': 'grey',
            'color': 'grey',
            'weight': 0.5,
            'fillOpacity': 0,
        }
    ).add_to(map_osm)

    # Add red rectangle
    folium.vector_layers.Polygon(
        locations=[(N, W), (N, E), (S, E), (S, W)],
        color='darkred',
        weight=3,
        fill=False
    ).add_to(map_osm)

    # Layer control
    folium.LayerControl().add_to(map_osm)

    # Display the map
    display(map_osm)

#@markdown <font color= #5559AB> **View Subbasin of Interest** </font><br>
#@markdown Check box and enter name of subbasin of interest to view the location of the subbasin <br>
#@markdown <font color=grey>example, 3086525 </font><br>

view_subbasin = True # @param {type:"boolean"}

if view_subbasin:
    # define  subbasin of interest
    subbasin_of_interest = 3086525 #@param

    if product_name == 'OLRRP':
        subbasin_path = '/content/drainage_region_olrrp/finalcat_info_v2-0.shp'
        river_path = '/content/drainage_region_olrrp/finalcat_info_riv_v2-0.shp'
    elif product_name == 'NALRP':

        search_prefix = "drainage_region_"
        base_dir = "/content"

        # Find the first matching folder
        found_folder = next(
            (os.path.join(root, dir_name)
            for root, dirs, _ in os.walk(base_dir)
            for dir_name in dirs if dir_name.startswith(search_prefix)),
            None
        )

        subbasin_path = os.path.join(found_folder, 'finalcat_info_v2-1.shp')
        river_path = os.path.join(found_folder, 'finalcat_info_riv_v2-1.shp')

    # path to gauge ID
    gauge_csv_path = os.path.join(main_dir, "extras", "subbasin_plots","obs_gauges_NA_v2-1.csv")  # Path to gauge CSV file
    map_with_gauges_and_clipped_shapefile(
        subbasin_path=subbasin_path,
        river_path=river_path,
        gauge_csv_path=gauge_csv_path,
        subbasin_of_interest=subbasin_of_interest  # Optional
    )

In [None]:
#@markdown <font color= #5559AB> **Extract drainage area and simplify drainage product** </font><br>
#@markdown BasinMaker needs the ID of subbasin (subId) which the gauge is situated in <br>
#@markdown <font color=grey>example, 3086525 </font><br>

most_down_stream_subbasin_ids = 3086525 #@param

most_up_stream_subbasin_ids = -1 #@param

if product_name == "NALRP":
  version_num = "v2-1"
if product_name == "OLRRP":
  version_num = "v2-0"

extract_drainage_area(product_path,most_down_stream_subbasin_ids,
                          most_up_stream_subbasin_ids,temporary_dir,version_num,main_dir)


In [None]:
#@markdown <font color= grey> **Visualize shapefile of study area**</font><br>
#@markdown produces an interactive map for users to visualize the generated shapefile boundary area

shp_file_path = os.path.join(main_dir, 'shapefile')
visualize_shapefile(shp_file_path)

In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download

# delete temorary folder
remove_temp_data(main_dir, temporary_dir,product_path)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

if write_to_configuration_file:
  data = {
      "_comment": "------------1a) STUDY AREA---------------",
      "generate_DEM": "yes",
      "define_lat_shp": f"{define_lat}",
      "define_lon_shp": f"{define_lon}",
      "city_name_shp": f"{city_name}",

      "interactive_map_response": f"{interactive_map_response}",

      "product_name_shp": f"{product_name}",
      "gauge_name_shp": f"{gauge_name}",

      "version_num_shp": f"{version_num}",

      "most_down_stream_subbasin_ids_shp": most_down_stream_subbasin_ids,
      "most_up_stream_subbasin_ids_shp": most_up_stream_subbasin_ids,
  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path,"1a_studyArea.json")

  # Writing data to the JSON file
  with open(file_path, 'w') as json_file:
      json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


**References**

Han, M., H. Shen, B. A. Tolson, J. R. Craig, J. Mai, S. Lin, N. B. Basu, F. Awol. (2021). BasinMaker 3.0: a GIS toolbox for distributed watershed delineation of complex lake-river routing networks. Environmental Modelling and Software.

## <font color=#5559AB> 1b) DEM  </font>

Magpie utilizes the [MERIT DEM](https://developers.google.com/earth-engine/datasets/catalog/MERIT_DEM_v1_0_3) available through Google Earth Engine. MERIT DEM is a high-accuracy global DEM at 3 arcs second resolution (~90 m at the equator) produced by eliminating major error components from existing DEMs (NASA SRTM3 DEM, JAXA AW3D DEM, Viewfinder Panoramas DEM).

In this section, the DEM layer is downloaded, clipped, and saved to the mounted Google Drive.

Only a <font color=red>shapefile of the study area</font> is required to run this subsection.


### <font color=grey> **Upload DEM** </font>

In [None]:
#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

def check_projection(shp_file_path, main_dir,temporary_dir):
  print('\n-----------------------------------------------------------------------------------------------')
  print('( ) Check projection of shapefile')
  print('-----------------------------------------------------------------------------------------------')
  # find name of shapefile
  for shp_file in os.listdir(os.path.join(shp_file_path)):
      if shp_file.endswith(".shp"):
        shp_file_name = shp_file
  print('Shapefile name: ', shp_file_name)
  print('Shapefile path: ', os.path.join(main_dir, "shapefile",shp_file_name))
  shp_lyr_check = gpd.read_file(os.path.join(main_dir, "shapefile",shp_file_name))
  print('Shapefile CRS: ', shp_lyr_check.crs)

  if shp_lyr_check.crs !=  'EPSG:4326':
    # reproject
    shp_lyr_crs = shp_lyr_check.to_crs(epsg=4326)
    shp_lyr_crs.to_file(os.path.join(main_dir, "shapefile",shp_file))
    print('Shapefile layer has been reprojected to match shapefile')
  else:
    shp_lyr_crs = shp_lyr_check
    print('Coordinate systems match!')

def format_and_visualize_dem(shp_file_path, temporary_dir, main_dir):
    """
    Clip DEM to the study area, reproject if necessary, and save the clipped DEM to drive.

    Args:
        shp_file_path (str): Path to the directory containing the shapefile.
        temporary_dir (str): Temporary directory path.
        main_dir (str): Main directory path.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Clip DEM to study area')
    print('-----------------------------------------------------------------------------------------------')

    # Find name of shapefile
    for shp_file in os.listdir(os.path.join(shp_file_path)):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    # Find name of DEM file
    for dem_file in os.listdir(os.path.join(os.path.join(temporary_dir, 'DEM'))):
        if dem_file.endswith(".tif"):
            dem_file_name = dem_file

    # Open raster DEM layer
    dem_lyr = rxr.open_rasterio(os.path.join(temporary_dir, 'DEM', dem_file_name), masked=True).squeeze()

    # Load shapefile (crop extent)
    crop_extent = gpd.read_file(os.path.join(main_dir, 'shapefile', shp_file_name))

    print('Shapefile CRS:', crop_extent.crs)
    print('DEM CRS:', dem_lyr.rio.crs)

    # Check if CRS match and reproject if necessary
    if crop_extent.crs != dem_lyr.rio.crs:
        dem_lyr = dem_lyr.rio.reproject(crop_extent.crs)
        print('\nDEM layer has been reprojected to match shapefile')
    else:
        print('\nCoordinate systems match')

    # Open crop extent (study area extent boundary)
    crop_extent_buffered = crop_extent.buffer(0.001)

    # Clip the DEM layer
    lidar_clipped = dem_lyr.rio.clip(crop_extent_buffered, crop_extent_buffered.crs)
    print('\nDEM layer has been clipped')

    # Define output directory path
    dem_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM')

    if not os.path.exists(dem_output_dir):
        os.makedirs(dem_output_dir)

    # Save clipped DEM layer to drive
    path_to_tif_file = os.path.join(dem_output_dir, 'dem.tif')
    lidar_clipped.rio.to_raster(path_to_tif_file)
    print('\nDEM layer has been saved to drive')


def visualize_dem(DEM_visualize, Aspect_visualize, save_aspect, main_dir, temporary_dir):
    """
    Visualize DEM layer, produce slope and aspect layers, and save them to drive.

    Args:
        DEM_visualize (str): 'yes' to visualize DEM layer, 'no' otherwise.
        Slope_visualize (str): 'yes' to visualize slope layer, 'no' otherwise.
        Aspect_visualize (str): 'yes' to visualize aspect layer, 'no' otherwise.
        save_shp_of_aspect (str): 'yes' to save shapefile of aspect layer, 'no' otherwise.
        main_dir (str): Main directory path.
        temporary_dir (str): Temporary directory path.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Visualize DEM layer and produce slope and aspect layers')
    print('-----------------------------------------------------------------------------------------------')

    # Define output directory path for slope and aspect
    slope_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Slope')
    aspect_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Aspect')

    # Create output directories if they don't exist
    for output_dir in [slope_output_dir, aspect_output_dir]:
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

    if DEM_visualize == 'yes':
        for dem_file in os.listdir(os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM')):
            if dem_file.endswith(".tif"):
                print('---------------------------------------------- DEM Layer ----------------------------------------------')
                # Open and visualize DEM layer
                dem_lyr = rxr.open_rasterio(os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM', dem_file),
                                            masked=True).squeeze()
                f, ax = plt.subplots(figsize=(8, 10))
                dem_lyr.plot(ax=ax)
                ax.set_axis_off()
                plt.show()

                # Calculate and print statistics of DEM layer
                dem_min = int(dem_lyr.min())
                dem_max = int(dem_lyr.max())
                dem_mean = int(dem_lyr.mean())
                print('\nMinimum Elevation:', dem_min)
                print('Maximum Elevation:', dem_max)
                print('Mean Elevation:', dem_mean)

    def calculate_aspect(dem_file, output_path):
        """
        Calculate the aspect of a DEM and save the aspect raster.

        :param dem_file: Path to the input DEM file.
        :param output_path: Path to save the aspect output.
        """
        with rasterio.open(dem_file) as src:
            # Read DEM as a 2D numpy array
            dem = src.read(1, resampling=Resampling.bilinear)
            transform = src.transform
            no_data = src.nodata or -9999  # Default to -9999 if no_data is None

            # Handle no_data values
            dem[dem == no_data] = np.nan

            # Calculate gradients
            dx, dy = np.gradient(dem, np.abs(transform[0]), np.abs(transform[4]))

            # Calculate aspect
            aspect = np.arctan2(dy, -dx)  # Aspect in radians
            aspect = np.degrees(aspect)  # Convert to degrees
            aspect = (aspect + 360) % 360  # Normalize to [0, 360]

            # Save aspect as a GeoTIFF
            profile = src.profile
            profile.update(dtype=rasterio.float32, count=1, nodata=np.nan)
            with rasterio.open(output_path, 'w', **profile) as dst:
                dst.write(aspect.astype(rasterio.float32), 1)

        print(f"Aspect saved to {output_path}")
        return aspect

    if Aspect_visualize == 'yes':
        dem_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM')
        aspect_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Aspect')
        os.makedirs(aspect_output_dir, exist_ok=True)

        for dem_file in os.listdir(dem_dir):
            if dem_file.endswith(".tif"):
                print('---------------------------------------------- Aspect Layer ----------------------------------------------')
                dem_path = os.path.join(dem_dir, dem_file)
                aspect_path = os.path.join(aspect_output_dir, "aspect.tif")

                # Calculate aspect
                aspect_array = calculate_aspect(dem_path, aspect_path)

                # Visualize aspect
                plt.figure(figsize=(8, 9))
                plt.imshow(aspect_array, cmap='jet', extent=rasterio.open(dem_path).bounds)
                plt.colorbar(label="Aspect (degrees)")
                plt.title("Aspect Layer")
                plt.xlabel("Longitude")
                plt.ylabel("Latitude")
                plt.show()

                # Calculate statistics
                aspect_min = int(np.nanmin(aspect_array))
                aspect_max = int(np.nanmax(aspect_array))
                aspect_mean = int(np.nanmean(aspect_array))
                print('\nMinimum Aspect:', aspect_min)
                print('Maximum Aspect:', aspect_max)
                print('Mean Aspect:', aspect_mean)

    if save_aspect == "yes":
        # Save shapefile of aspect layer
        aspect_rast = os.path.join(aspect_output_dir, 'aspect.tif')
        aspect_shp = os.path.join(aspect_output_dir, 'aspect.shp')

        # Use gdal_polygonize to convert raster to vector (shapefile)
        with open(os.path.join(temporary_dir, 'polygon.sh'), 'w') as f3:
            print(f'gdal_polygonize.py "{aspect_rast}" "{aspect_shp}" -b 1 -f "ESRI Shapefile"', file=f3)

        sh_file = os.path.join(temporary_dir, 'polygon.sh')
        subprocess.run(['bash', sh_file])

        # Open and format the shapefile
        aspect_shp_gdf = gpd.read_file(aspect_shp)
        aspect_shp_gdf["O_ID_2"] = list(range(1, (len(aspect_shp_gdf.index) + 1)))
        aspect_shp_gdf["Aspect"] = aspect_shp_gdf.DN

        # Save the final elevation band shapefile to drive
        aspect_shp_gdf.to_file(aspect_shp)
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) DEM Complete!')
    print('-----------------------------------------------------------------------------------------------')

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))


In [None]:
#@markdown <font color=grey> **Upload DEM File** </font>

#@markdown drag-and-drop the DEM file into the specified folder

dem_temp_dir = os.path.join(temporary_dir,'DEM')
if not os.path.exists(dem_temp_dir):
  os.makedirs(dem_temp_dir)
print('\n-----------------------------------------------------------------------------------------------')
print('( ) Upload DEM')
print('-----------------------------------------------------------------------------------------------')
print(f'drag-and-drop DEM file into following folder: {dem_temp_dir}')
response = input("Have you uploaded the DEM (.tif) file (yes or no): ")
if response == "yes":
  shp_file_path = os.path.join(main_dir, 'shapefile')
  # check projection
  check_projection(shp_file_path, main_dir,temporary_dir)
  # format and visualize
  format_and_visualize_dem(shp_file_path,temporary_dir,main_dir)

In [None]:
#@markdown <font color=#5559AB> **Check the box to visualize clipped DEM layer** </font>

#@markdown Slope and aspect are computed from the DEM layer to provide users additional information about their study area </font> <br>
DEM_visualize = True #@param {type:"boolean"}
Aspect_visualize = True #@param {type:"boolean"}

#@markdown Check box to save a shapefile layer of aspect to drive  </font> <br>
save_aspect = False #@param {type:"boolean"}

if DEM_visualize == True:
  DEM_visualize = "yes"
if Aspect_visualize == True:
  Aspect_visualize = "yes"
if save_aspect == True:
  save_aspect = "yes"

visualize_dem(DEM_visualize, Aspect_visualize, save_aspect, main_dir, temporary_dir)


In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download


# delete temorary folder
remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

data = {
    "_comment": "------------ 1b) DEM ---------------",
    "generate_DEM": "no",
    "upload_DEM": "yes",
    "DEM_visualize": f"{DEM_visualize}",
    "Aspect_visualize": f"{Aspect_visualize}",
    "save_aspect": f"{save_aspect}",
}

# Specify the file path where you want to save the JSON file
file_path = os.path.join(config_path,"1b_dem.json")

# Writing data to the JSON file
with open(file_path, 'w') as json_file:
    json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


### <font color=grey> **Generate DEM - Google Earth Engine** </font>

Check out the short video [Creating DEM Layer in Magpie Workflow](https://youtu.be/S8rh7aovu_0) for more information

In [None]:
import ee
import requests
from rasterio.enums import Resampling
#import richdem as rd

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

# Trigger the authentication flow.
service_account = 'magpie-developer@magpie-id-409519.iam.gserviceaccount.com'
credentials = ee.ServiceAccountCredentials(service_account, os.path.join(main_dir,'extras','magpie-key.json'))

# Initialize the library.
ee.Initialize(credentials)

def check_projection(shp_file_path, main_dir,temporary_dir):
  print('\n-----------------------------------------------------------------------------------------------')
  print('( ) Check projection of shapefile')
  print('-----------------------------------------------------------------------------------------------')
  # find name of shapefile
  for shp_file in os.listdir(os.path.join(shp_file_path)):
      if shp_file.endswith(".shp"):
        shp_file_name = shp_file
  print('Shapefile name: ', shp_file_name)
  print('Shapefile path: ', os.path.join(main_dir, "shapefile",shp_file_name))
  shp_lyr_check = gpd.read_file(os.path.join(main_dir, "shapefile",shp_file_name))
  print('Shapefile CRS: ', shp_lyr_check.crs)

  if shp_lyr_check.crs !=  'EPSG:4326':
    # reproject
    shp_lyr_crs = shp_lyr_check.to_crs(epsg=4326)
    shp_lyr_crs.to_file(os.path.join(main_dir, "shapefile",shp_file_name))
    print('Shapefile layer has been reprojected to match shapefile')
  else:
    shp_lyr_crs = shp_lyr_check
    print('Coordinate systems match!')

def download_DEM(shp_file_path, data_source, band_name, scale, main_dir, temporary_dir):
    """
    Download a Digital Elevation Model (DEM) based on the provided shapefile and parameters.

    Args:
        shp_file_path (str): Path to the directory containing the shapefile.
        data_source: Earth Engine data source.
        band_name (str): Name of the band in the DEM.
        scale (float): Scale for the DEM download.
        main_dir (str): Main directory path.
        temporary_dir (str): Temporary directory path.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Download DEM')
    print('-----------------------------------------------------------------------------------------------')

    # Define buffer size
    buffer_size = 0.01

    # Find name of shapefile
    shp_file_path = os.path.join(main_dir, 'shapefile')
    for shp_file in os.listdir(os.path.join(shp_file_path)):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    # Determine the boundary of the provided shapefile
    bounds = gpd.read_file(os.path.join(main_dir, 'shapefile', shp_file_name)).bounds
    west, south, east, north = bounds = bounds.loc[0]
    west -= buffer_size * (east - west)
    east += buffer_size * (east - west)
    south -= buffer_size * (north - south)
    north += buffer_size * (north - south)

    img = ee.Image(data_source)
    region = ee.Geometry.BBox(west, south, east, north)

    # Multi-band GeoTIFF file.
    url = img.getDownloadUrl({
        'bands': [band_name],
        'region': region,
        'scale': scale,
        'format': 'GEO_TIFF'
    })

    # Define output directory for DEM
    dem_dir = os.path.join(temporary_dir, 'DEM')
    if not os.path.exists(dem_dir):
        os.makedirs(dem_dir)

    # Download DEM using requests
    response = requests.get(url)
    with open(os.path.join(dem_dir, 'dem.tif'), 'wb') as fd:
        fd.write(response.content)

def format_and_visualize_dem(shp_file_path, temporary_dir, main_dir):
    """
    Clip DEM to the study area, reproject if necessary, and save the clipped DEM to drive.

    Args:
        shp_file_path (str): Path to the directory containing the shapefile.
        temporary_dir (str): Temporary directory path.
        main_dir (str): Main directory path.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Clip DEM to study area')
    print('-----------------------------------------------------------------------------------------------')

    # Find name of shapefile
    for shp_file in os.listdir(os.path.join(shp_file_path)):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    # Find name of DEM file
    for dem_file in os.listdir(os.path.join(os.path.join(temporary_dir, 'DEM'))):
        if dem_file.endswith(".tif"):
            dem_file_name = dem_file

    # Open raster DEM layer
    dem_lyr = rxr.open_rasterio(os.path.join(temporary_dir, 'DEM', dem_file_name), masked=True).squeeze()

    # Load shapefile (crop extent)
    crop_extent = gpd.read_file(os.path.join(main_dir, 'shapefile', shp_file_name))

    print('Shapefile CRS:', crop_extent.crs)
    print('DEM CRS:', dem_lyr.rio.crs)

    # Check if CRS match and reproject if necessary
    if crop_extent.crs != dem_lyr.rio.crs:
        dem_lyr = dem_lyr.rio.reproject(crop_extent.crs)
        print('\nDEM layer has been reprojected to match shapefile')
    else:
        print('\nCoordinate systems match')

    # Open crop extent (study area extent boundary)
    crop_extent_buffered = crop_extent.buffer(0.001)

    # Clip the DEM layer
    lidar_clipped = dem_lyr.rio.clip(crop_extent_buffered, crop_extent_buffered.crs)
    print('\nDEM layer has been clipped')

    # Define output directory path
    dem_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM')

    if not os.path.exists(dem_output_dir):
        os.makedirs(dem_output_dir)

    # Save clipped DEM layer to drive
    path_to_tif_file = os.path.join(dem_output_dir, 'dem.tif')
    lidar_clipped.rio.to_raster(path_to_tif_file)
    print('\nDEM layer has been saved to drive')


def visualize_dem(DEM_visualize, Aspect_visualize, save_aspect, main_dir, temporary_dir):
    """
    Visualize DEM layer, produce slope and aspect layers, and save them to drive.

    Args:
        DEM_visualize (str): 'yes' to visualize DEM layer, 'no' otherwise.
        Aspect_visualize (str): 'yes' to visualize aspect layer, 'no' otherwise.
        save_shp_of_aspect (str): 'yes' to save shapefile of aspect layer, 'no' otherwise.
        main_dir (str): Main directory path.
        temporary_dir (str): Temporary directory path.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Visualize DEM layer and produce slope and aspect layers')
    print('-----------------------------------------------------------------------------------------------')

    # Define output directory path for slope and aspect
    slope_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Slope')
    aspect_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Aspect')

    # Create output directories if they don't exist
    for output_dir in [slope_output_dir, aspect_output_dir]:
        if not os.path.exists(output_dir):
            os.makedirs(output_dir)

    if DEM_visualize == 'yes':
        for dem_file in os.listdir(os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM')):
            if dem_file.endswith(".tif"):
                print('---------------------------------------------- DEM Layer ----------------------------------------------')
                # Open and visualize DEM layer
                dem_lyr = rxr.open_rasterio(os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM', dem_file),
                                            masked=True).squeeze()
                f, ax = plt.subplots()
                dem_lyr.plot(ax=ax)
                ax.set_axis_off()
                plt.show()

                # Calculate and print statistics of DEM layer
                dem_min = int(dem_lyr.min())
                dem_max = int(dem_lyr.max())
                dem_mean = int(dem_lyr.mean())
                print('\nMinimum Elevation:', dem_min)
                print('Maximum Elevation:', dem_max)
                print('Mean Elevation:', dem_mean)

    def calculate_aspect(dem_file, output_path):
        """
        Calculate the aspect of a DEM and save the aspect raster.

        :param dem_file: Path to the input DEM file.
        :param output_path: Path to save the aspect output.
        """
        with rasterio.open(dem_file) as src:
            # Read DEM as a 2D numpy array
            dem = src.read(1, resampling=Resampling.bilinear)
            transform = src.transform
            no_data = src.nodata or -9999  # Default to -9999 if no_data is None

            # Handle no_data values
            dem[dem == no_data] = np.nan

            # Calculate gradients
            dx, dy = np.gradient(dem, np.abs(transform[0]), np.abs(transform[4]))

            # Calculate aspect
            aspect = np.arctan2(dy, -dx)  # Aspect in radians
            aspect = np.degrees(aspect)  # Convert to degrees
            aspect = (aspect + 360) % 360  # Normalize to [0, 360]

            # Save aspect as a GeoTIFF
            profile = src.profile
            profile.update(dtype=rasterio.float32, count=1, nodata=np.nan)
            with rasterio.open(output_path, 'w', **profile) as dst:
                dst.write(aspect.astype(rasterio.float32), 1)

        print(f"Aspect saved to {output_path}")
        return aspect

    if Aspect_visualize == 'yes':
        dem_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM')
        aspect_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Aspect')
        os.makedirs(aspect_output_dir, exist_ok=True)

        for dem_file in os.listdir(dem_dir):
            if dem_file.endswith(".tif"):
                print('---------------------------------------------- Aspect Layer ----------------------------------------------')
                dem_path = os.path.join(dem_dir, dem_file)
                aspect_path = os.path.join(aspect_output_dir, "aspect.tif")

                # Calculate aspect
                aspect_array = calculate_aspect(dem_path, aspect_path)

                # Visualize aspect
                plt.figure(figsize=(8, 9))
                plt.imshow(aspect_array, cmap='jet', extent=rasterio.open(dem_path).bounds)
                plt.colorbar(label="Aspect (degrees)")
                plt.title("Aspect Layer")
                plt.xlabel("Longitude")
                plt.ylabel("Latitude")
                plt.show()

                # Calculate statistics
                aspect_min = int(np.nanmin(aspect_array))
                aspect_max = int(np.nanmax(aspect_array))
                aspect_mean = int(np.nanmean(aspect_array))
                print('\nMinimum Aspect:', aspect_min)
                print('Maximum Aspect:', aspect_max)
                print('Mean Aspect:', aspect_mean)

    if save_aspect == "yes":
        # Save shapefile of aspect layer
        aspect_rast = os.path.join(aspect_output_dir, 'aspect.tif')
        aspect_shp = os.path.join(aspect_output_dir, 'aspect.shp')

        # Use gdal_polygonize to convert raster to vector (shapefile)
        with open(os.path.join(temporary_dir, 'polygon.sh'), 'w') as f3:
            print(f'gdal_polygonize.py "{aspect_rast}" "{aspect_shp}" -b 1 -f "ESRI Shapefile"', file=f3)

        sh_file = os.path.join(temporary_dir, 'polygon.sh')
        subprocess.run(['bash', sh_file])

        # Open and format the shapefile
        aspect_shp_gdf = gpd.read_file(aspect_shp)
        aspect_shp_gdf["O_ID_2"] = list(range(1, (len(aspect_shp_gdf.index) + 1)))
        aspect_shp_gdf["Aspect"] = aspect_shp_gdf.DN

        # Save the final elevation band shapefile to drive
        aspect_shp_gdf.to_file(aspect_shp)
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) DEM Complete!')
    print('-----------------------------------------------------------------------------------------------')

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))


In [None]:
#@markdown <font color=#5559AB> **Download DEM data** </font>

#@markdown By default, Magpie is set up to download the MERIT DEM. However, by changing the data source, band name, and scale (information available in Earth Engine Data Catalog) users can download other DEM datasets available on Google Earth Engine

#@markdown Please refer to the [Earth Engine Data Catalog](https://developers.google.com/earth-engine/datasets/catalog) to view other data source options

#@markdown If Earth Engine cannot download the extent of your study area, increase size of scale

#@markdown <font color=grey> for example, adjust scale to "90"

# shapefile path
shp_file_path = os.path.join(main_dir, 'shapefile')

data_source = "MERIT/DEM/v1_0_3" #@param {type:"string"}

band_name = "dem" #@param {type:"string"}

scale = 90 #@param

# check projection
check_projection(shp_file_path, main_dir,temporary_dir)
# download DEM
download_DEM(shp_file_path, data_source, band_name, scale, main_dir, temporary_dir)

In [None]:
#@markdown <font color=grey> **Clip DEM layer to study area** </font>

format_and_visualize_dem(shp_file_path,temporary_dir,main_dir)

In [None]:
#@markdown <font color=#5559AB> **Check the box to visualize clipped DEM layer** </font>

#@markdown Slope and aspect are computed from the DEM layer to provide users additional information about their study area </font> <br>
DEM_visualize = True #@param {type:"boolean"}
Aspect_visualize = True #@param {type:"boolean"}

#@markdown Check box to save a shapefile layer of aspect to drive  </font> <br>
save_aspect = False #@param {type:"boolean"}

if DEM_visualize == True:
  DEM_visualize = "yes"
else:
  DEM_visualize = "no"
if Aspect_visualize == True:
  Aspect_visualize = "yes"
else:
  Aspect_visualize = "no"
if save_aspect == True:
  save_aspect = "yes"
else:
  save_aspect = "no"

visualize_dem(DEM_visualize, Aspect_visualize, save_aspect, main_dir, temporary_dir)

In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download

# delete temorary folder
remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

if write_to_configuration_file == True:
  data = {
      "_comment": "------------ 1b) DEM ---------------",
      "generate_DEM": "yes",
      "data_source_dem": f"{data_source}",
      "band_name_dem": f"{band_name}",
      "scale_dem": f"{scale}",
      "DEM_visualize": f"{DEM_visualize}",
      "Aspect_visualize": f"{Aspect_visualize}",
      "save_aspect": f"{save_aspect}",
  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path,"1b_dem.json")

  # Writing data to the JSON file
  with open(file_path, 'w') as json_file:
      json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


**References**

Barnes, Richard. 2016. RichDEM: Terrain Analysis Software. http://github.com/r-barnes/richdem

C Barnes, R., Lehman, C., Mulla, D., 2014. Priority-flood: An optimal depression-filling and watershed-labeling algorithm for digital elevation models. Computers & Geosciences 62, 117–127. doi:10.1016/j.cageo.2013.04.024

Yamazaki, D., Ikeshima, D., Tawatari, R., Yamaguchi, T., O'Loughlin, F., Neal, J. C., ... & Bates, P. D. (2017). A high‐accuracy map of global terrain elevations. Geophysical Research Letters, 44(11), 5844-5853.

### <font color=grey> **Generate DEM - R elevatr Package** </font>

Check out the short video [Creating DEM Layer in Magpie Workflow](https://youtu.be/S8rh7aovu_0) for more information

In [None]:
#@markdown <font color=#5559AB> **Check the box to generate/visualize the following layers** </font>

#@markdown Slope and aspect are computed from the DEM layer to provide users additional information about their study area </font> <br>
DEM_visualize = True #@param {type:"boolean"}
Slope_visualize = True #@param {type:"boolean"}
Aspect_visualize = True #@param {type:"boolean"}

#@markdown Check box to save a shapefile layer of aspect to drive  </font> <br>
save_aspect = False #@param {type:"boolean"}

# check libraries
libraries_to_check = ["rpy2==3.5.1","requests"]
check_and_install_libraries(libraries_to_check)

%load_ext rpy2.ipython

import requests

# File path for output
output_file = "/content/variable_info.txt"

# Writing to a file
with open(output_file, "w") as file:
    file.write("main_dir = '{}'\n".format(main_dir))
    file.write("DEM_visualize = '{}'\n".format(DEM_visualize))
    file.write("Slope_visualize = '{}'\n".format(Slope_visualize))
    file.write("Aspect_visualize = '{}'\n".format(Aspect_visualize))
    file.write("save_aspect_shp = '{}'\n".format(save_aspect))

In [None]:
#@markdown <font color=grey> **Install and Load packages**

%%R

# Define the library path and packages to check
library_path <- "/content/google_drive/MyDrive/R_Packages"
packages <- c("terra", "sf", "slippymath", "raster", "s2", "elevatr")

# Create the library path if it doesn't exist
if (!dir.exists(library_path)) {
  dir.create(library_path, recursive = TRUE)
}

# Function to check and install missing packages
check_and_install <- function(package_name) {
  if (!require(package_name, character.only = TRUE, lib.loc = library_path)) {
    install.packages(package_name, lib = library_path, repos = "http://cran.r-project.org")
    library(package_name, lib.loc = library_path, character.only = TRUE)
  }
}

# Check and install each package
for (pkg in packages) {
  check_and_install(pkg)
}

# Example of loading a specific library after the check
library(elevatr, lib.loc = library_path)
library(terra, lib.loc = library_path)


In [None]:
#@markdown <font color=grey> **Set paths for variables**

%%R

# Define the file path
file_path <- "/content/variable_info.txt"

# Read the file
lines <- readLines(file_path)

# Initialize variables
main_dir <- NULL
DEM_visualize <- NULL
Slope_visualize <- NULL
Aspect_visualize <- NULL
save_aspect_shp <- NULL

# Extract values from the lines
for (line in lines) {
  if (grepl("main_dir", line)) {
    main_dir <- sub("main_dir = '(.*)'", "\\1", line)
  } else if (grepl("DEM_visualize", line)) {
    DEM_visualize <- sub("DEM_visualize = '(.*)'", "\\1", line)
  } else if (grepl("Slope_visualize", line)) {
    Slope_visualize <- sub("Slope_visualize = '(.*)'", "\\1", line)
  } else if (grepl("Aspect_visualize", line)) {
    Aspect_visualize <- sub("Aspect_visualize = '(.*)'", "\\1", line)
  } else if (grepl("save_aspect_shp", line)) {
    save_aspect_shp <- sub("save_aspect_shp = '(.*)'", "\\1", line)
  }
}

In [None]:
#@markdown <font color=grey> **Download DEM layer**

%%R

# Construct the full path to shapefile
shp_path <- file.path(main_dir, 'shapefile', 'studyArea_outline.shp')

# Read the polygon shapefile
shp_file <- st_read(shp_path)

# Fetch elevation data using elevatr
# Ensure the CRS of the shapefile is appropriate for the elevatr package (e.g., WGS84 or EPSG:4326)
study_area_elev <- st_transform(shp_file, crs = 4326)  # Transform to WGS84 if needed
shp_buffer <- st_buffer(study_area_elev, 500)

# download DEM
elevation_data <- get_elev_raster(locations = shp_buffer,
                                  z = 10,  # Adjust zoom level for desired resolution
                                  prj = st_crs(shp_buffer)$wkt,
                                  clip = "locations")

# overlay shapefile on DEM
cat('Overlay shapefile on DEM')
plot(elevation_data)
plot(st_geometry(study_area_elev), add = TRUE, col = "black")

In [None]:
#@markdown <font color=grey> **Save and visualization outputs**

%%R

# DEM
if (DEM_visualize == 'yes') {
  # Plot elevation data
  plot(elevation_data, main = "Elevation")
}

# Construct the full path to DEM
elev_path <- file.path(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM', 'dem.tif')

# Check if the directory exists, and create it if it doesn't
if (!dir.exists(elev_path)) {
  dir.create(elev_path, recursive = TRUE)
}

    # Save the results as a file
writeRaster(elevation_data, elev_path, overwrite = TRUE)

# compute slope
slope_raster <- terrain(elevation_data, opt = "slope", unit = "degrees")

if (Slope_visualize == 'yes') {
  # plot
  plot(slope_raster, main = "Slope (Degrees)")
  # save
  slope_path <- file.path(main_dir, 'workflow_outputs', '1_HRU_data', 'Slope', 'slope.tif')
  writeRaster(slope_raster, slope_path, overwrite = TRUE)
}

# compute aspect
aspect_raster <- terrain(elevation_data, opt = "aspect", unit = "degrees")

if (Aspect_visualize == 'yes') {
  # plot
  plot(aspect_raster, main = "Aspect (Degrees)")
  # save
  aspect_path <- file.path(main_dir, 'workflow_outputs', '1_HRU_data', 'Aspect', 'aspect.tif')
  writeRaster(aspect_raster, aspect_path, overwrite = TRUE)
} else if (save_aspect_shp == 'yes') {
  # Convert aspect raster to polygons
  aspect_path_shp <- file.path(main_dir, 'workflow_outputs', '1_HRU_data', 'Aspect', 'aspect.shp')
  aspect_polygons <- as.polygons(terra::rast(aspect_raster))
  # Save slope polygons as a shapefile
  st_write(st_as_sf(aspect_polygons), aspect_path_shp, delete_layer = TRUE)
}


In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download

# delete temorary folder
os.remove('/content/variable_info.txt')

## <font color=#5559AB> 1c) Elevation Bands </font>

The elevation band is a polygon shapefile derived from the DEM layer and is used for more accurate precipitation and temperature distributions in modelling. Users have the option to define the increment of the elevation bands.

All that is required is a <font color=red>DEM layer</font> of the study area.



### <font color=grey> **Upload Elevation Bands** </font>

In [None]:
#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

def upload_elev_bands(reclassify_dir, main_dir):
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Upload Elevation Band Shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Find the name of the shapefile in the 'shapefile' directory
    shp_file_name = next((shp_file for shp_file in os.listdir(os.path.join(main_dir, 'shapefile')) if shp_file.endswith(".shp")), None)

    # Find the name of the elevation band shapefile in the reclassify directory
    elevB_file_name = next((elevB_file for elevB_file in os.listdir(reclassify_dir) if elevB_file.endswith(".shp")), None)

    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Check Projection')
    print('-----------------------------------------------------------------------------------------------')

    # Load the study area shapefile
    shp_extent = gpd.read_file(os.path.join(main_dir, 'shapefile', shp_file_name))

    # Load the elevation band shapefile
    elevB_extent = gpd.read_file(os.path.join(reclassify_dir, elevB_file_name))

    if shp_extent.crs != elevB_extent.crs:
        # Reproject the elevation band layer to match the study area shapefile
        elevB_extent = elevB_extent.to_crs(shp_extent.crs)
        print('\nElevation Band layer has been reprojected to match the shapefile')
    else:
        print('\nCoordinate systems match')

    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Visualize and save elevation bands')
    print('-----------------------------------------------------------------------------------------------')

    # Display the table
    display(elevB_extent)

    # Visualize elevation bands
    f, ax = plt.subplots(figsize=(9, 10))
    elevB_extent.plot(categorical=False, legend=True, ax=ax)
    ax.set(title="Elevation Bands")
    ax.set_axis_off()
    plt.show()

    # Save the final elevation band shapefile to drive
    elev_band_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Elevation_band')
    os.makedirs(elev_band_output_dir, exist_ok=True)
    elevB_extent.to_file(os.path.join(elev_band_output_dir, 'studyArea_elev_band.shp'))

def remove_temp_data(temporary_dir):
    if os.path.exists(os.path.join(temporary_dir)):
      shutil.rmtree(os.path.join(temporary_dir))


In [None]:
#@markdown <font color=grey> **Upload Elevation Band Layer** </font>

#@markdown drag-and-drop the elevation band shapefile (.shp) layer into the specified folder

# set directory/path
reclassify_dir = os.path.join(temporary_dir, 'elev_band')
if not os.path.exists(reclassify_dir):
  os.makedirs(reclassify_dir)
print('\n-----------------------------------------------------------------------------------------------')
print('( ) Upload elevation band layer')
print('-----------------------------------------------------------------------------------------------')
print(f'drag-and-drop elevation band = file into following folder: {dem_temp_dir}')
response = input("Have you uploaded the elevation band (.shp) file (yes or no): ")
if response == "yes":
  upload_elev_bands(reclassify_dir, main_dir)
  remove_temp_data(temporary_dir)


In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

data = {
    "_comment": "------------ 1c) ELEVATION BANDS ---------------",
    "generate_elev_bands": "no",
    "upload_elev_bands": "yes",
}

# Specify the file path where you want to save the JSON file
file_path = os.path.join(config_path,"1c_elevationBand.json")

# Writing data to the JSON file
with open(file_path, 'w') as json_file:
    json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


### <font color=grey> **Generate Elevation Bands** </font>

Check out the short video [Creating Elevation Bands in Magpie Workflow](https://youtu.be/F5bH0wtPPhM) for more information

In [None]:
# check libraries
libraries_to_check = ["zipfile"]
check_and_install_libraries(libraries_to_check)

import zipfile

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

def min_max_elev(reclassify_dir, main_dir):
    """
    Determine the minimum and maximum elevation of a Digital Elevation Model (DEM).

    Args:
        reclassify_dir (str): Directory for temporary files, including the downloaded package.
        main_dir (str): Main directory path.

    Returns:
        int: Maximum elevation of the DEM.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Determine minimum and maximum elevation')
    print('-----------------------------------------------------------------------------------------------')

    # Create reclassify_dir if it doesn't exist
    if not os.path.exists(reclassify_dir):
        os.makedirs(reclassify_dir)

    # Download and extract the gdal_reclassify package
    reclassify_url = 'https://github.com/hburdett1/Magpie_Workflow_Developer/archive/refs/heads/main.zip'
    wget.download(reclassify_url, out=reclassify_dir)

    # Unzip the downloaded package
    extension = ".zip"
    os.chdir(reclassify_dir)
    for item in os.listdir(reclassify_dir):
        if item.endswith(extension):
            file_name = os.path.abspath(item)
            zip_ref = zipfile.ZipFile(file_name)
            zip_ref.extractall(reclassify_dir)
            zip_ref.close()
            os.remove(file_name)

    # Find name of DEM file
    dem_path = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM')
    for dem_file in os.listdir(os.path.join(dem_path)):
        if dem_file.endswith(".tif"):
            dem_file_name = dem_file

    # Open DEM layer using rasterio and rioxarray
    src = rxr.open_rasterio(os.path.join(dem_path, dem_file_name), masked=True).squeeze()

    # Determine and print minimum elevation of DEM layer
    dem_min = int(src.min())
    print('\nMinimum Elevation: ', dem_min)

    # Determine and print maximum elevation of DEM layer
    dem_max = int(src.max())
    print('Maximum Elevation: ', dem_max)

    return dem_max

def reclassify_dem(increment_val, min_elev, dem_max, reclassify_dir, main_dir, temporary_dir):
    """
    Reclassify a Digital Elevation Model (DEM) based on elevation bands and convert it to a shapefile.

    Args:
        increment_val (float): Increment value for elevation bands.
        min_elev (float): Minimum elevation value.
        dem_max (float): Maximum elevation value.
        reclassify_dir (str): Directory for temporary files, including the reclassify script.
        main_dir (str): Main directory path.
        temporary_dir (str): Temporary directory path.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Reclassify DEM')
    print('-----------------------------------------------------------------------------------------------')

    # Initialize variables for elevation bands
    min_val = min_elev
    inc_lst = ['<=0', f'<{min_elev}']

    # Generate elevation bands
    while min_elev < (dem_max - increment_val):
        min_elev = min_elev + increment_val
        inc_lst.append(f'<{min_elev}')

    # Join elevation bands for gdal_reclassify command
    band_range = ','.join(str(x) for x in inc_lst)
    print('Elevation Band Increments: ', band_range)

    # Create band IDs
    id_vals = list(range(len(inc_lst)))
    band_id = ','.join(str(i) for i in id_vals)
    print('Number of Elevation Bands: ', band_id)
    print('\n')

    # Define paths for gdal commands
    gdal_reclassify = os.path.join(reclassify_dir, 'Magpie_Workflow_Developer-main', '1_Data_Collection','gdal_reclassify.py')
    elev_band_rast_path = os.path.join(reclassify_dir, 'elev_bands_rast.tif')
    elev_band_shp_path = os.path.join(reclassify_dir, 'elev_bands.shp')

    # Find name of DEM file
    dem_path = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'DEM')
    for dem_file in os.listdir(dem_path):
        if dem_file.endswith(".tif"):
            dem_file_name = dem_file
    dem_path_full = os.path.join(dem_path, dem_file_name)

    # Write bash command to reclassify
    with open(os.path.join(temporary_dir, 'reclass.sh'), 'w') as f1:
        print(f'python "{gdal_reclassify}" "{dem_path_full}" "{elev_band_rast_path}" -c "{band_range}" -r "{band_id}" -d 0 -n true -p "COMPRESS=LZW"', file=f1)

    reclass_sh_file = os.path.join(temporary_dir, 'reclass.sh')
    subprocess.run(['bash', reclass_sh_file])

    # Write bash command to convert raster to shapefile
    with open(os.path.join(temporary_dir, 'polygonize.sh'), 'w') as f2:
        print(f'gdal_polygonize.py "{elev_band_rast_path}" "{elev_band_shp_path}" -b 1 -f "ESRI Shapefile"', file=f2)

    poly_sh_file = os.path.join(temporary_dir, 'polygonize.sh')
    subprocess.run(['bash', poly_sh_file])

def visualize_elev_bands_func(increment_val, min_elev, temporary_dir, main_dir, dem_max):
    """
    Visualize and save elevation bands based on a shapefile with elevation band information.

    Args:
        increment_val (float): Increment value for elevation bands.
        min_elev (float): Minimum elevation value.
        temporary_dir (str): Temporary directory path.
        main_dir (str): Main directory path.
        dem_max (float): Maximum elevation value.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Visualize and save elevation bands')
    print('-----------------------------------------------------------------------------------------------')

    # Load polygon shapefile
    elev_band_shp_path = os.path.join(temporary_dir, 'elev_band', 'elev_bands.shp')
    elev_band_shp = gpd.read_file(elev_band_shp_path)

    # Aggregate elevation bands based on ID values
    elev_band_dissolve = elev_band_shp.dissolve(by='DN')
    elev_band_dissolve.to_file(os.path.join(temporary_dir, 'elev_bands_dissolved.shp'))

    # Create lists of minimum and maximum contour values
    inc_min = [min_elev]
    while min_elev < (dem_max - increment_val):
        min_elev = min_elev + increment_val
        inc_min.append(min_elev)
    inc_min_lst = inc_min[:-1]
    max_inc = [max_val + increment_val for max_val in inc_min_lst]

    print('\nMinimum Contour Lines: ', inc_min_lst)
    print('Maximum Contour Lines: ', max_inc)

    # Format shapefile with contour values
    elev_band_final = gpd.read_file(os.path.join(temporary_dir, 'elev_bands_dissolved.shp'))
    elev_band_final["O_ID_1"] = list(range(1, (len(elev_band_final["DN"]) + 1)))  # Define new ID column
    elev_band_final["ContourMin"] = inc_min_lst  # Assign minimum contour values
    elev_band_final["ContourMax"] = [max_val + increment_val for max_val in inc_min_lst]  # Create list of maximum contour values
    del elev_band_final["DN"]  # Remove old ID column

    # Display the table
    display(elev_band_final)

    # Visualize elevation bands
    f, ax = plt.subplots(figsize=(9, 10))
    elev_band_shp.plot(column='DN', categorical=False, legend=True, ax=ax)
    ax.set(title="Elevation Bands")
    ax.set_axis_off()
    plt.show()

    # Save the final elevation band shapefile to drive
    elev_band_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Elevation_band')
    if not os.path.exists(elev_band_output_dir):
        os.makedirs(elev_band_output_dir)
    elev_band_final.to_file(os.path.join(elev_band_output_dir, 'studyArea_elev_band.shp'))

    # Remove temporary directory
    shutil.rmtree(os.path.join(temporary_dir))

    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Elevation Bands Complete!')
    print('-----------------------------------------------------------------------------------------------')

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))


In [None]:
#@markdown <font color=grey> **Determine minimum and maximum elevation** </font>

# set directory/path
reclassify_dir = os.path.join(temporary_dir, 'elev_band')
if not os.path.exists(reclassify_dir):
  os.makedirs(reclassify_dir)

dem_max = min_max_elev(reclassify_dir, main_dir)

In [None]:
#@markdown <font color=#5559AB> **Define minimum elevation value** </font><br>
#@markdown <font color=grey> ex, 100 </font>

min_elev = 300 #@param

#@markdown <font color=#5559AB> **Define increment of elevation band values** </font><br>
#@markdown <font color=grey> ex, 100 </font>

increment_val = 100 #@param

reclassify_dem(increment_val, min_elev, dem_max, reclassify_dir, main_dir, temporary_dir)

In [None]:
#@markdown <font color=grey> **Visualize elevation bands** </font>

visualize_elev_bands_func(increment_val, min_elev, temporary_dir, main_dir, dem_max)
print('Elevation band shapefile saved to drive')


In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download

# delete temorary folder
remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

if write_to_configuration_file == True:
  data = {
      "_comment": "------------ 1c) ELEVATION BANDS ---------------",
      "generate_elev_bands": "yes",
      "upload_elev_bands": "no",
      "min_elev": f"{min_elev}",
      "increment_val": f"{increment_val}",
  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path,"1c_elevationBand.json")

  # Writing data to the JSON file
  with open(file_path, 'w') as json_file:
      json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


## <font color=#5559AB> 1d) Landcover </font>

The landcover layer is a polygon shapefile derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover Type ([MCD12Q1](https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MCD12Q1#description)) Version 6.1 data set available through Google Earth Engine. This dataset provides a harmonized view of the physical cover of Earth’s surface based on supervised classifications of MODIS Terra and Aqua reflectance data at yearly intervals.

In this section, users can download the MODIS MCD12Q1 land cover product, clip, and save the landcover shapefile to their drive.

Only a <font color=red>shapefile of the study area</font> is required to run this subsection.

### <font color=grey> **Upload Landcover** </font>

In [None]:
# check libraries
libraries_to_check = ["shapely"]
check_and_install_libraries(libraries_to_check)

from shapely.geometry import box
from IPython.display import display

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

def check_projection(shp_file_path, main_dir, temporary_dir):
    """
    Check the projection of a shapefile and reproject if necessary.

    Parameters:
    - shp_file_path: Path to the shapefile.
    - main_dir: Main directory containing the shapefile.
    - temporary_dir: Temporary directory for storing reprojected shapefiles.

    Returns:
    - shp_file_name: Name of the shapefile.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Check projection of shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Find name of shapefile
    for shp_file in os.listdir(os.path.join(shp_file_path)):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    print('Shapefile name: ', shp_file_name)
    print('Shapefile path: ', os.path.join(main_dir, "shapefile", shp_file_name))

    shp_lyr_check = gpd.read_file(os.path.join(main_dir, "shapefile", shp_file_name))
    print('Shapefile CRS: ', shp_lyr_check.crs)

    temp_dir = os.path.join(temporary_dir)
    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)

    if shp_lyr_check.crs != 'EPSG:4326':
        # Reproject
        shp_lyr_crs = shp_lyr_check.to_crs(epsg=4326)
        shp_lyr_crs.to_file(os.path.join(temporary_dir, shp_file_name))
        print('Shapefile layer has been reprojected to match shapefile')
    else:
        shp_lyr_crs = shp_lyr_check
        print('Coordinate systems match!')
        shp_lyr_crs.to_file(os.path.join(temporary_dir, shp_file_name))

    return shp_file_name

def overlay_shp_on_landcov(shp_file_path, shp_file_name, temporary_dir):
    """
    Overlay shapefile on landcover data and visualize the result.

    Parameters:
    - shp_file_path: Path to the shapefile.
    - shp_file_name: Name of the shapefile.
    - temporary_dir: Temporary directory for storing landcover data.

    Returns:
    - landcov_lyr: Landcover layer.
    - crop_extent: Crop extent layer.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Overlay shapefile on landcover')
    print('-----------------------------------------------------------------------------------------------')

    # Find name of raster
    for tif_file in os.listdir(os.path.join(temporary_dir, 'Landcover')):
        if tif_file.endswith(".tif"):
            tif_file_name = tif_file

    # Open raster
    landcov_lyr = rxr.open_rasterio(os.path.join(temporary_dir, 'Landcover', tif_file_name), masked=True).squeeze()
    # Load shapefile
    crop_extent = gpd.read_file(os.path.join(temporary_dir, shp_file_name))

    print('Shapefile CRS: ', crop_extent.crs)
    print('Landcover CRS: ', landcov_lyr.rio.crs)

    if crop_extent.crs != landcov_lyr.rio.crs:
        # Reproject
        landcov_lyr = landcov_lyr.rio.reproject(crop_extent.crs)
        print('Landcover layer has been reprojected to match the shapefile')
    else:
        print('Coordinate systems match!')

    f, ax = plt.subplots(figsize=(10, 5))
    landcov_lyr.plot.imshow(ax=ax)

    crop_extent.plot(ax=ax, alpha=.8, color="black")
    ax.set(title="Raster Layer with Shapefile Overlayed")
    ax.set_axis_off()

    landcov_shapfile_visualization = True

    if landcov_shapfile_visualization:
        plt.show()

    return landcov_lyr, crop_extent

def clip_and_format(landcov_dir, landcov_lyr, crop_extent, temporary_dir):
    """
    Clip and format landcover into a shapefile.

    Parameters:
    - landcov_dir: Directory to save the processed data.
    - landcov_lyr: Landcover layer to be clipped.
    - crop_extent: Study area extent boundary.
    - temporary_dir: Temporary directory to store intermediate files.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Clip and format landcover into shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Open crop extent (the study area extent boundary)
    crop_extent1 = crop_extent.buffer(.001)

    # Clip the landcover layer
    lidar_clipped = landcov_lyr.rio.clip(crop_extent1, crop_extent1.crs)
    print('Landcover layer has been clipped')

    # Save clipped landcover layer to drive
    path_to_tif_file = os.path.join(landcov_dir, 'clipped_lyr.tif')
    lidar_clipped.rio.to_raster(path_to_tif_file)
    print('Layer has been saved to temporary folder')

    # Define pathways necessary for GDAL Commands
    clip_name = 'studyArea_outline'
    clipped = os.path.join(landcov_dir, 'clipped.tif')
    clipped_lyr = os.path.join(landcov_dir, 'clipped_lyr.tif')
    landcov_shp = os.path.join(landcov_dir, 'studyArea_landcov.shp')
    bash_dir = os.path.join(temporary_dir, 'bash_scripts')

    # Create the bash directory if it doesn't exist
    if not os.path.exists(bash_dir):
        os.makedirs(bash_dir)
        print("Created folder: ", bash_dir)

    # GDAL polygonize
    with open(os.path.join(bash_dir, 'polygon.sh'), 'w') as f3:
        print(f'gdal_polygonize.py "{clipped_lyr}" "{landcov_shp}" -b 1 -f "ESRI Shapefile"', file=f3)

    # Define bash command path
    polygon_sh = os.path.join(bash_dir, 'polygon.sh')

    # Run bash command
    subprocess.run(['bash', polygon_sh])

def format_google_earth_data(main_dir, temporary_dir):
    """
    Format landcover attribute names.

    Parameters:
    - main_dir: Main directory where the workflow is located.
    - temporary_dir: Temporary directory to store intermediate files.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Format landcover attribute names')
    print('-----------------------------------------------------------------------------------------------')

    landcov_dir = os.path.join(temporary_dir, 'Landcover')

    # Find the name of the shapefile
    for land_shp_file in os.listdir(landcov_dir):
        if land_shp_file.endswith(".shp"):
            land_shp_file_name = land_shp_file

    landcov_shp = gpd.read_file(os.path.join(landcov_dir, land_shp_file_name))
    landcov_dissolve = landcov_shp.dissolve(by='DN')
    landcov_dissolve["Landuse_ID"] = landcov_dissolve.index
    landcov_final = landcov_dissolve.reset_index()
    landcov_final = landcov_final.drop('DN', axis=1)

    land_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Landcover')
    if not os.path.exists(land_output_dir):
        os.makedirs(land_output_dir)

    # Save the final landcover shapefile to drive
    landcov_final.to_file(os.path.join(land_output_dir, 'studyArea_landcover.shp'))

def visualize_landcover(main_dir):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Visualize landcover outputs')
    print('-----------------------------------------------------------------------------------------------')

    # Create output directory if it doesn't exist
    landcov_dir_out = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Landcover')
    if not os.path.exists(landcov_dir_out):
        os.makedirs(landcov_dir_out)

    # Find name of shapefile in the output directory
    for land_shp_file in os.listdir(landcov_dir_out):
        if land_shp_file.endswith(".shp"):
            land_shp_file_name1 = land_shp_file

    # Read the shapefile into a GeoDataFrame
    landcov_final = gpd.read_file(os.path.join(landcov_dir_out, land_shp_file_name1))

    # Display GeoDataFrame table
    display(landcov_final)

    # Plot landcover using GeoDataFrame
    f, ax = plt.subplots(figsize=(8, 9))
    landcov_final.plot(column='Landuse_ID', categorical=True, legend=True, ax=ax)
    ax.set_axis_off()
    plt.show()

    # Calculate and display area for each landcover type
    landcov_final['area'] = landcov_final.area
    df_L1 = pd.DataFrame(landcov_final.drop(columns='geometry'))
    df_landcov = df_L1.groupby('Landuse_ID').sum()
    df_landcov.reset_index(inplace=True)

    # Collect unique landcover IDs
    unique_landcov = df_landcov['Landuse_ID'].unique()

    # Compute general area percentage for each landcover type
    for val_L in unique_landcov:
        area_poly_L = df_landcov.loc[df_landcov['Landuse_ID'] == val_L, 'area']
        landcov_area = float((area_poly_L / df_landcov['area'].sum()) * 100)
        print(f'ID ({val_L}): {round(landcov_area, 2)}%')

    # Print section footer
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Landcover is complete!')
    print('-----------------------------------------------------------------------------------------------')

def upload_landcover(land_temp_dir, main_dir, temporary_dir):
    # Find the name of the study area shapefile
    for shp_file in os.listdir(os.path.join(main_dir, 'shapefile')):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    # Find the name of the landcover shapefile
    for landcov_file in os.listdir(land_temp_dir):
        if landcov_file.endswith(".shp"):
            landcov_file_name = landcov_file

    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Check Projection')
    print('-----------------------------------------------------------------------------------------------')

    # Load study area shapefile
    shp_extent = gpd.read_file(os.path.join(main_dir, 'shapefile', shp_file_name))
    # Load landcover shapefile
    landcov_extent = gpd.read_file(os.path.join(land_temp_dir, landcov_file_name))

    # Check if the landcover shapefile has the required column 'Landuse_ID'
    if 'Landuse_ID' in landcov_extent:
        # Check if coordinate systems match, reproject if necessary
        if shp_extent.crs != landcov_extent.crs:
            # Reproject landcover layer to match the study area shapefile
            landcov_lyr = landcov_extent.to_crs(shp_extent.crs)
            print('\nLandcover layer has been reprojected to match the shapefile')
        else:
            landcov_lyr = landcov_extent
            print('\nCoordinate systems match')

        # Print section header
        print('\n-----------------------------------------------------------------------------------------------')
        print('( ) Visualize and save landcover layer')
        print('-----------------------------------------------------------------------------------------------')

        # Display landcover GeoDataFrame table
        display(landcov_lyr)

        # Visualize landcover
        f, ax = plt.subplots(figsize=(8, 9))
        landcov_lyr.plot(column='Landuse_ID', categorical=True, legend=True, ax=ax)
        ax.set_axis_off()
        plt.show()

        # Calculate and display area for each landcover type
        landcov_lyr['area'] = landcov_lyr.area
        df_L1 = pd.DataFrame(landcov_lyr.drop(columns='geometry'))
        df_landcov = df_L1.groupby('Landuse_ID').sum()
        df_landcov.reset_index(inplace=True)

        # Collect unique landcover IDs
        unique_landcov = df_landcov['Landuse_ID'].unique()

        # Compute general area percentage for each landcover type
        for val_L in unique_landcov:
            area_poly_L = df_landcov.loc[df_landcov['Landuse_ID'] == val_L, 'area']
            landcov_area = float((area_poly_L / df_landcov['area'].sum()) * 100)
            print(f'ID ({val_L}): {round(landcov_area, 2)}%')

        # Save the final landcover shapefile to drive
        landcov_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Landcover')
        if not os.path.exists(landcov_output_dir):
            os.makedirs(landcov_output_dir)
        landcov_lyr.to_file(os.path.join(landcov_output_dir, 'studyArea_landcover.shp'))
    else:
        # Print an error message if the required column name is not found
        print('--- Invalid landuse column name ---\n')
        print('Please change the landuse column name to Landuse_ID and run again')

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

In [None]:
#@markdown <font color=grey> **Upload Landcover File** </font>

#@markdown Drag-and-drop the landcover file into the specified folder, both raster (.tif) and shapefile (.shp) files are accepted

#@markdown If the layer is a raster, it will be clipped, formatted, and converted into a shapefile layer of the study area.

#@markdown Is the layer is a shapefile, Magpie checks the required columns for BasinMaker are present and checks the projection


land_temp_dir = os.path.join(temporary_dir,'Landcover')
if not os.path.exists(land_temp_dir):
  os.makedirs(land_temp_dir)

print('\n-----------------------------------------------------------------------------------------------')
print('( ) Upload Landcover')
print('-----------------------------------------------------------------------------------------------')
print(f'drag-and-drop landcover file into following folder: {land_temp_dir}')
response = input("Have you uploaded the landcover file (yes or no): ")
if response == "yes":
  for land_file in os.listdir(land_temp_dir):
    if land_file.endswith(".tif"):
      shp_file_path = os.path.join(main_dir, 'shapefile')
      # step 1
      shp_file_name = check_projection(shp_file_path,main_dir,temporary_dir)
      # step 2
      landcov_lyr, crop_extent = overlay_shp_on_landcov(shp_file_path, shp_file_name,temporary_dir)
      # step 3
      # define output directory
      landcov_dir = os.path.join(temporary_dir,'Landcover')
      if not os.path.exists(landcov_dir):
        os.makedirs(landcov_dir)
      clip_and_format(landcov_dir, landcov_lyr, crop_extent,temporary_dir)
      format_google_earth_data(main_dir,temporary_dir)
      # step 4
      visualize_landcover(main_dir)
      remove_temp_data(temporary_dir)
    if land_file.endswith(".shp"):
      upload_landcover(land_temp_dir,main_dir,temporary_dir)
      remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

data = {
    "_comment": "------------ 1d) LANDCOVER ---------------",
    "generate_landcover": "yes",
    "upload_landcover": "no",
}

# Specify the file path where you want to save the JSON file
file_path = os.path.join(config_path,"1d_landcover.json")

# Writing data to the JSON file
with open(file_path, 'w') as json_file:
    json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


### <font color=grey> **Generate Landcover** </font>

Check out the short video [Generating Landcover Data with Magpie Workflow](https://youtu.be/JlAD33ox_wk) for more information.

In [None]:
# check libraries
libraries_to_check = ["requests", "shapely"]
check_and_install_libraries(libraries_to_check)

import ee
import requests
from shapely.geometry import box
from IPython.display import display

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

# Trigger the authentication flow.
service_account = 'magpie-developer@magpie-id-409519.iam.gserviceaccount.com'
credentials = ee.ServiceAccountCredentials(service_account, os.path.join(main_dir,'extras','magpie-key.json'))

# Initialize the library.
ee.Initialize(credentials)

def check_projection(shp_file_path, main_dir, temporary_dir):
    """
    Check the projection of a shapefile and reproject if necessary.

    Parameters:
    - shp_file_path: Path to the shapefile.
    - main_dir: Main directory containing the shapefile.
    - temporary_dir: Temporary directory for storing reprojected shapefiles.

    Returns:
    - shp_file_name: Name of the shapefile.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Check projection of shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Find name of shapefile
    for shp_file in os.listdir(os.path.join(shp_file_path)):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    print('Shapefile name: ', shp_file_name)
    print('Shapefile path: ', os.path.join(main_dir, "shapefile", shp_file_name))

    shp_lyr_check = gpd.read_file(os.path.join(main_dir, "shapefile", shp_file_name))
    print('Shapefile CRS: ', shp_lyr_check.crs)

    temp_dir = os.path.join(temporary_dir)
    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)

    if shp_lyr_check.crs != 'EPSG:4326':
        # Reproject
        shp_lyr_crs = shp_lyr_check.to_crs(epsg=4326)
        shp_lyr_crs.to_file(os.path.join(temporary_dir, shp_file_name))
        print('Shapefile layer has been reprojected to match shapefile')
    else:
        shp_lyr_crs = shp_lyr_check
        print('Coordinate systems match!')
        shp_lyr_crs.to_file(os.path.join(temporary_dir, shp_file_name))

    return shp_file_name

def download_landcover(shp_file_path, shp_file_name, data_source, year_of_interest, band_name, scale, temporary_dir):
    """
    Download landcover data based on the given shapefile bounding box.

    Parameters:
    - shp_file_path: Path to the shapefile.
    - shp_file_name: Name of the shapefile.
    - data_source: Source of landcover data.
    - year_of_interest: Year of the landcover data.
    - band_name: Name of the band.
    - scale: Scale of the download.
    - temporary_dir: Temporary directory for storing downloaded landcover data.

    Returns:
    - landcov_dir: Directory path where the landcover data is stored.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Download Landcover')
    print('-----------------------------------------------------------------------------------------------')


    # Define buffer
    buffer_size = 0.3

    # Determine the boundary of the provided shapefile
    bounds = gpd.read_file(os.path.join(temporary_dir, shp_file_name)).bounds
    west, south, east, north = bounds = bounds.loc[0]
    west -= buffer_size
    south -= buffer_size

    print('Bounding box: ', west, south, east, north)

    if data_source == 'USGS':
        full_data_source = "USGS/NLCD_RELEASES/2020_REL/NALCMS"
        band_name = 'landcover'
        year_of_interest = '2020'
    elif data_source == 'MODIS':
        full_data_source = f"MODIS/061/MCD12Q1/{year_of_interest}_01_01"
        band_name = 'LC_Type1'

    # Concatenate input data to generate full path
    img = ee.Image(full_data_source)
    region = ee.Geometry.BBox(west, south, east, north)

    # Multi-band GeoTIFF file.
    url = img.getDownloadUrl({
        'bands': band_name,
        'region': region,
        'scale': scale,
        'format': 'GEO_TIFF'
    })

    # Define output directory
    landcov_dir = os.path.join(temporary_dir, 'Landcover')
    if not os.path.exists(landcov_dir):
        os.makedirs(landcov_dir)

    response = requests.get(url)
    with open(os.path.join(landcov_dir, 'study_area_landcov.tif'), 'wb') as fd:
        fd.write(response.content)

    # Path to clipped output file
    clipped_bounds = os.path.join(landcov_dir, 'study_area_landcov.tif')
    return landcov_dir

def overlay_shp_on_landcov(shp_file_path, shp_file_name, temporary_dir):
    """
    Overlay shapefile on landcover data and visualize the result.

    Parameters:
    - shp_file_path: Path to the shapefile.
    - shp_file_name: Name of the shapefile.
    - temporary_dir: Temporary directory for storing landcover data.

    Returns:
    - landcov_lyr: Landcover layer.
    - crop_extent: Crop extent layer.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Overlay shapefile on landcover')
    print('-----------------------------------------------------------------------------------------------')

    # Find name of raster
    for tif_file in os.listdir(os.path.join(temporary_dir, 'Landcover')):
        if tif_file.endswith(".tif"):
            tif_file_name = tif_file

    # Open raster
    landcov_lyr = rxr.open_rasterio(os.path.join(temporary_dir, 'Landcover', tif_file_name), masked=True).squeeze()
    # Load shapefile
    crop_extent = gpd.read_file(os.path.join(temporary_dir, shp_file_name))

    print('Shapefile CRS: ', crop_extent.crs)
    print('Landcover CRS: ', landcov_lyr.rio.crs)

    if crop_extent.crs != landcov_lyr.rio.crs:
        # Reproject
        landcov_lyr = landcov_lyr.rio.reproject(crop_extent.crs)
        print('Landcover layer has been reprojected to match the shapefile')
    else:
        print('Coordinate systems match!')

    f, ax = plt.subplots(figsize=(10, 5))
    landcov_lyr.plot.imshow(ax=ax)

    crop_extent.plot(ax=ax, alpha=.8, color="black")
    ax.set(title="Raster Layer with Shapefile Overlayed")
    ax.set_axis_off()

    landcov_shapfile_visualization = True

    if landcov_shapfile_visualization:
        plt.show()

    return landcov_lyr, crop_extent

def clip_and_format(landcov_dir, landcov_lyr, crop_extent, temporary_dir):
    """
    Clip and format landcover into a shapefile.

    Parameters:
    - landcov_dir: Directory to save the processed data.
    - landcov_lyr: Landcover layer to be clipped.
    - crop_extent: Study area extent boundary.
    - temporary_dir: Temporary directory to store intermediate files.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Clip and format landcover into shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Open crop extent (the study area extent boundary)
    crop_extent1 = crop_extent.buffer(.001)

    # Clip the landcover layer
    lidar_clipped = landcov_lyr.rio.clip(crop_extent1, crop_extent1.crs)
    print('Landcover layer has been clipped')

    # Save clipped landcover layer to drive
    path_to_tif_file = os.path.join(landcov_dir, 'clipped_lyr.tif')
    lidar_clipped.rio.to_raster(path_to_tif_file)
    print('Layer has been saved to temporary folder')

    # Define pathways necessary for GDAL Commands
    clip_name = 'studyArea_outline'
    clipped = os.path.join(landcov_dir, 'clipped.tif')
    clipped_lyr = os.path.join(landcov_dir, 'clipped_lyr.tif')
    landcov_shp = os.path.join(landcov_dir, 'studyArea_landcov.shp')
    bash_dir = os.path.join(temporary_dir, 'bash_scripts')

    # Create the bash directory if it doesn't exist
    if not os.path.exists(bash_dir):
        os.makedirs(bash_dir)
        print("Created folder: ", bash_dir)

    # GDAL polygonize
    with open(os.path.join(bash_dir, 'polygon.sh'), 'w') as f3:
        print(f'gdal_polygonize.py "{clipped_lyr}" "{landcov_shp}" -b 1 -f "ESRI Shapefile"', file=f3)

    # Define bash command path
    polygon_sh = os.path.join(bash_dir, 'polygon.sh')

    # Run bash command
    subprocess.run(['bash', polygon_sh])

def format_google_earth_data(main_dir, temporary_dir):
    """
    Format landcover attribute names.

    Parameters:
    - main_dir: Main directory where the workflow is located.
    - temporary_dir: Temporary directory to store intermediate files.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Format landcover attribute names')
    print('-----------------------------------------------------------------------------------------------')

    landcov_dir = os.path.join(temporary_dir, 'Landcover')

    # Find the name of the shapefile
    for land_shp_file in os.listdir(landcov_dir):
        if land_shp_file.endswith(".shp"):
            land_shp_file_name = land_shp_file

    landcov_shp = gpd.read_file(os.path.join(landcov_dir, land_shp_file_name))
    landcov_dissolve = landcov_shp.dissolve(by='DN')
    landcov_dissolve["Landuse_ID"] = landcov_dissolve.index
    landcov_final = landcov_dissolve.reset_index()
    landcov_final = landcov_final.drop('DN', axis=1)

    land_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Landcover')
    if not os.path.exists(land_output_dir):
        os.makedirs(land_output_dir)

    # Save the final landcover shapefile to drive
    landcov_final.to_file(os.path.join(land_output_dir, 'studyArea_landcover.shp'))

def visualize_landcover(main_dir):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Visualize landcover outputs')
    print('-----------------------------------------------------------------------------------------------')

    # Create output directory if it doesn't exist
    landcov_dir_out = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Landcover')
    if not os.path.exists(landcov_dir_out):
        os.makedirs(landcov_dir_out)

    # Find name of shapefile in the output directory
    for land_shp_file in os.listdir(landcov_dir_out):
        if land_shp_file.endswith(".shp"):
            land_shp_file_name1 = land_shp_file

    # Read the shapefile into a GeoDataFrame
    landcov_final = gpd.read_file(os.path.join(landcov_dir_out, land_shp_file_name1))

    # Display GeoDataFrame table
    display(landcov_final)

    # Plot landcover using GeoDataFrame
    f, ax = plt.subplots(figsize=(8, 9))
    landcov_final.plot(column='Landuse_ID', categorical=True, legend=True, ax=ax)
    ax.set_axis_off()
    plt.show()

    # Calculate and display area for each landcover type
    landcov_final['area'] = landcov_final.area
    df_L1 = pd.DataFrame(landcov_final.drop(columns='geometry'))
    df_landcov = df_L1.groupby('Landuse_ID').sum()
    df_landcov.reset_index(inplace=True)

    # Collect unique landcover IDs
    unique_landcov = df_landcov['Landuse_ID'].unique()

    # Compute general area percentage for each landcover type
    for val_L in unique_landcov:
        area_poly_L = df_landcov.loc[df_landcov['Landuse_ID'] == val_L, 'area']
        landcov_area = float((area_poly_L / df_landcov['area'].sum()) * 100)
        print(f'ID ({val_L}): {round(landcov_area, 2)}%')

    # Print section footer
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Landcover is complete!')
    print('-----------------------------------------------------------------------------------------------')

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

In [None]:
#@markdown <font color=#5559AB> **Download  landcover data** </font>

#@markdown Magpie offers two landcover options from Google Earth Engine:

#@markdown >[USGS Landcover](https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD_RELEASES_2020_REL_NALCMS#bands) which is only available for 2020, 30m resolution

#@markdown >[MODIS Landcover](https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MCD12Q1) data which is available at a yearly interval, 500m resolution

#@markdown Users can change the data source, band name, and scale (information available in Earth Engine Data Catalog) users can download other landcover datasets available on Google Earth Engine

#@markdown Please refer to the [Earth Engine Data Catalog](https://developers.google.com/earth-engine/datasets/catalog) to view other data source options

#@markdown If Earth Engine cannot download the extent of your study area, increase size of scale

#@markdown <font color=grey> for example, adjust scale to "90"

# define paths
shp_file_path = os.path.join(main_dir, 'shapefile')
shp_file_name = check_projection(shp_file_path,main_dir,temporary_dir)

# define input variables
data_source = "USGS" #@param ["USGS", "MODIS"]

year_of_interest = "2020" #@param {type:"string"}

band_name = "landcover" #@param {type:"string"}

scale = 60 #@param

# download landcover
landcov_dir = download_landcover(shp_file_path, shp_file_name, data_source,
                                 year_of_interest, band_name, scale, temporary_dir)





In [None]:
#@markdown <font color=grey> **Clip and Format** </font>

landcov_dir = os.path.join(temporary_dir, 'Landcover')

landcov_lyr, crop_extent = overlay_shp_on_landcov(shp_file_path, shp_file_name,temporary_dir)
# step 3
clip_and_format(landcov_dir, landcov_lyr, crop_extent,temporary_dir)
format_google_earth_data(main_dir,temporary_dir)

In [None]:
#@markdown <font color=grey> **Visualize Landcover Layer** </font>


# step 4
visualize_landcover(main_dir)

In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download


# delete temorary folder
remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

if write_to_configuration_file == True:
  data = {
      "_comment": "------------ 1d) LANDCOVER ---------------",
      "generate_landcover": "yes",
      "upload_landcover": "no",

      "data_source_landcover": f"{data_source}",
      "year_of_interest": f"{year_of_interest}",
      "band_name_landcover": f"{band_name}",
      "scale_landcover": scale,
  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path,"1d_landcover.json")

  # Writing data to the JSON file
  with open(file_path, 'w') as json_file:
      json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


**References**

Friedl, M. A., Sulla-Menashe, D., Tan, B., Schneider, A., Ramankutty, N., Sibley, A., et al.
(2010). MODIS collection 5 global land cover: Algorithm refinements and characterization of new datasets. Remote Sensing of Environment, 114, 168–182.

## <font color=#5559AB> 1e) Soil </font>

The soil layer is a polygon shapefile derived from the [soil texture classes](https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_CLAY-WFRACTION_USDA-3A1A1A_M_v02#description) (USDA system) data set derived from predicted soil texture fractions hosted on Google Earth Engine.

Only a <font color=red>shapefile of the study area</font> is required to run this subsection.

### <font color=grey> **Upload Soil** </font>

In [None]:
# check libraries
libraries_to_check = ["requests", "shapely"]
check_and_install_libraries(libraries_to_check)

import requests
from shapely.geometry import box
from IPython.display import display

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

def check_projection(shp_file_path, main_dir, temporary_dir):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Check projection of shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Find the name of the shapefile in the given path
    for shp_file in os.listdir(shp_file_path):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    # Print shapefile information
    print('Shapefile name: ', shp_file_name)
    print('Shapefile path: ', os.path.join(main_dir, "shapefile", shp_file_name))

    # Read the shapefile into a GeoDataFrame to check its CRS
    shp_lyr_check = gpd.read_file(os.path.join(main_dir, "shapefile", shp_file_name))
    print('Shapefile CRS: ', shp_lyr_check.crs)

    # Create a temporary directory if it doesn't exist
    temp_dir = os.path.join(temporary_dir)
    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)

    # Check if the CRS is EPSG:4326, if not, reproject the shapefile
    if shp_lyr_check.crs != 'EPSG:4326':
        # Reproject the shapefile layer to match EPSG:4326
        shp_lyr_crs = shp_lyr_check.to_crs(epsg=4326)
        shp_lyr_crs.to_file(os.path.join(temporary_dir, shp_file_name))
        print('Shapefile layer has been reprojected to match EPSG:4326')
    else:
        shp_lyr_crs = shp_lyr_check
        print('Coordinate systems match!')
        shp_lyr_crs.to_file(os.path.join(temporary_dir, shp_file_name))

    # Return the name of the reprojected shapefile
    return shp_file_name

def overlay_shp_on_soil(temporary_dir, shp_file_name):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Overlay shapefile on soil')
    print('-----------------------------------------------------------------------------------------------')

    # Find the name of the raster file in the temporary directory
    for tif_file in os.listdir(os.path.join(temporary_dir, 'Soil')):
        if tif_file.endswith(".tif"):
            tif_file_name = tif_file

    # Open the soil raster
    soil_lyr = rxr.open_rasterio(os.path.join(temporary_dir, 'Soil', tif_file_name), masked=True).squeeze()

    # Load the shapefile
    crop_extent = gpd.read_file(os.path.join(temporary_dir, shp_file_name))

    # Print CRS information
    print('Shapefile CRS: ', crop_extent.crs)
    print('Soil CRS: ', soil_lyr.rio.crs)

    # Check if coordinate systems match, reproject if necessary
    if crop_extent.crs != soil_lyr.rio.crs:
        soil_lyr = soil_lyr.rio.reproject(crop_extent.crs)
        print('Soil layer has been reprojected to match the shapefile')
    else:
        print('Coordinate systems match!')

    # Plot the overlay
    f, ax = plt.subplots(figsize=(10, 5))
    soil_lyr.plot.imshow(ax=ax)
    crop_extent.plot(ax=ax, alpha=0.8, color="black")
    ax.set(title="Raster Layer with Shapefile Overlayed")
    ax.set_axis_off()

    soil_shapfile_visualization = True

    if soil_shapfile_visualization:
        plt.show()

    # Return the soil layer and crop extent
    return soil_lyr, crop_extent

def clip_and_format(soil_dir, soil_lyr, crop_extent, temporary_dir):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Clip and format soil into shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Clip soil product to the study area
    # Open the crop extent (the study area extent boundary)
    crop_extent_buffered = crop_extent.buffer(.001)

    # Clip the soil layer
    soil_clipped = soil_lyr.rio.clip(crop_extent_buffered, crop_extent_buffered.crs)
    print('Soil layer has been clipped')

    # Save the clipped soil layer to the temporary folder
    path_to_tif_file = os.path.join(soil_dir, 'clipped_lyr.tif')
    soil_clipped.rio.to_raster(path_to_tif_file)
    print('Layer has been saved to the temporary folder')

    # Define necessary paths
    clip_name = 'studyArea_outline'
    clipped_lyr = os.path.join(soil_dir, 'clipped_lyr.tif')
    soil_shp = os.path.join(soil_dir, 'studyArea_soil.shp')

    # Define the bash script directory
    bash_dir = os.path.join(temporary_dir, 'bash_scripts')
    if not os.path.exists(bash_dir):
        os.makedirs(bash_dir)
        print("Created folder: ", bash_dir)

    # GDAL polygonize
    with open(os.path.join(bash_dir, 'polygon.sh'), 'w') as f3:
        print(f'gdal_polygonize.py "{clipped_lyr}" "{soil_shp}" -b 1 -f "ESRI Shapefile"', file=f3)

    # Define the path to the bash command
    polygon_sh = os.path.join(bash_dir, 'polygon.sh')

    # Run the bash command
    subprocess.run(['bash', polygon_sh])

def visualize_soil(soil_dir, main_dir):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Visualize soil outputs')
    print('-----------------------------------------------------------------------------------------------')

    # Find the name of the shapefile in the soil directory
    for soil_shp_file in os.listdir(soil_dir):
        if soil_shp_file.endswith(".shp"):
            soil_shp_file_name = soil_shp_file

    # Read the soil shapefile
    soil_shp = gpd.read_file(os.path.join(soil_dir, soil_shp_file_name))

    # Dissolve the soil shapefile by 'DN' field
    soil_dissolve = soil_shp.dissolve(by='DN')
    soil_dissolve["Soil_ID"] = soil_dissolve.index
    soil_final = soil_dissolve.reset_index()
    soil_final = soil_final.drop('DN', axis=1)

    # Display the dissolved soil shapefile
    display(soil_final)

    # Create the output directory for soil shapefile
    soil_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Soil')
    if not os.path.exists(soil_output_dir):
        os.makedirs(soil_output_dir)

    # Save the final soil shapefile to the output directory
    soil_final.to_file(os.path.join(soil_output_dir, 'studyArea_soil.shp'))

    print('------------------------------------- Soil -------------------------------------')

    # Read the final soil shapefile
    soil_final = gpd.read_file(os.path.join(soil_output_dir, 'studyArea_soil.shp'))

    # Plot the soil shapefile
    f, ax = plt.subplots(figsize=(8, 9))
    soil_final.plot(column='Soil_ID', categorical=True, legend=True, ax=ax)
    ax.set_axis_off()
    plt.show()

    # Calculate and print the percentage area for each soil type
    soil_final['area'] = soil_final.area
    df_S1 = pd.DataFrame(soil_final.drop(columns='geometry'))
    df_soil = df_S1.groupby('Soil_ID').sum()
    df_soil.reset_index(inplace=True)

    # Collect unique soil IDs
    unique_soil = df_soil['Soil_ID'].unique()
    un_soi_len_lst = list(range(unique_soil.size))

    for val_s in unique_soil:
        area_poly_S = df_soil.loc[df_soil['Soil_ID'] == val_s, 'area']
        soil_area = float((area_poly_S / df_soil['area'].sum()) * 100)
        for num in un_soi_len_lst:
            if val_s == unique_soil[num]:
                print(f'ID ({val_s}): {round(soil_area, 2)}%')

    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Soil is complete!')
    print('-----------------------------------------------------------------------------------------------')

def upload_soil(soil_temp_dir, main_dir):
    # Find the name of the shapefile in the shapefile directory
    for shp_file in os.listdir(os.path.join(main_dir, 'shapefile')):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    # Find the name of the shapefile in the soil temporary directory
    for soil_file in os.listdir(soil_temp_dir):
        if soil_file.endswith(".shp"):
            soil_file_name = soil_file

    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Check Projection')
    print('-----------------------------------------------------------------------------------------------')

    # Load the study area shapefile
    shp_extent = gpd.read_file(os.path.join(main_dir, 'shapefile', shp_file_name))

    # Load the soil shapefile
    soil_extent = gpd.read_file(os.path.join(soil_temp_dir, soil_file_name))

    # Check if 'Soil_ID' column exists in soil shapefile
    if 'Soil_ID' in soil_extent:
        # Check if coordinate systems match, reproject if necessary
        if shp_extent.crs != soil_extent.crs:
            # Reproject soil layer to match the study area shapefile
            soil_lyr = soil_extent.to_crs(shp_extent.crs)
            print('\nSoil layer has been reprojected to match the shapefile')
        else:
            soil_lyr = soil_extent
            print('\nCoordinate systems match')

        # Print section header for visualization
        print('\n-----------------------------------------------------------------------------------------------')
        print('( ) Visualize and save soil layer')
        print('-----------------------------------------------------------------------------------------------')

        # Display the soil layer table
        display(soil_lyr)

        # Visualize the soil layer
        f, ax = plt.subplots(figsize=(8, 9))
        soil_lyr.plot(column='Soil_ID', categorical=True, legend=True, ax=ax)
        ax.set_axis_off()
        plt.show()

        # Calculate the area for each soil type
        soil_lyr['area'] = soil_lyr.area
        df_S1 = pd.DataFrame(soil_lyr.drop(columns='geometry'))
        df_soil = df_S1.groupby('Soil_ID').sum()
        df_soil.reset_index(inplace=True)

        # Collect unique soil IDs
        unique_soil = df_soil['Soil_ID'].unique()
        un_soi_len_lst = list(range(unique_soil.size))

        # Print the percentage area for each soil type
        for val_s in unique_soil:
            area_poly_S = df_soil.loc[df_soil['Soil_ID'] == val_s, 'area']
            soil_area = float((area_poly_S / df_soil['area'].sum()) * 100)
            for num in un_soi_len_lst:
                if val_s == unique_soil[num]:
                    print(f'ID ({val_s}): {round(soil_area, 2)}%')

        # Save the final soil shapefile to the output directory
        soil_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Soil')
        if not os.path.exists(soil_output_dir):
            os.makedirs(soil_output_dir)
        soil_lyr.to_file(os.path.join(soil_output_dir, 'studyArea_soil.shp'))
    else:
        print('--- Invalid soil column name ---\n')
        print('Please change the soil column name to Soil_ID and run again')

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

In [None]:
#@markdown <font color=grey> **Upload Soil File** </font>

#@markdown Drag-and-drop the soil file into the specified folder, both raster (.tif) and shapefile (.shp) files are accepted

#@markdown If the layer is a raster, it will be clipped, formatted, and converted into a shapefile layer of the study area.

#@markdown Is the layer is a shapefile, Magpie checks the required columns for BasinMaker are present and checks the projection

# define the output directory
soil_dir = os.path.join(temporary_dir, 'Soil')
if not os.path.exists(soil_dir):
    os.makedirs(soil_dir)
# define temporary directory
soil_temp_dir = os.path.join(temporary_dir,'Soil')
if not os.path.exists(soil_temp_dir):
  os.makedirs(soil_temp_dir)

print('\n-----------------------------------------------------------------------------------------------')
print('( ) Upload Soil')
print('-----------------------------------------------------------------------------------------------')
print(f'drag-and-drop soil file into following folder: {soil_temp_dir}')
response = input("Have you uploaded the soil file (yes or no): ")
if response == "yes":
  for soil_file in os.listdir(soil_temp_dir):
      if soil_file.endswith(".tif"):
        shp_file_path = os.path.join(main_dir, 'shapefile')
        # step 1
        shp_file_name = check_projection(shp_file_path, main_dir, temporary_dir)
        # step 2
        soil_lyr, crop_extent = overlay_shp_on_soil(temporary_dir, shp_file_name)
        # step 3
        clip_and_format(soil_dir, soil_lyr, crop_extent, temporary_dir)
        # step 4
        visualize_soil(soil_dir, main_dir)
        remove_temp_data(temporary_dir)
      elif soil_file.endswith(".shp"):
        upload_soil(soil_temp_dir, main_dir)
        remove_temp_data(temporary_dir)


In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

data = {
    "_comment": "------------ 1e) SOIL ---------------",
    "generate_soil": "no",
    "upload_soil": "yes",
}

# Specify the file path where you want to save the JSON file
file_path = os.path.join(config_path,"1e_soil.json")

# Writing data to the JSON file
with open(file_path, 'w') as json_file:
    json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


### <font color=grey> **Generate Soil** </font>

Check out the short video [Handling Soil Data with Magpie Workflow](https://youtu.be/vobTEKVu9iw) for more information.

In [None]:
# check libraries
libraries_to_check = ["requests", "shapely"]
check_and_install_libraries(libraries_to_check)

import ee
import requests
from shapely.geometry import box
from IPython.display import display

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

# Trigger the authentication flow.
service_account = 'magpie-developer@magpie-id-409519.iam.gserviceaccount.com'
credentials = ee.ServiceAccountCredentials(service_account, os.path.join(main_dir,'extras','magpie-key.json'))

# Initialize the library.
ee.Initialize(credentials)

def check_projection(shp_file_path, main_dir, temporary_dir):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Check projection of shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Find the name of the shapefile in the given path
    for shp_file in os.listdir(shp_file_path):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    # Print shapefile information
    print('Shapefile name: ', shp_file_name)
    print('Shapefile path: ', os.path.join(main_dir, "shapefile", shp_file_name))

    # Read the shapefile into a GeoDataFrame to check its CRS
    shp_lyr_check = gpd.read_file(os.path.join(main_dir, "shapefile", shp_file_name))
    print('Shapefile CRS: ', shp_lyr_check.crs)

    # Create a temporary directory if it doesn't exist
    temp_dir = os.path.join(temporary_dir)
    if not os.path.exists(temp_dir):
        os.makedirs(temp_dir)

    # Check if the CRS is EPSG:4326, if not, reproject the shapefile
    if shp_lyr_check.crs != 'EPSG:4326':
        # Reproject the shapefile layer to match EPSG:4326
        shp_lyr_crs = shp_lyr_check.to_crs(epsg=4326)
        shp_lyr_crs.to_file(os.path.join(temporary_dir, shp_file_name))
        print('Shapefile layer has been reprojected to match EPSG:4326')
    else:
        shp_lyr_crs = shp_lyr_check
        print('Coordinate systems match!')
        shp_lyr_crs.to_file(os.path.join(temporary_dir, shp_file_name))

    # Return the name of the reprojected shapefile
    return shp_file_name

def download_soil(shp_file_path, shp_file_name, data_source, band_name, scale, temporary_dir):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Download Soil')
    print('-----------------------------------------------------------------------------------------------')

    # Define buffer size
    buffer_size = 0.3

    # Determine the boundary of the provided shapefile
    bounds = gpd.read_file(os.path.join(temporary_dir, shp_file_name)).bounds
    west, south, east, north = bounds = bounds.loc[0]
    west -= buffer_size
    south -= buffer_size
    print('Bounding box: ', west, south, east, north)

    # Concatenate input data to generate the full path
    #full_data_source = f'{data_source}/{year_of_interest}_01_01'
    full_data_source = f'{data_source}'

    # Create an Earth Engine image
    img = ee.Image(full_data_source)
    region = ee.Geometry.BBox(west, south, east, north)

    # Get the download URL for the image
    url = img.getDownloadUrl({
        'bands': [band_name],
        'region': region,
        'scale': scale,
        'format': 'GEO_TIFF'
    })

    # Define the output directory
    soil_dir = os.path.join(temporary_dir, 'Soil')
    if not os.path.exists(soil_dir):
        os.makedirs(soil_dir)

    # Download the image using the generated URL
    response = requests.get(url)
    with open(os.path.join(soil_dir, 'study_area_soil.tif'), 'wb') as fd:
        fd.write(response.content)

    # Define paths
    clipped_bounds = os.path.join(soil_dir, 'study_area_soil.tif')
    soil_filled = os.path.join(soil_dir, 'soil_filled.tif')

    # Define the directory for bash scripts
    bash_dir = os.path.join(temporary_dir, 'bash_scripts')
    if not os.path.exists(bash_dir):
        os.makedirs(bash_dir)
        print("Created folder:", bash_dir)

    # GDAL fill no data
    with open(os.path.join(bash_dir, 'fillnodata.sh'), 'w') as f1:
        print(f'gdal_fillnodata.py -md 10 -b 1 -of GTiff "{clipped_bounds}" "{soil_filled}"', file=f1)

    # Format and run the bash command
    fill_data = os.path.join(bash_dir, 'fillnodata.sh')
    subprocess.run(['bash', fill_data])

    # Remove the old unfilled soil layer
    os.remove(clipped_bounds)

    return soil_dir

def overlay_shp_on_soil(temporary_dir, shp_file_name):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Overlay shapefile on soil')
    print('-----------------------------------------------------------------------------------------------')

    # Find the name of the raster file in the temporary directory
    for tif_file in os.listdir(os.path.join(temporary_dir, 'Soil')):
        if tif_file.endswith(".tif"):
            tif_file_name = tif_file

    # Open the soil raster
    soil_lyr = rxr.open_rasterio(os.path.join(temporary_dir, 'Soil', tif_file_name), masked=True).squeeze()

    # Load the shapefile
    crop_extent = gpd.read_file(os.path.join(temporary_dir, shp_file_name))

    # Print CRS information
    print('Shapefile CRS: ', crop_extent.crs)
    print('Soil CRS: ', soil_lyr.rio.crs)

    # Check if coordinate systems match, reproject if necessary
    if crop_extent.crs != soil_lyr.rio.crs:
        soil_lyr = soil_lyr.rio.reproject(crop_extent.crs)
        print('Soil layer has been reprojected to match the shapefile')
    else:
        print('Coordinate systems match!')

    # Plot the overlay
    f, ax = plt.subplots(figsize=(10, 5))
    soil_lyr.plot.imshow(ax=ax)
    crop_extent.plot(ax=ax, alpha=0.8, color="black")
    ax.set(title="Raster Layer with Shapefile Overlayed")
    ax.set_axis_off()

    soil_shapfile_visualization = True

    if soil_shapfile_visualization:
        plt.show()

    # Return the soil layer and crop extent
    return soil_lyr, crop_extent

def clip_and_format(soil_dir, soil_lyr, crop_extent, temporary_dir):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Clip and format soil into shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Clip soil product to the study area
    # Open the crop extent (the study area extent boundary)
    crop_extent_buffered = crop_extent.buffer(.001)

    # Clip the soil layer
    soil_clipped = soil_lyr.rio.clip(crop_extent_buffered, crop_extent_buffered.crs)
    print('Soil layer has been clipped')

    # Save the clipped soil layer to the temporary folder
    path_to_tif_file = os.path.join(soil_dir, 'clipped_lyr.tif')
    soil_clipped.rio.to_raster(path_to_tif_file)
    print('Layer has been saved to the temporary folder')

    # Define necessary paths
    clip_name = 'studyArea_outline'
    clipped_lyr = os.path.join(soil_dir, 'clipped_lyr.tif')
    soil_shp = os.path.join(soil_dir, 'studyArea_soil.shp')

    # Define the bash script directory
    bash_dir = os.path.join(temporary_dir, 'bash_scripts')
    if not os.path.exists(bash_dir):
        os.makedirs(bash_dir)
        print("Created folder: ", bash_dir)

    # GDAL polygonize
    with open(os.path.join(bash_dir, 'polygon.sh'), 'w') as f3:
        print(f'gdal_polygonize.py "{clipped_lyr}" "{soil_shp}" -b 1 -f "ESRI Shapefile"', file=f3)

    # Define the path to the bash command
    polygon_sh = os.path.join(bash_dir, 'polygon.sh')

    # Run the bash command
    subprocess.run(['bash', polygon_sh])

def visualize_soil(soil_dir, main_dir):
    # Print section header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Visualize soil outputs')
    print('-----------------------------------------------------------------------------------------------')

    # Find the name of the shapefile in the soil directory
    for soil_shp_file in os.listdir(soil_dir):
        if soil_shp_file.endswith(".shp"):
            soil_shp_file_name = soil_shp_file

    # Read the soil shapefile
    soil_shp = gpd.read_file(os.path.join(soil_dir, soil_shp_file_name))

    # Dissolve the soil shapefile by 'DN' field
    soil_dissolve = soil_shp.dissolve(by='DN')
    soil_dissolve["Soil_ID"] = soil_dissolve.index
    soil_final = soil_dissolve.reset_index()
    soil_final = soil_final.drop('DN', axis=1)

    # Display the dissolved soil shapefile
    display(soil_final)

    # Create the output directory for soil shapefile
    soil_output_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Soil')
    if not os.path.exists(soil_output_dir):
        os.makedirs(soil_output_dir)

    # Save the final soil shapefile to the output directory
    soil_final.to_file(os.path.join(soil_output_dir, 'studyArea_soil.shp'))

    print('------------------------------------- Soil -------------------------------------')

    # Read the final soil shapefile
    soil_final = gpd.read_file(os.path.join(soil_output_dir, 'studyArea_soil.shp'))

    # Plot the soil shapefile
    f, ax = plt.subplots(figsize=(8, 9))
    soil_final.plot(column='Soil_ID', categorical=True, legend=True, ax=ax)
    ax.set_axis_off()
    plt.show()

    # Calculate and print the percentage area for each soil type
    soil_final['area'] = soil_final.area
    df_S1 = pd.DataFrame(soil_final.drop(columns='geometry'))
    df_soil = df_S1.groupby('Soil_ID').sum()
    df_soil.reset_index(inplace=True)

    # Collect unique soil IDs
    unique_soil = df_soil['Soil_ID'].unique()
    un_soi_len_lst = list(range(unique_soil.size))

    for val_s in unique_soil:
        area_poly_S = df_soil.loc[df_soil['Soil_ID'] == val_s, 'area']
        soil_area = float((area_poly_S / df_soil['area'].sum()) * 100)
        for num in un_soi_len_lst:
            if val_s == unique_soil[num]:
                print(f'ID ({val_s}): {round(soil_area, 2)}%')

    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Soil is complete!')
    print('-----------------------------------------------------------------------------------------------')

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

In [None]:
#@markdown <font color=#5559AB> **Download OpenLandMap Soil Texture Class data** </font>

#@markdown By default, Magpie is set up to download OpenLandMap Soil Texture Class data which is available at a yearly interval. However, by changing the data source, band name, and scale (information available in Earth Engine Data Catalog) users can download other soil datasets available on Google Earth Engine

#@markdown Please refer to the [Earth Engine Data Catalog](https://developers.google.com/earth-engine/datasets/catalog) to view other data source options

#@markdown Note** - users may need to adjust the scale size to 90, if downloading for a larger study area

data_source = "OpenLandMap/SOL/SOL_TEXTURE-CLASS_USDA-TT_M/v02" #@param {type:"string"}

#year_of_interest = 2000 #@param {type:"string"}

band_name = "b0" #@param {type:"string"}

scale = 90 #@param

# define path
shp_file_path = os.path.join(main_dir, 'shapefile')

# determine shapefile name
shp_file_name = check_projection(shp_file_path, main_dir, temporary_dir)

soil_dir = download_soil(shp_file_path, shp_file_name, data_source, band_name, scale, temporary_dir)


In [None]:
#@markdown <font color=grey> **Clip and Format** </font>

soil_lyr, crop_extent = overlay_shp_on_soil(temporary_dir, shp_file_name)
# step 3
clip_and_format(soil_dir, soil_lyr, crop_extent, temporary_dir)

In [None]:
#@markdown <font color=grey> **Visualize Landcover Layer** </font>

visualize_soil(soil_dir, main_dir)


In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download


# delete temorary folder
remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

if write_to_configuration_file == True:
  data = {
      "_comment": "------------ 1e) SOIL ---------------",
      "generate_soil": "yes",
      "upload_soil": "no",
      "data_source_landcover": f"{data_source}",
      "band_name_landcover": f"{band_name}",
      "scale_landcover": scale,
  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path,"1e_soil.json")

  # Writing data to the JSON file
  with open(file_path, 'w') as json_file:
      json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


**References**

Hengl, T. (2018). Clay Content in%(Kg/Kg) at 6 Standard Depths (0, 10, 30, 60, 100 and 200 Cm) at 250 M Resolution (Version v02)[Data Set].

## <font color=#5559AB> 1f) Classification </font>

This section allows users to define land cover, vegetation, and soil classifications. The data is saved in a CSV file and is used to classify the RVH and RVP Raven model files.

Users can choose a classification option from the drop bar (arrow) or double-click to create their own classifications.


### <font color=grey> **Upload Classifications** </font>

In [None]:
#@markdown <font color=grey> **Upload Landcover Classification** </font> <br>

#@markdown Here users can upload a landcover classification (.csv) of their study area

# define landcover directory
land_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Landcover')
# create if it does note exists
if not os.path.exists(land_dir):
    os.makedirs(land_dir)
    print("created  folder: ", land_dir)
print('\n-----------------------------------------------------------------------------------------------')
print('Upload landcover classification')
print('-----------------------------------------------------------------------------------------------')
print(f'Please drag-and-drop the file into the following folder: {land_dir}')



In [None]:
#@markdown <font color=grey> **Upload Vegetation Classification** </font> <br>

#@markdown Here users can upload a vegetation classification (.csv) of their study area

# define vegetation directory
veg_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Vegetation')
# create if it does note exists
if not os.path.exists(veg_dir):
    os.makedirs(veg_dir)
    print("created  folder: ", veg_dir)
print('\n-----------------------------------------------------------------------------------------------')
print('Upload vegetation classification')
print('-----------------------------------------------------------------------------------------------')
print(f'Please drag-and-drop the file into the following folder: {veg_dir}')


In [None]:
#@markdown <font color=grey> **Upload Soil Classification** </font> <br>

#@markdown Here users can upload a soil classification (.csv) of their study area

# define soil directory
soil_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Soil')
# create if it does note exists
if not os.path.exists(soil_dir):
    os.makedirs(soil_dir)
    print("created  folder: ", soil_dir)
print('\n-----------------------------------------------------------------------------------------------')
print('Upload vegetation classification')
print('-----------------------------------------------------------------------------------------------')
print(f'Please drag-and-drop the file into the following folder: {soil_dir}')

### <font color=grey> **Generate Classifications** </font>

Check out the short video [Preparing Data Classifications with Magpie Workflow](https://youtu.be/W9udFXIZNyw) for more information

In [None]:
#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

def landcover_classification(main_dir, landuse_ID, landuse_Classifications):
    """
    Classify landcover based on provided information.

    Parameters:
    - main_dir (str): Main directory path.
    - landuse_ID (str): Comma-separated string of landuse IDs.
    - landuse_Classifications (str): Comma-separated string of landuse classifications.
    """

    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Landcover classification')
    print('-----------------------------------------------------------------------------------------------')

    land_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Landcover')

    if not os.path.exists(land_dir):
        os.makedirs(land_dir)

    if landuse_ID == "NA":
        print('Please upload landcover csv file to: ', land_dir)
    else:
        landID_val = list(landuse_ID.split(","))
        landclass_val = list(landuse_Classifications.split(","))

        land_class = {
            'Landuse_ID': landID_val,
            'LAND_USE_C': landclass_val
        }

        land_class_df = pd.DataFrame(land_class)
        land_class_df['Landuse_ID'] = land_class_df['Landuse_ID'].astype(str)

        print(land_class_df)

        land_class_df.to_csv(os.path.join(land_dir, 'landcover_info.csv'), index=False)

def vegetation_classification(main_dir, veg_ID, veg_Classifications):
    """
    Classify vegetation based on provided information.

    Parameters:
    - main_dir (str): Main directory path.
    - veg_ID (str): Comma-separated string of vegetation IDs.
    - veg_Classifications (str): Comma-separated string of vegetation classifications.
    """

    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Vegetation classification')
    print('-----------------------------------------------------------------------------------------------')

    veg_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Landcover')

    if not os.path.exists(veg_dir):
        os.makedirs(veg_dir)

    if veg_ID == "NA":
        print('Please upload vegetation csv file to: ', veg_dir)
    else:
        vegID_val = list(veg_ID.split(","))
        vegClass_val = list(veg_Classifications.split(","))

        veg_class = {
            'Veg_ID':  vegID_val,
            'VEG_C': vegClass_val
        }

        veg_class_df = pd.DataFrame(veg_class)
        veg_class_df['Veg_ID'] = veg_class_df['Veg_ID'].astype(str)

        print(veg_class_df)

        veg_class_df.to_csv(os.path.join(veg_dir, 'veg_info.csv'), index=False)

def soil_classification(main_dir, soil_ID, soil_Classifications):
    """
    Classify soil based on provided information.

    Parameters:
    - main_dir (str): Main directory path.
    - soil_ID (str): Comma-separated string of soil IDs.
    - soil_Classifications (str): Comma-separated string of soil classifications.
    """

    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Soil classification')
    print('-----------------------------------------------------------------------------------------------')

    soil_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'Soil')

    if not os.path.exists(soil_dir):
        os.makedirs(soil_dir)

    if soil_ID == "NA":
        print('Please upload soil csv file to: ', soil_dir)
    else:
        soilID_val = list(soil_ID.split(","))
        soilClass_val = list(soil_Classifications.split(","))

        soil_class = {
            'Soil_ID':  soilID_val,
            'SOIL_PROF': soilClass_val
        }

        soil_class_df = pd.DataFrame(soil_class)
        soil_class_df.to_csv(os.path.join(soil_dir, 'soil_info.csv'), index=False)

        print(soil_class_df)

#### <font color=#5559AB> **Generate Landcover Classification Table** </font>

##### <font color=grey> **USGS Landcover Classification Table** </font>

In [None]:
#@markdown **Define landcover classification ID's:**

#@markdown include -1 for BasinMaker for lake classification
landuse_ids = '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,-1' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

#@markdown **Define landcover classification's:**
landuse_classifications = 'FOREST,FOREST,FOREST,FOREST,FOREST,FOREST,SHRUBLAND,SHRUBLAND,GRASSLAND,GRASSLAND,SHRUBLAND,GRASSLAND,BARREN,WETLAND,CROPLAND,BARREN,URBAN,WATER,WATER,LAKE' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

# run function
landcover_classification(main_dir, landuse_ids, landuse_classifications)

##### <font color=grey> **MODIS Landcover Classification Table** </font>

In [None]:
#@markdown **Define landcover classification ID's:**
landuse_ids = '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,-1' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

#@markdown **Define landcover classification's:**
landuse_classifications = 'FOREST,FOREST,FOREST,FOREST,FOREST,SHRUBLAND,SHRUBLAND,GRASSLAND,GRASSLAND,GRASSLAND,WETLAND,CROPLAND,BARREN,URBAN,CROPLAND,WATER,WATER,LAKE' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

# run function
landcover_classification(main_dir, landuse_ids, landuse_classifications)


##### <font color=grey> **Other Landcover Type Classification Table** </font>

In [None]:
#@markdown **Define landcover classification ID's:**
landuse_ids = '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,-1' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

#@markdown **Define landcover classification's:**
landuse_classifications = 'FOREST,FOREST,FOREST,FOREST,FOREST,SHRUBLAND,SHRUBLAND,GRASSLAND,GRASSLAND,GRASSLAND,WETLAND,CROPLAND,BARREN,URBAN,CROPLAND,WATER,WATER,LAKE' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

# run function
landcover_classification(main_dir, landuse_ids, landuse_classifications)

#### <font color=#5559AB> **Generate Vegetation Classification Table** </font>

##### <font color=grey> **USGS Vegetation Classification Table** </font>

In [None]:
#@markdown **Define vegetation classification ID's:**

#@markdown include -1 for BasinMaker for lake classification
veg_ids = '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,-1' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

#@markdown **Define vegetation classification's:**
veg_classifications = 'FOREST,FOREST,FOREST,FOREST,FOREST,FOREST,SHRUBLAND,SHRUBLAND,GRASSLAND,GRASSLAND,SHRUBLAND,GRASSLAND,BARREN,WETLAND,CROPLAND,BARREN,URBAN,WATER,WATER,LAKE' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

# run function
vegetation_classification(main_dir, veg_ids, veg_classifications)

##### <font color=grey> **MODIS Vegetation Classification Table** </font>

In [None]:
#@markdown **Define vegetation classification ID's:**
veg_ids = '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,-1' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

#@markdown **Define vegetation classification's:**
veg_classifications = 'FOREST,FOREST,FOREST,FOREST,FOREST,SHRUBLAND,SHRUBLAND,GRASSLAND,GRASSLAND,GRASSLAND,WETLAND,CROPLAND,BARREN,URBAN,CROPLAND,WATER,WATER,LAKE' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

# run function
vegetation_classification(main_dir, veg_ids, veg_classifications)


##### <font color=grey> **Other Landcover Type Classification Table** </font>

In [None]:
#@markdown **Define vegetation classification ID's:**
veg_ids = '1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,-1' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

#@markdown **Define vegetation classification's:**
veg_classifications = 'FOREST,FOREST,FOREST,FOREST,FOREST,SHRUBLAND,SHRUBLAND,GRASSLAND,GRASSLAND,GRASSLAND,WETLAND,CROPLAND,BARREN,URBAN,CROPLAND,WATER,WATER,LAKE' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

# run function
vegetation_classification(main_dir, veg_ids, veg_classifications)

#### <font color=#5559AB> **Generate Soil Classification Table** </font>

In [None]:
#@markdown **Define soil classification ID's:**
soil_ids = '1,2,3,4,5,6,7,8,9,10,11,12,-1' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

#@markdown **Define soil classification's:**
soil_classifications = 'SAND,LOAMY_SAND,SANDY_LOAM,SILT_LOAM,SILT,LOAM,SANDY_CLAY_LOAM,SILT_CLAY_LOAM,CLAY_LOAM,SANDY_CLAY,SILT_CLAY,CLAY,LAKE' # @param {type:"string"}
#@markdown _include commas in between values, no spaces_

# run function
soil_classification(main_dir, soil_ids, soil_classifications)


#### <font color=grey> **Write Model Decisions to Configuration File**

In [None]:
#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

if write_to_configuration_file == True:
  data = {
      "_comment": "------------ 1f) CLASSIFICATION ---------------",
      "_comment": "--- Landcover ---",
      "landuse_ID": f"{landuse_ids}",
      "landuse_Classifications": f"{landuse_classifications}",

      "_comment": "--- Vegetation ---",
      "veg_ID": f"{veg_ids}",
      "veg_Classifications": f"{veg_classifications}",

      "_comment": "--- Soil ---",
      "soil_ID": f"{soil_ids}",
      "soil_Classifications": f"{soil_classifications}",
  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path,"1f_classification.json")

  # Writing data to the JSON file
  with open(file_path, 'w') as json_file:
      json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


# **2.0 Discretize Basin**

Semi-distributed/distributed hydrological models are built on the concept of the hydrological response unit (HRU). An HRU is the smallest unit containing unique geospatial information. Thus, defining HRUs is the process to overlay and union different geospatial data layers, such as land cover, soil, and elevation band.

To do so, Magpie utilizes the light version of BasinMaker, developed by Han et al. (2021), to generate HRUs. Please visit the [BasinMaker website](http://hydrology.uwaterloo.ca/basinmaker/) for more information

Inputs required for BasinMaker to generate HRUs are flexible and include:
*   DEM
*   elevation bands
*   aspect
*   landcover
*   soil layers

Be sure to visit section "2.6 Classification" to define the landcover, vegetation, and soil classifications that are stored in a CSV and used in the RVH and RVP Raven model files.

If users are planning to use BasinMaker it is **strongly encouraged** that the shapefile is generated in section 2.1 Study Area and that the corresponding input layers match/exceed the extent of that shapefile.

Check out the short video [Basin Discretization with BasinMaker Light: Magpie Tutorial](https://youtu.be/h5qO7jB0rxA) for more information

<font color=#5559AB> **Download Routing Product for Catchment of Interest** </font> <br>

The North American Lake-River Routing Product (version 2.1) covers the main drainage regions across North America (Canada and the USA). The Ontario Lake River Routing Product(version 1.0) covers the main drainage regions across the Ontario Province, Canada. <br>
Both the NA routing product and the OLRRP provide sub-region-wise products for download. <br>
BasinMaker has two built-in functions for routing product downloads, which are named Download_Routing_Product_For_One_Gauge and Download_Routing_Product_From_Points_Or_LatLon.<br><br>
This leaves us two options for data download: <br>

<font color=#5559AB> **Option #1**: Provide function Download_Routing_Product_For_One_Gauge with gauge ID. </font> <br>

<font color=#5559AB>**Option #2**: Provide function Download_Routing_Product_From_Points_Or_LatLon with the outlet coordinates (lat-lon in degree decimals). </font><br><br>

*All options are for BasinMaker to find out the subbasin ID of the outlet subbasin. BasinMaker will use subbasin ID to extract the drainage areas.*<br>
<font color=grey>The map above or the following link can be used to help identify gauges of interest: </font>
https://wateroffice.ec.gc.ca/search/historical_e.html

### <font color=#5559AB>2a) Subbasins

#### <font color=grey> **Upload Subbasin Derived from Full BasinMaker** </font>

Magpie offers post-processing with BasinMaker light. Users can use the full installation version on their local machines, either through the ArcGIS Pro and GRASS/QGIS GIS python environments and then upload their derived products to Magpie to complete post-processing steps like HRU delineation or RVH/RVP generation

Intructions on how to install the full version of BasinMaker is available [here](https://basinmaker.readthedocs.io/en/latest/installation.html#full-installation)

In [None]:
# check libraries
libraries_to_check = ["ipyleaflet"]
check_and_install_libraries(libraries_to_check)

from ipyleaflet import Map, GeoJSON

# Define another folder that will save the outputs
folder_product_after_increase_catchment_drainage_area = os.path.join(temporary_dir, 'drainage_area')

def remove_temp_data(product_path,temporary_dir):
    """
    Remove temporary data and zip files.

    Parameters:
    - product_path (str): Path to the product folder.
    """
    if os.path.exists(temporary_dir):
        shutil.rmtree(temporary_dir)
    if os.path.exists(product_path):
        shutil.rmtree(product_path)
    zip_files_rm = glob(os.path.join(product_path, "*.zip"))
    for files_rm in zip_files_rm:
        os.remove(files_rm)

# Define the product name
#@markdown <font color=#5559AB> **Define the product name** </font><br>
#@markdown 'OLRRP'  to use the Ontario Lake-River Routing Product(version 2.0)<br>
#@markdown 'NALRP' to use the North American Lake-River Routing Product (version 2.1)</font><br>
# define version number
version_num = 'v2-1' #@param ["v2-1", "v2-0"] {type:"raw"}

# define product path
product_path = 'None'

routing_temp_dir = os.path.join(main_dir, 'workflow_outputs', 'routing_product')
if not os.path.exists(routing_temp_dir):
    os.makedirs(routing_temp_dir)

print('\n-----------------------------------------------------------------------------------------------')
print('( ) Upload routing product')
print('-----------------------------------------------------------------------------------------------')
print(f'drag-and-drop routing product files into following folder: {routing_temp_dir}')
response = input("Have you uploaded the related subbasin shapefiles (.shp) file (yes or no): ")

if response.lower() == "yes":
    # -- Visualize --
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Visualize Simplified Routing Product')
    print('-----------------------------------------------------------------------------------------------')
    # define routing number
    routing_product_version_number = version_num
    # plot product
    display(plot_routing_product_with_ipyleaflet(path_to_product_folder=routing_temp_dir, version_number=routing_product_version_number))

    # Remove temporary data after visualization
    remove_temp_data(product_path,temporary_dir)


In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

data = {
    "_comment": "------------ 2) DISCRETIZATION ---------------",

    "_comment": "------------ 2a) BASINMAKER ROUTING PRODUCT ---------------",

    "upload_routing_product": "yes",
}

# Specify the file path where you want to save the JSON file
file_path = os.path.join(config_path,"2a_routing_product.json")

# Writing data to the JSON file
with open(file_path, 'w') as json_file:
    json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


#### <font color=grey> **Derive Subbasin with Light BasinMaker** </font>

In [None]:
# check libraries
libraries_to_check = ["folium","simpledbf","branca",
                      "time","ipyleaflet","ipywidgets","pathlib"]
check_and_install_libraries(libraries_to_check)

!python -m pip install https://github.com/dustming/basinmaker/archive/master.zip

import time
import simpledbf
import zipfile
from pathlib import Path
from IPython.display import display
from basinmaker import basinmaker
from ipywidgets import HTML,Layout,IntSlider, ColorPicker, jslink ## only needed to plot figures
from ipyleaflet import Map, GeoData, basemaps, LayersControl,Popup,Marker,Polygon,Choropleth,WidgetControl## only needed to plot figures
from basinmaker.postprocessing.plotleaflet import plot_routing_product_with_ipyleaflet
from basinmaker.postprocessing.downloadpd import Download_Routing_Product_For_One_Gauge
from basinmaker.postprocessing.downloadpdptspurepy import Download_Routing_Product_From_Points_Or_LatLon
from basinmaker.postprocessing.downloadpdptspurepy import Extract_Routing_Product

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

# Define functions
def download_routing_product(product_name, gauge_name, define_lat, define_lon):
    """
    Download routing product based on provided information.

    Parameters:
    - product_name (str): Name of the routing product.
    - gauge_name (str): Gauge name for downloading by gauge.
    - city_name (str): City name for downloading by city coordinates.
    - define_lat (str): Latitude for downloading by specified coordinates.
    - define_lon (str): Longitude for downloading by specified coordinates.

    Returns:
    - product_path (str): Path to the downloaded routing product.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Download routing product')
    print('-----------------------------------------------------------------------------------------------')

    if gauge_name != "NA":
        subid, product_path = Download_Routing_Product_For_One_Gauge(gauge_name=gauge_name, product_name=product_name)
        print('Successfully downloaded routing product using the gauge name!')
    elif define_lat != 'NA':
        lat, lon = float(define_lat), float(define_lon)
        print("Study area coordinates:", lat, lon)
        coords = pd.DataFrame({'lat': [lat], 'lon': [lon]})
        subid, product_path = Download_Routing_Product_From_Points_Or_LatLon(product_name=product_name,
                                                                             Lat=coords['lat'], Lon=coords['lon'])
        print('Successfully downloaded routing product using the specified coordinates!')

    # Check if the product path exists
    if os.path.exists(product_path):
        print('Product path exists:', product_path)
    else:
        print('Product path does not exist:', product_path)
        zip_path = f'{product_path}.zip'

        # Check if a zip file with the same name exists
        if os.path.exists(zip_path):
            print('Zip file found. Extracting:', zip_path)

            # Extract the zip file to the same directory as the product path
            with zipfile.ZipFile(zip_path, 'r') as zip_ref:
                zip_ref.extractall(os.path.dirname(product_path))
            print('Extraction complete.')
        else:
            print('No zip file found with the same name:', zip_path)

    return product_path

def extract_drainage_area(product_path, subid_of_interested_gauges, most_up_stream_subbasin_ids):
    """
    Extract drainage area based on provided subbasin IDs.

    Parameters:
    - product_path (str): Path to the downloaded and unzipped lake-river routing product folder.
    - subid_of_interested_gauges (str): Subbasin ID where the gauge is situated.
    - most_up_stream_subbasin_ids (str): Most upstream subbasin IDs for extraction.

    Returns:
    - folder_product_for_interested_gauges (str): Path to the folder containing the extracted drainage area.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Extract drainage area')
    print('-----------------------------------------------------------------------------------------------')

    # Define subbasin ID lists
    subid_of_interested_gauges_lst = [subid_of_interested_gauges]
    most_up_stream_subbasin_ids_lst = [most_up_stream_subbasin_ids]

    # Define the folder path for downloaded and unzipped lake-river routing product folder
    unzip_routing_product_folder = product_path

    # Define another folder that will save the outputs
    folder_product_for_interested_gauges = os.path.join(temporary_dir, 'catchment_extraction')

    # Initialize the basinmaker
    start = time.time()
    bm = basinmaker.postprocess()

    # Extract subregion of the routing product
    bm.Select_Subregion_Of_Routing_Structure(
        path_output_folder=folder_product_for_interested_gauges,
        routing_product_folder=unzip_routing_product_folder,
        most_down_stream_subbasin_ids=subid_of_interested_gauges_lst,
        most_up_stream_subbasin_ids=most_up_stream_subbasin_ids_lst,
        gis_platform="purepy",
    )
    end = time.time()
    print("This section took ", end - start, " seconds")

    return folder_product_for_interested_gauges

def remove_small_lakes(lake_size):
    """
    Remove small lakes based on the provided lake size threshold.

    Parameters:
    - lake_size (float): Lake size threshold (unit: km^2).

    Returns:
    - folder_product_after_filter_lakes (str): Path to the folder containing the drainage network after removing small lakes.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Remove small lakes')
    print('-----------------------------------------------------------------------------------------------')

    # Print all lake ID file paths in the temporary directory
    if os.path.exists(os.path.join(temporary_dir, 'catchment_extraction*', 'sl_connected_lake_v*.shp')):
      for lake_ID_file_path in glob(os.path.join(temporary_dir, 'catchment_extraction*', 'sl_connected_lake_v*.shp')):
          print(lake_ID_file_path)
          # Read the lake ID file
          lake_ID_file = gpd.read_file(lake_ID_file_path)

          # Filter lakes based on size
          small_lakes = lake_ID_file[lake_ID_file["Lake_area"] > lake_size]

          # Extract lake IDs
          lake_list = (small_lakes['Hylak_id'].unique()).tolist()
          lake_IDs = [i for i in lake_list if i != 0]
          print("Lake IDs: ", lake_IDs)

          # Define the output folder for interested gauges
          folder_product_for_interested_gauges = os.path.join(temporary_dir, 'catchment_extraction')

          # Define the input product folder path, which is the output folder of the previous section
          input_routing_product_folder = folder_product_for_interested_gauges

          # Define a list containing HyLakeId IDs of lakes of interest
          interested_lake_ids = lake_IDs

          # Update the variable inside the if block
          folder_product_after_filter_lakes = os.path.join(temporary_dir, 'filter_lakes')

          start = time.time()

          # Remove small lakes using BasinMaker
          bm = basinmaker.postprocess()

          bm.Remove_Small_Lakes(
              path_output_folder=folder_product_after_filter_lakes,
              routing_product_folder=input_routing_product_folder,
              connected_lake_area_thresthold=lake_size,
              non_connected_lake_area_thresthold=lake_size,
              selected_lake_ids=interested_lake_ids,
              gis_platform="purepy",
          )
          end = time.time()
          print("This section took  ", end - start, " seconds")
    else:
      # Define the output folder for interested gauges
      folder_product_for_interested_gauges = os.path.join(temporary_dir, 'catchment_extraction')
      folder_product_after_filter_lakes = folder_product_for_interested_gauges

    return folder_product_after_filter_lakes

def simplify_drainage_area(minimum_subbasin_drainage_area):
    """
    Simplify the drainage product by increasing the size of subbasins.

    Parameters:
    - minimum_subbasin_drainage_area (float): Minimum drainage area of subbasins (unit: km^2).

    Returns:
    - folder_product_after_increase_catchment_drainage_area (str): Path to the folder containing the drainage network after simplification.
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Simplify drainage product')
    print('-----------------------------------------------------------------------------------------------')

    # Print information about the process
    print(f"Simplifying drainage product with a minimum subbasin drainage area of {minimum_subbasin_drainage_area} km^2")

    # Define the input folder path, which is the output folder of the previous section
    if os.path.exists(os.path.join(temporary_dir, 'catchment_extraction*', 'sl_connected_lake_v*.shp')):
      folder_product_after_filter_lakes = os.path.join(temporary_dir, 'filter_lakes')
    else:
      folder_product_after_filter_lakes = os.path.join(temporary_dir, 'catchment_extraction')
    input_routing_product_folder = folder_product_after_filter_lakes

    # Define the output folder after increasing catchment drainage area
    folder_product_after_increase_catchment_drainage_area = os.path.join(temporary_dir, 'drainage_area')

    # Initialize the basinmaker
    start = time.time()
    bm = basinmaker.postprocess()

    # Remove river reaches and increase the size of subbasins
    bm.Decrease_River_Network_Resolution(
        path_output_folder=folder_product_after_increase_catchment_drainage_area,
        routing_product_folder=input_routing_product_folder,
        minimum_subbasin_drainage_area=minimum_subbasin_drainage_area,
        gis_platform="purepy",
    )
    end = time.time()
    print("This section took  ", end - start, " seconds")

    return folder_product_after_increase_catchment_drainage_area

def save_routing_product(version_number):
    """
    Save the simplified routing product to the drive.

    Parameters:
    - version_number (str): Version number of the routing product.

    Returns:
    - None
    """
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Save routing product')
    print('-----------------------------------------------------------------------------------------------')

    # Define the routing directory
    routing_dir = os.path.join(main_dir, 'workflow_outputs', 'routing_product')

    # Create the routing directory if it doesn't exist
    if not os.path.exists(routing_dir):
        os.makedirs(routing_dir)
        print("created folder:", routing_dir)

    # Define paths for different routing product files
    finalcat_info_riv_path = Path(os.path.join(temporary_dir, 'drainage_area', f'finalcat_info_riv_{version_number}.shp'))
    finalcat_info_path = Path(os.path.join(temporary_dir, 'drainage_area', f'finalcat_info_{version_number}.shp'))
    sl_connected_lake_path = Path(os.path.join(temporary_dir, 'drainage_area', f'sl_connected_lake_{version_number}.shp'))
    sl_non_connected_lake_path = Path(os.path.join(temporary_dir, 'drainage_area', f'sl_non_connected_lake_{version_number}.shp'))

    # Helper function to reproject and save GeoDataFrame
    def reproject_and_save(gdf, output_path):
        if gdf.exists():
            gdf_data = gpd.read_file(gdf)
            # Reproject if needed
            if gdf_data.crs != 'EPSG:4326':
                gdf_data = gdf_data.to_crs(epsg=4326)
            # Save the GeoDataFrame
            gdf_data.to_file(output_path)

    # Reproject and save each routing product file
    reproject_and_save(finalcat_info_riv_path, os.path.join(routing_dir, f'finalcat_info_riv_{version_number}.shp'))
    reproject_and_save(finalcat_info_path, os.path.join(routing_dir, f'finalcat_info_{version_number}.shp'))
    reproject_and_save(sl_connected_lake_path, os.path.join(routing_dir, f'sl_connected_lake_{version_number}.shp'))
    reproject_and_save(sl_non_connected_lake_path, os.path.join(routing_dir, f'sl_non_connected_lake_{version_number}.shp'))

def remove_temp_data(main_dir, temporary_dir, product_path):
  print('\n-----------------------------------------------------------------------------------------------')
  print('( ) Remove unnecessary files')
  print('-----------------------------------------------------------------------------------------------')

  # Remove the temporary directory if it exists
  if os.path.exists(temporary_dir):
      shutil.rmtree(temporary_dir)
      print(f"Deleted temporary directory: {temporary_dir}")

  # Remove the product directory if it exists
  if os.path.exists(product_path):
      shutil.rmtree(product_path)
      print(f"Deleted product directory: {product_path}")

  # Remove all .zip files in the current directory
  zip_files_rm = glob("*.zip")
  for files_rm in zip_files_rm:
      os.remove(files_rm)
      print(f"Deleted zip file: {files_rm}")

  # Remove folders that start with "drainage_region" in the main directory
  for item in os.listdir('/content'):
      item_path = os.path.join('/content', item)
      if os.path.isdir(item_path) and item.startswith("drainage_region"):
          shutil.rmtree(item_path)
          print(f"Deleted folder: {item_path}")

In [None]:
#@markdown <font color=#5559AB> **Define the product name** </font><br>
#@markdown [OLRRP](https://uwaterloo-olrrp.shinyapps.io/OLRRP-V2/) to use the Ontario Lake-River Routing Product(version 2.0)<br>
#@markdown [NALRP](https://hydrology.uwaterloo.ca/basinmaker/download_regional.html) to use the North American Lake-River Routing Product (version 2.1)</font><br>

product_name = "NALRP"  #@param ["NALRP", "OLRRP"]

# Define version number
version_number = None

if product_name == "NALRP":
  version_number = 'v2-1'
elif product_name == "OLRRP":
  version_number = 'v2-0'

#@markdown <font color= #5559AB> **Option 1:**</font> Download routing product using gauge name<br>
#@markdown <font color=grey>example, "02GA024" <br>
#@markdown *If you rather utilize coordinates, enter "NA" into gauge_name* </font> <br>

gauge_name = "02GA024" #@param {type:"string"}

#@markdown <font color=#5559AB> **Option 2:** </font>
#@markdown Download routing product using coordiantes</font><br>

#@markdown <font color=grey> example, 43.4652699 -80.5222961 <br>
#@markdown *If you rather use the city name derived coordinates, leave this as "NA" (be sure to include " " around NA)* </font> <br>

define_lat = "NA" #@param {type:"string"}
define_lon = "NA" #@param {type:"string"}


if product_name == "NALRP":
    # download routing product
    product_path = download_routing_product(product_name, gauge_name, define_lat, define_lon)

if product_name == "OLRRP":
  if gauge_name != "NA":
      # define path
      product_path = os.path.join('/content','drainage_region_olrrp')

      # Download routing product using provided coordinates
      Extract_Routing_Product(version='v2-0', by='Obs_NM', obs_nm=gauge_name,output_path=product_path)
  else:
      print('Downloading the routing product with lat and lon for OLRRP is currently unavailable')

<font color= #5559AB> **Extract drainage area** </font><br>

The drainage area is extracted by utilizing the BasinMaker function `Select_Subregion_Of_Routing_Structure` to extract the routing product based on the subbasin ID.

A more detailed description of the function `Select_Subregion_Of_Routing_Structure` can be found [here](https://basinmaker.readthedocs.io/en/latest/basinmaker_tools.html#extract-the-region-of-interest).

The output of this function is the following GIS files that only covers the study area domain:

* finalcat_info : subbasin polygons respecting lakes (all subbasins in the figure below comes from this file)
* finalcat_info_riv : river network polylines in each subbasin polygon
* obs_gauges : streamflow observation gauges included in the routing product
* sl_connected_lake : the lake polygons of lakes that are connected by the finalcat_info_riv.shp
* sl_non_connected_lake : the lake polygons of lakes that are not connected by the finalcat_info_riv.shp

The explanation of these GIS files can be found in [BasinMaker website](http://hydrology.uwaterloo.ca/basinmaker/index.html).




In [None]:
#@markdown <font color= #5559AB> **Define subbasin ID** </font><br>
#@markdown BasinMaker needs the ID of subbasin (subId) which the gauge is situated in, example, 3086525 <br>

subid_of_interested_gauges = 3086525 #@param
most_up_stream_subbasin_ids = -1 #@param <br><br>

#@markdown the value of -1 for most_up_stream_subbasin_ids extracts to the most-upstream (headwater) subbasin whereas other subbasin ID's extract the areas from the outlet to the provided subbasin

# step 2
folder_product_for_interested_gauges = extract_drainage_area(product_path,subid_of_interested_gauges,most_up_stream_subbasin_ids)

# Define the path to the routing product folder
path_to_input_routing_product_folder = folder_product_for_interested_gauges
# plot product
plot_routing_product_with_ipyleaflet(path_to_product_folder = path_to_input_routing_product_folder,version_number = version_number)

In [None]:
#@markdown <font color= #5559AB> **Remove Smaller Lakes** </font>

#@markdown This section filters lakes to simplify the network.

#@markdown The BasinMaker function Remove_Small_Lakes will be used for this purpose.

#@markdown A more detailed description on function `Remove_Small_Lakes` can be found [here](https://basinmaker.readthedocs.io/en/latest/basinmaker_tools.html#filter-lakes).

#@markdown For example, if a user enters "5" (unit is $km^2$), BasinMaker then removes lakes with area less than $5km2$ (lakes equal to $5km2$ will not be removed).

lake_size = 5 #@param

folder_product_after_filter_lakes_derived = remove_small_lakes(lake_size)

# Define the path to the routing product folder
path_to_input_routing_product_folder = folder_product_after_filter_lakes_derived # example Path_to_input_routing_product_folder = folder_product_after_filter_lakes

# plot product
plot_routing_product_with_ipyleaflet(path_to_product_folder = path_to_input_routing_product_folder,version_number = version_number)


In [None]:
#@markdown <font color= #5559AB> **Simplify drainage product** </font>

#@markdown In this section, the network is further simplified by increasing the size of subbasins.

#@markdown For example, the minimum drainage area of subbasins can be adjusted to 50 $km^2$.

#@markdown The BasinMaker function `Decrease_River_Network_Resolution`  will be used for this purpose. This function will merge any upstream subbasin with a drainage area at the outlet that is smaller than a given threshold with their downstream subbasin. The function starts in headwater subbasin and moves downstream and it does this throughout the domain being processed.

#@markdown More detailed description of function `Decrease_River_Network_Resolution` can be found in [here](https://basinmaker.readthedocs.io/en/latest/basinmaker_tools.html#increase-catchment-area). </font>

minimum_subbasin_drainage_area = 50.0 #@param

# run function to simplify drainage basin
folder_product_after_increase_catchment_drainage_area = simplify_drainage_area(minimum_subbasin_drainage_area)

# plot product
plot_routing_product_with_ipyleaflet(path_to_product_folder = folder_product_after_increase_catchment_drainage_area,version_number = version_number)


In [None]:
#@markdown <font color=grey>**Save routing product to drive** <br>

save_routing_product(version_number)

In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download


# delete temorary folder
remove_temp_data(main_dir, temporary_dir, product_path)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

if write_to_configuration_file == True:
  data = {
      "_comment": "------------ 2) DISCRETIZATION ---------------",

      "_comment": "------------ 2a) BASINMAKER ROUTING PRODUCT ---------------",

      "upload_routing_product": "no",

      "product_name": f"{product_name}",
      "gauge_name": f"{gauge_name}",

      "define_lat": f"{define_lat}",
      "define_lon": f"{define_lon}",

      "version_num": f"{version_number}",

      "most_down_stream_subbasin_ids": subid_of_interested_gauges,
      "most_up_stream_subbasin_ids": most_up_stream_subbasin_ids,

      "lake_size": lake_size,
      "minimum_subbasin_drainage_area": minimum_subbasin_drainage_area
  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path,"2a_routing_product.json")

  # Writing data to the JSON file
  with open(file_path, 'w') as json_file:
      json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


### <font color=#5559AB>2b) Hydrologic Response Units (HRUs)

#### <font color=grey> **Upload HRUs Derived from BasinMaker Light** </font>

In [None]:
def remove_temp_data(product_path,temporary_dir):
    """
    Remove temporary data and zip files.

    Parameters:
    - product_path (str): Path to the product folder.
    """
    if os.path.exists(temporary_dir):
        shutil.rmtree(temporary_dir)
    if os.path.exists(product_path):
        shutil.rmtree(product_path)
    zip_files_rm = glob(os.path.join(product_path, "*.zip"))
    for files_rm in zip_files_rm:
        os.remove(files_rm)

#@markdown <font color=#5559AB> **Define the product name** </font><br>
#@markdown 'OLRRP'  to use the Ontario Lake-River Routing Product(version 1.0)<br>
#@markdown 'NALRP' to use the North American Lake-River Routing Product (version 2.1)</font><br>

# define version number
version_num = 'v2-1' #@param ["v2-1", "v2-0"] {type:"raw"}

# define working directory path
HRU_output_folder = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'maps')
if not os.path.exists(HRU_output_folder):
  os.makedirs(HRU_output_folder)

routing_dir = os.path.join(main_dir,'workflow_outputs','routing_product')

print('\n-----------------------------------------------------------------------------------------------')
print('( ) Upload HRUs')
print('-----------------------------------------------------------------------------------------------')
print(f'drag-and-drop HRU files into following folder: {HRU_output_folder}')
response = input("Have you uploaded the related HRU shapefiles (.shp) file (yes or no): ")
if response == "yes":
  # -- Visualize --
  print('\n-----------------------------------------------------------------------------------------------')
  print('( ) Visualize Uploaded HRUs')
  print('-----------------------------------------------------------------------------------------------')
  hru_polygon = gpd.read_file(os.path.join(HRU_output_folder, "finalcat_hru_info.shp"))
  # define paths
  path_connect_lake_polygon_dir = Path(os.path.join(routing_dir, "sl_connected_lake_"+version_number+".shp"))
  path_non_connect_lake_polygon_dir = Path(os.path.join(routing_dir, "sl_non_connected_lake_"+version_number+".shp"))

  ax = hru_polygon.plot(linewidth = 1,edgecolor='black',facecolor="none",zorder=0,figsize=(11,12))

  if path_connect_lake_polygon_dir.exists():
    sl_lake_ply = gpd.read_file(os.path.join(routing_dir, "sl_connected_lake_"+version_number+".shp")).to_crs(hru_polygon.crs)
    sl_lake_ply.plot(ax = ax, linewidth = 0.00001,edgecolor='black',alpha=0.6,zorder=1)

  if path_non_connect_lake_polygon_dir.exists():
    nsl_lake_ply = gpd.read_file(os.path.join(routing_dir, "sl_non_connected_lake_"+version_number+".shp")).to_crs(hru_polygon.crs)
    nsl_lake_ply.plot(ax = ax, linewidth = 0.00001,edgecolor='black',alpha=0.6,zorder=1)

  # remove temporary files
  remove_temp_data(product_path,temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

data = {
    "_comment": "------------ 2b) BASINMAKER DERIVE HRUS ---------------",

    "upload_HRU_files": "yes",
}

# Specify the file path where you want to save the JSON file
file_path = os.path.join(config_path,"2b_derive_hrus.json")

# Writing data to the JSON file
with open(file_path, 'w') as json_file:
    json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


#### <font color=grey> **Derive HRUs with BasinMaker Light** </font>

In [None]:
# check libraries
libraries_to_check = ["geopy", "folium","simpledbf","branca",
                      "time","ipyleaflet","ipywidgets","pathlib"]
check_and_install_libraries(libraries_to_check)

!python -m pip install https://github.com/dustming/basinmaker/archive/master.zip

import time
import shutil
import simpledbf
from pathlib import Path
from IPython.display import display
from basinmaker import basinmaker
import matplotlib.pyplot as plt ## only needed to plot figures
from ipywidgets import HTML,Layout,IntSlider, ColorPicker, jslink ## only needed to plot figures
from ipyleaflet import Map, GeoData, basemaps, LayersControl,Popup,Marker,Polygon,Choropleth,WidgetControl## only needed to plot figures
from basinmaker.postprocessing.plotleaflet import plot_routing_product_with_ipyleaflet
from basinmaker.postprocessing.downloadpd import Download_Routing_Product_For_One_Gauge
from basinmaker.postprocessing.downloadpdptspurepy import Download_Routing_Product_From_Points_Or_LatLon

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

def define_hrus(landcover, soil, elevation_bands, dem, aspect,
                area_ratio_thresholds, pixel_size, version_number):
    """
    Define Hydrologic Response Units (HRUs) using BasinMaker.

    Parameters:
    - landcover (str): Flag indicating whether to include landcover data.
    - soil (str): Flag indicating whether to include soil data.
    - elevation_bands (str): Flag indicating whether to include elevation band data.
    - dem (str): Flag indicating whether to include Digital Elevation Model (DEM) data.
    - aspect (str): Flag indicating whether to include aspect data.
    - area_ratio_thresholds (list): List of area ratio thresholds.
    - pixel_size (float): Pixel size for processing.
    - version_number (str): Version number for the output files.
    """

    # Define working directory paths
    hru_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data')
    routing_dir = os.path.join(main_dir, 'workflow_outputs', 'routing_product')

    # Generate maps folder
    HRU_output_folder = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'maps')

    # Create the HRU output folder if it doesn't exist
    if not os.path.isdir(HRU_output_folder):
        os.makedirs(HRU_output_folder)
        print("created folder:", HRU_output_folder)

    delin_list = []

    # Paths for DEM data
    if dem == True:
      for dem_file in os.listdir(os.path.join(hru_dir, 'DEM')):
        if dem_file.endswith(".tif"):
          dem_file_name = dem_file
      # Adjust the following paths based on your data locations
      path_to_dem = os.path.join(hru_dir, 'DEM',dem_file_name)
      print(path_to_dem)
    else:
      path_to_dem = '#'

    # Paths for Elevation Bands data
    if elevation_bands == True:
      for elev_file in os.listdir(os.path.join(hru_dir, 'Elevation_band')):
        if elev_file.endswith(".shp"):
          elev_file_name = elev_file
      # Adjust the following paths based on your data locations
      path_other_polygon_1 = os.path.join(hru_dir, 'Elevation_band',elev_file_name)
      other_polygon = gpd.read_file(path_other_polygon_1)
      print(path_other_polygon_1)
      delin_list.append('O_ID_1')
    else:
      path_other_polygon_1 = '#'

    # Paths for Aspect data
    if aspect == True:
        for aspect_file in os.listdir(os.path.join(hru_dir, 'Aspect')):
          if aspect_file.endswith(".shp"):
            aspect_file_name = aspect_file
        # Adjust the following paths based on your data locations
        path_to_aspect = os.path.join(hru_dir, 'Aspect',aspect_file_name)
        print(path_to_aspect)
    else:
      path_to_aspect = '#'

    # Paths for Landcover data
    if landcover == True:
      for landcov_file in os.listdir(os.path.join(hru_dir, 'Landcover')):
        if landcov_file.endswith(".shp"):
          landcov_file_name = landcov_file
      # Adjust the following paths based on your data locations
      path_landuse_polygon = os.path.join(hru_dir,'Landcover',landcov_file_name)
      path_landuse_info = os.path.join(hru_dir,'Landcover','landcover_info.csv')
      path_veg_info = os.path.join(hru_dir, 'Landcover', "veg_info.csv")
      #landuse = gpd.read_file(path_landuse_polygon)
      landuse_info = pd.read_csv(path_landuse_info)
      print(path_landuse_polygon)
      delin_list.append('Landuse_ID')
      delin_list.append('Veg_ID')
    else:
      path_landuse_polygon = '#'
      path_landuse_info = os.path.join(hru_dir,'Landcover','landcover_info.csv')
      path_veg_info = os.path.join(hru_dir, 'Landcover', "veg_info.csv")

    # Paths for Soil data
    if soil == True:
      for soil_file in os.listdir(os.path.join(hru_dir, 'Soil')):
        if soil_file.endswith(".shp"):
          soil_file_name = soil_file
      # Adjust the following paths based on your data locations
      path_soil_polygon = os.path.join(hru_dir,'Soil',soil_file_name)
      path_soil_info = os.path.join(hru_dir,'Soil',"soil_info.csv")
      #soil = gpd.read_file(path_soil_polygon)
      soil_info = pd.read_csv(path_soil_info)
      print(path_soil_polygon)
      delin_list.append('Soil_ID')
    else:
      path_soil_polygon = '#'
      path_soil_info = os.path.join(hru_dir,'Soil',"soil_info.csv")

    # Define the input folder path
    input_routing_product_folder = routing_dir

    # Define paths for connected and non-connected lake polygons
    path_connect_lake_polygon_dir = Path(os.path.join(routing_dir, f"sl_connected_lake_{version_number}.shp"))
    path_non_connect_lake_polygon_dir = Path(os.path.join(routing_dir, f"sl_non_connected_lake_{version_number}.shp"))

    # Check if connected lake polygon exists
    path_connect_lake_polygon = path_connect_lake_polygon_dir if path_connect_lake_polygon_dir.exists() else '#'

    # Check if non-connected lake polygon exists
    path_non_connect_lake_polygon = path_non_connect_lake_polygon_dir if path_non_connect_lake_polygon_dir.exists() else '#'

    # Run BasinMaker
    bm = basinmaker.postprocess()
    start = time.time()

    bm.Generate_HRUs(
        path_output_folder=HRU_output_folder,
        path_subbasin_polygon=os.path.join(routing_dir, f"finalcat_info_{version_number}.shp"),
        path_connect_lake_polygon=path_connect_lake_polygon,
        path_non_connect_lake_polygon=path_non_connect_lake_polygon,
        path_landuse_polygon=path_landuse_polygon,
        path_soil_polygon=path_soil_polygon,
        path_other_polygon_1=path_other_polygon_1,
        path_other_polygon_2=path_to_aspect,
        path_landuse_info=path_landuse_info,
        path_soil_info=path_soil_info,
        path_veg_info=path_veg_info,
        path_to_dem=path_to_dem,
        area_ratio_thresholds=area_ratio_thresholds,
        gis_platform="purepy",
        projected_epsg_code='EPSG:3161',
        pixel_size=pixel_size
    )

    end = time.time()
    print("This section took ", end - start, " seconds\n")

    # Visualize HRUs
    hru_polygon = gpd.read_file(os.path.join(HRU_output_folder, "finalcat_hru_info.shp"))

    # Define paths for connected and non-connected lake polygons
    path_connect_lake_polygon_dir = Path(os.path.join(routing_dir, f"sl_connected_lake_{version_number}.shp"))
    path_non_connect_lake_polygon_dir = Path(os.path.join(routing_dir, f"sl_non_connected_lake_{version_number}.shp"))

    ax = hru_polygon.plot(linewidth=1, edgecolor='black', facecolor="none", zorder=0, figsize=(11, 12))

    # Plot connected lake polygons if exists
    if path_connect_lake_polygon_dir.exists():
        sl_lake_ply = gpd.read_file(path_connect_lake_polygon_dir).to_crs(hru_polygon.crs)
        sl_lake_ply.plot(ax=ax, linewidth=0.00001, edgecolor='black', alpha=0.6, zorder=1)

    # Plot non-connected lake polygons if exists
    if path_non_connect_lake_polygon_dir.exists():
        nsl_lake_ply = gpd.read_file(path_non_connect_lake_polygon_dir).to_crs(hru_polygon.crs)
        nsl_lake_ply.plot(ax=ax, linewidth=0.00001, edgecolor='black', alpha=0.6, zorder=1)

    plt.show()


In [None]:
#@markdown <font color=#5559AB> **Select which layers to include in discretization** <br>
landcover = True #@param {type:"boolean"}
soil = False #@param {type:"boolean"}
elevation_bands = True #@param {type:"boolean"}
dem = True #@param {type:"boolean"}
aspect = False #@param {type:"boolean"}

#@markdown <font color=#5559AB> **Define area ratio thresholds** <br></font>
#@markdown Use [0.1,0.2, 0.2] to get the default HRU map, be sure to include square brackets around values <br>
#@markdown Values smaller than 0, ex: [0.1, 0.2, 0.3], provide a finer delineation </font><br>

area_ratio_thresholds = [0.1,0.2,0.2] #@param

#@markdown <font color=#5559AB> **Define pixel size** <br></font>
#@markdown User-defined grid size in m. It is recommend to use 30 m for OLRRP and 90 m for NA. The unit follows the coordinate system of the routing network polygons. <br>

pixel_size = 30 #@param

#@markdown <font color=#5559AB> **Define the product name** </font><br>
#@markdown 'OLRP'  to use the Ontario Lake-River Routing Product(version 1.0)<br>
#@markdown 'NALRP' to use the North American Lake-River Routing Product (version 2.1)</font><br>

product_name = "NALRP"  #@param ["NALRP", "OLRRP"]

# Define version number
version_number = None

if product_name == "NALRP":
  version_number = 'v2-1'
elif product_name == "OLRRP":
  version_number = 'v2-0'

define_hrus(landcover,soil,elevation_bands,dem,aspect,
                area_ratio_thresholds,pixel_size,version_number)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

# Store results in a dictionary
result_dict = {
    "dem_res": "yes" if dem else "no",
    "elevation_bands_res": "yes" if elevation_bands else "no",
    "aspect_res": "yes" if aspect else "no","landcover_res": "yes" if landcover else "no",
    "soil_res": "yes" if soil else "no"
}

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

if write_to_configuration_file == True:
  # Include these results in the configuration file data
  data = {
      "_comment": "------------ 2b) BASINMAKER DERIVE HRUS ---------------",
      "upload_HRU_files": "yes",
      **result_dict,  # Using dictionary unpacking for a cleaner approach
      "area_ratio_thresholds": area_ratio_thresholds,
      "pixel_size": pixel_size,
  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path, "2b_derive_hrus.json")

  # Writing data to the JSON file
  if write_to_configuration_file:
      with open(file_path, 'w') as json_file:
          json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


### <font color=#5559AB>2c) RVH and RVP Files

#### <font color=grey> **Upload RVH and/or RVP File(s)** </font>

In [None]:
#@markdown <font color=grey>**Upload Raven RVP and RVH input files**</font>

# upload RVH file
rvp_rvh_temp_dir = os.path.join(main_dir,'workflow_outputs','RavenInput')
if not os.path.exists(rvp_rvh_temp_dir):
  os.makedirs(rvp_rvh_temp_dir)
print('\n-----------------------------------------------------------------------------------------------')
print('( ) Upload RVP and RVH files')
print('-----------------------------------------------------------------------------------------------')
print(f'drag-and-drop routing product files into following folder: {rvp_rvh_temp_dir}')

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

data = {
    "_comment": "------------ 2c) BASINMAKER GENERATED RVP AND RVH FILES ---------------",

    "upload_RVP_RVH": "yes",
}

# Specify the file path where you want to save the JSON file
file_path = os.path.join(config_path,"2c_rvp_rvh.json")

# Writing data to the JSON file
with open(file_path, 'w') as json_file:
    json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional

#### <font color=grey> **Generate RVH and/or RVP File(s)** </font>

In [None]:
# check libraries
libraries_to_check = ["geopy", "folium","simpledbf","branca",
                      "time","ipyleaflet","ipywidgets","pathlib"]
check_and_install_libraries(libraries_to_check)

!python -m pip install https://github.com/dustming/basinmaker/archive/master.zip

import time
import shutil
import simpledbf
from pathlib import Path
from IPython.display import display
from basinmaker import basinmaker
import matplotlib.pyplot as plt ## only needed to plot figures

#@markdown <font color=grey> **Use BasinMaker to Generate RVH File and RVP Template**</font> <br>

def rvp_rvh_generate(model_name,main_dir,temporary_dir):
  # Generate Raven RVP and RVH input files
  # save final HRU shapefile to Geojson format for RavenView
  HRU_output_folder = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'maps')
  hru_polygon = gpd.read_file(os.path.join(HRU_output_folder, "finalcat_hru_info.shp"))
  hru_polygon.to_file(os.path.join(main_dir, 'shapefile', 'myshpfile.geojson'), driver='GeoJSON')

  # generate Raven files
  bm = basinmaker.postprocess()

  bm.Generate_Raven_Model_Inputs(
      path_hru_polygon         = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'maps', 'finalcat_hru_info.shp'),
      model_name            = model_name,                         # This is used for naming the output files.
      subbasingroup_names_channel   =["Allsubbasins"],                        # A subbasin group will be created in the rvh file for simultaneous manipulation in Raven modeling.
      subbasingroup_length_channel   =[-1],
      subbasingroup_name_lake      =["AllLakesubbasins"],
      subbasingroup_area_lake      =[-1],
      path_output_folder         = os.path.join(temporary_dir), # define temporary folder
      aspect_from_gis          = 'purepy',
  )

  # save to drive
  for file in glob(os.path.join(temporary_dir, 'RavenInput','*')):
      shutil.move(file, os.path.join(main_dir, 'workflow_outputs', 'RavenInput'))

# remove temporary data
def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

# generate RVH file
rvp_rvh_generate(model_name,main_dir,temporary_dir)

remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False # @param {type:"boolean"}

if write_to_configuration_file:
  data = {
      "_comment": "------------ 2c) BASINMAKER GENERATED RVP AND RVH FILES ---------------",

      "upload_RVP_RVH": "no",
  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path,"2c_rvp_rvh.json")

  # Writing data to the JSON file
  with open(file_path, 'w') as json_file:
      json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional

**References**

Han, M., H. Shen, B. A. Tolson, J. R. Craig, J. Mai, S. Lin, N. B. Basu, F. Awol. (2021). BasinMaker 3.0: a GIS toolbox for distributed watershed delineation of complex lake-river routing networks. Environmental Modelling and Software.

Ming Han, Hongren Shen, Bryan A. Tolson, & Robert A. Metcalfe. (2022). Ontario Lake-River Routing Product version 1.0 (v1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6536085

Ming Han, Hongren Shen, Bryan A. Tolson, James R. Craig, Juliane Mai, Simon Lin, Nandita Basu, & Frezer Awol. (2020). North American Lake-River Routing Product v 2.1, derived by BasinMaker GIS Toolbox (v2.1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4728185

# **3.0 Forcing Data**

Raven supports gridded or station-based forcing inputs exclusively in NetCDF format (*.nc files). This subsection offers users to download four different data downloads:

* 4.1 CaSPAr Data
* 4.2 Gridded Weights Generator
* 4.3 DayMet Data
* 4.4 Environment Canada Climate Data
* 4.5 Format Uploaded Observational Data
* 4.6 Hydrometric data (HYDAT)

Once downloaded the data is then formatted into a Raven time series/forcing function file (RVT file).

For section 4.5, users can upload their own CSV climatic data that is then formatted into a Raven RVT file.

Please keep in mind, a Raven model does not require data from all of these sources. The user must determine which type of forcing data best complements their model objectives.

### <font color=#5559AB> 3a) CaSPAr Data </font>



> <font color=red>Temporarily unavailable






The Canadian Surface Prediction Archive (CaSPAr), developed by Mai et al. (2020), is an archive of numerical weather predictions issued by Environment and Climate Change Canada. More information about the CaSPAr dataset is available [here](https://github.com/julemai/CaSPAr)

This subsection of the Magpie workflow assists in walking users through how to download CaSPAr data and format it to be interpretable by Raven.

**References**

Mai et al. (2020).
The Canadian Surface Prediction Archive (CaSPAr): A Platform to Enhance Environmental Modeling in Canada and Globally.
Bulletin of the American Meteorological Society, https://doi.org/10.1175/BAMS-D-19-0143.1

**Acknowledgements**

Dr. Jonathan Cuellar, Qiutong Yu, and others at the University of Waterloo assisted with the temporal adjustment/aggregation script.

### <font color=#5559AB> 3b) Gridded Weights Generator </font>

The grid weights generator for Raven, developed by Mai (2022), generates weights of grid cells contributing to individual shapes. Grid weights can be used to spatially aggregate a gridded model output to an irregularly shaped domain or, more generally, to map grid cells to a polygon shape. More information can be found [here](https://github.com/julemai/GridWeightsGenerator)

Layout example for general use:

<font color=grey> **python derive_grid_weights.py -i [your-nc-file] -d [dim-names] -v [var-names] -r [tlbx-shp-file] -s [subbasin-id] -b [gauge-id] -f [shp-attr-name] -o [weights-file]**	</font>

A <font color=red>NetCDF file</font> and <font color=red>shapfile of HRUs</font> are required.

Check out the short video [4.3 Gridded Weights Generator - Magpie Workflow](https://youtu.be/zskW_9qczdk) for more information.

If there is an error loading the netCDF4 library go to "Runtime" -> "Disconnet and delete runtime", run the "Mandatory Workflow Set up", and re-run this section.

In [None]:
#@markdown <font color=#5559AB>**Define NetCDF File Path**</font>

#@markdown <font color=grey> **-i [your-nc-file]**<br>
#@markdown filename of NetCDF file or shapefile that contains how you discretized your model
forcing_var = "/content/google_drive/MyDrive/Magpie_Workflow/workflow_outputs/RavenInput/input/mean_prcp_daily.nc" #@param {type:"string"}


In [None]:
#@markdown <font color=grey> **Overview of CaSPAr data** </font>

check_casp = xr.open_dataset(os.path.join(forcing_var))
display(check_casp)

In [None]:
#@markdown <font color=#5559AB>**Define Dimension and Variable Names**</font>

#@markdown <font color=grey> **-d [dim-names]**<br>
#@markdown names of NetCDF dimensions of longitude (x) and latitude (y) in this order, e.g. "rlon,rlat"<br>
#@markdown if you are using RDRS data, input: "rlon,rlat"
dim_names = "rlon,rlat" #@param {type:"string"}

#@markdown <font color=grey> **-v [var-names]**<br>
#@markdown names of 2D NetCDF variables containing longitudes and latitudes (in this order) of centroids of grid cells, e.g. "lon,lat"<br>
#@markdown if you are using RDRS data, input: "lon,lat"
var_names = "lon,lat" #@param {type:"string"}


In [None]:
#@markdown <font color=#5559AB>**Define HRU Shapefile Path**</font>

#@markdown <font color=grey> **-r [tlbx-shp-file]**<br>
#@markdown name of shapefile routing toolbox provides; shapefile contains shapes of all land and lake HRUs,e.g. "HRUs.shp"<br>
boundary_shp = "/content/google_drive/MyDrive/Magpie_Workflow/workflow_outputs/RavenInput/maps/finalcat_hru_info.shp" #@param {type:"string"}


#@markdown located in "RavenInput"->"maps" in Google Drive

In [None]:
#@markdown <font color=grey>**OPTIONAL - Determine the Subbasin ID that the Flow Gauge is Located In**</font>

#@markdown determin the flow gauge ID to determine the subbasin it is located in, copy and paste the ID into gauge RVT file

#@markdown only needed is the user would rather utilize the subbasin ID rather than the flow gauge, althought the flow gauge is encouraged

flow_gauge_ID = '02GA024' #@param {type:"string"}

gauge_csv = pd.read_csv(os.path.join(drive_dir, 'extras', 'subbasin_plots', 'obs_gauges_NA_v2-1.csv'))
select_row = gauge_csv.loc[gauge_csv['Obs_NM'] == flow_gauge_ID]
subbasin_ID = select_row['SubId'].values
print(f'Subbasin ID: {subbasin_ID[0]}')

In [None]:
#@markdown <font color=#5559AB>**Define Subbasin ID OR Gauge Flow ID**</font>

#@markdown <font color=grey> **-s [subbasin-id]**<br>
#@markdown <font color=red> (**either** -s [subbasin-id] or -b [gauge-id] must be set)</font> ID of subbasin  most downstream (likely a subbasin
#@markdown that contains a streamflow gauge station but can be any subbasin ID); script will include all subbasins upstream of the given subbasin automatically; according
#@markdown attribute in [tlbx-shp-file] is called "SubId"; e.g. "7202"<br>
#@markdown Refer to 1.1_Shapefile.ipynb to identify subbasin
subId = "3086525" #@param {type:"string"}
#@markdown _if you rather define -b [gauge-id], input subID as NA_

if subId == "NA":
  subId = None

#@markdown <font color=grey> **-b [gauge-id]**<br>
#@markdown <font color=red> (**either** -b [gauge-id] or -s [subbasin-id] must be set)</font> ID of streamflow gauging station; according attribute in [tlbx-shp-file] is called "Obs_NM"; e.g. "02LE024
gaugeId = "NA" #@param {type:"string"}
#@markdown _if you rather define -b [gauge-id], input subID as NA_

if gaugeId == "NA":
  gaugeId = None

In [None]:
#@markdown <font color=#5559AB>**Define Output Name**</font>

#@markdown <font color=grey> **-o [output-name]**<br>
#@markdown define output filename <br>
output_name = "GridWeights.txt" #@param {type:"string"}

In [None]:
#@markdown <font color=grey>**Run Gridded Weight Generator**</font>

#!/usr/bin/env python
from __future__ import print_function

# Copyright 2016-2020 Juliane Mai - juliane.mai(at)uwaterloo.ca

# read netCDF files
import netCDF4 as nc4

# command line arguments
import argparse

# checking file paths and file extensions
from pathlib import Path

# to perform numerics
import numpy as np

# read shapefiles and convert to GeoJSON and WKT
import geopandas as gpd

# get equal-area projection and derive overlay
from   osgeo   import ogr
from   osgeo   import osr
from   osgeo   import __version__ as osgeo_version



input_file           = forcing_var
dimname              = dim_names
varname              = var_names
routinginfo          = boundary_shp
basin                = gaugeId  # e.g. "02LE024"
SubId                = subId  # e.g. 7202
output_file          = "GriddedForcings.txt"
doall                = False
key_colname          = "HRU_ID"
key_colname_model    = None
area_error_threshold = 0.05
dojson               = False

parser      = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
              description='''Convert files from ArcGIS raster format into NetDF file usable in CaSPAr.''')
parser.add_argument('-i', '--input_file', action='store',
                    default=input_file, dest='input_file', metavar='input_file',
                    help='Either (A) Example NetCDF file containing at least 1D or 2D latitudes and 1D or 2D longitudes where grid needs to be representative of model outputs that are then required to be routed. Or (B) a shapefile that contains shapes of subbasins and one attribute that is defining its index in the NetCDF model output file (numbering needs to be [0 ... N-1]).')
parser.add_argument('-d', '--dimname', action='store',
                    default=dimname, dest='dimname', metavar='dimname',
                    help='Dimension names of longitude (x) and latitude (y) (in this order). Example: "rlon,rlat", or "x,y"')
parser.add_argument('-v', '--varname', action='store',
                    default=varname, dest='varname', metavar='varname',
                    help='Variable name of 2D longitude and latitude variables in NetCDF (in this order). Example: "lon,lat".')
parser.add_argument('-r', '--routinginfo', action='store',
                    default=routinginfo, dest='routinginfo', metavar='routinginfo',
                    help='Shapefile that contains all information for the catchment of interest (and maybe some more catchments).')
parser.add_argument('-b', '--basin', action='store',
                    default=basin, dest='basin', metavar='basin',
                    help='Basin of interest (corresponds to "Gauge_ID" in shapefile given with -r). Either this or SubId ID (-s) needs to be given. Can be a comma-separated list of basins, e.g., "02LB005,02LB008".')
parser.add_argument('-s', '--SubId', action='store',
                    default=SubId, dest='SubId', metavar='SubId',
                    help='SubId of most downstream subbasin (containing usually a gauge station) (corresponds to "SubId" in shapefile given with -r). Either this or basin ID (-b) needs to be given. Can be a comma-separated list of SubIds, e.g., "7399,7400".')
parser.add_argument('-o', '--output_file', action='store',
                    default=output_file, dest='output_file', metavar='output_file',
                    help='File that will contain grid weights for Raven.')
parser.add_argument('-a', '--doall', action='store_true',
                    default=doall, dest='doall',
                    help='If given, all HRUs found in shapefile are processed. Overwrites settings of "-b" and "-s". Default: not set (False).')
parser.add_argument('-c', '--key_colname', action='store',
                    default=key_colname, dest='key_colname', metavar='key_colname',
                    help='Name of column in shapefile containing unique key for each dataset. This key will be used in output file. This setting is only used if "-a" option is used. "Default: "HRU_ID".')
parser.add_argument('-f', '--key_colname_model', action='store',
                    default=key_colname_model, dest='key_colname_model', metavar='key_colname_model',
                    help='Attribute name in input_file shapefile (option -i) that defines the index of the shape in NetCDF model output file (numbering needs to be [0 ... N-1]). Example: "NetCDF_col".')
parser.add_argument('-e', '--area_error_threshold', action='store',
                    default=area_error_threshold, dest='area_error_threshold', metavar='area_error_threshold',
                    help='Threshold (as fraction) of allowed mismatch in areas between subbasins from shapefile (-r) and overlay with grid-cells or subbasins (-i). If error is smaller than this threshold the weights will be adjusted such that they sum up to exactly 1. Raven will exit gracefully in case weights do not sum up to at least 0.95. Default: 0.05.')
parser.add_argument('-j', '--dojson', action='store_true',
                    default=dojson, dest='dojson',
                    help='If given, the GeoJSON of grid cells contributing to at least one HRU are dumped into  GeoJSON. Default: False.')

args                 = parser.parse_args()
input_file           = args.input_file
dimname              = np.array(args.dimname.split(','))
varname              = np.array(args.varname.split(','))
routinginfo          = args.routinginfo
basin                = args.basin
SubId                = args.SubId
output_file          = args.output_file
doall                = args.doall
key_colname          = args.key_colname
key_colname_model    = args.key_colname_model
area_error_threshold = float(args.area_error_threshold)
dojson               = args.dojson

if dojson:
    # write geoJSON files (eventually)
    import geojson as gjs

if not(SubId is None):

    SubId = [ np.int(ss.strip()) for ss in SubId.split(',') ]

if (SubId is None) and (basin is None) and not(doall):
    raise ValueError("Either gauge ID (option -b; e.g., 02AB003) or SubId ID (option -s; e.g., 7173) specified in shapefile needs to be given. You specified none. This basin will be the most downstream gridweights of all upstream subbasins will be added automatically.")

if ( not(SubId is None) ) and ( not(basin is None) ) and not(doall):
    raise ValueError("Either gauge ID (option -b; e.g., 02AB003) or SubId ID (option -s; e.g., 7173) specified in shapefile needs to be specified. You specified both. This basin will be the most downstream gridweights of all upstream subbasins will be added automatically.")

if not(doall):
    key_colname = "HRU_ID"    # overwrite any settimg made for this column name in case doall is not set

del parser, args


# better dont chnage that ever
crs_lldeg = 4326        # EPSG id of lat/lon (deg) coordinate referenence system (CRS)
crs_caea  = 3573        # EPSG id of equal-area    coordinate referenence system (CRS)


def create_gridcells_from_centers(lat, lon):

    # create array of edges where (x,y) are always center cells
    nlon = np.shape(lon)[1]
    nlat = np.shape(lat)[0]
    lonh = np.empty((nlat+1,nlon+1), dtype=float)
    lath = np.empty((nlat+1,nlon+1), dtype=float)
    tmp1 = [ [ (lat[ii+1,jj+1]-lat[ii,jj])/2 for jj in range(nlon-1) ] + [ (lat[ii+1,nlon-1]-lat[ii,nlon-2])/2 ] for ii in range(nlat-1) ]
    tmp2 = [ [ (lon[ii+1,jj+1]-lon[ii,jj])/2 for jj in range(nlon-1) ] + [ (lon[ii+1,nlon-1]-lon[ii,nlon-2])/2 ] for ii in range(nlat-1) ]
    dlat = np.array(tmp1 + [ tmp1[-1] ])
    dlon = np.array(tmp2 + [ tmp2[-1] ])
    lonh[0:nlat,0:nlon] = lon - dlon
    lath[0:nlat,0:nlon] = lat - dlat

    # make lat and lon one column and row wider such that all
    lonh[nlat,0:nlon] = lonh[nlat-1,0:nlon] + (lonh[nlat-1,0:nlon] - lonh[nlat-2,0:nlon])
    lath[nlat,0:nlon] = lath[nlat-1,0:nlon] + (lath[nlat-1,0:nlon] - lath[nlat-2,0:nlon])
    lonh[0:nlat,nlon] = lonh[0:nlat,nlon-1] + (lonh[0:nlat,nlon-1] - lonh[0:nlat,nlon-2])
    lath[0:nlat,nlon] = lath[0:nlat,nlon-1] + (lath[0:nlat,nlon-1] - lath[0:nlat,nlon-2])
    lonh[nlat,nlon]   = lonh[nlat-1,nlon-1] + (lonh[nlat-1,nlon-1] - lonh[nlat-2,nlon-2])
    lath[nlat,nlon]   = lath[nlat-1,nlon-1] + (lath[nlat-1,nlon-1] - lath[nlat-2,nlon-2])

    return [lath,lonh]

def shape_to_geometry(shape_from_jsonfile,epsg=None):

    # converts shape read from shapefile to geometry
    # epsg :: integer EPSG code

    ring_shape = ogr.Geometry(ogr.wkbLinearRing)

    for ii in shape_from_jsonfile:
        ring_shape.AddPoint_2D(ii[0],ii[1])
    # close ring
    ring_shape.AddPoint_2D(shape_from_jsonfile[0][0],shape_from_jsonfile[0][1])

    poly_shape = ogr.Geometry(ogr.wkbPolygon)
    poly_shape.AddGeometry(ring_shape)

    if not( epsg is None):
        source = osr.SpatialReference()
        source.ImportFromEPSG(crs_lldeg)       # usual lat/lon projection

        target = osr.SpatialReference()
        target.ImportFromEPSG(epsg)       # any projection to convert to

        transform = osr.CoordinateTransformation(source, target)
        poly_shape.Transform(transform)

    return poly_shape

def check_proximity_of_envelops(gridcell_envelop, shape_envelop):

    # checks if two envelops are in proximity (intersect)

    # minX  --> env[0]
    # maxX  --> env[1]
    # minY  --> env[2]
    # maxY  --> env[3]

    if  ((gridcell_envelop[0] <= shape_envelop[1]) and (gridcell_envelop[1] >= shape_envelop[0]) and
         (gridcell_envelop[2] <= shape_envelop[3]) and (gridcell_envelop[3] >= shape_envelop[2])):

        grid_is_close = True

    else:

        grid_is_close = False

    return grid_is_close

def check_gridcell_in_proximity_of_shape(gridcell_edges, shape_from_jsonfile):

    # checks if a grid cell falls into the bounding box of the shape
    # does not mean it intersects but it is a quick and cheap way to
    # determine cells that might intersect

    # gridcell_edges = [(lon1,lat1),(lon2,lat2),(lon3,lat3),(lon4,lat4)]
    # shape_from_jsonfile

    min_lat_cell  = np.min([ii[1] for ii in gridcell_edges])
    max_lat_cell  = np.max([ii[1] for ii in gridcell_edges])
    min_lon_cell  = np.min([ii[0] for ii in gridcell_edges])
    max_lon_cell  = np.max([ii[0] for ii in gridcell_edges])

    lat_shape = np.array([ icoord[1] for icoord in shape_from_jsonfile ])     # is it lat???
    lon_shape = np.array([ icoord[0] for icoord in shape_from_jsonfile ])     # is it lon???

    min_lat_shape = np.min(lat_shape)
    max_lat_shape = np.max(lat_shape)
    min_lon_shape = np.min(lon_shape)
    max_lon_shape = np.max(lon_shape)

    if  ((min_lat_cell <= max_lat_shape) and (max_lat_cell >= min_lat_shape) and
         (min_lon_cell <= max_lon_shape) and (max_lon_cell >= min_lon_shape)):

        grid_is_close = True

    else:

        grid_is_close = False

    return grid_is_close


def derive_2D_coordinates(lat_1D, lon_1D):

    # nlat = np.shape(lat_1D)[0]
    # nlon = np.shape(lon_1D)[0]

    # lon_2D =              np.array([ lon_1D for ilat in range(nlat) ], dtype=float32)
    # lat_2D = np.transpose(np.array([ lat_1D for ilon in range(nlon) ], dtype=float32))

    lon_2D =              np.tile(lon_1D, (lat_1D.size, 1))
    lat_2D = np.transpose(np.tile(lat_1D, (lon_1D.size, 1)))

    return lat_2D, lon_2D


if ( Path(input_file).suffix == '.nc'):

    # -------------------------------
    # Read NetCDF
    # -------------------------------
    print(' ')
    print('   (1) Reading NetCDF (grid) data ...')

    nc_in = nc4.Dataset(input_file, "r")
    lon      = nc_in.variables[varname[0]][:]
    lon_dims = nc_in.variables[varname[0]].dimensions
    lat      = nc_in.variables[varname[1]][:]
    lat_dims = nc_in.variables[varname[1]].dimensions
    nc_in.close()


    if len(lon_dims) == 1 and len(lat_dims) == 1:

        # in case coordinates are only 1D (regular grid), derive 2D variables

        print('   >>> Generate 2D lat and lon fields. Given ones are 1D.')

        lat, lon = derive_2D_coordinates(lat,lon)
        lon_dims_2D = lat_dims + lon_dims
        lat_dims_2D = lat_dims + lon_dims
        lon_dims = lon_dims_2D
        lat_dims = lat_dims_2D

    elif len(lon_dims) == 2 and len(lat_dims) == 2:

        # Raven numbering is (numbering starts with 0 though):
        #
        #      [      1      2      3   ...     1*nlon
        #        nlon+1 nlon+2 nlon+3   ...     2*nlon
        #           ...    ...    ...   ...     ...
        #           ...    ...    ...   ...  nlat*nlon ]
        #
        # --> Making sure shape of lat/lon fields is like that
        #

        if np.all(np.array(lon_dims) == dimname[::1]):
            lon = np.transpose(lon)
            print('   >>> switched order of dimensions for variable "{0}"'.format(varname[0]))
        elif np.all(np.array(lon_dims) == dimname[::-1]):
            print('   >>> order of dimensions correct for variable "{0}"'.format(varname[0]))
        else:
            print('   >>> Dimensions found {0} does not match the dimension names specified with (-d): {1}'.format(lon_dims,dimname))
            raise ValueError('STOP')

        if np.all(np.array(lat_dims) == dimname[::1]):
            lat = np.transpose(lat)
            print('   >>> switched order of dimensions for variable "{0}"'.format(varname[1]))
        elif np.all(np.array(lat_dims) == dimname[::-1]):
            print('   >>> order of dimensions correct for variable "{0}"'.format(varname[1]))
        else:
            print('   >>> Dimensions found {0} does not match the dimension names specified with (-d): {1}'.format(lat_dims,dimname))
            raise ValueError('STOP')

    else:

        raise ValueError(
            "The coord variables must have the same number of dimensions (either 1 or 2)"
        )

    lath, lonh    = create_gridcells_from_centers(lat, lon)

    nlon       = np.shape(lon)[1]
    nlat       = np.shape(lat)[0]
    nshapes    = nlon * nlat

elif ( Path(input_file).suffix == '.shp'):

    # -------------------------------
    # Read Shapefile
    # -------------------------------
    print(' ')
    print('   (1) Reading Shapefile (grid) data ...')

    model_grid_shp     = gpd.read_file(input_file)
    model_grid_shp     = model_grid_shp.to_crs(epsg=crs_caea)           # WGS 84 / North Pole LAEA Canada

    nshapes    = model_grid_shp.geometry.count()    # number of shapes in model "discretization" shapefile (i.e. model grid-cells; not basin-discretization shapefile)
    nlon       = 1        # only for consistency
    nlat       = nshapes  # only for consistency

else:

    print("File extension found: {}".format(input_file.split('.')[-1]))
    raise ValueError('Input file needs to be either NetCDF (*.nc) or a Shapefile (*.shp).')


# -------------------------------
# Read Basin shapes and all subbasin-shapes (from toolbox)
# -------------------------------
print(' ')
print('   (2) Reading shapefile data ...')

shape     = gpd.read_file(routinginfo)
# shape     = shape.to_crs(epsg=crs_lldeg)        # this is lat/lon in degree
shape     = shape.to_crs(epsg=crs_caea)           # WGS 84 / North Pole LAEA Canada

# check that key column contains only unique values
keys = np.array(list(shape[key_colname]))
# keys_uniq = np.unique(keys)
# if len(keys_uniq) != len(keys):
#     raise ValueError("The attribute of the shapefile set to contain only unique identifiers ('{}') does contain duplicate keys. Please specify another column (option -c '<col_name>') and use the option to process all records contained in the shapefile (-a).".format(key_colname))


# select only relevant basins/sub-basins
if not(doall):

    if not(basin is None):    # if gauge ID is given

        basins     = [ bb.strip() for bb in basin.split(',') ]
        idx_basins = [ list(np.where(shape['Obs_NM']==bb)[0]) for bb in basins ]

        # find corresponding SubId
        SubId = [np.int(shape.loc[idx_basin].SubId) for idx_basin in idx_basins]
        print("   >>> found gauge at SubId = ",SubId)

    if not(SubId is None): # if basin ID is given

        old_SubIds = []
        for SI in SubId:

            old_SubId     = []
            new_SubId     = [ SI ]

            while len(new_SubId) > 0:

                old_SubId.append(new_SubId)
                new_SubId = [ list(shape.loc[(np.where(shape['DowSubId']==ii))[0]].SubId) for ii in new_SubId ]  # find all upstream catchments of these new basins
                new_SubId = list(np.unique([item for sublist in new_SubId for item in sublist])) # flatten list and make entries unique

            old_SubId   = np.array([item for sublist in old_SubId for item in sublist],dtype=np.int)  # flatten list
            old_SubIds += list(old_SubId)

        old_SubIds = list( np.sort(np.unique(old_SubIds)) )

        idx_basins = [ list(np.where(shape['SubId']==oo)[0]) for oo in old_SubIds ]
        idx_basins = [ item for sublist in idx_basins for item in sublist ]  # flatten list
        idx_basins = list(np.unique(idx_basins))                             # getting only unique list indexes

else: # all HRUs to be processed

    idx_basins = list(np.arange(0,len(shape)))


# make sure HRUs are only once in this list
hrus = np.array( shape.loc[idx_basins][key_colname] ) #[sort_idx]

idx_basins_unique = []
hrus_unique       = []
for ihru,hru in enumerate(hrus):

    if not( hru in hrus_unique ):

        hrus_unique.append(hrus[ihru])
        idx_basins_unique.append(idx_basins[ihru])

idx_basins = idx_basins_unique
hrus       = hrus_unique


# order according to values in "key_colname"; just to make sure outputs will be sorted in the end
sort_idx = np.argsort(shape.loc[idx_basins][key_colname])
print('   >>> HRU_IDs found = ',list(np.array( shape.loc[idx_basins][key_colname] )[sort_idx]),'  (total: ',len(idx_basins),')')

# reduce the shapefile dataset now to only what we will need
shape     = shape.loc[np.array(idx_basins)[sort_idx]]

# indexes of all lines in df
keys       = shape.index
nsubbasins = len(keys)

# initialize
coord_catch_wkt = {}

# loop over all subbasins and transform coordinates into equal-area projection
for kk in keys:

    ibasin = shape.loc[kk]

    poly                = ibasin.geometry
    try:
        coord_catch_wkt[kk] = ogr.CreateGeometryFromWkt(poly.to_wkt())
    except:
        coord_catch_wkt[kk] = ogr.CreateGeometryFromWkt(poly.wkt)

# -------------------------------
# construct all grid cell polygons
# -------------------------------
if ( Path(input_file).suffix == '.nc'):

    print(' ')
    print('   (3) Generate shapes for NetCDF grid cells ...')

    grid_cell_geom_gpd_wkt_ea = [ [ [] for ilon in range(nlon) ] for ilat in range(nlat) ]
    if dojson:
        grid_cell_geom_gpd_wkt_ll = [ [ [] for ilon in range(nlon) ] for ilat in range(nlat) ]
    for ilat in range(nlat):
        if ilat%10 == 0:
            print('   >>> Latitudes done: {0} of {1}'.format(ilat,nlat))

        for ilon in range(nlon):

            # -------------------------
            # EPSG:3035   needs a swap before and after transform ...
            # -------------------------
            # gridcell_edges = [ [lath[ilat,ilon]    , lonh[ilat,  ilon]    ],            # for some reason need to switch lat/lon that transform works
            #                    [lath[ilat+1,ilon]  , lonh[ilat+1,ilon]    ],
            #                    [lath[ilat+1,ilon+1], lonh[ilat+1,ilon+1]  ],
            #                    [lath[ilat,ilon+1]  , lonh[ilat,  ilon+1]  ]]

            # tmp = shape_to_geometry(gridcell_edges, epsg=crs_caea)
            # tmp.SwapXY()              # switch lat/lon back
            # grid_cell_geom_gpd_wkt_ea[ilat][ilon] = tmp

            # -------------------------
            # EPSG:3573   does not need a swap after transform ... and is much faster than transform with EPSG:3035
            # -------------------------
            #
            # Windows            Python 3.8.5 GDAL 3.1.3 --> lat/lon (Ming)
            # MacOS 10.15.6      Python 3.8.5 GDAL 3.1.3 --> lat/lon (Julie)
            # Graham             Python 3.8.2 GDAL 3.0.4 --> lat/lon (Julie)
            # Graham             Python 3.6.3 GDAL 2.2.1 --> lon/lat (Julie)
            # Ubuntu 18.04.2 LTS Python 3.6.8 GDAL 2.2.3 --> lon/lat (Etienne)
            #
            if osgeo_version < '3.0':
                gridcell_edges = [ [lonh[ilat,  ilon]   , lath[ilat,ilon]      ],            # for some reason need to switch lat/lon that transform works
                                   [lonh[ilat+1,ilon]   , lath[ilat+1,ilon]    ],
                                   [lonh[ilat+1,ilon+1] , lath[ilat+1,ilon+1]  ],
                                   [lonh[ilat,  ilon+1] , lath[ilat,ilon+1]    ]]
            else:
                gridcell_edges = [ [lath[ilat,ilon]     , lonh[ilat,  ilon]    ],            # for some reason lat/lon order works
                                   [lath[ilat+1,ilon]   , lonh[ilat+1,ilon]    ],
                                   [lath[ilat+1,ilon+1] , lonh[ilat+1,ilon+1]  ],
                                   [lath[ilat,ilon+1]   , lonh[ilat,  ilon+1]  ]]

            tmp = shape_to_geometry(gridcell_edges, epsg=crs_caea)
            grid_cell_geom_gpd_wkt_ea[ilat][ilon] = tmp

            if dojson:
                tmp = shape_to_geometry(gridcell_edges)
                grid_cell_geom_gpd_wkt_ll[ilat][ilon] = tmp

elif ( Path(input_file).suffix == '.shp'):

    # -------------------------------
    # Grid-cells are actually polygons in a shapefile
    # -------------------------------
    print(' ')
    print('   (3) Extract shapes from shapefile ...')

    grid_cell_geom_gpd_wkt_ea = [ [ [] for ilon in range(nlon) ] for ilat in range(nlat) ]   # nlat = nshapes, nlon = 1
    for ishape in range(nshapes):

        idx = np.where(model_grid_shp[key_colname_model] == ishape)[0]
        if len(idx) == 0:
            print("Polygon ID = {} not found in '{}'. Numbering of shapefile attribute '{}' needs to be [0 ... {}-1].".format(ishape,input_file,key_colname_model,nshapes))
            raise ValueError('Polygon ID not found.')
        if len(idx) > 1:
            print("Polygon ID = {} found multiple times in '{}' but needs to be unique. Numbering of shapefile attribute '{}' needs to be [0 ... {}-1].".format(ishape,input_file,key_colname_model,nshapes))
            raise ValueError('Polygon ID not unique.')
        idx  = idx[0]
        poly = model_grid_shp.loc[idx].geometry
        try:
            grid_cell_geom_gpd_wkt_ea[ishape][0] = ogr.CreateGeometryFromWkt(poly.to_wkt())
        except:
            grid_cell_geom_gpd_wkt_ea[ishape][0] = ogr.CreateGeometryFromWkt(poly.wkt)

else:

    print("File extension found: {}".format(input_file.split('.')[-1]))
    raise ValueError('Input file needs to be either NetCDF (*.nc) or a Shapefile (*.shp).')

# -------------------------------
# Derive overlay and calculate weights
# -------------------------------
print(' ')
print('   (4) Deriving weights ...')

filename = output_file
ff       = open(filename,'w')
ff.write(':GridWeights                     \n')
ff.write('   #                                \n')
ff.write('   # [# HRUs]                       \n')
ff.write('   :NumberHRUs       {0}            \n'.format(nsubbasins))
ff.write('   :NumberGridCells  {0}            \n'.format(nshapes))
ff.write('   #                                \n')
ff.write('   # [HRU ID] [Cell #] [w_kl]       \n')

if dojson:
    cells_to_write_to_geojson = []
    geojson = []

error_dict = {}
for ikk,kk in enumerate(keys):

    ibasin = shape.loc[kk]

    area_basin = coord_catch_wkt[kk].Area()
    enve_basin = coord_catch_wkt[kk].GetEnvelope()   # bounding box around basin (for easy check of proximity)

    area_all = 0.0
    ncells   = 0

    data_to_write = []
    for ilat in range(nlat):
        for ilon in range(nlon):

            enve_gridcell  = grid_cell_geom_gpd_wkt_ea[ilat][ilon].GetEnvelope()   # bounding box around grid-cell (for easy check of proximity)
            grid_is_close  = check_proximity_of_envelops(enve_gridcell, enve_basin)

            if grid_is_close: # this check decreases runtime DRASTICALLY (from ~6h to ~1min)

                grid_cell_area = grid_cell_geom_gpd_wkt_ea[ilat][ilon].Area()

                inter = (grid_cell_geom_gpd_wkt_ea[ilat][ilon].Buffer(0.0)).Intersection(coord_catch_wkt[kk].Buffer(0.0)) # "fake" buffer to avoid invalid polygons and weirdos dumped by ArcGIS
                area_intersect = inter.Area()

                area_all += area_intersect
                if area_intersect > 0:

                    ncells += 1
                    cell_ID = ilat*nlon+ilon
                    data_to_write.append( [int(ibasin[key_colname]),ilat,ilon,cell_ID,area_intersect/area_basin] )

                    if dojson:
                        if cell_ID not in cells_to_write_to_geojson:
                            cells_to_write_to_geojson.append( cell_ID )
                            tmp = grid_cell_geom_gpd_wkt_ll[ilat][ilon].Buffer(0.0).ExportToJson()
                            geojson.append({"type":"Feature","geometry":gjs.loads(tmp),"properties":{"CellId":cell_ID,"Area":grid_cell_area}})

    # mismatch between area of subbasin (shapefile) and sum of all contributions of grid cells (model output)
    error = (area_basin - area_all)/area_basin


    if abs(error) > area_error_threshold and area_basin > 500000.:
        # record all basins with errors larger 5% (if basin is larger than 0.5 km2)
        error_dict[int(ibasin[key_colname])] = [ error, area_basin ]
        for idata in data_to_write:
            print("   >>> {0},{1},{2},{3},{4}".format(idata[0],idata[1],idata[2],idata[3],idata[4]))
            ff.write("   {0}   {1}   {2}\n".format(idata[0],idata[3],idata[4]))

    else:
        # adjust such that weights sum up to 1.0
        for idata in data_to_write:
            corrected = idata[4] * 1.0/(1.0-error)
            print("   >>> {0},{1},{2},{3},{4}  (corrected to {5})".format(idata[0],idata[1],idata[2],idata[3],idata[4],corrected))
            ff.write("   {0}   {1}   {2}\n".format(idata[0],idata[3],corrected))

        if error < 1.0:
            area_all *= 1.0/(1.0-error)
        error    = 0.0

    print('   >>> (Sub-)Basin: {0} ({1} of {2})'.format(int(ibasin[key_colname]),ikk+1,nsubbasins))
    print('   >>> Derived area of {0}  cells: {1}'.format(ncells,area_all))
    print('   >>> Read area from shapefile:   {0}'.format(area_basin))
    print('   >>> error:                      {0}%'.format(error*100.))
    print('   ')

ff.write(':EndGridWeights \n')
ff.close()

# write geoson
if dojson:

    json_file = '.'.join(filename.split('.')[0:-1])+".json"

    geojson = {"type":"FeatureCollection","features":geojson}
    with open(json_file, 'w') as outfile:
      gjs.dump(geojson, outfile)



# print out all subbasins that have large errors
if (error_dict != {}):
    print('')
    print('WARNING :: The following (sub-)basins show large mismatches between provided model')
    print('           grid and domains spefied in the shapefile. It seems that your model')
    print('           output is not covering the entire domain!')
    print("           { <basin-ID>: <error>, ... } = ")
    for attribute, value in error_dict.items():
        print('               {0} : {1:6.2f} % of {2:8.1f} km2 basin'.format(attribute, value[0]*100., value[1]/1000./1000.))
    print('')

print('')
print('Wrote: ',filename)
if dojson:
    print('Wrote: ',json_file)
print('')

**References**

Han, M., Mai, J., Tolson, B. A., Craig, J. R., Gaborit, É., Liu, H., and Lee, K. (2020a):
Subwatershed-based lake and river routing products for hydrologic and land surface models applied over Canada
Canadian Water Resources Journal, 0, 1-15. (publication)

Han, M. et al. (2020b):
An automated GIS toolbox for watershed delineation with lakes
In preparation.

Han, M., Mai, J., Tolson, B. A., Craig, J. R., Gaborit, É., Liu, H., and Lee, K. (2020c):
A catchment-based lake and river routing product for hydrologic and land surface models in Canada (Dataset)
Zenodo. (dataset)

### <font color=#5559AB> 3c) DayMet Data</font>

This section downloads Daymet data for the years of interest, merges the years together, and formats the RVT file for the Raven model.

Daymet provides long-term, continuous, gridded estimates of daily weather and climatology variables produced on a 1 km x 1 km gridded surface by interpolating and extrapolating ground-based observations through statistical modeling techniques.

Only a <font color=red>shapefile of the study area</font> is required.

Areas included in the dataset: <font color=blue>North America</font>

Check out the short video [4.1 DayMet Data - Magpie Workflow](https://youtu.be/wi6lUwNJ2d0) for more information

**Daymet Variables**

Parameter | Abbr | Units | Description
--- | --- | --- | ---
Precipitation | prcp | mm | Daily total precipitation in millimeters. Sum of all forms of precipitation converted to a water-equivalent depth.
Shortwave radiation | srad | W/m2 | Incident shortwave radiation flux density in watts per square meter, taken as an average over the daylight period of the day.
Maximum air temperature | tmax | degrees C | Daily maximum 2 m air temperature in degrees Celsius.
Minimum air temperature | tmin | degrees C | Daily minimum 2 m air temperature in degrees Celsius.

In [None]:
# check libraries
libraries_to_check = ["rasterstats==0.19.0", "requests","earthengine-api"]
check_and_install_libraries(libraries_to_check)

import ee
import requests
import rasterstats as rs
from shapely.geometry import Point

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

# Trigger the authentication flow.
service_account = 'magpie-developer@magpie-id-409519.iam.gserviceaccount.com'
credentials = ee.ServiceAccountCredentials(service_account, os.path.join(main_dir,'extras','magpie-key.json'))

# Initialize the library.
ee.Initialize(credentials)

def read_and_sample_data(main_dir, day_indiv_dir, point_distance, buffer_distance):
    # Find name of shapefile
    shp_file_path = os.path.join(main_dir, 'shapefile')
    shp_file_name = next((f for f in os.listdir(shp_file_path) if f.endswith(".shp")), None)
    if not shp_file_name:
        raise FileNotFoundError("No shapefile found in the specified directory.")

    # Read the original shapefile
    shapefile_gdf = gpd.read_file(os.path.join(shp_file_path, shp_file_name))
    print("Original CRS:", shapefile_gdf.crs)

    # Reproject shapefile to EPSG:4326
    shapefile_gdf = shapefile_gdf.to_crs(epsg=4326)
    print("Reprojected CRS:", shapefile_gdf.crs)

    # Create a buffer around the shapefile
    buffered_shapefile = shapefile_gdf.geometry.buffer(buffer_distance)

    # Get the bounding box of the buffered shapefile
    minx, miny, maxx, maxy = buffered_shapefile.total_bounds

    # Generate a grid of points within the bounding box
    x_coords = np.arange(minx, maxx, point_distance)
    y_coords = np.arange(miny, maxy, point_distance)
    points = [Point(x, y) for x in x_coords for y in y_coords]

    grid_gdf = gpd.GeoDataFrame(geometry=points, crs="EPSG:4326")
    grid_gdf = grid_gdf[grid_gdf.within(buffered_shapefile.unary_union)]

    # Extract lat and lon from geometry
    grid_gdf['lon'] = grid_gdf.geometry.x
    grid_gdf['lat'] = grid_gdf.geometry.y

    # Save selected points to CSV
    grid_gdf[['lon', 'lat']].to_csv(os.path.join(day_indiv_dir, 'selected_lat_lon.csv'), index=False)

    # Plotting
    fig, ax = plt.subplots()
    shapefile_gdf.plot(ax=ax, color='gray', edgecolor='black', alpha=0.5, label='Original Shapefile')
    gpd.GeoDataFrame(geometry=buffered_shapefile, crs="EPSG:4326").plot(ax=ax, color='blue', edgecolor='black', alpha=0.3, label='Buffered Shapefile')
    grid_gdf.plot(ax=ax, color='red', markersize=10, label='Selected Points (Grid)')
    plt.title('Generated Grid Points over Shapefile (Clipped)')
    plt.legend()
    plt.show()

def download_daymet_data(lat, lon, variables, start_date, end_date, output_dir):
    base_url = 'https://daymet.ornl.gov/single-pixel/api/data'

    # Prepare parameters for the request
    params = {
        'lat': lat,
        'lon': lon,
        'vars': ','.join(variables),
        'start': start_date,
        'end': end_date
    }

    # Construct the URL
    url = f"{base_url}?{'&'.join([f'{key}={value}' for key, value in params.items()])}"

    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)

    # Download data using wget
    command = f"wget --content-disposition '{url}' -P {output_dir}"
    os.system(command)


def process_csv_files(csv_files_dir, min_year_val, max_year_val, params, units_lst, daymet_ind_dir):
    count = -1

    for csv_file in os.listdir(csv_files_dir):
        if csv_file.endswith(".csv"):  # Assuming only CSV files are processed; adjust if needed
            count += 1

            # Load the CSV file
            forc_input = pd.read_csv(os.path.join(csv_files_dir, csv_file), skiprows=6)

            # Create a complete date range
            forc_input['date'] = pd.to_datetime(forc_input['year'].astype(str) + forc_input['yday'].astype(str), format='%Y%j')
            full_date_range = pd.date_range(start=f"{min_year_val}-01-01", end=f"{max_year_val}-12-31", freq='D')

            # Reindex to include all dates
            forc_input = forc_input.set_index('date').reindex(full_date_range).reset_index()
            forc_input.rename(columns={'index': 'date'}, inplace=True)

            # Fill missing year and yday columns
            forc_input['year'] = forc_input['date'].dt.year
            forc_input['yday'] = forc_input['date'].dt.dayofyear

            # Handle missing data with defaults or interpolations
            forc_input.fillna(method='ffill', inplace=True)  # Forward-fill as default (adjust as needed)

            if 'TEMP_DAILY_MAX' in params:
              forc_input['tmax (deg c)'] = forc_input['tmax (deg c)'].replace(-0.0, 0.0)
            if 'TEMP_DAILY_MIN' in params:
              forc_input['tmin (deg c)'] = forc_input['tmin (deg c)'].replace(-0.0, 0.0)
            if 'SW_RADIA' in params:
              forc_input['srad (W/m^2)'] = forc_input['srad (W/m^2)'].multiply(0.0864).round(2)
            if 'REL_HUMIDITY' in params:
              if 'tmax (deg c)' in forc_input and 'tmin (deg c)' in forc_input:
                # Calculate mean temperature
                forc_input['meanT'] = (forc_input['tmax (deg c)'] + forc_input['tmin (deg c)']) / 2

                # Calculate saturation vapor pressure and convert it to Pa
                forc_input['SatVap'] = 6.11 * np.exp((17.27 * forc_input['meanT']) / (237.3 + forc_input['meanT'])) * 100

                # Calculate relative humidity
                forc_input['vp (Pa)'] = round((forc_input['vp (Pa)'] / forc_input['SatVap']),2)

                # Adjust relative humidity values that are exactly 0 or exceed 100
                forc_input['vp (Pa)'] = np.where((forc_input['vp (Pa)'] == 0) | (forc_input['vp (Pa)'] > 100), 100, forc_input['vp (Pa)'])

                # Drop the 'meanT' column as it is no longer needed
                forc_input.drop('meanT', axis=1, inplace=True)
                # Drop the 'meanT' column as it is no longer needed
                forc_input.drop('SatVap', axis=1, inplace=True)
              else:
                print('Min and max temperature is required to compute relative humidity')

            rm_col_df = forc_input.drop(['year', 'yday', 'date'], axis=1)
            t3 = '\n'.join(rm_col_df.astype(str).apply(lambda x: ', '.join(x), axis=1))

            output_file_path = os.path.join(daymet_ind_dir, f"File{count}.rvt")
            with open(output_file_path, "a") as f:
                print(":MultiData", file=f)
                print(f"{min_year_val}-01-01 00:00:00 1.0 {len(forc_input.iloc[:, 0])}", file=f)
                print(f":Parameters,{params}", file=f)
                print(f":Units,{units_lst}", file=f)
                print(f"{t3}", file=f)
                print(":EndMultiData", file=f)

            print(f'Formatted DayMet data for Raven input saved to {output_file_path}')

def check_projection(shp_file_path, main_dir, temporary_dir):
    # Print header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Check projection of shapefile')
    print('-----------------------------------------------------------------------------------------------')

    # Find name of shapefile
    for shp_file in os.listdir(os.path.join(shp_file_path)):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    # Check if a shapefile was found
    if not shp_file_name:
        print('No shapefile found in the specified directory.')
        return

    # Print shapefile information
    print('Shapefile name: ', shp_file_name)
    shp_file_full_path = os.path.join(main_dir, "shapefile", shp_file_name)
    print('Shapefile path: ', shp_file_full_path)

    # Read shapefile
    shp_lyr_check = gpd.read_file(shp_file_full_path)
    print('Shapefile CRS: ', shp_lyr_check.crs)

    # Check if CRS is different from EPSG:4326
    if shp_lyr_check.crs != 'EPSG:4326':
        # Reproject to EPSG:4326
        shp_lyr_crs = shp_lyr_check.to_crs(epsg=4326)
        shp_lyr_crs.to_file(shp_file_full_path)
        print('Shapefile layer has been reprojected to match EPSG:4326')
    else:
        shp_lyr_crs = shp_lyr_check
        print('Coordinate systems match!')

def download_DEM(shp_file_path, data_source, band_name, scale, temporary_dir):
    # Print header
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Download DEM')
    print('-----------------------------------------------------------------------------------------------')

    # Authenticate and initialize Earth Engine (uncomment if needed)
    # ee.Authenticate()
    # ee.Initialize()

    # Define buffer size
    buffer_size = 0.01

    # Find name of shapefile
    shp_file_path = os.path.join(main_dir, 'shapefile')
    for shp_file in os.listdir(shp_file_path):
        if shp_file.endswith(".shp"):
            shp_file_name = shp_file

    # Determine the boundary of the provided shapefile
    bounds = gpd.read_file(os.path.join(main_dir, 'shapefile', shp_file_name)).bounds
    west, south, east, north = bounds = bounds.loc[0]
    west -= buffer_size * (east - west)
    east += buffer_size * (east - west)
    south -= buffer_size * (north - south)
    north += buffer_size * (north - south)

    # Create Earth Engine Image and define region
    img = ee.Image(data_source)
    region = ee.Geometry.BBox(west, south, east, north)

    # Get download URL for the specified region and parameters
    url = img.getDownloadUrl({
        'bands': [band_name],
        'region': region,
        'scale': scale,
        'format': 'GEO_TIFF'
    })

    # Define output directory for downloaded DEM
    dem_dir = os.path.join(temporary_dir, 'DEM')
    if not os.path.exists(dem_dir):
        os.makedirs(dem_dir)

    # Download the DEM and save it locally
    response = requests.get(url)
    with open(os.path.join(dem_dir, 'dem.tif'), 'wb') as fd:
        fd.write(response.content)

def extract_elev_data(temporary_dir,day_indiv_dir):
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Extract elevation data')
    print('-----------------------------------------------------------------------------------------------')

    # Find the DEM file name
    dem_files = [file for file in os.listdir(os.path.join(temporary_dir, 'DEM')) if file.endswith(".tif")]
    if not dem_files:
        raise FileNotFoundError("No DEM files found in the specified directory.")

    dem_file_name = dem_files[0]  # Assuming only one DEM file for simplicity

    # Define path/open raster
    dem_path = os.path.join(temporary_dir, 'DEM', dem_file_name)
    print(dem_path)
    dem_lyr = rxr.open_rasterio(dem_path, masked=True).squeeze()

    # Isolate lat and lon variables
    df_coords = pd.read_csv(os.path.join(day_indiv_dir, 'selected_lat_lon.csv'))
    df_coords1 = pd.DataFrame({"lon": df_coords.lon, "lat": df_coords.lat}).reset_index(drop=True)

    # Generate points shapefile of lats and lon
    gdf = gpd.GeoDataFrame(
        df_coords1, geometry=gpd.points_from_xy(df_coords1.lon, df_coords1.lat), crs="EPSG:4326")

    # Create a buffered polygon layer from your plot location points
    plots_poly = gdf.copy()

    # Buffer each point using a 20-meter circle radius
    # and replace the point geometry with the new buffered geometry
    plots_poly["geometry"] = gdf.buffer(0.2)  # Buffer each point with a 5-meter radius
    plots_poly.head()

    # Export the buffered point layer as a shapefile to use in zonal stats
    plot_buffer_path = os.path.join(temporary_dir, 'DEM', 'plot_buffer.shp')
    plots_poly.to_file(plot_buffer_path)

    # Extract zonal stats
    elev_zonal = rs.zonal_stats(plot_buffer_path,
                                  dem_lyr.values,
                                  nodata=-999,
                                  affine=dem_lyr.rio.transform(),
                                  geojson_out=True,
                                  copy_properties=True,
                                  stats="mean")

    # Turn extracted data into a pandas geodataframe
    elev_zonal_df = gpd.GeoDataFrame.from_features(elev_zonal)
    df_elev = pd.DataFrame(elev_zonal_df['mean'])
    return df_elev, df_coords1

def generate_RVT_gauges(df_elev, df_coords1,model_name):
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Generate RVT for gauges')
    print('-----------------------------------------------------------------------------------------------')

    with open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a") as f:
        print(f"#------------------------------------------------------------------------", file=f)
        print(f"# Climate Stations List", file=f)
        print(f"#------------------------------------------------------------------------\n#", file=f)

    lst_vals = list(range(0, len(df_coords1['lon'])))

    for n in lst_vals:
        #n1 = forc_input[(forc_input.lon == df_coords1.lon[n])&(forc_input.lat == df_coords1.lat[n])].reset_index(drop=True)
        #print(n1)
        f = open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a")
        print(f":Gauge File{n}", file=f)
        print(f":Latitude {df_coords1['lat'][n]}", file=f)
        print(f":Longitude {df_coords1['lon'][n]}", file=f)
        print(f":Elevation {df_elev['mean'][n]}", file=f)
        print(f":RedirectToFile DayMet/File{n}.rvt", file=f)
        print(f":EndGauge", file=f)
        print(f"#", file=f)
        f.close()

    f = open(os.path.join(main_dir, 'workflow_outputs','RavenInput','awslist.txt'), "a")
    print(f"#------------------------------------------------------------------------", file=f)
    print(f"# Climate Stations List", file=f)
    print(f"#------------------------------------------------------------------------\n#", file=f)
    f.close()

    lst_vals = list(range(0, len(df_coords1['lon'])))

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

In [None]:
#@markdown <font color=#5559AB> **Define years of interest** </font>

#@markdown Input the start year (min_year) and the end year (max_year)

min_year = "2000" #@param {type:"string"}
max_year = "2005" #@param {type:"string"}

#@markdown <font color=#5559AB> **Define number of points of interest** </font>

#@markdown <font color=grey> for example, 0.01

point_distance = 0.01 #@param

#@markdown <font color=#5559AB> **Define buffer size around study area** </font>

#@markdown <font color=grey> for example, 0.05

buffer_distance = 0.01 #@param

# back-up output directory
output_forc_dir = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data', 'forcing_backups')
os.makedirs(output_forc_dir, exist_ok=True)
print("created folder:", output_forc_dir) if not os.path.isdir(output_forc_dir) else None

day_indiv_dir = os.path.join(temporary_dir, 'DayMet_individual_files')
os.makedirs(day_indiv_dir, exist_ok=True)
print("created folder:", day_indiv_dir) if not os.path.isdir(day_indiv_dir) else None

# download DEM layer
read_and_sample_data(main_dir, day_indiv_dir, point_distance, buffer_distance)

In [None]:
# Lists to store information about variables of interest
variables_of_interest = []
params = []
units = []

#@markdown <font color=#5559AB> **Select variables** </font>

# Daymet variables configuration
prcp_dayMet = True #@param {type:"boolean"}
srad_dayMet = False #@param {type:"boolean"}
tmax_dayMet = True #@param {type:"boolean"}
tmin_dayMet = True #@param {type:"boolean"}
vp_dayMet = False #@param {type:"boolean"}


# Check which Daymet variables to include and populate lists
if prcp_dayMet:
    variables_of_interest.append('prcp')
    params.append('PRECIP')
    units.append('mm/d')
if srad_dayMet:
    variables_of_interest.append('srad')
    params.append('SW_RADIA')
    units.append('MJ/m^2/d')
if tmax_dayMet:
    variables_of_interest.append('tmax')
    params.append('TEMP_DAILY_MAX')
    units.append('degC')
if tmin_dayMet:
    variables_of_interest.append('tmin')
    params.append('TEMP_DAILY_MIN')
    units.append('degC')
if vp_dayMet:
    variables_of_interest.append('vp')
    params.append('REL_HUMIDITY')
    units.append('0..1')

# Create a directory for temporary files
day_indiv_dir = os.path.join(temporary_dir, 'DayMet_individual_files')
os.makedirs(day_indiv_dir, exist_ok=True)

# Read latitude and longitude data from a CSV file
df_lat_lon = pd.read_csv(os.path.join(day_indiv_dir, 'selected_lat_lon.csv'))
latitudes = df_lat_lon['lat'].tolist()
longitudes = df_lat_lon['lon'].tolist()

# Specify Daymet download parameters
start_date = f"{min_year}-01-01"
end_date = f"{max_year}-12-31"

# Create a directory to store downloaded Daymet data
output_directory = os.path.join(temporary_dir, 'csv_files')
os.makedirs(output_directory, exist_ok=True)

# Loop through each location and download Daymet data
for lat, lon in zip(latitudes, longitudes):
    print(f"Downloading data for Latitude: {lat}, Longitude: {lon}")
    download_daymet_data(lat, lon, variables_of_interest, start_date, end_date, output_directory)

print("Data download complete.")


In [None]:
#@markdown <font color=grey> **Generate RVT files for each of the points** </font>

# Example usage
csv_files_directory = output_directory
parameters = ', '.join(params)  # Replace with your actual parameters
units_lst = ', '.join(units) # Replace with your actual units
daymet_output_directory = os.path.join(main_dir, 'workflow_outputs','RavenInput','DayMet')
os.makedirs(daymet_output_directory, exist_ok=True)

process_csv_files(csv_files_directory, min_year, max_year, parameters, units_lst, daymet_output_directory)


In [None]:
#@markdown <font color=#5559AB> **Select Elevation Layer** </font>

#@markdown either use the exsisting DEM layer saved in the 1_HRU_data folder, download a MERIT DEM using Google Earth Engine, or upload your own DEM layer

select_elev_layer = 'use_existing' #@param ["use_existing","download", "upload"] {type:"string"}

# generate elevation data
if select_elev_layer == 'use_existing':
  elev_path = os.path.join(main_dir, 'workflow_outputs', '1_HRU_data')
  df_elev, df_coords1 = extract_elev_data(elev_path,day_indiv_dir)
elif select_elev_layer == 'download':
  # shapefile path
  shp_file_path = os.path.join(main_dir, 'shapefile')
  data_source = "MERIT/DEM/v1_0_3"
  band_name = "dem"
  scale = 90
  check_projection(shp_file_path, main_dir,temporary_dir)
  download_DEM(shp_file_path,data_source, band_name, scale, temporary_dir)
  df_elev, df_coords1 = extract_elev_data(temporary_dir,day_indiv_dir)
elif select_elev_layer == 'upload':
  print('\n-----------------------------------------------------------------------------------------------')
  print('( ) Upload DEM')
  print('-----------------------------------------------------------------------------------------------')
  # Define temporary directory for DEM
  dem_temp_dir = os.path.join(temporary_dir, 'DEM')
  os.makedirs(dem_temp_dir, exist_ok=True)
  print("Created folder:", dem_temp_dir) if not os.path.isdir(dem_temp_dir) else None
  print('\n')
  print(f'drag-and-drop DEM (.tif) file into following folder: {dem_temp_dir}')
  response = input("Have you uploaded the DEM file (yes or no): ")
  if response == "yes":
    df_elev, df_coords1 = extract_elev_data(temporary_dir,day_indiv_dir)

In [None]:
#@markdown <font color=grey> **Generate Raven RVT input files for DayMet data** </font></br>

generate_RVT_gauges(df_elev, df_coords1,model_name)

In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download


# delete temorary folder
remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

# Define boolean variables based on user inputs
prcp_dayMet_res = 'yes' if prcp_dayMet else 'no'
srad_dayMet_res = 'yes' if srad_dayMet else 'no'
tmax_dayMet_res = 'yes' if tmax_dayMet else 'no'
tmin_dayMet_res = 'yes' if tmin_dayMet else 'no'
vp_dayMet_res = 'yes' if vp_dayMet else 'no'

# Define years of interest
min_year = "2000"
max_year = "2001"

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False  # @param {type:"boolean"}

# Create a dictionary to store configuration data
data = {
    "_comment": "------------ 3c) DAYMET ---------------",

    "prcp_dayMet": prcp_dayMet_res,
    "tmax_dayMet": tmax_dayMet_res,
    "tmin_dayMet": tmin_dayMet_res,
    "vp_dayMet": vp_dayMet_res,
    "srad_dayMet": srad_dayMet_res,

    "point_distance": point_distance,
    "buffer_distance": buffer_distance,

    "min_year_dayMet": min_year,
    "max_year_dayMet": max_year,

    "select_elev_layer": select_elev_layer,
}

# Specify the file path where you want to save the JSON file
file_path = os.path.join(config_path, "3c_daymet.json")

# Writing data to the JSON file
with open(file_path, 'w') as json_file:
    json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


**References**

Thornton, P. E., R. Shrestha, M. Thornton, S.-C. Kao, Y. Wei, and B. E. Wilson. (2022). Gridded daily weather data for North America with comprehensive uncertainty quantification. Scientific Data 8.

### <font color=#5559AB> 3d) Environment Canada Climate Data </font>

This Python subsection downloads and formats Environment Canada climate station records into RVT format file usable in Raven. This subsection will be replaced with "weathercan" at a later date when it is available on CRAN.

Areas included in the dataset: <font color=blue>Canada</font>

Check out the short video [Downloading and Formatting Environment Canada Climate Data for Raven: Magpie Tutorial](https://youtu.be/N1Cw4OgI7bo) for more information.

In [None]:
# check libraries
libraries_to_check = ["folium", "branca","requests",
                      "re", "plotly"]
check_and_install_libraries(libraries_to_check)

import folium
import branca
from IPython.display import display
import requests
from bs4 import BeautifulSoup
import re
import plotly.express as px

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

def fancy_html(row):
    climName = climate_stations_updated['Name'].iloc[row]
    lat_info = climate_stations_updated['Latitude'].iloc[row]
    lon_info = climate_stations_updated['Longitude'].iloc[row]

    html = f"""<!DOCTYPE html>
        <html>
        <p>Climate Station Name: {climName}</td></p>
        <p>Lat: {lat_info}</td></p>
        <p>Lon: {lon_info}</td></p>
        </html>
    """
    return html

def interactive_map_climate_stations(main_dir,climate_stations_updated):
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Interactive map')
    print('-----------------------------------------------------------------------------------------------')

    shp_file_path = os.path.join(main_dir, 'shapefile')

    # Generate map with rectangular guide to assist in identifying the ideal subbasin/gauges to use
    shp_file_name = next((file for file in os.listdir(shp_file_path) if file.endswith(".shp")), None)
    if shp_file_name:
        shp_boundary = gpd.read_file(os.path.join(main_dir, 'shapefile', shp_file_name))
        # Reproject the GeoDataFrame to EPSG:4326 (WGS84)
        shp_boundary = shp_boundary.to_crs(epsg=4326)

        # Get the bounds and extract the center for map initialization
        bounds = shp_boundary.bounds.loc[0]
        shp_bounds = [bounds['miny'], bounds['minx']]
        # Create a Folium map centered at the reprojected bounds
        map = folium.Map(location=shp_bounds, zoom_start=10)
        # Add the GeoJSON representation of the geometry to the map
        folium.GeoJson(data=shp_boundary["geometry"]).add_to(map)

        for i in range(len(climate_stations_updated)):
            html = fancy_html(i)
            iframe = branca.element.IFrame(html=html, width=200, height=200)
            popup = folium.Popup(iframe, parse_html=True)
            folium.Marker([climate_stations_updated['Latitude'].iloc[i], climate_stations_updated['Longitude'].iloc[i]],
                          popup=popup).add_to(map)

        display(map)


def station_look_up(province,start_year,temporary_dir):
    """
    Perform a station look-up for climate data and save the results to a CSV file.

    This function uses the Environment Canada website to look up climate stations based on province and start year.

    Parameters:
    - variable['province']: Province abbreviation (e.g., 'ON' for Ontario)
    - variable['climate_start_yr']: Start year for climate data retrieval

    Outputs:
    - A CSV file containing station information (StationID, Name, Intervals, Year Start, Year End)
    """

    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Station look-up')
    print('-----------------------------------------------------------------------------------------------')

    max_pages = 10  # Number of maximum pages to parse (EC's limit is 100 rows per page)

    # Store each page in a list and parse them later
    soup_frames = []

    # Download and parse each page
    for i in range(max_pages):
        start_row = 1 + i * 100
        print('Downloading Page: ', i)

        base_url = "http://climate.weather.gc.ca/historical_data/search_historic_data_stations_e.html?"
        query_province = "searchType=stnProv&timeframe=1&lstProvince={}&optLimit=yearRange&".format(province)
        query_year = "StartYear={}&EndYear=2025&Year=2025&Month=5&Day=29&selRowPerPage=100&txtCentralLatMin=0&txtCentralLatSec=0&txtCentralLongMin=0&txtCentralLongSec=0&".format(start_year)
        query_start_row = "startRow={}".format(start_row)

        response = requests.get(base_url + query_province + query_year + query_start_row)
        soup = BeautifulSoup(response.text, 'html.parser')
        soup_frames.append(soup)

    # Empty list to store the station data
    station_data = []

    # Parse each soup
    for soup in soup_frames:
        forms = soup.findAll("form", {"id": re.compile('stnRequest*')})
        for form in forms:
            try:
                # Extract station information
                station_id = form.find("input", {"name": "StationID"})['value']
                name = form.find("input", {"name": "lstProvince"}).find_next_siblings("div")[0].text
                timeframes = form.find("select", {"name": "timeframe"}).findChildren()
                intervals = [t.text for t in timeframes]
                years = form.find("select", {"name": "Year"}).findChildren()
                min_year = years[0].text
                max_year = years[-1].text

                # Store the data in an array
                data = [station_id, name, intervals, min_year, max_year]
                station_data.append(data)
            except:
                pass

    # create temporary dir
    station_temp_path = os.path.join(temporary_dir,'station_lookup_dir')
    os.makedirs(station_temp_path, exist_ok=True)

    # Create a pandas dataframe using the collected data and give it the appropriate column names
    stations_df = pd.DataFrame(station_data, columns=['StationID', 'Name', 'Intervals', 'Year Start', 'Year End'])
    stations_df.to_csv(os.path.join(station_temp_path, f'station_lookup_{province}.csv'))

def stationID_download(station_ID_lst, start_date, end_date, temporary_dir):
  all_files_merged = []
  for stationID in station_ID_lst:
      print('\n-----------------------------------------------------------------------------------------------')
      print(f'( ) {stationID} station download')
      print('-----------------------------------------------------------------------------------------------')
      # Define time step
      time_step = "daily"
      time_step_val = {"hourly": 1, "daily": 2, "monthly": 3}[time_step]

      # Define station folder
      station_dir = os.path.join(temporary_dir, 'weather_data', str(stationID))
      os.makedirs(station_dir, exist_ok=True)

      # Split the date string
      start_year, start_month, start_day = start_date.split('-')
      end_year, end_month, end_day = end_date.split('-')

      # Define month list
      month = ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]
      # Collect years of interest and store as a list
      years_of_interest = [str(year) for year in range(int(start_year), int(end_year) + 1)]
      print('List of years:', years_of_interest)

      # Download climate data for each month and year
      for yr in years_of_interest:
          for m in month:
              base_url = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?"
              query_url = f"format=csv&stationID={stationID}&Year={yr}&Month={m}&timeframe={time_step_val}"
              api_endpoint = base_url + query_url
              wget.download(api_endpoint, out=station_dir)

      # Remove duplicate files
      file_list = glob(os.path.join(station_dir, "*(1).csv"))
      for f in file_list:
          os.remove(f)

      # Merge files
      file_pattern = f'en_climate_{time_step}_*.csv'
      file_daily = glob(os.path.join(station_dir, file_pattern), recursive=True)

      # Define merged folder
      merge_dir = os.path.join(temporary_dir, 'weather_data', 'merge')
      os.makedirs(merge_dir, exist_ok=True)

      # Merge files
      climateIDs = pd.DataFrame()
      files_merged = pd.concat([pd.read_csv(f) for f in file_daily]).reset_index()
      files_merged.to_csv(os.path.join(merge_dir, f'stationID_{stationID}_merged.csv'))

      # Convert the 'Date/Time' column to datetime format
      files_merged['Date/Time'] = pd.to_datetime(files_merged['Date/Time'], errors='coerce')

      # Filter the DataFrame for the specified date range
      files_merged = files_merged[
          (files_merged['Date/Time'] >= start_date) &
          (files_merged['Date/Time'] <= end_date)
      ]

      files_merged.to_csv(os.path.join(merge_dir, f'stationID_{stationID}_merged.csv'))

      # Display dataset
      display(files_merged)
      all_files_merged.append(files_merged)

def generate_rvt_files(station_ID_lst, col_name_lst, param_lst, unit_lst, start_date, main_dir, temporary_dir):
    # Create directory if it doesn't exist
    obs_dir = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'climate_obs')
    os.makedirs(obs_dir, exist_ok=True)

    for stationID in station_ID_lst:
        print('\n-----------------------------------------------------------------------------------------------')
        print(f'( ) Generate Station {stationID} RVT File')
        print('-----------------------------------------------------------------------------------------------')
        start_year, start_month, start_day = start_date.split('-')
        # Define Minimum Year, Month, and Day
        min_year = str(start_year)
        month = str(start_month)
        day = str(start_day)

        # Glob pattern for finding merged files in the folder
        file_pattern = f'stationID_{stationID}_merged.csv'
        file_path = os.path.join(temporary_dir, 'weather_data', 'merge', file_pattern)
        print(file_path)

        # Check if the file exists
        if os.path.exists(file_path):
            # Read the merged file
            files_merged_station = pd.read_csv(file_path)

            # Check for NaN values in the DataFrame and replace them with -1.2345
            #files_merged_station.fillna(-1.2345, inplace=True)

            # Define variables to incorporate into the Raven RVT file
            var_df = files_merged_station[col_name_lst]
            print(var_df)

            for vars in col_name_lst:
                var_df[vars] = (var_df[vars].interpolate(method='linear',order=3, limit=None,limit_direction='both').ffill().bfill())
                plt.figure()  # Create a new figure for each column
                plt.plot(var_df[vars])
                plt.title(f'Line Plot for {vars}')

            # Show all the plots
            plt.show()

            # Generate RVT file for each station
            file_name = f'station_{stationID}'

            # Write RVT file
            var_vals = '\n'.join(var_df.astype(str).apply(lambda x: ', '.join(x), axis=1))

            with open(os.path.join(obs_dir, f"{file_name}.rvt"), "a") as f:
                print(":MultiData", file=f)
                print(f"{min_year}-{month}-{day} 00:00:00 1.0 {len(files_merged_station)}", file=f)
                print(f":Parameters,{param_lst}", file=f)
                print(f":Units,{unit_lst}", file=f)
                print(f"{var_vals}", file=f)
                print(":EndMultiData", file=f)
        else:
            print(f"File not found for station {stationID}: {file_pattern}")

def rvt_station_generator(station_ID_lst, model_name, main_dir):
  print('\n-----------------------------------------------------------------------------------------------')
  print('( ) Generate station RVT files')
  print('-----------------------------------------------------------------------------------------------')
  # Create directory if it doesn't exist
  obs_dir = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'climate_obs')
  # generate list of station files in climate_obs
  file_lst = glob(os.path.join(obs_dir, "*.rvt"))

  lst_gauge_data = []
  for f in file_lst:
    f_name = os.path.basename(f)
    lst_gauge_data.append(f_name)

  print("Station files: ",lst_gauge_data)

  # empty list to collect climate IDs
  climateIDs = pd.DataFrame()
  # read in csv with climate station information
  climate_stations = pd.read_csv(os.path.join(main_dir, 'extras','subbasin_plots','climate_station_inventory.csv'))
  climate_stations_updated = climate_stations.dropna(subset=['Latitude'])

  with open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a") as f:
    print(f"#########################################################################", file=f)
    print(f"# Climate Stations List", file=f)
    print(f"#------------------------------------------------------------------------\n#", file=f)

  for n in range(0, (len(lst_gauge_data))):
    station_val = station_ID_lst[n]
    df_search = climate_stations_updated[climate_stations_updated['Station_ID'].astype(str).str.contains(str(station_val))]
    if df_search.empty:
      f = open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a")
      print(f":Gauge {station_val}", file=f)
      print(f"\t:Latitude \t\t\t\t[]", file=f)
      print(f"\t:Longitude \t\t\t\t[]", file=f)
      print(f"\t:Elevation \t\t\t\t[]", file=f)
      print(f"\t:RedirectToFile \tclimate_obs/{lst_gauge_data[n]}", file=f)
      print(f":EndGauge", file=f)
      print(f"#", file=f)
      f.close()
      print(f'WARNING: The climate station inventory CSV file available in Magpie unfortunately does not have the latitude, longitude, and elevation for {lst_gauge_data[n]}. Please follow the directions below to fill in the missing information.')
    else:
      f = open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a")
      print(f":Gauge {station_val}", file=f)
      print(f"\t:Latitude \t\t\t\t{df_search['Latitude'].iloc[0]}", file=f)
      print(f"\t:Longitude \t\t\t\t{df_search['Longitude'].iloc[0]}", file=f)
      print(f"\t:Elevation \t\t\t\t{df_search['Elevation'].iloc[0]}", file=f)
      print(f"\t:RedirectToFile \tclimate_obs/{lst_gauge_data[n]}", file=f)
      print(f":EndGauge", file=f)
      print(f"#", file=f)
      f.close()

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

In [None]:
#@markdown <font color=#5559AB> **Visualization of Potential Climate Stations in Study Area** </font>

#@markdown This cell produces an interactive plot with the study area shapefile, where users can click on the points to identify the stations names, latitude, and longitude

map_visualize = True #@param {type:"boolean"}

# Read in CSV with climate station information
climate_stations = pd.read_csv(os.path.join(main_dir, 'extras', 'subbasin_plots', 'climate_station_inventory.csv'))
climate_stations_updated = climate_stations.dropna(subset=['Latitude'])

# generate map with rectangular guide to assist in identifying the ideal subbasin/gauges to use
if map_visualize == True:
  interactive_map_response = 'yes'
  interactive_map_climate_stations(main_dir,climate_stations_updated)
else:
  interactive_map_response = 'no'


In [None]:
#@markdown <font color=grey> **Station Look-up by Provice** </font>

#@markdown <font color=#5559AB> **Define Provice** </font><br>
#@markdown <font color=grey> use province abbreviation, for example:  <br>
#@markdown <font color=grey> ON, for Ontario </font>
province = "ON"   #@param ["BC", "AB", "SK", "MB", "ON", "QC", "NB", "NS", "PEI", "NL", "YT", "NT", "NU"]

#@markdown <font color=#5559AB> **Define Start Year** </font>
start_year = "2015" #@param {type:"string"}

max_pages = 10        # Number of maximum pages to parse, EC's limit is 100 rows per page, there are about 500 stations in BC with data going back to 2006

# run function
station_look_up(province,start_year,temporary_dir)



This subsection provides hourly, daily, and/or monthly downloads for gauge stations of interest. The downloaded data is the visualized and formatted into a Raven RVT file that are stored in the "RavenInput" --> "obs" folder.

If the user would like to download and incorporate multiple gauge stations into their Raven model, run this subsection for each station of interest then move onto subsection 4.5.2



In [None]:
#@markdown <font color=grey> **Generate Daily Climatic Variables for Selected Station** </font>

#@markdown <font color=#5559AB> **Define Station(s) of Interest** </font>
stations_of_interest = "32232,47748" #@param {type:"string"}

#@markdown <font color=#5559AB> **Define Start Year** </font>
start_date = "2016-10-01" #@param {type:"string"}

#@markdown <font color=#5559AB> **Define End Year** </font>
end_date = "2019-09-30" #@param {type:"string"}

#@markdown <font color=#5559AB> **Define Station of Interest** </font><br>
#@markdown <font color=grey> **for hourly data** </font>
time_step = "daily" #@param ["hourly", "daily", "monthly"]

# Split the string into a list of substrings using the commas
str_values = stations_of_interest.split(',')
# Convert each substring to an integer and create a list of integers
station_ID_lst = [int(value.strip()) for value in str_values]

# download station data
files_merged = stationID_download(station_ID_lst, start_date, end_date, temporary_dir)

In [None]:
#@markdown <font color=#5559AB> **Define variables to incorporate into the Raven RVT file** </font><br>
#@markdown ensure each variable is within quatotations, a comma seperates each variable, and that it is surrounded by square brackets <br>
#@markdown <font color=grey> for example,<br> ['Total Precip (mm)', 'Mean Temp (°C)']

col_name_lst = ['Total Precip (mm)', 'Max Temp (°C)', 'Min Temp (°C)']   #@param
#@markdown <font color=grey> recommended that users copy and paste column names from the Preview Column Names cell

#@markdown <font color=#5559AB> **Adjust Variable names for RVT file and define units** </font><br>
#@markdown ensure each variable follows the same order as presented above and a comma seperates each variable <br>
#@markdown <font color=grey> for example,<br> 'PRECIP, TEMP_AVE'
param_lst = 'PRECIP, TEMP_MAX, TEMP_MIN' #@param {type:"string"}
unit_lst = 'mm/d, C, C' #@param {type:"string"}
#@markdown <font color=grey> additional naming suggestions, TEMP_MIN, TEMP_MAX, SNOWFALL, RAINFALL, PET<br></font>

#@markdown be sure to check the [Raven Manual](http://raven.uwaterloo.ca/files/v3.6/RavenManual_v3.6.pdf) for additional information on which variables can be included in RVT files, naming suggestions, and appropriate units

generate_rvt_files(station_ID_lst, col_name_lst, param_lst, unit_lst, start_date, main_dir,temporary_dir)

In [None]:
#@markdown <font color=grey> **Generate Raven RVT Input containing Station Information** </font></br>

rvt_station_generator(station_ID_lst, model_name, main_dir)


<font color=grey>**If warning is present:**

The user needs to add latitude, longitude, and elevation data to RVT file once the cell is run. Please utilize [Search by Station Name](https://climate.weather.gc.ca/historical_data/search_historic_data_e.html) to gather latitude, longitude, and elevation information. The RVT file can the be opened directly in the Magpie Workflow, remove the square brackets ('[  ]') and replace them with the missing information. Press Ctrl+S to save.

In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download


# delete temorary folder
remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

col_names = ', '.join(col_name_lst)
params = ', '.join(param_lst)
units = ', '.join(unit_lst)

config_path = os.path.join(main_dir, "configuration_files")
os.makedirs(config_path, exist_ok=True)

write_to_configuration_file = False  # @param {type:"boolean"}

# Create a dictionary to store configuration data
data = {
    "_comment": "------------ 3d) ENVIRONMENT CANADA CLIMATE DATA ---------------",

    "interactive_map_climate_stations": f"{interactive_map_response}",
    "station_look_up": "yes",
    "province": f"{province}",
    "climate_start_yr": f"{start_date}",
    "climate_end_yr": f"{end_date}",

    "stationID": f"{stations_of_interest}",
    "time_step": f"{time_step}",

    "col_name_lst": f"{col_names}",
    "param_lst": f"{params}",
    "unit_lst": f"{units}",

}

# Specify the file path where you want to save the JSON file
file_path = os.path.join(config_path, "3d_environment_canada_climate_data.json")

if write_to_configuration_file:
  # Writing data to the JSON file
  with open(file_path, 'w') as json_file:
      json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)

**References**

Lim, S. (2017). Exploring Environment Canada Weather Data with Python and Jupyter Notebooks, https://github.com/csianglim/weather-gc-ca-python.git.

### <font color=#5559AB> 3e) Format Uploaded Observational Data </font>

This  section formats meteorological and flow observational data saved as a CSV into an RVT file applicable in Raven. Be sure to follow the formatting requirements discussed below.

Check out the short video [4.5 Format Uploaded Observational Data - Magpie Workflow](https://youtu.be/m4n7MMvMFNs) for more information.


#### <font color=grey> **4.5.1 Format Uploaded Meteorological Data** </font>

This section requires a <font color=red>CSV file</font> containing <font color=red> meteorological data, latitude, longitude, and elevation </font>formatted as follows:

lat | lon | elev | precip | ... | temp |
--- | --- | --- | --- | --- | --- |
50.1 | -115.1 | 360.5 | 4.1 | ... | 5.4 |
50.1 | -115.1 | 360.5 | 6.0 | ... | 6.3 |
... | ... | ... | ... | ... | ... |
50.1 | -115.1 | 360.5 | 6.6 | ... | 5.2 |

In [None]:
# check libraries
libraries_to_check = ["ipython"]
check_and_install_libraries(libraries_to_check)

from IPython.display import display

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

def uploaded_climate_data_rvt_file(forc_dir,obs_dir,latitude_name,longitude_name,elevation_name,
                                   var_lst,params_list,units_list,min_year,month,day,model_name,main_dir):
  # overview of the first uploaded forcing file
  file_name = os.listdir(forc_dir)[0]
  view_file = pd.read_csv(os.path.join(forc_dir, file_name))

  # define column names of latitude, longitude, and elevations variables from CSV
  # longitude input
  if longitude_name == "NA":
    longitude_name = input('Enter longitude column name: ')
  # latitude input
  if latitude_name == "NA":
    latitude_name = input('Enter latitude column name: ')
  # elevation column input
  if elevation_name == "NA":
    elevation_name = input('Enter elevation column name: ')

  lat_lst = []
  lon_lst = []
  elev_lst = []

  # define column names of forcing variables from CSV to incorporate into the Raven RVT file
  # variable list input
  if var_lst == "NA":
      var_lst_input = input('Enter variable list: ')
      var_lst_temp = [x for x in var_lst_input.split(',')]
      var_lst = [item.replace(" ", "") for item in var_lst_temp]
      print('CSV Variable list: ', var_lst)
  else:
      var_lst_input = var_lst
      var_lst_temp = [x for x in var_lst_input.split(',')]
      var_lst = [item.replace(" ", "") for item in var_lst_temp]
      print('CSV Variable list: ', var_lst)

  # find files
  file_path = glob(os.path.join(forc_dir, "*.csv"))

  for f in file_path:
    # open csv
    forc_df = pd.read_csv(f)
    # get list of variables from column names
    forc_vars = forc_df.columns.values

    lat_lst.append(forc_df[latitude_name].unique())
    lon_lst.append(forc_df[longitude_name].unique())
    elev_lst.append(forc_df[elevation_name].unique())

    # adjust Variable names for RVT file and define units
    # ensure the list of variables follows the same order as the columns, the variables are within quatotations, a comma seperates each variable, and square brackets are present on either end

    # param list input
    if params_list == "NA":
      params_list_input = input('Enter variable list: ')
      params_list = [x for x in params_list_input.split(',')]
      print('Parameter list: ',params_list)
    else:
      params_list_input = params_list
      params_list = [x for x in params_list_input.split(',')]
      print('Parameter list: ',params_list)

    # unit list input
    if units_list == "NA":
      units_list_input = input('Enter variable list: ')
      units_list = [x for x in units_list_input.split(',')]
      print('Unit list: ',units_list)
    else:
      units_list_input = units_list
      units_list = [x for x in units_list_input.split(',')]
      print('Unit list: ',units_list)

      # additional naming suggestions: <br>
      # TEMP_MIN, TEMP_MAX, SNOWFALL, RAINFALL, RELHUM, WINDVEL, PET<br></font>
      # be sure to check the [Raven Manual](http://raven.uwaterloo.ca/files/v3.6/RavenManual_v3.6.pdf) for additional information on which variables can be included in RVT files and check that variables have the appropriate units

    forc_variables_selected = forc_df[var_lst]

    # get filename
    base = os.path.basename(f)
    split_base = os.path.splitext(base)
    file_name = os.path.splitext(base)[0]

    # write rvt file
    var_vals ='\n'.join(forc_variables_selected.astype(str).apply(lambda x: ', '.join(x), axis=1))
    params = ','.join(params_list)
    units = ','.join(units_list)

    f = open(os.path.join(obs_dir, f"{file_name}.rvt"), "a")
    print(f":MultiData", file=f)
    print(f"{min_year}-{month}-{day} 00:00:00 1.0 {len(forc_variables_selected.iloc[:, 0])}",file=f)
    print(f":Parameters,{params}", file=f)
    print(f":Units,{units}", file=f)
    print(f"{var_vals}", file=f)
    print(f":EndMultiData", file=f)
    f.close()

  file_lst = glob(os.path.join(obs_dir, "*.rvt"))
  print(file_lst)

  lst_gauge_data = []
  for f in file_lst:
    f_name = os.path.basename(f)
    lst_gauge_data.append(f_name)

  lat_lst_val = np.concatenate(lat_lst, axis=0)
  lon_lst_val = np.concatenate(lon_lst, axis=0)
  elev_lst_val = np.concatenate(elev_lst, axis=0)

  # generate Raven RVT Input containting Station Information
  # define model name (for RVT file)
  with open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a") as f:
    print(f"#########################################################################", file=f)
    print(f"# Climate Stations List", file=f)
    print(f"#------------------------------------------------------------------------\n#", file=f)

  for n in range(len(file_lst)):
    base = os.path.basename(lst_gauge_data[n])
    split_base = os.path.splitext(base)
    station_val = os.path.splitext(base)[0]
    # define station vals
    lat_val = lat_lst_val[n]
    lon_val = lon_lst_val[n]
    elev_val = elev_lst_val[n]
    f = open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a")
    print(f":Gauge \t\t\t\t\t\t{station_val}", file=f)
    print(f"\t:Latitude \t\t\t{lat_val}", file=f)
    print(f"\t:Longitude \t\t\t{lon_val}", file=f)
    print(f"\t:Elevation \t\t\t{elev_val}", file=f)
    print(f"\t:RedirectToFile climate_obs/{lst_gauge_data[n]}", file=f)
    print(f":EndGauge", file=f)
    print(f"#", file=f)
    f.close()

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

In [None]:
#@markdown <font color=grey> **Generate temporary "climate_obs" to upload data** </font>

#@markdown Step 1) once the cell is run, go to "data_temporary" -> "forcing_data" -> "climate_obs"

#@markdown Step 2) drag and drop climate data into the "climate_obs" folder

# Define and create temporary directory for RavenInput
obs_dir = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'climate_obs')
os.makedirs(obs_dir, exist_ok=True)
print("created folder:", obs_dir)

# Define and create temporary directory for forcing data
forc_dir = os.path.join(temporary_dir, 'forcing_data', 'climate_obs')
os.makedirs(forc_dir, exist_ok=True)
print("created folder:", forc_dir)

print(f"Please drag-and-drop meteorological file(s) into the following folder: {forc_dir}")

In [None]:
#@markdown <font color=grey> **Overview of the first uploaded forcing file** </font>

# Select the first file in the directory
file_name = os.listdir(forc_dir)[0]

# Preview the content of the selected file
view_file = pd.read_csv(os.path.join(forc_dir, file_name))
display(view_file.head())

In [None]:
#@markdown <font color=#5559AB> **Define column names of latitude, longitude, and elevations variables from CSV** </font><br>
#@markdown ensure each variable is within quatotations, a comma seperates each variable, and square brackets are present on either end <br>
#@markdown <font color=grey> for example,<br> ['lon', 'lat', 'elev]

latitude_name = 'lat' #@param {type:"string"}
longitude_name = 'lon' #@param {type:"string"}
elevation_name = 'elev' #@param {type:"string"}

#@markdown <font color=#5559AB> **Define column names of forcing variables from CSV to incorporate into the Raven RVT file** </font><br>
#@markdown ensure each variable is within quatotations, a comma seperates each variable, and square brackets are present on either end <br>
#@markdown <font color=grey> for example,<br> 'Precip . Amount (mm)', 'Mean Temp (°C)']

var_lst = 'precip, temp' #@param {type:"string"}

#@markdown <font color=#5559AB> **Adjust Variable names for RVT file and define units** </font><br>
#@markdown ensure the list of variables follows the same order as the columns, the variables are within quatotations, a comma seperates each variable, and square brackets are present on either end <br>
#@markdown <font color=grey> for example,<br> ['PRECIP', 'TEMP']
params_list = 'PRECP, TEMP' #@param {type:"string"}
units_list = 'mm/d, C' #@param {type:"string"}
#@markdown <font color=grey> additional naming suggestions: <br>
#@markdown TEMP_MIN, TEMP_MAX, SNOWFALL, RAINFALL, RELHUM, WINDVEL, PET<br></font>
#@markdown be sure to check the [Raven Manual](http://raven.uwaterloo.ca/files/v3.6/RavenManual_v3.6.pdf) for additional information on which variables can be included in RVT files and check that variables have the appropriate units

#@markdown <font color=#5559AB> **Define Minimum Year, Month, and Day** </font><br>
min_year = '2000' #@param {type:"string"}
month = '01' #@param {type:"string"}
day = '01' #@param {type:"string"}

uploaded_climate_data_rvt_file(forc_dir,obs_dir,latitude_name,longitude_name,elevation_name,
                                   var_lst,params_list,units_list,min_year,month,day,model_name,main_dir)

In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download


# delete temorary folder
remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

write_to_configuration_file = False  # @param {type:"boolean"}

if write_to_configuration_file:

    # Create a dictionary to store configuration data
    data = {
        "_comment": "------------ 3e) FORMAT UPLOADED OBSERVATIONAL DATA ---------------",

        "_comment": "-- UPLOADED METEOROLOGICAL DATA",

        "uploaded_climate_latitude": f"{latitude_name}",
        "uploaded_climate_longitude": f"{longitude_name}",
        "uploaded_climate_elevation": f"{elevation_name}",

        "uploaded_climate_csv_var_list": f"{var_lst}",
        "uploaded_climate_parameter_list": f"{params_list}",
        "uploaded_climate_unit_list": f"{units_list}",

        "uploaded_climate_min_year": f"{min_year}",
        "uploaded_climate_month": f"{month}",
        "uploaded_climate_day": f"{day}",


    }

    # Specify the file path where you want to save the JSON file
    file_path = os.path.join(config_path, "3e_upload_meteorological_data.json")

    if write_to_configuration_file:
      # Writing data to the JSON file
      with open(file_path, 'w') as json_file:
          json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)

#### <font color=grey> **4.5.2 Format Uploaded Flow Data** </font>

This section requires a <font color=red>CSV file</font> containing <font color=red> the flow data of only one station </font>formatted as follows:

CSV File 1

station1 |
--- |
2.1 |
... |
3.2 |


<br>In the case multiple flow gauges need to be formatted, upload them all in seperate CSV files. Note, that the column name is used to name the file. Additionally, ensure the flow data is in m^3/s

In [None]:
# check libraries
libraries_to_check = ["ipython"]
check_and_install_libraries(libraries_to_check)

from IPython.display import display

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>


def uploaded_flow_data_rvt_file(flow_forc_temp,forc_dir,subbasin_ID,min_year,month,day):
  # overview of the first uploaded forcing file
  flow_file_name = os.listdir(flow_forc_temp)[0]
  view_file = pd.read_csv(os.path.join(flow_forc_temp, flow_file_name))

  if subbasin_ID == "NA":
    subbasin_ID_input = input('Enter subbasin ID for each station (ensure they are in the same order as the station files): ')
    subbasin_ID = [x for x in subbasin_ID_input.split(',')]
    print('Subbasin ID(s): ',subbasin_ID)
  else:
    subbasin_ID_input = subbasin_ID
    subbasin_ID = [x for x in subbasin_ID_input.split(',')]
    print('Subbasin ID(s): ',subbasin_ID)

  # find files
  flow_file_path = glob(os.path.join(flow_forc_temp, "*.csv"))

  for n in range(len(flow_file_path)):
      # define temporary directory
      flow_obs_dir = os.path.join(main_dir,'workflow_outputs','RavenInput','obs')
      flow_obs_path = os.path.isdir(flow_obs_dir)
      if not flow_obs_path:
        os.makedirs(flow_obs_dir)

      # open csv
      forc_df = pd.read_csv(flow_file_path[n])
      forc_col_names = forc_df.columns.values
      # get list of variables from column names
      forc_vars = forc_df.to_string(index=False, header=False)
      indent = '  '
      indented_vars = indent + forc_vars.replace('\n', '\n' + indent)
      # determine the number of observations
      num_obs = len(forc_df.index)

      f = open(os.path.join(flow_obs_dir, f"{forc_col_names[0]}.rvt"), "a")
      print(f":ObservationData HYDROGRAPH {subbasin_ID[n]} m3/s", file=f)
      print(f"\t{min_year}-{month}-{day} 00:00:00 1 {num_obs}",file=f)
      print(f"{indented_vars}", file=f)
      print(f":EndObservationData", file=f)
      f.close()

  # define model name
  file_names_lst = []
  files_input = glob(os.path.join(main_dir, 'workflow_outputs', 'RavenInput','obs',"*"))

  for f_name in list(range(0, len(files_input))):
    base = os.path.basename(files_input[f_name])
    file_names_lst.append(base)

  if not glob(os.path.join(main_dir, 'workflow_outputs', 'RavenInput',model_name+'.rvt')):
    with open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a") as f:
      print(f"#------------------------------------------------------------------------", file=f)
      print(f"# Observed Discharge Data", file=f)
      print(f"#------------------------------------------------------------------------\n#", file=f)

    lst_vals = list(range(0, len(file_names_lst)))

    for n in lst_vals:
      f = open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a")
      print(f":RedirectToFile \t\t obs/{file_names_lst[n]}", file=f)
      f.close()
  else:
    with open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a") as f:
      print(f"#------------------------------------------------------------------------", file=f)
      print(f"# Observed Discharge Data", file=f)
      print(f"#------------------------------------------------------------------------\n#", file=f)

    lst_vals = list(range(0, len(file_names_lst)))

    for n in lst_vals:
      f = open(os.path.join(main_dir, 'workflow_outputs','RavenInput',model_name+'.rvt'), "a")
      print(f":RedirectToFile \t\t obs/{file_names_lst[n]}", file=f)
      f.close()

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

In [None]:
#@markdown <font color=grey> **Generate temporary "flow_obs" to upload data** </font>

#@markdown Step 1) once the cell is run, go to "data_temporary" -> "forcing_data" -> "flow_obs"

#@markdown Step 2) drag and drop climate data into the "flow_obs" folder

# Define and create temporary directory for RavenInput
forc_dir = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'obs')
os.makedirs(forc_dir, exist_ok=True)
print("created folder:", forc_dir)

# Define and create temporary directory for forcing data
flow_forc_temp = os.path.join(temporary_dir, 'forcing_data', 'flow_obs')
os.makedirs(flow_forc_temp, exist_ok=True)
print("created folder:", flow_forc_temp)

print(f"Please drag-and-drop flow station file(s) into the following folder: {flow_forc_temp}")

In [None]:
#@markdown <font color=grey> **Overview of the first uploaded forcing file** </font>

flow_file_name = os.listdir(flow_forc_temp)[0]
view_file = pd.read_csv(os.path.join(flow_forc_temp, flow_file_name))
display(view_file.head())

In [None]:
#@markdown <font color=#5559AB>**Determine the Subbasin ID that the Flow Gauge is Located In**</font>

subbasin_ID = '3048815' #@param {type:"string"}

#@markdown <font color=#5559AB> **Define Minimum Year, Month, and Day** </font><br>
min_year = '2000' #@param {type:"string"}
month = '01' #@param {type:"string"}
day = '01' #@param {type:"string"}

uploaded_flow_data_rvt_file(flow_forc_temp,forc_dir,subbasin_ID,min_year,month,day)

In [None]:
#@markdown <font color=grey> **Remove temporary data** </font><br>
#@markdown remove temporary data to assist in saving space on drive

#@markdown if users want to save any of the temporary data, they can right-click on the layer in the folder directory and select download


# delete temorary folder
remove_temp_data(temporary_dir)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

write_to_configuration_file = False  # @param {type:"boolean"}

if write_to_configuration_file:

    # Create a dictionary to store configuration data
    data = {
        "_comment": "------------ 3e) FORMAT UPLOADED OBSERVATIONAL DATA ---------------",

        "_comment": "-- UPLOADED FLOW DATA",

        "uploaded_flow_subbasin_ID": f"{flow_gauge_ID}",
        "uploaded_flow_min_yr": f"{min_year}",
        "uploaded_flow_month": f"{month}",
        "uploaded_flow_day": f"{day}",

    }

    # Specify the file path where you want to save the JSON file
    file_path = os.path.join(config_path, "3e_upload_flow_data.json")

    if write_to_configuration_file:
      # Writing data to the JSON file
      with open(file_path, 'w') as json_file:
          json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)


### <font color=#5559AB> 3f) Hydrometric Data (HYDAT) </font>

This section is run in R and utilizes the RavenR, developed by Chlumsky et al. (2022), command `rvn_rvt_tidyhydat` to convert Environment Canada historical streamgauge data, accessed via the tidyhydat package, into .rvt format files usable in Raven.

Check out the short video [Downloading and Processing Hydrometric Data (HYDAT) for Raven](https://youtu.be/ySqDOxj-EYA) for more information.


In [None]:
# check libraries
libraries_to_check = ["rpy2==3.5.1","folium","branca", "ipython"]
check_and_install_libraries(libraries_to_check)

%load_ext rpy2.ipython

from IPython.display import display
# # interactive map
import folium
import branca

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

# if "interactive_map_response": "yes"
# an interative map is generated to visualize which gauges are near the previously defined coords
def interactive_gauge_map(main_dir):
    print('\n-----------------------------------------------------------------------------------------------')
    print('( ) Interactive gauge plot')
    print('-----------------------------------------------------------------------------------------------')
    print('Response: ', interactive_map_response)
    # read in csv with gauge information
    flow_stations = pd.read_csv(os.path.join(main_dir, 'extras','subbasin_plots','obs_gauges_NA_v2-1.csv'))
    flow_stations_updated = flow_stations.dropna(subset=['POINT_Y'])

    # format fancy folium pop-up
    def fancy_html(row):
        i = row
        flowName = flow_stations_updated['Obs_NM'].iloc[i]
        pointX_info = flow_stations_updated['POINT_Y'].iloc[i]
        pointY_info = flow_stations_updated['POINT_X'].iloc[i]
        html = """<!DOCTYPE html>
    <html>
    <p>Obs_NM: {}</td>""".format(flowName) + """</p>
    <p>POINT_X: {}</td>""".format(pointX_info) + """</p>
    <p>POINT_Y: {}</td>""".format(pointY_info) + """</p>
    </html>
    """
        return html

    # generate map with rectangular guide to assist in identifying the ideal subbasin/gauges to use
    shp_boundary = gpd.read_file(os.path.join(main_dir, 'shapefile','studyArea_outline.shp'))
    shp_boundary = shp_boundary.to_crs(epsg=4326)

    # determine the boundary of the provided shapefile
    bounds = shp_boundary.bounds
    west, south, east, north = bounds = bounds.loc[0]
    shp_bounds = [south,west]

    map = folium.Map(location=shp_bounds, zoom_start=10)
    folium.GeoJson(data=shp_boundary["geometry"]).add_to(map)
    #folium.LatLngPopup().add_to(map)

    for i in range(0,len(flow_stations_updated)):
        html = fancy_html(i)
        iframe = branca.element.IFrame(html=html,width=200,height=200)
        popup = folium.Popup(iframe,parse_html=True)
        folium.Marker([flow_stations_updated['POINT_Y'].iloc[i],flow_stations_updated['POINT_X'].iloc[i]],popup=popup).add_to(map)
    # displays interactive map
    display(map)

def generate_rvt_file(main_dir, model_name):
    file_names_lst = []
    files_input = glob(os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'obs', "*"))

    for f_name in files_input:
        base = os.path.basename(f_name)
        file_names_lst.append(base)

    rvt_file_path = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', f"{model_name}.rvt")

    if not os.path.exists(rvt_file_path):
        with open(rvt_file_path, "a") as f:
            f.write("#------------------------------------------------------------------------\n")
            f.write("# Observed Discharge Data\n")
            f.write("#------------------------------------------------------------------------\n#\n")

    for file_name in file_names_lst:
        with open(rvt_file_path, "a") as f:
            f.write(f":RedirectToFile \t\t obs/{file_name}\n")

def get_subbasin_id(main_dir, flow_gauge_ID):
    # Assuming main_dir is a parameter passed to the function
    gauge_csv_path = os.path.join(main_dir, 'extras', 'subbasin_plots', 'obs_gauges_NA_v2-1.csv')

    # Read the CSV file into a DataFrame
    gauge_csv = pd.read_csv(gauge_csv_path)

    # Select the row where 'Obs_NM' is equal to flow_gauge_ID
    select_row = gauge_csv.loc[gauge_csv['Obs_NM'] == flow_gauge_ID]

    # Get the 'SubId' values from the selected row
    subbasin_ID = select_row['SubId'].values

    if len(subbasin_ID) > 0:
        return subbasin_ID[0]
    else:
        return None  # Return None or any other value indicating that the subbasin ID is not found

In [None]:
#@markdown <font color=grey> **Visualization of Potential Flow Gauges in Study Area** </font>

#@markdown This cell produces an interactive plot with the study area shapefile, where users can click on the points to identify the flow gauges names, latitude, and longitude

#@markdown Please note this will take a minute or two to load

interactive_map_response = False # @param {type:"boolean"}

if interactive_map_response:
  interactive_gauge_map(main_dir)
  interactive_map_response_flow = 'yes'
else:
  interactive_map_response_flow = 'no'

In [None]:
#@markdown <font color=grey> **Install and Load Related R Packages** </font>

#@markdown This will take aproximately 2.5 minutes to run, good time for a coffee break ☕

%%R
# Start measuring time
start_time <- Sys.time()

cat('-----------------------------------------------------------------------------------------------\n')
cat('( ) Installing libraries\n')
cat('-----------------------------------------------------------------------------------------------\n')

install.packages("tidyhydat")
install.packages("rjson")

cat('-----------------------------------------------------------------------------------------------\n')
cat('( ) Loading libraries\n')
cat('-----------------------------------------------------------------------------------------------\n')

library(tidyhydat)
library(ggplot2)
library(rjson)
library(RavenR, lib="/content/google_drive/MyDrive/R_Packages")

# End measuring time
end_time <- Sys.time()

# Calculate the elapsed time in minutes
elapsed_time_minutes <- as.numeric(difftime(end_time, start_time, units = "mins"))

cat(paste("Script execution time:", elapsed_time_minutes, "minutes\n"))


In [None]:
#@markdown <font color=grey> **Download HYDAT Data** </font>

#@markdown please note this will take a couple of minutes to run

%%R
hy_dir()

download_hydat(dl_hydat_here="/root/.local/share/tidyhydat", ask=FALSE)
print("Download complete")

<font color=#5559AB>**Define Input Variables**</font>

To identify which historical hydrometic station corresponds with the study area, please click [here](https://wateroffice.ec.gc.ca/search/historical_e.html)


In [None]:
#@markdown <font color=#5559AB>**List stations of interest**</font>

# Define the folder path
folder_path = os.path.join(main_dir, "workflow_outputs", "RavenInput", "obs")

# Check if the folder exists
if not os.path.exists(folder_path):
    # Create the folder
    os.makedirs(folder_path)

#@markdown use comma to seperate stations if there are multiple

stations_lst = '02KB001' #@param {type:"string"}

#@markdown <font color=#5559AB> **Define start and end date** </font><br>

#@markdown format date as follows, 2000-01-01

start_date = '2016-10-01' #@param {type:"string"}
end_date = '2019-09-30' #@param {type:"string"}

# File path for output
output_file = "/content/variable_info.txt"

# Writing to a file
with open(output_file, "w") as file:
    file.write("main_dir = '{}'\n".format(main_dir))
    file.write("stations_lst = '{}'\n".format(stations_lst))
    file.write("start_date = '{}'\n".format(start_date))
    file.write("end_date = '{}'\n".format(end_date))

In [None]:
#@markdown <font color=grey> **Formatting values** </font>

%%R

# Define the file path
file_path <- "/content/variable_info.txt"

# Read the file
lines <- readLines(file_path)

# Initialize variables
main_dir <- NULL
stations_lst <- NULL
start_date <- NULL
end_date <- NULL

# Extract values from the lines
for (line in lines) {
  if (grepl("main_dir", line)) {
    main_dir <- gsub("main_dir = '(.*)'", "\\1", line)
  } else if (grepl("stations_lst", line)) {
    stations_lst <- gsub("stations_lst = '(.*)'", "\\1", line)
  } else if (grepl("start_date", line)) {
    start_date <- gsub("start_date = '(.*)'", "\\1", line)
  } else if (grepl("end_date", line)) {
    end_date <- gsub("end_date = '(.*)'", "\\1", line)
  }
}

# Split the string by commas
stations_list <- strsplit(stations_lst, ",")[[1]]

cat('Values have been successfully formatted')

<font color=grey> **Gather station data/info using tidyhydat function** </font>

This cell downloads daily flow data, however, users have the option of adjusting the script to download monthly data by subbing `hy_daily_flows` with `hy_monthly_flows` to gather monthly flow levels.

For more informationon available tidhydat functions, please click [here](https://cran.r-project.org/web/packages/tidyhydat/tidyhydat.pdf)


In [None]:
#@markdown <font color=grey> **Download available HYDAT data** </font>

%%R

hd <- tidyhydat::hy_daily_flows(station_number = stations_list, start_date = start_date, end_date = end_date)
print(hd)

In [None]:
#@markdown <font color=grey> **Plot Station HYDAT Data** </font>

%%R

ggplot(hd) +
geom_line(aes(x = Date, y = Value, colour = STATION_NUMBER)) +
labs(y = "Mean daily Flow") +
scale_colour_viridis_d(option = "C") +
theme_minimal() +
theme(legend.position = "bottom")

In [None]:
#@markdown <font color=grey> **Generate RVT files for each of the available stations** </font>

%%R

# Extract unique values from the 'Value' column and save to a list
unique_values_list <- unique(hd$STATION_NUMBER)

for (stationID in unique_values_list){
hd <- tidyhydat::hy_daily_flows(station_number = stationID, start_date = start_date, end_date = end_date)

print('-- station data saved at: ')
tf1 <- file.path(main_dir,'workflow_outputs','RavenInput','obs',paste0(stationID,".rvt", sep=""))
print(tf1)

print('-- create RVT file for station')
# Create RVT files
rvn_rvt_tidyhydat(hd,
                  subIDs=c(3),
                  write_redirect = FALSE,
                  flip_number = FALSE,
                  filename=c(tf1)
                  )

}

In [None]:
#@markdown <font color=grey>**Add station RVT information to exisiting RVT file or generate a new RVT file**</font>


generate_rvt_file(main_dir, model_name)

# Define the file path
file_path = "/content/variable_info.txt"

# Check if the file exists
if os.path.exists(file_path):
    # Delete the file
    os.remove(file_path)

In [None]:
#@markdown <font color=red>**EXTRA STEP**</font>

#@markdown <font color=#5559AB>**Determine the Subbasin ID that the Flow Gauge is Located In**</font>

#@markdown if using NLRP product define the flow gauge ID to determine the subbasin it is located in, copy and paste the ID into gauge RVT file

flow_gauge_ID = '02KB001' #@param {type:"string"}

subbasin_ID = get_subbasin_id(main_dir, flow_gauge_ID)

if subbasin_ID is not None:
    print(f'Subbasin ID: {subbasin_ID}')
else:
    print('Subbasin ID not found for the given flow gauge ID.')

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

write_to_configuration_file = False  # @param {type:"boolean"}

if write_to_configuration_file:

  # Create a dictionary to store configuration data
  data = {
      "_comment": "------------ 3f) HYDROMETRIC DATA ---------------",

      "interactive_map_response_flow": f"{interactive_map_response_flow}",
      "flow_stations": f"{stations_lst}",
      "start_date_flow": f"{start_date}",
      "end_date_flow": f"{end_date}",

  }

  # Specify the file path where you want to save the JSON file
  file_path = os.path.join(config_path, "3f_hydrometric_data.json")

  if write_to_configuration_file:
    # Writing data to the JSON file
    with open(file_path, 'w') as json_file:
        json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)

**References**

Albers, S. J. (2017). tidyhydat: Extract and tidy Canadian hydrometric data. Journal of Open Source Software, 2(20), 511.

Chlumsky, R., Craig, J. R., Lin, S. G., Grass, S., Scantlebury, L., Brown, G., & Arabzadeh, R. (2022). RavenR v2. 1.4: an open-source R package to support flexible hydrologic modelling. Geoscientific Model Development, 15(18), 7017-7030.



# **4.0 Raven Input Files**

This subsection focuses on generating the remaining Raven inputs (rvi, rvh, and rvc files) through RavenR, developed by Chlumsky et al. (2022). For more information on RavenR, please visit the [RavenR Manual](https://cran.r-project.org/web/packages/RavenR/RavenR.pdf)

For the rvp file, BasinMakers <font color=red>.rvp_temp.rvp</font> template is required.

Check out the short video [Creating Raven Input Files with RavenR: Magpie Tutorial](https://youtu.be/vA2ozwdblWA) for more information.


In [None]:
# check libraries
libraries_to_check = ["rpy2==3.5.1","requests"]
check_and_install_libraries(libraries_to_check)

%load_ext rpy2.ipython

import requests

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>


def download_and_build_raven(url, temporary_dir, main_dir):
    """
    Download a zip file from the given URL, unzip it, and build Raven.

    Parameters:
    - url (str): The URL of the zip file.
    - temporary_dir (str): The temporary directory to store files.
    - main_dir (str): The main directory for the build.

    Returns:
    - None
    """
    zip_file_path = os.path.join(temporary_dir, 'RavenSource')
    zip_file_unpacked = os.path.join(temporary_dir, 'unpacked')
    source_dir = os.path.join(temporary_dir, 'unpacked')
    build_dir = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'packages')

    # Download the zip file
    response = requests.get(url, stream=True)
    zip_file_path_full = os.path.join(zip_file_path, 'RavenSource_v3.8.zip')
    os.makedirs(zip_file_path, exist_ok=True)

    with open(zip_file_path_full, 'wb') as zip_file:
        shutil.copyfileobj(response.raw, zip_file)

    # Check if the download was successful
    print(f"Downloaded {url} to {zip_file_path_full}") if response.status_code == 200 else print(f"Failed to download {url} (Status code: {response.status_code})")

    # Unzip the source code
    subprocess.run(['unzip', zip_file_path_full, '-d', zip_file_unpacked], check=True)

    # Create the build directory and change to it
    os.makedirs(build_dir, exist_ok=True)
    os.chdir(build_dir)

    # Run CMake and Make
    subprocess.run(['cmake', source_dir], check=True)
    subprocess.run(['make'], check=True)

def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

In [None]:
#@markdown <font color=grey> **Install and Setup Raven Executable** </font>

# define output directory path
raven_exe_dir = os.path.join(main_dir,'workflow_outputs', 'RavenInput', 'packages')
output_path = os.path.isdir(raven_exe_dir)
if not output_path:
  os.makedirs(raven_exe_dir)
  print("created  folder: ", raven_exe_dir)
else:
  print(raven_exe_dir, "folder already exists")

src = os.path.join(main_dir,'extras', 'netcdf_library')

src_files = os.listdir(src)
for file_name in src_files:
    full_file_name = os.path.join(src, file_name)
    if os.path.isfile(full_file_name):
        shutil.copy(full_file_name, raven_exe_dir)

# Define the file path to check
file_path = os.path.join(main_dir, 'workflow_outputs','RavenInput','packages','Raven')

# Check if the file exists
if not os.path.exists(file_path):
    # download Raven executable
    url = "https://raven.uwaterloo.ca/files/v3.8/RavenSource_v3.8.zip"
    download_and_build_raven(url, temporary_dir, main_dir)
    remove_temp_data(temporary_dir)
else:
    os.remove(file_path)
    # download Raven executable
    url = "https://raven.uwaterloo.ca/files/v3.8/RavenSource_v3.8.zip"
    download_and_build_raven(url, temporary_dir, main_dir)
    remove_temp_data(temporary_dir)

print(f"\n\n-----------------------------------------------------------------------")
print(f"Final output folder previously defined as: {final_output_folder}")
print(f"-----------------------------------------------------------------------")

# File path for output
output_file = "/content/variable_info.txt"

# Writing to a file
with open(output_file, "w") as file:
    file.write("main_dir = '{}'\n".format(main_dir))
    file.write("model_name = '{}'\n".format(model_name))


<font color=#5559AB> **Define model name** </font>

Be sure to keep the model naming consistent between the rvi, rvh, rvp, rvc, and rvt files

<font color=red>**Note</font> - If the folder directory name, "Magpie_Workflow", has been changed, the following file paths will need to be adjusted

In [None]:
#@markdown <font color=grey> **Load path information** </font>

%%R

# Define the file path
file_path <- "/content/variable_info.txt"

# Read the file
lines <- readLines(file_path)

# Initialize variables
main_dir <- NULL
model_name <- NULL

# Extract values from the lines
for (line in lines) {
  if (grepl("main_dir", line)) {
    main_dir <- gsub("main_dir = '(.*)'", "\\1", line)
  } else if (grepl("model_name", line)) {
    model_name <- gsub("model_name = '(.*)'", "\\1", line)
  }
}

In [None]:
#@markdown <font color=grey> **Install and Load Related R Packages** </font>

#@markdown This will take a couple of minutes, good time for a coffee break ☕

%%R

library(RavenR, lib="/content/google_drive/MyDrive/R_Packages")

### <font color=grey>**Generate RVI File**</font>


RavenR provides several templates that can be used; "UBCWM", "HBV-EC", "HBV-Light", "GR4J", "CdnShield", "MOHYSE", "HMETS", "HYPR", or "HYMOD". For more information on the model templates, check out the [Raven Manual](http://raven.uwaterloo.ca/files/v3.6/RavenManual_v3.6.pdf) OR in the cell below change view_rvn_rvi_write_template to "yes" and run to get more information

In [None]:
%%R

template_help <- "no"

# runs funciton, if view_rvn_rvi_write_template value is yes
if (template_help == "yes") {
  help(rvn_rvi_write_template)
}

In [None]:
#@markdown <font color=grey> **Set working directory** </font>

#@markdown check that the set working directory path is correct

%%R
# Use file.path to join paths
target_dir <- file.path(main_dir, "workflow_outputs/RavenInput")

# Set the working directory
setwd(target_dir)
getwd()

In [None]:
#@markdown <font color=#5559AB> **Define template name** </font><br>

#@markdown use drop-down menu to select RavenR RVI template

define_template_name = 'CdnShield' #@param ["CdnShield", "UBCWM", "HBV-EC","HBV-Light", "GR4J", "MOHYSE", "HMETS", "HYPR", "HYMOD"]

#@markdown <font color=#5559AB> **Fill in author name and user description** </font><br>

author_name = 'Magpie' #@param {type:"string"}
user_description = 'RVI file for Raven model created by RavenR' #@param {type:"string"}

# File path for output
output_file = "/content/variable_rvi_info.txt"

# Writing to a file
with open(output_file, "w") as file:
    file.write("define_template_name = '{}'\n".format(define_template_name))
    file.write("author_name = '{}'\n".format(author_name))
    file.write("user_description = '{}'\n".format(user_description))

In [None]:
#@markdown <font color=grey> **Generate RVI file** </font>

%%R

# Define the file path
file_path <- "/content/variable_rvi_info.txt"

# Read the file
lines <- readLines(file_path)

# Initialize variables
define_template_name <- NULL
author_name <- NULL
user_description <- NULL

# Extract values from the lines
for (line in lines) {
  if (grepl("define_template_name", line)) {
    define_template_name <- gsub("define_template_name = '(.*)'", "\\1", line)
  } else if (grepl("author_name", line)) {
    author_name <- gsub("author_name = '(.*)'", "\\1", line)
  } else if (grepl("user_description", line)) {
    user_description <- gsub("user_description = '(.*)'", "\\1", line)
  }
}

# create an rvi file and template file with Raven
rvn_rvi_write_template(template_name=define_template_name,
   filename=file.path(getwd(), paste(model_name,"rvi", sep=".")),
   author=author_name,
   description=user_description)

In [None]:
#@markdown <font color=grey> **Write Model Decisions to Configuration File**

#@markdown To enhance model reproducibility, users can choose to document the model
#@markdown decisions made in the Magpie interface by writing them to a configuration
#@markdown file. Subsequently, this configuration file can be executed in Magpie
#@markdown Developer, facilitating the recreation of the same model with consistent results.

# delete rvi variable text file
os.remove('/content/variable_rvi_info.txt')

write_to_configuration_file = False  # @param {type:"boolean"}

#@markdown users will need to redefine the following variables as the variables defined in R cannot be easily transferred to Python

if write_to_configuration_file:
    # Create a dictionary to store configuration data
    data = {
        "_comment": "------------ 4) RAVEN INPUT ---------------",

        "define_template_name": f"{define_template_name}",
        "author_name": f"{author_name}",
        "user_description": f"{user_description}",

    }

    # Specify the file path where you want to save the JSON file
    file_path = os.path.join(config_path, "4_raven_input_files.json")

    # Writing data to the JSON file
    with open(file_path, 'w') as json_file:
        json.dump(data, json_file, indent=2)  # indent parameter for pretty formatting (optional)

### <font color=grey>**Generate RVP File**</font>

To edit the RVI file, find it in the RavenInput folder, double click and it will open in an editable window.

To generate an RVP file template that can be filled in later on, copy and paste **:CreateRVPTemplate** into the RVI file, save, and run the next cell.


In [None]:
#@markdown ### <font color=#5559AB> **Run Raven**</font>

#@markdown run raven to generate RVP template to be filled in with RavenR

# define paths
exe_path = os.path.join(main_dir,'workflow_outputs','RavenInput','packages','Raven')
model_path = os.path.join(main_dir,'workflow_outputs', 'RavenInput', model_name)
final_output_path = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', final_output_folder)

# define bash command
bash_command = f'"{exe_path}" "{model_path}" -o "{final_output_path}"'

# Run the Bash command and capture the output
result = subprocess.run(bash_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

# Display the output
print("Bash Command Output:")
print(result.stdout)

# Display the error, if any
if result.returncode != 0:
    print("Error (if any):")
    print(result.stderr)

In [None]:
#@markdown <font color=grey> **Load Pre-Generated RVP Template**

#@markdown within the Magpie workflow the "rvp_temp.rvp"  can be generated by BasinMaker or users can upload their own template directly to the "RavenInput" folder before running the following cell

%%R
# load pre-generated template file and other model files

model_template_file <- file.path(getwd(), paste(model_name,"rvp_temp.rvp", sep="."))
rvi_file <- file.path(getwd(), paste(model_name,"rvi", sep="."))
rvh_file <- file.path(getwd(), paste(model_name,"rvh", sep="."))
rvp_out_file <- file.path(getwd(), paste(model_name,"rvp", sep="."))


<font color=grey>**Fill in RVP File**

users have the option to define the average annual runoff value before running the cell

kindly be aware that the values provided are not mandatory; users are encouraged to tailor these parameters according to the specifics of the study area by referring to relevant information

In [None]:
#@markdown <font color=#5559AB> **Define average annual runoff** </font><br>

avg_annual_runoff = 123 #@param

# File path for output
output_file = "/content/variable_rvp_info.txt"

# Writing to a file
with open(output_file, "w") as file:
    file.write("avg_annual_runoff = '{}'\n".format(avg_annual_runoff))

In [None]:
#@markdown <font color=grey> **Generate RVP file**

%%R

# Define the file path
file_path <- "/content/variable_rvp_info.txt"

# Read the file
lines <- readLines(file_path)

# Initialize variables
avg_annual_runoff <- NULL

# Extract values from the lines
for (line in lines) {
  if (grepl("avg_annual_runoff", line)) {
    avg_annual_runoff <- gsub("avg_annual_runoff = '(.*)'", "\\1", line)
  }
}


# rewrite template with parameter values
rvn_rvp_fill_template(rvi_file=rvi_file,
                      rvh_file=rvh_file,
                      rvp_template_file=model_template_file,
                      avg_annual_runoff = as.double(avg_annual_runoff),
                      extra_commands=":RedirectToFile  channel_properties.rvp",
                      rvp_out=rvp_out_file)

In [None]:
#@markdown <font color=grey> **Preview of parameter dataframe**

%%R

rvi_file %>%
  rvn_rvi_read() %>%
  rvn_rvi_getparams() %>%
  head()

In [None]:
#@markdown <font color=grey> **Remove Unnecessary Files**

# delete old rvp file
os.remove('/content/variable_rvp_info.txt')

# delete old rvp file
os.remove(os.path.join(main_dir, 'workflow_outputs','RavenInput',f'{model_name}.rvp'))

# rename RavenR generated rvp file
old_rvp_file = os.path.join(main_dir, 'workflow_outputs','RavenInput',f'{model_name}_ravenr_generated.rvp')
new_rvp_file =  os.path.join(main_dir, 'workflow_outputs','RavenInput',f'{model_name}.rvp')

os.rename(old_rvp_file, new_rvp_file)

# delete old rvp
os.remove( os.path.join(main_dir, 'workflow_outputs','RavenInput',f'{model_name}.rvp_temp.rvp'))

### <font color=grey>**Generate RVC File**</font>

This provides a blank RVC file.

In [None]:
#@markdown <font color=grey> **Create empty RVC file**

%%R

# Generate RVC file
#rvn_rvc_res(initial_percent = 0, output = file.path(getwd(),"model_name.rvc"))

rvc_file <- file.path(getwd(), paste(model_name,"rvc", sep="."))
writeLines(c("# ----------------------------------------------","# Raven Input file","# ----------------------------------------------"), rvc_file)

**References**

Chlumsky, R., Craig, J. R., Lin, S. G., Grass, S., Scantlebury, L., Brown, G., & Arabzadeh, R. (2022). RavenR v2. 1.4: an open-source R package to support flexible hydrologic modelling. Geoscientific Model Development, 15(18), 7017-7030.

Craig, J. R., Brown, G., Chlumsky, R., Jenkinson, R. W., Jost, G., Lee, K., ... & Tolson, B. A. (2020). Flexible watershed simulation with the Raven hydrological modelling framework. Environmental Modelling & Software, 129, 104728.

# **5.0 Run Raven**

Raven, a robust and flexible hydrological modelling framework developed by Craig et al. (2020). This fully object-oriented code allows for complete flexibility in spatial discretization, interpolation, process representation, and the generation of forcing functions. Raven models range from a single watershed lumped model with only a few state variables to a full semi-distributed system model with physically-based infiltration, snowmelt, and routing.

Check out the short video [Running Raven Model with Magpie Workflow](https://youtu.be/Rp3aAQjmXSA) for more information.


In [None]:
#@markdown <font color=grey> **Download Raven Input Files** </font>

#@markdown This cell zips the Raven input files so that users can then download the zipped file and run the Raven model locally

download_Raven_input_files = True #@param {type:"boolean"}

if download_Raven_input_files == True:
  # zip folder
  directory_name = os.path.join(main_dir, 'Raven_model_inputs')
  zip_name = os.path.join(main_dir, 'workflow_outputs', 'RavenInput')
  print('Folder zipped')

  # Create 'path\to\zip_file.zip'
  shutil.make_archive(directory_name, 'zip', zip_name)

**Here is a quick overview of the required input files for Raven:**

<font color=#5559AB> .rvi </font> - the primary model input file <font color=grey> (section 5.0 Raven Input Files) </font>

<font color=#5559AB> .rvh </font> - the HRU / basin definition file <font color=grey> (section 3.0 Discretize Basin and 5.0 Raven Input Files) </font>

<font color=#5559AB> .rvt </font> - the time series/forcing function file <font color=grey> (section 4.0 Forcing Data) </font>

<font color=#5559AB> .rvp </font> - the class parameters file <font color=grey> (section 3.0 Discretize Basin) </font>

<font color=#5559AB> .rvc </font> - the initial conditions file <font color=grey> (section 5.0 Raven Input Files) </font>


**Optional Raven Inputs:**

A map of the final HRUs <font color=grey> (section 3.0 Discretize Basin) </font>

Grid Weights for NetCDF data <font color=grey> (section 4.0 Forcing Data --> 4.3 Gridded Weights Generator) </font>

More information about the files can be found in the [Raven Manual](http://raven.uwaterloo.ca/files/v3.6/RavenManual_v3.6.pdf)

In [None]:
# check libraries
libraries_to_check = ["requests"]
check_and_install_libraries(libraries_to_check)

#@markdown <font color=#C41E3A> **Load Functions** </font> <br>
#@markdown Loading functions in Python involves loading them into your script or notebook using, making the functions available for use. </font> <br>

import requests

def download_and_build_raven(url, temporary_dir, main_dir):
    """
    Download a zip file from the given URL, unzip it, and build Raven.

    Parameters:
    - url (str): The URL of the zip file.
    - temporary_dir (str): The temporary directory to store files.
    - main_dir (str): The main directory for the build.

    Returns:
    - None
    """
    zip_file_path = os.path.join(temporary_dir, 'RavenSource')
    zip_file_unpacked = os.path.join(temporary_dir, 'unpacked')
    source_dir = os.path.join(temporary_dir, 'unpacked')
    build_dir = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', 'packages')

    # Download the zip file
    response = requests.get(url, stream=True)
    zip_file_path_full = os.path.join(zip_file_path, 'RavenSource_v3.8.zip')
    os.makedirs(zip_file_path, exist_ok=True)

    with open(zip_file_path_full, 'wb') as zip_file:
        shutil.copyfileobj(response.raw, zip_file)

    # Check if the download was successful
    print(f"Downloaded {url} to {zip_file_path_full}") if response.status_code == 200 else print(f"Failed to download {url} (Status code: {response.status_code})")

    # Unzip the source code
    subprocess.run(['unzip', zip_file_path_full, '-d', zip_file_unpacked], check=True)

    # Create the build directory and change to it
    os.makedirs(build_dir, exist_ok=True)
    os.chdir(build_dir)

    # Run CMake and Make
    subprocess.run(['cmake', source_dir], check=True)
    subprocess.run(['make'], check=True)


def remove_temp_data(temporary_dir):
  if os.path.exists(os.path.join(temporary_dir)):
    shutil.rmtree(os.path.join(temporary_dir))

In [None]:
#@markdown <font color=grey> **Load Raven Executable** </font>

# Define the file path to check
file_path = os.path.join(main_dir, 'workflow_outputs','RavenInput','packages','Raven')

# Check if the file exists
if not os.path.exists(file_path):
    # download Raven executable
    url = "https://raven.uwaterloo.ca/files/v3.8/RavenSource_v3.8.zip"
    download_and_build_raven(url, temporary_dir, main_dir)
    remove_temp_data(temporary_dir)
else:
    os.remove(file_path)
    # download Raven executable
    url = "https://raven.uwaterloo.ca/files/v3.8/RavenSource_v3.8.zip"
    download_and_build_raven(url, temporary_dir, main_dir)
    remove_temp_data(temporary_dir)

print(f"\n\n-----------------------------------------------------------------------")
print(f"Final output folder previously defined as: {final_output_folder}")
print(f"-----------------------------------------------------------------------")


In [None]:
#@markdown **Optional: change output foler name**

#@markdown users have the option to change the name of their Raven output folder here

final_output_folder = 'outputA' #@param {type:"string"}



In [None]:
#@markdown <font color=grey> **Run Raven** </font>

# define paths
exe_path = os.path.join(main_dir,'workflow_outputs','RavenInput','packages','Raven')
model_path = os.path.join(main_dir,'workflow_outputs', 'RavenInput', model_name)
final_output_path = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', final_output_folder)

# define bash command
bash_command = f'"{exe_path}" "{model_path}" -o "{final_output_path}"'

# Run the Bash command and capture the output
result = subprocess.run(bash_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

# Display the output
print("Bash Command Output:")
print(result.stdout)

# Display the error, if any
if result.returncode != 0:
    print("Error (if any):")
    print(result.stderr)

**References**

Craig, J. R., Brown, G., Chlumsky, R., Jenkinson, R. W., Jost, G., Lee, K., ... & Tolson, B. A. (2020). Flexible watershed simulation with the Raven hydrological modelling framework. Environmental Modelling & Software, 129, 104728.

#**6.0 Raven Visualization**

The Raven model outputs are formatted and visualized in this subsection. Please keep in mind that the data visualization options are dependent on the Raven model outputs and may need to be customized by the user.

Check out the short video [7.0 Visualization of Raven Model Outputs - Magpie Workflow](https://youtu.be/NgbmsRu6Fm0) for more information.

###<font color=grey> **HydroGlyph**

An online Time Series Visualizer for Raven Hydroglyph which displays CSV output files from the Raven Hydrological Modelling Framework developed by the Raven Development Team at the University of Waterloo.

Hydroglyph was developed by Leland Scantlebury.

In [None]:
# check libraries
libraries_to_check = ["requests", "zipfile"]
check_and_install_libraries(libraries_to_check)

import zipfile
import requests

#@markdown <font color=#5559AB> **Define Raven output folder name** </font>

final_output_folder = "outputA" #@param {type:"string"}

#@markdown <font color=grey> **Zip and download HydroGlyph outputs** </font>

#@markdown Right click on the zipped folder and select "Download" to save to your local computer.

#@markdown Once downloaded, visit [HydroGlyph](http://raven.uwaterloo.ca/hydroglyph/) to visualize outputs


# define directory
zip_dir = os.path.join(temporary_dir)
os.makedirs(zip_dir, exist_ok=True)

def zip_and_download(file_path, zip_name):
    if os.path.exists(file_path):
        # Zip the specified file
        with zipfile.ZipFile(zip_name, 'w') as zipf:
            zipf.write(file_path, os.path.basename(file_path))
        print(f"Zipped and downloaded {file_path} to {zip_name}")
    else:
        print(f'The file {file_path} does not exist.')

# Example usage
folder_path = os.path.join(main_dir,'workflow_outputs','RavenInput',final_output_folder)
file_to_zip = "Hydrographs.csv"  # Replace with the actual file name
zip_file_name = os.path.join("HydroGlyph_files.zip")

# Construct the full path to the file
file_path = os.path.join(folder_path, file_to_zip)

# Call the function to zip and download
zip_and_download(file_path, zip_file_name)

###<font color=grey> **Visualize Hydrographs with Magpie**

In [None]:
import plotly.express as px

#@markdown <font color=grey> **Check output folder path**

print('Output folder name: ', final_output_folder)
print('Full output folder path: ', os.path.join(main_dir, 'workflow_outputs', 'RavenInput', final_output_folder))

user_input = input("Replace output folder path? (yes/no): ").strip().lower()
if user_input == 'yes':
  new_path = input("Please enter the new output folder path: ").strip()
  final_output_folder = new_path
  print('Updated full output folder path: ', final_output_folder)



#### **Hydrograph**

In [None]:
#@markdown <font color=#5559AB> **Define hydrograph name** </font>

# File path
hydrograph_filename = 'Hydrographs.csv' # @param {type:"string"}

hydro_file = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', final_output_folder, hydrograph_filename)

# Load the CSV file
hydrograph_vals = pd.read_csv(hydro_file)
display(hydrograph_vals.head())

print('------------------------------------------------------------------------------------')

print("Column Names: ", hydrograph_vals.columns.tolist())

In [None]:
#@markdown <font color=#5559AB> **Define simulated and observed column names** </font>

simulated_column = "sub26007397 (res. inflow) [m3/s]" # @param {type:"string"}

observed_column = "sub26007397 (res. inflow) [m3/s]" # @param {type:"string"}

# Convert the "date" column to datetime
hydrograph_vals['date'] = pd.to_datetime(hydrograph_vals['date'])

# Ensure the selected columns are numeric
columns_to_plot = [simulated_column, observed_column, 'precip [mm/day]']
for col in columns_to_plot:
    hydrograph_vals[col] = pd.to_numeric(hydrograph_vals[col], errors='coerce')

# Create an interactive line plot with a custom color for precipitation
fig = px.line(
    hydrograph_vals,
    x='date',
    y=columns_to_plot,
    labels={
        "value": "Values",
        "date": "Date",
        "variable": "Legend"
    },
    title="Hydrograph: Precipitation and Discharge Over Time",
    color_discrete_map={
        simulated_column: 'red',
        observed_column: 'blue',
        'precip [mm/day]': 'grey'  # Custom color for precipitation
    }
)

# Show the plot
fig.show()


#### **Forcing Function Outputs**

> add *:WriteForcingFunctions* to RVI file, rerun to generate file

In [None]:
#@markdown <font color=#5559AB> **Define forcing function file name** </font>

# File path
forcing_functions_filename = 'ForcingFunctions.csv' # @param {type:"string"}

forcing_file = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', final_output_folder, forcing_functions_filename)

# Read the CSV file into a pandas DataFrame
forcings = pd.read_csv(forcing_file)

display(forcings.head())

print('------------------------------------------------------------------------------------')

print("Column Names: ", forcings.columns.tolist())

In [None]:
#@markdown <font color=#5559AB> **Define forcing variables to compare** </font>

col_val1 = " temp_daily_min [C]" # @param {type:"string"}

col_val2 = " temp_daily_max [C]" # @param {type:"string"}


# Ensure the "date" column is in Date format
forcings['date'] = pd.to_datetime(forcings[' date'])

# Create an interactive line chart using Plotly Express
fig = px.line(
    data_frame=forcings,
    x='date',
    y=[col_val1, col_val2],
    labels={'value': 'Value', 'date': 'Date', 'variable': 'Legend'},
    title="Daily Minimum and Maximum Temperatures",
)

# Customize the line colors
fig.update_traces(line=dict(width=2))
fig.update_layout(
    legend=dict(title="Legend", orientation="h", x=0.5, xanchor="center"),
    title=dict(x=0.5),
)

# Display the interactive plot
fig.show()

#### **Watershed Storage Outputs**

In [None]:
#@markdown <font color=#5559AB> **Define forcing function file name** </font>

# File path
watershed_filename = 'WatershedStorage.csv' # @param {type:"string"}

wshed_file = os.path.join(main_dir, 'workflow_outputs', 'RavenInput', final_output_folder, watershed_filename)

# Read the CSV file into a pandas DataFrame
wshed = pd.read_csv(wshed_file)

display(wshed.head())

print('------------------------------------------------------------------------------------')

print("Column Names: ", wshed.columns.tolist())

In [None]:
#@markdown <font color=#5559AB> **Define forcing variables to compare** </font>

col_of_interest = 'Snow [mm]' # @param {type:"string"}

fig = px.line(
    wshed,
    y=col_of_interest,
    labels={'value': 'value', 'index': 'Time'},
    template='plotly'
)

# Show the interactive plot
fig.show()

###<font color=#5559AB> **RavenView**

RavenView is an online visualization tool, developed by Craig (2022), for Raven models. This subsection generates a downloadable folder of Raven inputs and outputs that can be dragged-and-dropped into RavenView to examine the model further.

RavenView is not only helpful in visualizing Raven model outputs but can be used beforehand to visualize Raven model input files (RVH, RVP)

Check out the short video [8.0 RavenView - Magpie Workflow](https://youtu.be/2AltLpOWe1M) for more information.

In [None]:
#@markdown <font color=#5559AB> **Define Raven output folder name** </font>

output_name = "outputA" #@param {type:"string"}

In [None]:
#@markdown <font color=grey> **Zip and download Raven model outputs** </font>

#@markdown Right click on the zipped folder and select "Download" to save to your local computer.

#@markdown Once downloaded, visit [RavenView](http://raven.uwaterloo.ca/RavenView/RavenView.html) to visualize outputs


# Define source and destination directories
ravenview_output_name = output_name+'_RavenView'
src_dir = os.path.join(main_dir, 'workflow_outputs', 'RavenInput')
dst_dir = os.path.join(src_dir, ravenview_output_name)

# Create the destination directory if it doesn't exist
os.makedirs(dst_dir, exist_ok=True)

# Copy and move HRU file
src_files = glob(os.path.join(src_dir, 'maps', '*.geojson'))
if src_files:
    dst_file = os.path.join(dst_dir, 'hru_map.geojson')
    shutil.copyfile(src_files[0], dst_file)

# Define a list of file extensions to copy and move
file_ext_lst = ('rvh', 'rvt', 'rvc', 'rvp', 'rvi')

# Copy and move files with specified extensions
for f in file_ext_lst:
    src_files = glob(os.path.join(src_dir, f"{model_name}.{f}"))
    for src_file in src_files:
        dst_file = os.path.join(dst_dir, os.path.basename(src_file))
        shutil.copyfile(src_file, dst_file)

# Copy and move subbasin file
subbasin_files = glob(os.path.join(main_dir, 'workflow_outputs', 'routing_product', 'finalcat_info_*.shp'))
if subbasin_files:
    subbasin_file = subbasin_files[0]
    subbasin_polygon = gpd.read_file(subbasin_file)
    subbasin_polygon.to_file(os.path.join(dst_dir, 'subbasin_map.geojson'), driver='GeoJSON')

# Copy and move river map
river_files = glob(os.path.join(main_dir, 'workflow_outputs', 'routing_product', 'finalcat_info_riv_*.shp'))
if river_files:
    river_file = river_files[0]
    river_polygon = gpd.read_file(river_file)
    river_polygon.to_file(os.path.join(dst_dir, 'river_map.geojson'), driver='GeoJSON')

# Zip the folder
shutil.make_archive(dst_dir, 'zip', dst_dir)
print('Folder zipped:', dst_dir + '.zip')


#**7.0 Magpie Developer - Configuration Files**

To enhance the reproducibility of the Magpie Workflow, users can use this section to complie all the configuration files generated during the workflow. The final configuration file can then be uploaded into Magpie Developer, allowing them to fully replicate their model.

Magpie Developer is a Jupyter notebook that can operate within Google Colab, be executed locally, or run on an alternative server, as long as the necessary packages are properly installed. Moreover, Magpie Developer streamlines the model development process by eliminating the need for a point-and-click interface; instead, all model decisions are retrieved from the configuration file.

If you're interested in trying out Magpie Developer for yourself, please visit Raven Utilities for more information.

In [None]:
#@markdown <font color=grey> **Merge Existing Configuration Files into One for Magpie Developer** </font>

# 1. Sort JSON files in a specific folder by name
folder_path = os.path.join(main_dir, "configuration_files")
sorted_files = sorted([f for f in os.listdir(folder_path) if f.endswith('.json')])

# 2. Combine JSON files into one and save it to a specific location
combined_data = []

for file_name in sorted_files:
    file_path = os.path.join(folder_path, file_name)
    with open(file_path, 'r') as file:
        json_data = json.load(file)
        combined_data.append(json_data)

# Specify the location where you want to save the combined JSON file
output_file_path = '/content/configuration_file.json'

with open(output_file_path, 'w') as output_file:
    json.dump(combined_data, output_file, indent=4)

print(f'Combined JSON file saved to {output_file_path}')
