In [None]:
%run "0_Workspace_setup.ipynb"

from NHM_helpers.sf_data_retrieval import (
    create_nwis_sf_df,
    create_OR_sf_df,
    create_ecy_sf_df,
    create_sf_efc_df,
)

from NHM_helpers.NHM_hydrofabric import (
    create_hru_gdf,
    create_poi_df,
    create_default_gages_file,
    read_gages_file,
)

from NHM_helpers.efc import plot_efc, plot_high_low

# Introduction

Critical in the evaluation of the NHM simulated flows is the comparison to observed flows. This notebook retrieves available streamflow observations from NWIS and two state agencies, the Oregon Water Resources Department (OWRD) and the Washington Department of Ecology (ECY), combines these data sets into one daily streamflow observations file with streamflow gage information and metadata, and writes the database out as a netCDF file (**sf_efc.nc**) to be used in Notebook "6_Streamflow_Output_Visualization" and other notebooks in NHM-Assist. Included in the **sf_efc.nc** are EFC classifications of flows using a python workflow (also in this notebook) as described by [Risley and others, 2010](https://pubs.usgs.gov/sir/2010/5016/pdf/sir20105016.pdf). This notebook also writes a default gages file (**default_gages.csv**) that includes gage information for gages in the parameter file and other NWIS gages that have data for the simulation period in the domain.
A complete database of streamflow gages and observations in the model domain is necessary to evaluate the NHM and identify other gages that could be included in a model recalibration to improve the model performance.

Three facts about streamflow observations and the NHM must be reviewed.
- Streamflow observations are NOT used when running PRMS or pywatershed. These data are meant for comparison of simulated output only.
- The NHM DOES use streamflow observations from NWIS in the model calibration workflow (not the streamflow file).
- Limited streamflow gage information is stored in the parameter file.

The parameter file has few parameters associated with gages (dimensioned by npoigages):
- poi_gage_id, the agency identification number
- poi_gage_segment, model segment identification number (nhm_seg) on which the gage falls (1 gage/segment only),
- poi_type, historically used, but not currently used.

It is important to note that the gages in the parameter file are NOT a complete set of gages in the model domain, and were NOT all used to calibrate the model.

## Load NHM domain hydrofabric elements: hrus and gages (pois).

In [None]:
hru_gdf, hru_txt, hru_cal_level_txt = create_hru_gdf(
    NHM_dir,
    model_dir,
    GIS_format,
    param_filename,
    nhru_params,
    nhru_nmonths_params,
)

poi_df = create_poi_df(
    model_dir,
    param_filename,
    control_file_name,
    hru_gdf,
    gages_file,
    default_gages_file,
    nwis_gage_nobs_min,
)

con.print(
    f"{workspace_txt}\n",
    f"\n{gages_txt}{seg_txt}{hru_txt}",
    f"\n     {hru_cal_level_txt}\n",
    f"\n{gages_txt_nb2}",
)

# Retrieve all NWIS gage information and streamflow observations.
This function pulls time series data for all NWIS gages in the domain, and then filters data to the simulation period (**nwis_gages_cache.nc**), and creates **NWISgages.csv**. Both the time series data file and the NWISgages.csv contain all site information for gages with a period of record greater than the user specified threshold (`nwis_gage_nobs_min`, set in [notebook 0](./0_Workspace_setup.ipynb)) within the simulation period **AND** ALL gages in the parameter file regardlless of a period of record less than the specified threshold.

In [None]:
NWIS_df = create_nwis_sf_df(
    control_file_name,
    model_dir,
    output_netcdf_filename,
    hru_gdf,
    poi_df,
    nwis_gage_nobs_min,
)

## Make the default gages file (default_gages.csv)
The **default_gages.csv** contains gages from the parameter file and NWIS gages from the domain (**nwis_gages_cache.nc**). The gages from the parameter file are read into notebooks as `poi_df`. The **default_gages.csv** is read into notebooks as `gages_df` here. The **default_gages.csv** may be missing site information if there are gages in the parameter file that are not in NWIS, and you will see an error below. If this is the case, an error will be displayed below and the **default_gages.csv** must be manually updated, the file must be renamed **gages.csv**, and this notebook must be re-run.

In [None]:
default_gages_file = create_default_gages_file(
    model_dir,
    control_file_name,
    nwis_gage_nobs_min,
    hru_gdf,
    poi_df,
)

gages_df, gages_txt, gages_txt_nb2 = read_gages_file(
    model_dir,
    poi_df,
    gages_file,
)

con.print(
    f"\n{gages_txt}",
    f"\n{gages_txt_nb2}",
)

# Retrieve non-NWIS daily streamflow data for gages in the gages file
We will not use gages in the model parameter file **poi_df**, but use those gages listed in the **gages_df** ( the gages_df is created from gages listed in the **default_gages.csv**, or the updated version, **gages.csv**). This will be useful later for adding gages to the domain model and for model validation or future calibration. Also, multiple gages cannot be associated with a segment in the parameter file. Additional gages in the domain that cannot be in the model parameter file may appended to the **default_gages.csv** and therefore included in the ** gages_df ** and **sf_efc.nc**.

## Retrieve available daily streamflow data from Oregon Water Resources Department
#### https://apps.wrd.state.or.us/apps/sw/hydro_near_real_time/

In [None]:
owrd_df = create_OR_sf_df(control, model_dir, output_netcdf_filename, hru_gdf, gages_df)

## Retrieve available daily streamflow data from Washing Department of Ecology
#### https://waecy.maps.arcgis.com/apps/Viewer/index.html?appid=832e254169e640fba6e117780e137e7b

In [None]:
ecy_df = create_ecy_sf_df(control, model_dir, output_netcdf_filename, hru_gdf, gages_df)

# Create streamflow observations file with appended EFC values (sf_efc.nc).

In [None]:
xr_streamflow = create_sf_efc_df(
    model_dir,
    control_file_name,
    nwis_gage_nobs_min,
    hru_gdf,
    output_netcdf_filename,
    owrd_df,
    ecy_df,
    NWIS_df,
    gages_df,
)

# Check streamflow observations file: plot discharge and efc information for a selected gage.

In [None]:
cpoi_id = xr_streamflow.poi_id.values[2]
print(
    f"Quick plots below for gage: {cpoi_id}; Some gages may show no data because some gages in the paramter file have data outside the simulation period."
)
start_date = pd.to_datetime(str(control.start_time)).strftime("%m/%d/%Y")
end_date = pd.to_datetime(str(control.end_time)).strftime("%m/%d/%Y")
ds_sub = xr_streamflow.sel(poi_id=cpoi_id, time=slice(start_date, end_date))
ds_sub = ds_sub.to_dataframe()
flow_col = "discharge"

In [None]:
plot_efc(ds_sub, flow_col)

In [None]:
plot_high_low(ds_sub, flow_col)