# Introduction

Critical in evaluation of the NHM model simulated flows is comparison to observed flows in the watershed. This notebook retrieves available streamflow observations from NWIS and two state agencies, the Oregon Water Resources Deparment (OWRD) and the Washington Departent of Ecology (ECY), combines these data sets into one streamflow obervations file and streanflow gage information and metadata, and write the database out as a netCDF file (.nc) to be used in Notebook "6_Streamflow_Output_Visualization" and other notebooks in NHM-Assist. A complete database of streamflow gages and observation in the model domain is nessessary to evaluate the NHM model and identify other gages that could be included in a model recalibration to improve the model performance.

Three key facts about streamflow observations and the NHM must be reviewed.
- Streamflow observations are NOT used when running PRMS or pywatershed. These data are meant for comparison of simulated output only.
- The NHM DOES use streamflow observations from NWIS in the model calibration workflow (not the streamflow file).
- Limited streamflow gage information is stored in the parameter file.

The paramter file has  are dimensioned by npoigages and include :
- poi_gage_id, the agency identification number
- poi_gage_segment, model segment identification number (nhm_seg) on which the gage falls (1 gage/segment only),
- poi_type, historically used, but not currently used.

It is important to note that the gages in the parameter file are NOT a complete set of gages in the model domain, and NOT all used to calibrate the model. 

In [None]:
%run "0_Workspace_setup.ipynb"

from NHM_helpers.sf_data_retrieval import *

In [None]:
# Create hydrofabric map elements
hru_gdf, hru_cal_level_txt = create_hru_gdf(
    NHM_dir,
    model_dir,
    GIS_format,
    param_filename,
    nhru_params,
    nhru_nmonths_params,
)

# seg_gdf = create_segment_gdf(
#     model_dir,
#     GIS_format,
#     param_filename,
# )

nwis_gages_aoi = fetch_nwis_gage_info(
    model_dir,
    control_file_name,
    nwis_gage_nobs_min,
    hru_gdf,
)

poi_df = create_poi_df(
    model_dir,
    param_filename,
    control_file_name,
    hru_gdf,
    nwis_gages_aoi,
    gages_file,
)

default_gages_file = create_default_gages_file(
    model_dir,
    nwis_gages_aoi,
    poi_df,
)

gages_df = read_gages_file(
    model_dir,
    poi_df,
    nwis_gages_file,
    gages_file,
)

In [None]:
crs = 4326

# Make a list if the HUC2 region(s) the subbasin intersects for NWIS queries
huc2_gdf = gpd.read_file("./data_dependencies/HUC2/HUC2.shp").to_crs(crs)
model_domain_regions = list((huc2_gdf.clip(hru_gdf).loc[:]["huc2"]).values)

# Retrieve daily streamflow data for gages in the gages file
##### We will not use the list of gages in the model paramter file, but will use the gages lsited in the gages_file.csv. This will be useful later for adding gages to the model for further validation or calibration. Also, multiple gages cannot be associated with a single segment outflow in the parameter file. We want a streamflow data set that is more inclusive.

## Retrieve available daily streamflow data from Oregon Water Resources Department
#### https://apps.wrd.state.or.us/apps/sw/hydro_near_real_time/

In [None]:
owrd_df = create_OR_sf_df(control, model_dir, output_netcdf_filename, hru_gdf, gages_df)

## Retrieve availabale daily streamflow data from Washing Department of Ecology
#### https://waecy.maps.arcgis.com/apps/Viewer/index.html?appid=832e254169e640fba6e117780e137e7b

In [None]:
ecy_df = create_ecy_sf_df(control, model_dir, output_netcdf_filename, hru_gdf, gages_df)

## Retrieve availabale daily streamflow data from NWIS
#### https://waterdata.usgs.gov/nwis/sw

In [None]:
NWIS_df = create_nwis_sf_df(control, model_dir, output_netcdf_filename, gages_df)

# Create streamflow observations file with appended EFC values (sf_efc.nc).

In [None]:
xr_streamflow = create_sf_efc_df(
    output_netcdf_filename,
    owrd_df,
    ecy_df,
    NWIS_df,
    gages_df,
)

## Check streamflow observations file: plot discharge and efc information for a selected gage.

In [None]:
cpoi_id = xr_streamflow.poi_id.values[1]

ds_sub = xr_streamflow.sel(poi_id=cpoi_id, time=slice("1980-10-01", "2022-12-31"))
ds_sub = ds_sub.to_dataframe()
flow_col = "discharge"

In [None]:
plot_efc(ds_sub, flow_col)

In [None]:
plot_high_low(ds_sub, flow_col)