# Exploratory Data Analysis Exercise with Pandas and HoloViews

In this exercise, you will build upon the exercise we conducted last class period, and can use the from the MatplotLib exercise. On Canvas

    files -> Data -> NWIS_Streaflow -> <STATE>

After performing data cleaning and time-series alignment with Pandas (you can copy the  code used in the Matplotlib exercise), we will transition develop interactaive HoloViews visualizations. The core of the assignment emphasizes the HoloViews philosophy and leveraging the Matplotlib and Bokeh backend, encouring interactive exploratory data analysis to link, overlay, and explore discharge trends across Idaho, Utah, and Wyoming. 

The [USGS NWIS Mapper](https://apps.usgs.gov/nwismapper/) provides interactive mapping to locate sites and repective metadata.

## Task 1: Select, download, and bring the data into your notebook session

Select the following sites:

Idaho:
* 13168500
* 13185000
* 13206000
* 13249500
* 13251000
* 13317000

Utah:
* 09261000
* 09266500
* 10136500
* 10163000
* 10171000
* 10215900

Wyoming
* 06751490
* 06752280
* 09234500
* 09255000
* 10011500
* 10039500

Make a **data** directory in the getting_started folder, create state folders for the data (e.g., UT, WY, ID). Drag and drop the data from CANVAS (or download and drag) into these folders.

In the code block below, load the data into a dictionary of Pandas DataFrames.

    ID_dict = {}

Note, use the os.listdir and os.path.join functions to load the data. Set the key name to the USGS site id, and nothing else.

For example:

    Idaho_files = [f for f in os.listdir(f"{data_folder}/ID") if f.endswith('.csv')]
    ID_dict = {f.split('_')[0]: pd.read_csv(os.path.join(data_folder, "ID", f)) for f in Idaho_files}

In [None]:
#Imports
import pandas as pd
import numpy as np
import holoviews as hv
import os
from holoviews import opts

hv.extension('bokeh', 'matplotlib')

## Task 2 Efficiently clean the data

Much of the USGS data is not needed, or bunches of data may be missing. 

Make a function called ``clean_df()`` that takes in a dataframe, null columns, and a specific state and conducts the following actions:

* Removes the unwanted columns (e.g., create a variable of a list of undesireable columns such as 'variable', 'measurement_unit', 'qualifiers', 'series')
* Adds a column for state and fills the column with the states name
* Sets the date column to a pandaas datetime object (e.g., pd.to_datetime())


Apply the cleaning function to each dictionary of dataframes

## Task 3: Assign a Location variable to each site

Google the USGS ID for each of the streamflow monitoring station IDs and create a new column called **Site_type** that fills each dataframe with how you idenitify the station: Headwater, Tailwater, Near_Terminus.

Once complete, combine all of the dictionary dataframes into one large DataFrame called **All_Streams**

Double check that the All_Streams DataFrame has all of your sites by running ``All_Streams['USGS_ID'].unique()``. All of your sites should be represented.

In [None]:
All_Streams['USGS_ID'].unique()

## Task 4 Make a Holoeviews object with dropdowns for USGS ID and State
The following code block should work and allow interactivity. Note, the USGS_ID and State should be dropdowns. If they are not, you will need to fix something...

For the plots to work, you will need to have the correct state selected for the respetive USGS_ID


In [None]:
vdims = ['USGS_flow'] 
kdims = ['USGS_ID', 'Datetime', 'State'] # Define the key dimensions with labels, note the ordinary column names renamed to more descriptive labels

ds = hv.Dataset(All_Streams, kdims, vdims) # Create the Holoviews Dataset

layout = ds.to(hv.Curve, 'Datetime', 'USGS_flow')
# Customize the layout appearance using hv.Curve options
layout.opts(
    opts.Curve(width=600, height=250, framewise=True)) 