# Exploratory Data Analysis Exercise with Pandas and HoloViews

In this exercise, you will use the data used in the MatplotLib exercise but explore the data interactively using the HoloViews plotting library. Filepath for the data:

    files -> Data -> NWIS_Streaflow -> <STATE>

After performing data cleaning and time-series alignment with Pandas (you can copy the exact code used in the Matplotlib exercise), you will transition develop interactaive HoloViews visualizations. The core of the assignment emphasizes the HoloViews philosophy and leveraging the Matplotlib backend, encouring interactive exploratory data analysis to link, overlay, and explore discharge trends across Idaho, Utah, and Wyoming.

The [USGS NWIS Mapper](https://apps.usgs.gov/nwismapper/) provides interactive mapping to locate sites and repective metadata.

## Task 1: Select, download, and bring the data into your notebook session (this can be copied from your Matplotlib exercise, and add a few more sites)

Use the [USGS NWIS Mapper](https://apps.usgs.gov/nwismapper/) to locate one site below a reservoir,  one site in a headwater catchment, and one site near a rivers terminus to the Great Salt Lake. In addition to these locations, ensure you have at least **2 sites in Idaho, 2 sites in Wyoming, and 2 sites in Utah.** 

In the code block below, load the data into a Pandas DataFrame and inspect it as we previously did in the Pandas exercises (.head(), .describe()). Write down what you notice. Remove any outliers NaN values, and -999.



## Task 3: Creating a Tabular dataset.

Create a single dataframe named All_Streams and combine all streamflow monitoring data into this dataframe. Your dataset should look like the diseases dataset in [3-Tabular_Datasets.ipynb](./getting_started/3-Tabular_Datasets.ipynb). Hint, create a column for the year, month, day, stream classification (e.g., headwater, below reservoir, GSL Terminus), state (Idaho, Utah, Wyoming), and streamflow.




## Task 4: Make a Holoviews Object

You should have 6 data columns, which each correspond either to independent variables that specify a particular measurement ('Year', 'Month', 'Day', 'State', or 'Stream Classification'), or observed/dependent variables reporting what was then actually measured (i.e., streamflow). 

Knowing the distinction between those two types of variables is crucial for doing visualizations, but if often not declared. For example, plotting 'Month' against 'State' would not be meaningful, whereas 'streamflow' for each 'Stream Classification' (averaging or summing across the other dimensions) may be fine, and there's no way to deduce those constraints from the tabular format.  

Your task is to make a HoloViews object called a ``Dataset`` that declares the independent variables (called key dimensions or **kdims** in HoloViews) and dependent variables (called value dimensions or **vdims**) that you want to work with.

## Task 5: Make a Holoviews Plot

The datasets has an arbitrary combination of dimensions preventing it from being immediately visualizable. There's no single clear mapping from these dimensions onto a two-dimensional page.

To make the data visualizable, you'll need to provide a bit more metadata, by selecting one of the library of Elements that can help answer the questions we want to ask about the data. Perhaps the most obvious representation of this dataset is as a ``Curve`` displaying the streamflow for each state or streamflow classification. You could pull out individual columns one by one from the original dataset, but now that you have have declared information about the dimensions, the cleanest approach is to map the dimensions of our ``Dataset`` onto the dimensions of an Element using ``.to``.

Task: 
* Create curves for streamflow time by attaching it to the Dataset, with a vdim set by State.