# Ex. 4 - Access NWIS with the dataretrieval package

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mrahnis/nb-streamgage/blob/main/Stream%20Gage%20Ex.%204%20-%20Access%20NWIS%20with%20dataretrieval.ipynb)

## The USGS dataretrieval package

This package allows users to retrieve data using the USGS NWIS API. It is possible to get longer timeseries than is possible from the NWIS webpage. The dataretrieval git repository is here: https://github.com/USGS-python/dataretrieval


## Preliminaries

In [1]:
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import dataretrieval.nwis as nwis

In [2]:
gages = {'01576516':'east branch',
         '015765185':'west branch',
         '015765195':'mainstem'}

gage = '015765195'

## Reading our data

Next we will read two parquet files using Pandas. The `read_parquet` function takes a quoted string representing the filesystem path to the file we want to read.

We use parquet here because it has some advantages over a CSV file:

- the filesize is smaller
- it is a binary format that reads quickly, whereas CSV is text that needs to be parsed
- parquet preserves the index, including indices of datetime

In [3]:
df = nwis.get_record(sites=gage, service='iv', start='2017-12-31', end='2018-01-01')

In [4]:
df.head()

Unnamed: 0_level_0,00010,00010_cd,site_no,00060,00060_cd,00065,00065_cd,00095,00095_cd,63680,63680_cd
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2017-12-31 00:00:00-05:00,3.1,A,15765195,1.05,A,3.09,A,853.0,A,12.5,A
2017-12-31 00:15:00-05:00,3.1,A,15765195,1.05,A,3.09,A,848.0,A,12.6,A
2017-12-31 00:30:00-05:00,3.0,A,15765195,1.05,A,3.09,A,848.0,A,12.5,A
2017-12-31 00:45:00-05:00,3.0,A,15765195,1.05,A,3.09,A,851.0,A,12.6,A
2017-12-31 01:00:00-05:00,3.0,A,15765195,1.05,A,3.09,A,854.0,A,10.9,A


Looking at `df` we will see it has several other codes. The NWIS codes included here stand for:
- 00010 : Temperature in degrees celcius
- 00060 : Discharge
- 63680 : Turbidity

We can describe them to obtain some summary statistics. 

In [5]:
df.describe()

Unnamed: 0,00010,00060,00065,00095,63680
count,191.0,192.0,192.0,190.0,174.0
mean,2.942932,1.007344,3.083698,873.805263,15.704023
std,0.955835,0.040153,0.006171,40.989339,8.862204
min,1.5,0.93,3.07,831.0,3.1
25%,2.1,0.98,3.08,848.0,9.125
50%,2.8,0.98,3.08,859.0,14.65
75%,3.6,1.05,3.09,882.5,20.475
max,5.3,1.05,3.09,981.0,49.5
