# Access NWIS with the USGS dataretrieval package

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mrahnis/nb-streamgage/blob/main/Streamgage-01--Access-NWIS-with-dataretrieval.ipynb)

## The USGS dataretrieval package

This package allows users to retrieve data using the USGS NWIS API. It is possible to get longer timeseries than is possible from the NWIS webpage. The dataretrieval git repository is here: https://github.com/USGS-python/dataretrieval


## Setup and imports

In [1]:
# if the notebook is running in colab we'll get the data from github
HOST_IS_COLAB = 'google.colab' in str(get_ipython())

if HOST_IS_COLAB:
    # if using the regular Colab runtime install dataretrieval and others packages
    !pip install dataretrieval --quiet --exists-action i
    !pip install pyproj --quiet --exists-action i
    !pip install xyzservices --quiet --exists-action i

In [2]:
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from dataretrieval import nwis, wqp

In [3]:
favorites = {'01576516':'east branch',
            '015765185':'west branch',
            '015765195':'mainstem',
            '01576521':'mainstem-historical',
            '01576754':'conenstoga river at conestoga, pa'}

## Get the USGS sites in Lancaster County

In [4]:
# parameter codes for discharge and turbidity
parameterCd = ["00060", "63680"]

# get_info() accepts arguments: sites, stateCd as two letter postal code, huc-8, bBox in W,S,E,N decimal lat-lon pairs, and more recently countyCd as a FIPS code
# modifiedSince='YYYY-MM-DD' should give sites active since date
# countyCd='42071' should give Lancaster County PA
sites, md = nwis.get_info(
    stateCd='PA',
    parameterCd=parameterCd,
    siteType='ST',
    startDt="2011-10-01",
    endDt=datetime.date.today().isoformat()
)

In [5]:
sites

Unnamed: 0,agency_cd,site_no,station_nm,site_tp_cd,lat_va,long_va,dec_lat_va,dec_long_va,coord_meth_cd,coord_acy_cd,...,local_time_fg,reliability_cd,gw_file_cd,nat_aqfr_cd,aqfr_cd,aqfr_type_cd,well_depth_va,hole_depth_va,depth_src_cd,project_no
0,USGS,01426690,"Faulkner Brook near Balls Eddy, PA",ST,415900.60,752035.70,41.983500,-75.343250,G,S,...,Y,,,,,,,,,
1,USGS,01426700,BALLS CREEK NEAR WINTERDALE PA,ST,415805.00,752011.00,41.968141,-75.336007,M,S,...,N,,Y,,,,,,,GAZETTEER
2,USGS,01427110,"Shehawken Creek near Hancock, NY",ST,415624.30,751721.70,41.940083,-75.289361,G,S,...,Y,,,,,,,,,
3,USGS,01427120,"Stockport Creek at Stockport, PA",ST,415341.60,751637.10,41.894889,-75.276972,G,S,...,Y,,,,,,,,,
4,USGS,01427190,"Factory Creek at Equinunk, PA",ST,415120.10,751341.20,41.855583,-75.228111,G,S,...,Y,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
456,USGS,400903076463301,"Unnamed Trib to Fishing Creek at Newberry, PA",ST,400902.66,764633.42,40.150739,-76.775950,N,1,...,Y,,,,,,,,,
457,USGS,400903076491601,"Fishing Cr at Bamberger Rd nr Yocumtown, PA",ST,400902.90,764916.30,40.150806,-76.821194,N,1,...,Y,,,,,,,,,
458,USGS,400942076470701,Fishing Creek abv Big Spr Run nr Yocumtown PA,ST,400942.35,764706.68,40.161764,-76.785189,N,1,...,Y,,,,,,,,,
459,USGS,400948076471001,"Big Spring Run near Yocumtown, PA",ST,400947.73,764709.80,40.163258,-76.786056,N,1,...,Y,,,,,,,,,


## Map the Sites

### Bokeh

In [6]:
from pyproj import Transformer

import xyzservices.providers as xyz

import bokeh
from bokeh.models import ColumnDataSource, OpenURL, TapTool
from bokeh.plotting import figure, show
from bokeh.io import output_notebook


def do_transform(lon, lat, transformer):
  return transformer.transform(lon, lat)

output_notebook()

WGS_TO_WEBMERCATOR = Transformer.from_crs("EPSG:4326", "EPSG:3857", always_xy=True)

x, y = do_transform(sites['dec_long_va'], sites['dec_lat_va'], WGS_TO_WEBMERCATOR)
sites['northing'] = y.tolist()
sites['easting'] = x.tolist()


# range bounds supplied in web mercator coordinates
collar = 5000

p = figure(
    x_range=(x.min()-collar, x.max()+collar),
    y_range=(y.min()-collar, y.max()+collar),
    x_axis_type="mercator",
    y_axis_type="mercator",
    tooltips = [
        ("name", "@station_nm"),
        ("number", "@site_no"),
        ("(Long, Lat)", "(@dec_long_va, @dec_lat_va)")
    ]
)

source = ColumnDataSource(sites)

if int(bokeh.__version__[0]) < 3:
    p.add_tile('OpenStreetMap Mapnik')
else:
    p.add_tile(xyz.OpenStreetMap.Mapnik)
print("Using Bokeh version {}".format(bokeh.__version__[0]))

p.circle(
    x='easting', y='northing',
    size=10,
    fill_color='blue', fill_alpha=0.6,
    line_color=None,
    source=source
)

url = "http://www.colors.commutercreative.com/@color/"
taptool = p.select(type=TapTool)
taptool.callback = OpenURL(url=url)

show(p)

Using Bokeh version 3


## Get site information and statistics

In [7]:
gage = '015765195'
gage_info = sites[sites['site_no']==gage]

In [8]:
gage_info

Unnamed: 0,agency_cd,site_no,station_nm,site_tp_cd,lat_va,long_va,dec_lat_va,dec_long_va,coord_meth_cd,coord_acy_cd,...,gw_file_cd,nat_aqfr_cd,aqfr_cd,aqfr_type_cd,well_depth_va,hole_depth_va,depth_src_cd,project_no,northing,easting
299,USGS,15765195,"Big Spring Run near Mylin Corners, PA",ST,395945.37,761550.54,39.995936,-76.264039,N,S,...,,,,,,,,2476DFS,4865352.0,-8489674.0


In [9]:
gage_stats, _ = nwis.get_stats(sites=gage)
gage_stats

Unnamed: 0,agency_cd,site_no,parameter_cd,ts_id,loc_web_ds,month_nu,day_nu,begin_yr,end_yr,count_nu,...,mean_va,p05_va,p10_va,p20_va,p25_va,p50_va,p75_va,p80_va,p90_va,p95_va
0,USGS,015765195,10,170026,,1,1,2013,2022,10,...,7.4,,3.2,5.8,6.2,7.4,8.8,9.3,11.0,
1,USGS,015765195,10,170026,,1,2,2013,2022,10,...,7.1,,3.4,5.9,6.0,6.9,8.3,9.0,10.7,
2,USGS,015765195,10,170026,,1,3,2013,2022,10,...,6.7,,3.1,4.1,5.1,7.2,8.2,8.9,9.3,
3,USGS,015765195,10,170026,,1,4,2013,2022,10,...,6.7,,3.4,4.3,5.5,6.8,8.3,8.6,9.8,
4,USGS,015765195,10,170026,,1,5,2013,2022,10,...,6.2,,1.9,4.6,4.6,6.1,8.0,8.5,9.3,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1459,USGS,015765195,63680,214327,,12,27,2017,2022,6,...,6.4,,,1.1,1.4,4.2,10.0,15.0,,
1460,USGS,015765195,63680,214327,,12,28,2017,2022,6,...,12.0,,,1.3,1.4,4.2,28.0,33.0,,
1461,USGS,015765195,63680,214327,,12,29,2017,2022,6,...,4.3,,,1.5,2.0,4.2,6.4,7.5,,
1462,USGS,015765195,63680,214327,,12,30,2017,2022,5,...,5.6,,,1.5,1.7,4.5,10.0,12.0,,


## Get a gage record

In [10]:
start = '2021-01-01'
end = datetime.datetime.today().date()
service = 'iv' # daily value dv, or instantaneous value iv
df = nwis.get_record(sites=gage, service=service, start=start, end=end)

In [11]:
df.head()

Unnamed: 0_level_0,00010,00010_cd,site_no,00060,00060_cd,00065,00065_cd,00095,00095_cd,63680,63680_cd
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2021-01-01T00:00:00.000-05:00,7.6,A,15765195,2.92,A,3.6,A,661.0,A,2.0,A
2021-01-01T00:15:00.000-05:00,7.5,A,15765195,2.82,A,3.59,A,661.0,A,1.9,A
2021-01-01T00:30:00.000-05:00,7.5,A,15765195,2.82,A,3.59,A,661.0,A,2.1,A
2021-01-01T00:45:00.000-05:00,7.4,A,15765195,2.82,A,3.59,A,661.0,A,1.9,A
2021-01-01T01:00:00.000-05:00,7.4,A,15765195,2.82,A,3.59,A,661.0,A,1.9,A


Looking at `df` we will see it has several other codes. The NWIS codes included here stand for:
- 00010 : Temperature, water, degrees Celsius
- 00060 : Discharge, cubic feet per second
- 00065 : Gage height, feet
- 00095 : Specific conductance, water, unfiltered, microsiemens per centimeter at 25 degrees Celsius
- 63680 : Turbidity, water, unfiltered, monochrome near infra-red LED light, 780-900 nm, detection angle 90 +-2.5 degrees, formazin nephelometric units (FNU)

We can describe them to obtain some summary statistics. 

In [12]:
df.describe()

Unnamed: 0,00010,00060,00065,00095,63680
count,71113.0,65818.0,71155.0,71009.0,69565.0
mean,12.344642,2.817119,3.71735,743.863412,4.335374
std,4.124078,6.478698,0.147219,127.897877,10.76292
min,0.6,0.85,3.35,86.0,0.3
25%,8.8,1.67,3.66,720.0,1.3
50%,12.6,2.08,3.73,749.0,2.0
75%,15.6,2.46,3.77,777.0,3.7
max,25.6,260.0,7.47,3870.0,371.0


## Save as parquet

Saving a DataFrame in Parquet format has some advantages over saving to CSV. Parquet files tend to be smaller on disk and faster to read. Parquet will maintain your data types so you do not need to specify dtypes or parse datetime strings on re-reading the file.

In [13]:
df.to_parquet('nwis_{}_{}_{}.parquet'.format(gage, start, end), index=True)