# Access NWIS with the USGS dataretrieval package

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mrahnis/nb-streamgage/blob/main/Streamgage-01--Access-NWIS-with-dataretrieval.ipynb)

## The USGS dataretrieval package

This package allows users to retrieve data using the USGS NWIS API. It is possible to get longer timeseries than is possible from the NWIS webpage. The dataretrieval git repository is here: https://github.com/USGS-python/dataretrieval


## Setup and imports

In [1]:
# if using the regular Colab runtime install dataretrieval and ipyleaflet
!pip install dataretrieval --quiet --exists-action i
!pip install ipyleaflet --quiet --exists-action i

In [2]:
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from dataretrieval import nwis, wqp
from ipywidgets import HTML
from ipyleaflet import AwesomeIcon, Map, Marker, Popup, basemaps, basemap_to_tiles

In [20]:
favorites = {'01576516':'east branch',
            '015765185':'west branch',
            '015765195':'mainstem',
            '01576521':'mainstem-historical',
            '01576754':'conenstoga river at conestoga, pa'}

## Get the USGS sites in Lancaster County

In [4]:
COUNTY_FIPS = 'US:42:071' # 'US:24:031' #
sites, _ = wqp.what_sites(countycode=COUNTY_FIPS, siteType='Stream')
sites

Unnamed: 0,OrganizationIdentifier,OrganizationFormalName,MonitoringLocationIdentifier,MonitoringLocationName,MonitoringLocationTypeName,MonitoringLocationDescriptionText,HUCEightDigitCode,DrainageAreaMeasure/MeasureValue,DrainageAreaMeasure/MeasureUnitCode,ContributingDrainageAreaMeasure/MeasureValue,...,AquiferName,LocalAqfrName,FormationTypeText,AquiferTypeName,ConstructionDateText,WellDepthMeasure/MeasureValue,WellDepthMeasure/MeasureUnitCode,WellHoleDepthMeasure/MeasureValue,WellHoleDepthMeasure/MeasureUnitCode,ProviderName
0,USGS-PA,USGS Pennsylvania Water Science Center,USGS-01573700,"Conewago Creek at Bellaire, PA",Stream,,2050305,20.80,sq mi,,...,,,,,,,,,,NWIS
1,USGS-PA,USGS Pennsylvania Water Science Center,USGS-01574050,"Snitz Creek near Falmouth, PA",Stream,,2050306,0.23,sq mi,,...,,,,,,,,,,NWIS
2,USGS-PA,USGS Pennsylvania Water Science Center,USGS-01574055,"Snitz Creek near Bainbridge, PA",Stream,,2050306,2.02,sq mi,,...,,,,,,,,,,NWIS
3,USGS-PA,USGS Pennsylvania Water Science Center,USGS-01574200,"Conoy Creek at Elizabethtown, PA",Stream,,2050306,3.02,sq mi,,...,,,,,,,,,,NWIS
4,USGS-PA,USGS Pennsylvania Water Science Center,USGS-01574300,"Conoy Creek Tributary at Elizabethtown, PA",Stream,,2050306,1.34,sq mi,,...,,,,,,,,,,NWIS
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
712,MDE_TMDL,TMDL Technical Development Program,MDE_TMDL-SUS0160,CP-4,River/Stream,,2050306,,,,...,,,,,,,,,,STORET
713,NARS,EPA National Aquatic Resource Survey Data,NARS-OWW04440-0242,Meetinghouse Creek,River/Stream,SITE TYPE is EASTPROB/Strahler Stream Order is...,2050306,,,,...,,,,,,,,,,STORET
714,NARSTEST,EPA National Aquatic Resources Survey (NARS),NARSTEST-FW08PA018,Tributary to West Branch Ontario Creek,River/Stream,"FW_ECO3=""EHIGH"";URBAN=""NonUrban"";STRAHLERORDER...",2050306,,,,...,,,,,,,,,,STORET
715,NARS_WQX,EPA National Aquatic Resources Survey (NARS),NARS_WQX-FW08PA018,Tributary to West Branch Ontario Creek,River/Stream,NonUrban,2050306,,,,...,,,,,,,,,,STORET


In [5]:
# make a list of USGS site_no and then get site info for these
site_ids = sites[sites['ProviderName']=='NWIS']['MonitoringLocationIdentifier'].str[5:].to_list()
site_info, _ = nwis.get_info(sites=site_ids)
site_info.head()

Unnamed: 0,agency_cd,site_no,station_nm,site_tp_cd,lat_va,long_va,dec_lat_va,dec_long_va,coord_meth_cd,coord_acy_cd,...,local_time_fg,reliability_cd,gw_file_cd,nat_aqfr_cd,aqfr_cd,aqfr_type_cd,well_depth_va,hole_depth_va,depth_src_cd,project_no
0,USGS,1573700,"Conewago Creek at Bellaire, PA",ST,401139.0,763437.0,40.19426,-76.576635,M,U,...,Y,,NNNNNNNN,,,,,,,
1,USGS,1574050,"Snitz Creek near Falmouth, PA",ST,400802.6,763917.0,40.134056,-76.654722,G,S,...,Y,,,,,,,,,2476CDH00
2,USGS,1574055,"Snitz Creek near Bainbridge, PA",ST,400728.4,763952.4,40.124556,-76.664556,G,S,...,Y,,,,,,,,,2476CDH00
3,USGS,1574200,"Conoy Creek at Elizabethtown, PA",ST,400909.0,763625.0,40.152594,-76.606634,M,U,...,Y,,NNNNNNNN,,,,,,,
4,USGS,1574300,"Conoy Creek Tributary at Elizabethtown, PA",ST,400920.0,763655.0,40.15565,-76.614968,M,U,...,Y,,NNNNNNNN,,,,,,,


## Map the sites

In [6]:
# map the stations in Lancaster County, PA
center = (
    (site_info['dec_lat_va'].min() + site_info['dec_long_va'].max())/2,
    (site_info['dec_lat_va'].min() + site_info['dec_long_va'].max())/2,
)

m = Map(
    basemap=basemap_to_tiles(basemaps.OpenStreetMap.Mapnik),
    center=center,
    zoom=9
)

default_icon = AwesomeIcon(
    name='map-marker',
    marker_color='blue',
    icon_color='black',
    spin=False
)

favorite_icon = AwesomeIcon(
    name='map-marker',
    marker_color='red',
    icon_color='black',
    spin=False
)

for ix, site in site_info.iterrows():
    
    if (site['site_no'] in favorites):
        icon = favorite_icon
        z_index_offset = 100
    else:
        icon = default_icon
        z_index_offset = 0
        
    marker = Marker(
        icon=icon,
        location=(site['dec_lat_va'], site['dec_long_va']),
        draggable=False,
        title=site['station_nm'],
        alt=site['site_no'],
        z_index_offset=z_index_offset
    )
    
    message = HTML()
    message.value = site['station_nm'] + '<br/> USGS: ' + site['site_no']

    popup = Popup(
        location=(site['dec_lat_va'], site['dec_long_va']),
        child=message,
        close_button=False,
        auto_close=False,
        close_on_escape_key=False
    )
    marker.popup = popup

    m.add_layer(marker)

m

Map(center=[-18.0792215, -18.0792215], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_tit…

## Get site information and statistics

In [11]:
gage = '015765195'

gage_info = site_info[site_info['site_no']==gage]

In [12]:
gage_stats, _ = nwis.get_stats(sites=gage)
gage_stats

Unnamed: 0,agency_cd,site_no,parameter_cd,ts_id,loc_web_ds,month_nu,day_nu,begin_yr,end_yr,count_nu,...,mean_va,p05_va,p10_va,p20_va,p25_va,p50_va,p75_va,p80_va,p90_va,p95_va
0,USGS,015765195,10,170026,,1,1,2013,2022,10,...,7.4,,3.2,5.8,6.2,7.4,8.8,9.3,11.0,
1,USGS,015765195,10,170026,,1,2,2013,2022,10,...,7.1,,3.4,5.9,6.0,6.9,8.3,9.0,10.7,
2,USGS,015765195,10,170026,,1,3,2013,2022,10,...,6.7,,3.1,4.1,5.1,7.2,8.2,8.9,9.3,
3,USGS,015765195,10,170026,,1,4,2013,2022,10,...,6.7,,3.4,4.3,5.5,6.8,8.3,8.6,9.8,
4,USGS,015765195,10,170026,,1,5,2013,2022,10,...,6.2,,1.9,4.6,4.6,6.1,8.0,8.5,9.3,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1459,USGS,015765195,63680,214327,,12,27,2017,2022,6,...,6.4,,,1.1,1.4,4.2,10.0,15.0,,
1460,USGS,015765195,63680,214327,,12,28,2017,2022,6,...,12.0,,,1.3,1.4,4.2,28.0,33.0,,
1461,USGS,015765195,63680,214327,,12,29,2017,2022,6,...,4.3,,,1.5,2.0,4.2,6.4,7.5,,
1462,USGS,015765195,63680,214327,,12,30,2017,2022,5,...,5.6,,,1.5,1.7,4.5,10.0,12.0,,


## Reading our data

Next we will read two parquet files using Pandas. The `read_parquet` function takes a quoted string representing the filesystem path to the file we want to read.

We use parquet here because it has some advantages over a CSV file:

- the filesize is smaller
- it is a binary format that reads quickly, whereas CSV is text that needs to be parsed
- parquet preserves the index, including indices of datetime

In [19]:
start = '1970-01-01'
end = datetime.datetime.today().date()
service = 'iv' # daily value dv, or instantaneous value iv
df = nwis.get_record(sites=gage, service=service, start=start, end=end)

JSONDecodeError: [Errno Expecting value] <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>503 Service Unavailable</title>
</head><body>
<h1>Service Unavailable</h1>
<p>The server is temporarily unable to service your
request due to maintenance downtime or capacity
problems. Please try again later.</p>
</body></html>
: 0

In [14]:
df.head()

Unnamed: 0_level_0,00010,00010_cd,site_no,00060,00060_cd,00065,00065_cd,00095,00095_cd,63680,63680_cd
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2022-12-03 21:00:00-05:00,9.6,P,15765195,1.23,P,3.69,P,669.0,P,2.6,P


Looking at `df` we will see it has several other codes. The NWIS codes included here stand for:
- 00010 : Temperature in degrees celcius
- 00060 : Discharge
- 63680 : Turbidity

We can describe them to obtain some summary statistics. 

In [None]:
df.describe()

In [None]:
df.index

## Save as parquet

Saving a DataFrame in Parquet format has some advantages over saving to CSV. Parquet files tend to be smaller on disk and faster to read. Parquet will maintain your data types so you do not need to specify dtypes or parse datetime strings on re-reading the file.

In [None]:
df.to_parquet('nwis_{}_{}_{}.parquet'.format(gage, start, end), index=True)