Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request site data from NWIS for stations #70

Open
mroberge opened this issue Apr 17, 2020 · 2 comments
Open

Request site data from NWIS for stations #70

mroberge opened this issue Apr 17, 2020 · 2 comments

Comments

@mroberge
Copy link
Owner

Description

It would be nice to be able to request site information for USGS stream gauges.

It is possible to get:

  • Latitude and Longitude
  • Drainage area

from waterdata.usgs.gov. The StreamStats station data site pulls drainage area from this source.

What I Did

https://waterdata.usgs.gov/nwis/inventory?search_site_no=01541200&format=sitefile_output&sitefile_output_format=rdb

returns:

#
#
# US Geological Survey
# retrieved: 2020-04-17 14:38:55 EDT
# URL: https://nwis.waterdata.usgs.gov/nwis/inventory
#
# The Site File stores location and general information about groundwater,
# surface water, and meteorological sites
# for sites in USA.
#
# The following selected fields are included in this output:
#
#  agency_cd       -- Agency
#  site_no         -- Site identification number
#  station_nm      -- Site name
#  state_cd        -- State code
#  county_cd       -- County code
#  huc_cd          -- Hydrologic unit code
#  lat_va          -- DMS latitude
#  long_va         -- DMS longitude
#  coord_acy_cd    -- Latitude-longitude accuracy
#  coord_datum_cd  -- Latitude-longitude datum
#  alt_va          -- Altitude of Gage/land surface
#  alt_acy_va      -- Altitude accuracy
#  alt_datum_cd    -- Altitude datum
#  drain_area_va   -- Drainage area
#  contrib_drain_area_va -- Contributing drainage area
#
#
# query started 2020-04-17 14:38:55 EDT
#
# there are 1 sites matching the search criteria.
#
#
agency_cd	site_no	station_nm	state_cd	county_cd	huc_cd	lat_va	long_va	coord_acy_cd	coord_datum_cd	alt_va	alt_acy_va	alt_datum_cd	drain_area_va	contrib_drain_area_va
5s	15s	50s	2s	3s	16s	11s	12s	1s	10s	8s	3s	10s	8s	8s
USGS	01541200	WB Susquehanna River near Curwensville, PA	42	033	02050201	405741	0783110	S	NAD27	 1124.66	.01	NGVD29	367	
@mroberge
Copy link
Owner Author

It turns out that different sites have different data on their stream stats pages ("gagepage"). Some sites have lots of data.

One original source is an old thing called the "Basin Characteristics File" that was first created in 1986! I found mentions of the file here: https://catalog.data.gov/dataset/watstore-stream-flow-basin-characteristics-file. This has links to the file on WATSTORE. It is in really old formats like .e00 and SDTS! These can be opened, but it doesn't seem worth it.

I checked several other packages to see if they have a way to access the gagepages:

  • USGS-python/dataretrieval: they have some streamstats code, but it doesn't touch this file. It's more about getting geoJSON files, I think.
  • cheginit/hydrodata: He has some routines that will calculate landcover for a watershed from the NLCD, but nothing with your slopes. You might find this package useful for you- the idea is that you provide a station, and this will retrieve the watershed and a lot of data for it.
  • earthlab/streamstats: this finds the HUC8 watershed when you give it a point. It uses the streamstats service, but not the pages you want.
  • Dewberry/usgs-tools: this uses the streamstats service too. It might be able to create a new watershed for you and calculate all of the statistics. It doesn't use the pre-calculated statistics that you found for the stations.

Since there doesn't seem to be anything that reads these pages, we could try to do it ourselves.

To read directly from the "gagepage" at streamstats, we could use Beautiful Soup, which parses HTML. I've never used it before, but here is a nice guide: https://www.pluralsight.com/guides/extracting-data-html-beautifulsoup It seems relatively easy to do:

pip install requests beautifulsoup4

import requests

from bs4 import BeautifulSoup as bs

url = "https://streamstatsags.cr.usgs.gov/gagepages/html/03335000.htm"

html_content = requests.get(url).text

soup = bs(html_content, "lxml")

The soup object has a few methods that allow you to select things...

  • soup.title
  • or soup.title.text for just the content and not the title tag.
    The tutorial page has an example that pulls data from a table. It shouldn't be too hard to figure out how to modify that to pull the data you want.

@mroberge
Copy link
Owner Author

The site file is retrieved by hf.site_file(site) with commit 76f92cf.
It would still be useful to get the 'gagepages' because the site file mostly just contains the size of the watershed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant