# Explore point data availability

To launch this notebook interactively in a Jupyter notebook-like browser interface, please click the "Launch Binder" button below. Note that Binder may take several minutes to launch.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hydroframe/subsettools-binder/HEAD?labpath=hf_hydrodata/point/example_explore_data.ipynb)

This notebook walks through how to explore what data is available using hf_hydrodata's `get_site_variables`. Please see the full [point module](https://hf-hydrodata.readthedocs.io) documentation for information on what data is available, our data collection process, and new features we are working on! Our [Metadata Description](https://hf-hydrodata.readthedocs.io/en/latest/point_data/metadata_definitions.html) page itemizes the fields that get returned from `get_point_metadata`.

In [1]:
# Import packages
from hf_hydrodata import register_api_pin, get_point_data, get_point_metadata, get_site_variables

In [None]:
# You need to register on https://hydrogen.princeton.edu/pin 
# and run the following with your registered information
# before you can use the hydrodata utilities
register_api_pin("your_email", "your_pin")

## Example 1: What streamflow sites are available in Colorado that were operational during Water Year 2019?

The `get_site_variables` function accepts any number of supported parameters. 

In this example, we want to know information about streamflow sites, so we will supply `variable='streamflow'`. We are specifically interested in the state of Colorado, so we will supply `state='CO'`. Finally, we only want sites that were operational during Water Year 2019. We will set `date_start='2018-10-01'` and `date_end='2019-09-30'`. 

In [2]:
# Let's use the above input parameters.
df = get_site_variables(variable="streamflow", state="CO", date_start="2018-10-01", date_end="2019-09-30")

print(f'Number of records: {len(df)}') 
df.head(5)

Number of records: 691


Unnamed: 0,site_id,site_name,site_type,agency,state,variable_name,units,first_date_data_available,last_date_data_available,record_count,latitude,longitude,site_query_url,date_metadata_last_updated,tz_cd,doi
0,6614800,"MICHIGAN RIVER NEAR CAMERON PASS, CO",stream gauge,USGS,CO,Hourly average streamflow,m3/s,1986-10-01,2023-12-02,224235,40.496094,-105.865012,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,MST,
1,6614800,"MICHIGAN RIVER NEAR CAMERON PASS, CO",stream gauge,USGS,CO,Daily average streamflow,m3/s,1973-10-01,2023-12-01,18322,40.496094,-105.865012,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,MST,
2,6620000,"NORTH PLATTE RIVER NEAR NORTHGATE, CO",stream gauge,USGS,CO,Hourly average streamflow,m3/s,1990-10-01,2023-12-02,177843,40.936639,-106.339194,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,MST,
3,6620000,"NORTH PLATTE RIVER NEAR NORTHGATE, CO",stream gauge,USGS,CO,Daily average streamflow,m3/s,1904-06-01,2023-12-01,39782,40.936639,-106.339194,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,MST,
4,6659580,SAND CREEK AT COLORADO-WYOMING STATE LINE,stream gauge,USGS,CO,Hourly average streamflow,m3/s,1996-04-01,2020-09-02,104382,40.99365,-105.759703,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,MST,


Above, we can see that 691 records were returned. Note that a site could collect data at multiple temporal resolutions (such as daily and hourly). The data availability range for each might be different, as a site could have started collecting hourly and daily data at different times. The above DataFrame will be unique for unique combinations of `site_id` and `variable_name`.

The fields `first_date_data_available` and `last_date_data_available` refer to the earliest and latest dates we have available for the site. The `record_count` field reports on the total number of records over that entire time span and does not relate to the specific `date_start` and `date_end` parameters supplied.

Let's narrow our search down to only sites that have daily streamflow data by adding `temporal_resolution='daily'`. Let's further refine things so that we only get sites with a latitude between 40 and 40.5 degrees.

In [3]:
df = get_site_variables(variable="streamflow", state="CO", date_start="2018-10-01", date_end="2019-09-30", temporal_resolution="daily",
                       latitude_range=(40, 40.5))
                       
print(f'Number of records: {len(df)}') 
df.tail(5)

Number of records: 47


Unnamed: 0,site_id,site_name,site_type,agency,state,variable_name,units,first_date_data_available,last_date_data_available,record_count,latitude,longitude,site_query_url,date_metadata_last_updated,tz_cd,doi
309,9032990,MEADOW CREEK BLW MEADOW CREEK RES NR TABERNASH...,stream gauge,USGS,CO,Daily average streamflow,m3/s,2018-06-01,2023-10-31,1073,40.052028,-105.754167,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,MST,
311,9033010,"MEADOW CREEK DIVERSION NEAR TABERNASH, CO",stream gauge,USGS,CO,Daily average streamflow,m3/s,2018-06-29,2023-10-31,1045,40.050286,-105.779706,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,MST,
312,9040500,"TROUBLESOME CREEK NEAR TROUBLESOME, CO.",stream gauge,USGS,CO,Daily average streamflow,m3/s,1904-10-01,2023-12-01,9318,40.058664,-106.305178,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,MST,
314,401723105400000,ANDREWS CREEK-LOCH VALE-RMNP,stream gauge,USGS,CO,Daily average streamflow,m3/s,1991-09-30,2023-12-01,11582,40.29,-105.666667,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,MST,
316,401727105400000,ANDREWS SPRING 1,stream gauge,USGS,CO,Daily average streamflow,m3/s,2019-04-10,2023-10-09,1341,40.290818,-105.667227,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,MST,


Now we 47 sites. We can extract this site list and pass it in to the `site_ids` parameter of `get_point_data` and `get_point_metadata` functions to get data for WY2019 for only these sites.

Note that `get_point_data` and `get_point_metadata` require the parameters `dataset`, `variable`, `temporal_resolution` and `aggregation` to be supplied to be able to uniquely identify a single data series. For daily average streamflow, we want `dataset='usgs_nwis'`, `variable='streamflow'`, `temporal_resolution='daily'`, and `aggregation='mean'`.

In [4]:
# Get list of site IDs from the above exploratory DataFrame
co_streamflow_site_ids = list(df['site_id'])
assert len(co_streamflow_site_ids) == len(df)

In [5]:
# Request point observations data
data_df = get_point_data(dataset="usgs_nwis", variable="streamflow", temporal_resolution="daily", aggregation="mean",
                        date_start="2018-09-30", date_end="2019-10-01",
                        site_ids=co_streamflow_site_ids)

# View first five records
data_df.head()

Unnamed: 0,date,06614800,06720990,06721000,06724970,06727410,06727500,06730160,06730200,06730500,...,09304200,09304500,09304800,09306222,09306255,09306290,401723105400000,401727105400000,401733105392404,402114105350101
0,2018-09-30,0.011037,1.2452,4.6695,0.044148,,0.006792,0.0,1.00465,0.51506,...,4.3582,5.7166,7.1882,0.024904,0.00849,6.2543,0.019244,,0.054053,0.185931
1,2018-10-01,0.01132,1.12634,5.0657,0.043582,,,,0.98484,0.72731,...,4.811,5.8864,7.4146,0.025187,0.00849,6.5373,0.018961,,0.050657,0.17829
2,2018-10-02,0.01132,0.4528,5.1223,0.0566,,,,0.9339,0.66222,...,5.0091,6.0562,7.5561,0.029149,0.009622,6.9052,0.019244,,0.051789,0.203194
3,2018-10-03,0.011886,0.43016,5.1223,0.04811,,,,0.81787,0.54902,...,5.5468,6.6788,8.1787,0.035375,0.011037,7.6976,0.023772,,0.056317,0.232626
4,2018-10-04,0.013867,0.40752,5.0374,0.049808,,,,0.81787,0.52072,...,6.1128,7.6976,10.4993,0.059713,0.024904,10.6408,0.022357,,0.059147,0.252153


In [6]:
# Request site-level attributes for these sites
metadata_df = get_point_metadata(dataset="usgs_nwis", variable="streamflow", temporal_resolution="daily", aggregation="mean",
                                 date_start="2018-09-30", date_end="2019-10-01",
                                 site_ids=co_streamflow_site_ids)

# View first five records
metadata_df.head()

Unnamed: 0,site_id,site_name,site_type,agency,state,latitude,longitude,first_date_data_available,last_date_data_available,record_count,...,doi,huc8,conus1_x,conus1_y,conus2_x,conus2_y,gagesii_drainage_area,gagesii_class,gagesii_site_elevation,usgs_drainage_area
0,6614800,"MICHIGAN RIVER NEAR CAMERON PASS, CO",stream gauge,USGS,CO,40.496094,-105.865012,1973-10-01,2023-12-01,18322,...,,10180001,1054.0,818.0,1481.0,1764.0,4.0284,Ref,3188.0,1.54
1,6720990,"BIG DRY CREEK AT MOUTH NEAR FORT LUPTON, CO",stream gauge,USGS,CO,40.068833,-104.831986,1991-10-01,2023-12-01,11734,...,,10190003,,,1561.0,1705.0,272.4498,Non-ref,1494.0,107.0
2,6721000,"SOUTH PLATTE RIVER AT FORT LUPTON, CO.",stream gauge,USGS,CO,40.116094,-104.818583,1929-04-29,2023-12-01,17668,...,,10190003,,,1563.0,1715.0,13064.82,Non-ref,1485.0,5043.0
3,6724970,"LEFT HAND CREEK AT HOVER ROAD NEAR LONGMONT, CO",stream gauge,USGS,CO,40.134278,-105.130819,2014-03-05,2023-12-01,3557,...,,10190005,,,1540.0,1718.0,,,,71.6
4,6727410,FOURMILE CREEK AT LOGAN MILL ROAD NEAR CRISMAN...,stream gauge,USGS,CO,40.042028,-105.364917,2011-04-01,2023-09-30,1125,...,,10190005,,,,,,,,19.2


This workflow shows how to quickly identify sites of interest based on location and temporal filters. The `state`, `date_start`, and `date_end` parameters can also be passed in to `get_point_data` and `get_point_metadata` directly. This `get_site_variables` offers a quick look-up for getting variable-level metadata without having to pre-specify the variable of interest (see additional example below)

## Example 2: What data is available within the Raritan watershed in New Jersey (HUC8='02030105')?

Please see the example notebook [Filter sites by USGS HUC boundary](https://hf-hydrodata.readthedocs.io/en/latest/point_data/examples/example_shapefile.html) for a more detailed walk-through on the following procedure. You may also supply an existing shapefile and CRS if you have one for your domain.

In [7]:
# Import additional packages
import requests
from zipfile import ZipFile
from io import BytesIO
import shapefile
from shapely.geometry import shape

In [8]:
# Download and subset HUC02 boundary file from the USGS

# Send request for data
url = 'https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/WBD/HU2/Shape/WBD_02_HU2_Shape.zip'
url_response = requests.get(url)    # note this might take a minute or so to run

# In this example, we will extract only the files with the HUC8 level watersheds
# This code saves these files to the local directory where this notebook is being run
myzipfile = ZipFile(BytesIO(url_response.content))
myzipfile.extractall(members=['Shape/WBDHU8.shp', 'Shape/WBDHU8.shx', 'Shape/WBDHU8.dbf', 'Shape/WBDHU8.prj'])

# Read in shapefile
huc02_shp = shapefile.Reader('Shape/WBDHU8.shp')

# Read in projection file
with open('Shape/WBDHU8.prj') as f:
    usgs_huc_crs = f.readlines()[0]
print(f"CRS: {usgs_huc_crs}")

# We want to use the Raritan watershed, HUC8='02030105'. This is at index 72
print(huc02_shp.shapeRecord(i=72, fields=['states', 'huc8', 'name']).record)

# Extract the shape and record information for this index
raritan_shape = huc02_shp.shapeRecord(i=72).shape
raritan_record = huc02_shp.shapeRecord(i=72).record

# Save as shapefile, to be passed in to hf_hydrodata functions below
with shapefile.Writer('Shape/raritan_watershed') as w:
    w.fields = huc02_shp.fields[1:]

    w.record(raritan_record)
    w.shape(raritan_shape)

CRS: GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]]
Record #72: ['NJ', '02030105', 'Raritan']


In [9]:
with open('Shape/WBDHU8.prj') as f:
    usgs_huc_crs = f.readlines()[0]
print(f"CRS: {usgs_huc_crs}")

CRS: GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]]


We will use this shapefile to see all of the sites with available data within that watershed. Again, if you have your own domain shapefile and CRS you can supply that here instead with the `polygon` and `polygon_crs` parameters.

In [10]:
df = get_site_variables(polygon="Shape/raritan_watershed.shp", polygon_crs=usgs_huc_crs)

print(f"Number of records: {len(df)}")
df.head(5)

Number of records: 1013


Unnamed: 0,site_id,site_name,site_type,agency,state,variable_name,units,first_date_data_available,last_date_data_available,record_count,latitude,longitude,site_query_url,date_metadata_last_updated,tz_cd,doi
0,1396091,South Br Raritan River at Rt 46 at Budd Lake NJ,stream gauge,USGS,NJ,Hourly average streamflow,m3/s,2008-06-01,2014-03-30,50925,40.859444,-74.760833,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,EST,
1,1396190,South Branch Raritan River at Four Bridges NJ,stream gauge,USGS,NJ,Hourly average streamflow,m3/s,1999-01-20,2012-09-30,116127,40.806111,-74.740556,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,EST,
2,1396500,South Branch Raritan River near High Bridge NJ,stream gauge,USGS,NJ,Hourly average streamflow,m3/s,1981-10-01,2023-12-02,353711,40.677778,-74.879167,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,EST,
3,1396580,Spruce Run at Glen Gardner NJ,stream gauge,USGS,NJ,Hourly average streamflow,m3/s,1981-10-01,2005-08-31,153867,40.693333,-74.939722,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,EST,
4,1396582,Spruce Run at Main Street at Glen Gardner NJ,stream gauge,USGS,NJ,Hourly average streamflow,m3/s,2005-10-01,2023-12-02,151737,40.691389,-74.936944,https://waterservices.usgs.gov/nwis/site/?form...,2023-05-30,EST,


Let's see what types of sites are included here.

In [11]:
print(f"Site types: {df['site_type'].unique()}")

Site types: ['stream gauge' 'groundwater well']


Let's look at the groundwater wells to see how many data records each well has.

In [12]:
df[df['site_type'] == 'groundwater well']

Unnamed: 0,site_id,site_name,site_type,agency,state,variable_name,units,first_date_data_available,last_date_data_available,record_count,latitude,longitude,site_query_url,date_metadata_last_updated,tz_cd,doi
63,402109074301301,230291-- Forsgate 1 Obs,groundwater well,USGS,NJ,Hourly average water table depth,m,2007-10-01,2023-08-10,138590,40.352608,-74.503209,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
64,402109074301302,230292-- Forsgate 2 Obs,groundwater well,USGS,NJ,Hourly average water table depth,m,2007-10-01,2023-08-10,138221,40.352608,-74.502931,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
65,402138074435801,210365-- Carter Rd Obs,groundwater well,USGS,NJ,Hourly average water table depth,m,2007-10-01,2023-09-29,140087,40.360662,-74.732383,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
66,402143074185201,230104-- Morrell 1 Obs,groundwater well,USGS,NJ,Hourly average water table depth,m,2007-10-01,2023-12-02,140656,40.362053,-74.313203,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
67,402151074525301,190251-- Corsalo Rd 1 Obs,groundwater well,USGS,NJ,Hourly average water table depth,m,2007-10-01,2023-12-02,141074,40.364273,-74.880999,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
967,405325074363901,271651-- 1963,groundwater well,USGS,NJ,Water table depth,m,1963-07-17,1989-09-22,4,40.890377,-74.610437,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
968,405330074363801,271123-- Kenvil Newcrete 1 Obs,groundwater well,USGS,NJ,Water table depth,m,1989-02-27,1991-01-01,20,40.891766,-74.610160,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
969,405330074363802,271124-- Kenvil Newcrete 2 Obs,groundwater well,USGS,NJ,Water table depth,m,1989-03-21,1997-10-08,20,40.891766,-74.610160,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
970,405330074363803,271183-- Kenvil Newcrete 7 Obs,groundwater well,USGS,NJ,Water table depth,m,1989-10-05,1990-09-30,13,40.891766,-74.610160,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,


It looks like some of these have a lot of records, and some have only a couple of sparse measurements. Let's see what sites have regular, daily water table depth observations.

In [13]:
df[(df['site_type'] == 'groundwater well') & (df['variable_name'] == 'Daily average water table depth')]

Unnamed: 0,site_id,site_name,site_type,agency,state,variable_name,units,first_date_data_available,last_date_data_available,record_count,latitude,longitude,site_query_url,date_metadata_last_updated,tz_cd,doi
78,401518074223001,250216-- PW 1,groundwater well,USGS,NJ,Daily average water table depth,m,1972-10-06,1975-07-31,697,40.255111,-74.374593,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
79,401819074351601,210395-- MW-2 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1993-09-18,1994-08-28,345,40.301775,-74.5921,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
80,402015074275701,230228-- Forsgate 3 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1972-11-01,1975-02-21,772,40.337608,-74.46543,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
81,402015074275702,230229-- Forsgate 4 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1972-11-01,2005-07-16,678,40.337608,-74.46543,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
82,402023074391901,210358-- Princeton 1-Brick Rd Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1989-06-28,1990-10-21,460,40.339829,-74.65488,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
83,402032074392501,210359-- Princeton 2-Chill Pl Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1989-11-18,1992-02-16,784,40.342329,-74.656547,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
84,402058074355901,230796-- Test 5 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1986-03-14,1992-02-20,2170,40.349552,-74.599323,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
85,402058074355902,230800-- Test 9 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1986-03-14,1992-02-20,2152,40.349552,-74.599323,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
86,402109074301301,230291-- Forsgate 1 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1973-02-07,2023-08-09,7278,40.352608,-74.503209,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
87,402109074301302,230292-- Forsgate 2 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1972-12-06,2023-08-09,7435,40.352608,-74.502931,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,


This looks good, but several of these sites only have data through the 1980s or 1990s. Let's say we only want to include sites that have data past at least 2000. We can specify `date_start='2000-01-01'` to make sure we capture only sites that are operational after that date. Let's formalize this query more and also add `variable='water_table_depth'`, `temporal_resolution='daily'` to the `get_site_variables` query.

In [14]:
df = get_site_variables(polygon="Shape/raritan_watershed.shp", polygon_crs=usgs_huc_crs,
                        variable="water_table_depth", temporal_resolution="daily", date_start="2000-01-01")

print(f"Number of records: {len(df)}")
df.head(5)

Number of records: 20


Unnamed: 0,site_id,site_name,site_type,agency,state,variable_name,units,first_date_data_available,last_date_data_available,record_count,latitude,longitude,site_query_url,date_metadata_last_updated,tz_cd,doi
0,402015074275702,230229-- Forsgate 4 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1972-11-01,2005-07-16,678,40.337608,-74.46543,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
1,402109074301301,230291-- Forsgate 1 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1973-02-07,2023-08-09,7278,40.352608,-74.503209,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
2,402109074301302,230292-- Forsgate 2 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1972-12-06,2023-08-09,7435,40.352608,-74.502931,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
3,402138074435801,210365-- Carter Rd Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1987-02-25,2023-09-28,13134,40.360662,-74.732383,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,
4,402143074185201,230104-- Morrell 1 Obs,groundwater well,USGS,NJ,Daily average water table depth,m,1985-01-24,2023-12-02,13930,40.362053,-74.313203,https://waterservices.usgs.gov/nwis/site/?form...,2023-03-08,EST,


Great! It looks like we have 20 sites left that fulfill our query. Let's request the full daily, water table depth data series for those sites, starting from 2000-01-01. 

Again, `get_point_data` and `get_point_metadata` require the parameters `dataset`, `variable`, `temporal_resolution` and `aggregation` to be supplied to be able to uniquely identify a single data series. For daily average water table depth, we want `dataset='usgs_nwis'`, `variable='water_table_depth'`, `temporal_resolution='daily'`, and `aggregation='mean'`.

In [15]:
# This DataFrame will contain data from 2000-01-01 onwards. If sites do not have data that early, they will have NaN values
# for dates until the site became operational.
data_df = get_point_data(dataset="usgs_nwis", variable="water_table_depth", temporal_resolution="daily", aggregation="mean",
                         date_start="2000-01-01", 
                         polygon="Shape/raritan_watershed.shp", polygon_crs=usgs_huc_crs)

# View the first five records
data_df.head()

Unnamed: 0,date,402015074275702,402109074301301,402109074301302,402138074435801,402143074185201,402151074525301,402208074145201,402208074505801,402512074414301,...,403119074290301,403135074274401,403135074274402,403135074274403,403200074420601,403455074514801,403517074452501,404014074585401,404712074454701,404934074400501
0,2000-01-01,,,,3.477768,0.77724,0.798576,37.956744,,,...,,3.761232,2.706624,2.368296,,2.923032,5.660136,,16.005048,3.691128
1,2000-01-02,,,,3.465576,0.77724,0.74676,37.911024,,,...,,3.758184,2.688336,2.343912,,2.904744,5.690616,,15.995904,3.691128
2,2000-01-03,,,,3.453384,0.780288,0.658368,37.874448,,,...,,3.761232,2.688336,2.334768,,2.904744,5.727192,,16.002,3.68808
3,2000-01-04,,,,3.407664,0.765048,0.509016,37.828728,,,...,,3.742944,2.654808,2.298192,,2.883408,5.739384,,15.989808,3.681984
4,2000-01-05,,,,3.380232,0.646176,0.521208,37.853112,,,...,,3.724656,2.667,2.289048,,2.889504,5.76072,,16.029432,3.678936


In [16]:
# Request site-level attributes for these sites
metadata_df = get_point_metadata(dataset="usgs_nwis", variable="water_table_depth", temporal_resolution="daily", aggregation="mean",
                                 date_start="2000-01-01", 
                                 polygon="Shape/raritan_watershed.shp", polygon_crs=usgs_huc_crs)

# View the first five records
metadata_df.head()

Unnamed: 0,site_id,site_name,site_type,agency,state,latitude,longitude,first_date_data_available,last_date_data_available,record_count,...,conus1_x,conus1_y,conus2_x,conus2_y,usgs_nat_aqfr_cd,usgs_aqfr_cd,usgs_aqfr_type_cd,usgs_well_depth,usgs_hole_depth,usgs_hole_depth_src_cd
0,402015074275702,230229-- Forsgate 4 Obs,groundwater well,USGS,NJ,40.337608,-74.46543,1972-11-01,2005-07-16,678,...,,,4035,1964,S100NATLCP,211FRNG,C,330.0,,
1,402109074301301,230291-- Forsgate 1 Obs,groundwater well,USGS,NJ,40.352608,-74.503209,1973-02-07,2023-08-09,7278,...,,,4032,1964,S100NATLCP,211FRNG,C,203.0,,
2,402109074301302,230292-- Forsgate 2 Obs,groundwater well,USGS,NJ,40.352608,-74.502931,1972-12-06,2023-08-09,7435,...,,,4032,1964,S100NATLCP,211ODBG,C,104.0,130.0,D
3,402138074435801,210365-- Carter Rd Obs,groundwater well,USGS,NJ,40.360662,-74.732383,1987-02-25,2023-09-28,13134,...,,,4013,1960,N300ERLMZC,227PSSC,U,99.0,,Z
4,402143074185201,230104-- Morrell 1 Obs,groundwater well,USGS,NJ,40.362053,-74.313203,1985-01-24,2023-12-02,13930,...,,,4046,1970,S100NATLCP,211EGLS,U,11.0,11.0,R


Now let's visualize these points on a map to see what we ended up with. We'll use the [bokeh](https://bokeh.org/) package to do this. If you do not have `bokeh` installed into your environment, you may need to `pip install bokeh` first.

In [17]:
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
import numpy as np

output_notebook()

# First let's define a function to convert lon/lat to mercator coordinates for Bokeh mapping procedure
def wgs84_to_web_mercator(lon, lat):

    k = 6378137
    x = lon * (k * np.pi/180.0)
    y = np.log(np.tan((90 + lat) * np.pi/360.0)) * k

    return x, y

# Calculate and add mercator coordinates to metadata DataFrame
metadata_df['x'] = metadata_df.apply(lambda x: wgs84_to_web_mercator(x['longitude'], x['latitude'])[0], axis=1)
metadata_df['y'] = metadata_df.apply(lambda x: wgs84_to_web_mercator(x['longitude'], x['latitude'])[1], axis=1)

# Define data source for point overlays
source = ColumnDataSource(
    data=dict(x=list(metadata_df['x']),
              y=list(metadata_df['y']),
              site_id=list(metadata_df['site_id']),
              site_name=list(metadata_df['site_name']))
)

# Range bounds supplied in web mercator coordinates
# Translate min and max bounding values from metadata
buffer = 0.2
p = figure(x_range=(wgs84_to_web_mercator((metadata_df['longitude'].max()+buffer), (metadata_df['latitude'].max()+buffer))[0], 
                    wgs84_to_web_mercator((metadata_df['longitude'].min()-buffer), (metadata_df['latitude'].min()-buffer))[0]),
           y_range=(wgs84_to_web_mercator((metadata_df['longitude'].min()-buffer), (metadata_df['latitude'].min()-buffer))[1], 
                    wgs84_to_web_mercator((metadata_df['longitude'].max()+buffer), (metadata_df['latitude'].max()+buffer))[1]),
           x_axis_type="mercator", y_axis_type="mercator")
p.add_tile('OSM')

# Overlay red circles onto map for site locations
p.circle(x="x", y="y", size=15, fill_color="red", fill_alpha=0.8, source=source)

show(p)