## Using Siphon to query the NetCDF Subset Service


Objectives:
1. Employ Siphon's NCSS class investigate the metadata of SIO and NDBC THREDDS Data Servers (TDS)
2. To retrieve data from SIO and NDBC THREDDS Data Servers (TDS)
3. Plot a map using numpy arrays, matplotlib, and cartopy.

Introduction:
Siphon is a python package that makes it possible to download data from Unidata data technologies. Here we will compare the data availability in 2 different TDS servers that distribute HFR data.  We will first assess the metadata availableo on each server and then attempt to sub set identical regions of data, plot them and compare them.

**But first!**
Bookmark these resources for when you want to use Siphon later:
+ [latest Siphon documentation](http://siphon.readthedocs.org/en/latest/)
+ [Siphon github repo](https://github.com/Unidata/siphon)
+ [TDS documentation](http://www.unidata.ucar.edu/software/thredds/v4.6/tds/TDS.html)
+ [netCDF subset service documentation](http://www.unidata.ucar.edu/software/thredds/current/tds/reference/NetcdfSubsetServiceReference.html)

##Let's get started!

 First, we'll import the TDSCatalog class from Siphon and put the special 'matplotlib' line in so our map will show up later in the notebook. Let's construct an instance of TDSCatalog pointing to our dataset of interest. 

In [1]:
get_ipython().magic(u'matplotlib inline')
from siphon.catalog import TDSCatalog
#starting with east coast and Gulf of Mexico 1km resolution HF Radar data 
NDBC_HFR = TDSCatalog('http://sdf.ndbc.noaa.gov/thredds/catalog.xml?dataset=hfradar_usegc_1km')
SIO_HFR = TDSCatalog('http://hfrnet.ucsd.edu/thredds/HFRADAR_USEGC_hourly_RTV.xml?dataset=HFRNet/USEGC/1km/hourly/RTV')
print('NDBC')
print(NDBC_HFR.datasets)
print('SIO')
print(SIO_HFR.datasets)

NDBC
OrderedDict([('US East Coast and Gulf of Mexico 1km resolution HF Radar data', <siphon.catalog.Dataset object at 0x00000000077257B8>)])
SIO
OrderedDict([('HFRADAR, US East and Gulf Coast, 1km Resolution, Hourly RTV', <siphon.catalog.Dataset object at 0x000000000773A198>)])


## NDBC's Metadata

In [2]:
#NDBC's Metadata
NDBC_HFR.metadata

{'authority': ['unidata.ucar.edu:'],
 'dataFormat': 'netCDF',
 'dataType': 'GRID',
 'inherited': True,
 'serviceName': 'aggregation'}

## Scripps's Metadata

In [3]:
#SIO's Metadata
SIO_HFR.metadata

{'creator': [{}],
 'dataFormat': 'netCDF',
 'dataType': 'GRID',
 'documentation': {'Rights': ['This is a research project and may contain errors. Please contact the providers of this data to ensure accurate values before making any critical judgements.'],
  'xlink': [{'href': 'http://cordc.ucsd.edu/projects/mapping/',
    'title': 'HFRNet Documentation'},
   {'href': 'http://www.sccoos.org/meta/browse',
    'title': 'View FGDC metadata for HFRADAR site installations'}]},
 'inherited': True,
 'serviceName': 'allServices',
 'timeCoverage': [{}]}

### First we pull out this dataset and look at the access urls. Note there are many ways to access the data. Both site appear to be the same except SIO has WMS access

In [4]:
#NDBC
NDBC_ds = list(NDBC_HFR.datasets.values())[0]
NDBC_ds.access_urls

{'ISO': 'http://sdf.ndbc.noaa.gov/thredds/iso/hfradar_usegc_1km',
 'NCML': 'http://sdf.ndbc.noaa.gov/thredds/ncml/hfradar_usegc_1km',
 'NetcdfSubset': 'http://sdf.ndbc.noaa.gov/thredds/ncss/grid/hfradar_usegc_1km',
 'OPENDAP': 'http://sdf.ndbc.noaa.gov/thredds/dodsC/hfradar_usegc_1km',
 'UDDC': 'http://sdf.ndbc.noaa.gov/thredds/uddc/hfradar_usegc_1km',
 'WCS': 'http://sdf.ndbc.noaa.gov/thredds/wcs/hfradar_usegc_1km'}

In [5]:
#Scripps
SIO_ds = list(SIO_HFR.datasets.values())[0]
SIO_ds.access_urls

{'ISO': 'http://hfrnet.ucsd.edu/thredds/iso/HFRNet/USEGC/1km/hourly/RTV',
 'NCML': 'http://hfrnet.ucsd.edu/thredds/ncml/HFRNet/USEGC/1km/hourly/RTV',
 'NetcdfSubset': 'http://hfrnet.ucsd.edu/thredds/ncss/grid/HFRNet/USEGC/1km/hourly/RTV',
 'OPENDAP': 'http://hfrnet.ucsd.edu/thredds/dodsC/HFRNet/USEGC/1km/hourly/RTV',
 'UDDC': 'http://hfrnet.ucsd.edu/thredds/uddc/HFRNet/USEGC/1km/hourly/RTV',
 'WCS': 'http://hfrnet.ucsd.edu/thredds/wcs/HFRNet/USEGC/1km/hourly/RTV',
 'WMS': 'http://hfrnet.ucsd.edu/thredds/wms/HFRNet/USEGC/1km/hourly/RTV'}

In [7]:
!activate ioos

In [8]:
!conda info --envs


# conda environments:
#
I                        C:\Users\jennifer.bosch\AppData\Local\Continuum\Miniconda2\envs\I
IOOS                     C:\Users\jennifer.bosch\AppData\Local\Continuum\Miniconda2\envs\IOOS
bokeh_tutorial           C:\Users\jennifer.bosch\AppData\Local\Continuum\Miniconda2\envs\bokeh_tutorial
root                  *  C:\Users\jennifer.bosch\AppData\Local\Continuum\Miniconda2



### The `NetcdfSubset` entry is what we're after...we'll use this in our NCSS class. Let's import the NCSS class from Siphon and then pass in the NetcdfSubset access url. 

In [9]:
from siphon.ncss import NCSS
NDBC_ncss = NCSS(NDBC_ds.access_urls['NetcdfSubset'])
SIO_ncss = NCSS(SIO_ds.access_urls['NetcdfSubset'])


### Now we can start talking to the data to see what our spatial and time limits are as well as the variables available.  

### First we will check NDBC

In [16]:
NDBC_ncss.metadata.time_span

{'begin': '2016-07-19T14:00:00Z', 'end': '2016-07-24T19:00:00Z'}

In [17]:
NDBC_ncss.metadata.lat_lon_box

{'east': -57.1924, 'north': 46.4944, 'south': 21.7, 'west': -97.8838}

In [18]:
NDBC_ncss.variables

{'DOPx', 'DOPy', 'u', 'v'}

In [19]:
NDBC_ncss.metadata.variables

{'DOPx': {'attributes': {'_ChunkSize': [1, 896, 1366],
   'comment': "The longitudinal dilution of precision (DOPx) represents the\ncontribution of the radars' configuration geometry to\nuncertainty in the eastward velocity estimate (u). DOPx is a\ndirect multiplier of the standard error in obtaining the\nstandard deviation for the eastward velocity estimate from the\nleast squares best fit. DOPx and DOPy are commonly used to\nobtain the geometric dilution of precision\n(GDOP = sqrt(DOPx^2 + DOPy^2)), a useful metric for filtering\nerrant velocities due to poor geometry.",
   'long_name': 'longitudinal dilution of precision'},
  'desc': 'longitudinal dilution of precision',
  'shape': 'time lat lon',
  'type': 'float'},
 'DOPy': {'attributes': {'_ChunkSize': [1, 896, 1366],
   'comment': "The latitudinal dilution of precision (DOPy) represents the\ncontribution of the radars' configuration geometry to\nuncertainty in the northward velocity estimate (v). DOPy is a\ndirect multiplier of 

### Now lets check Scripps

In [20]:
SIO_ncss.metadata.time_span

{'begin': '2012-01-01T00:00:00Z', 'end': '2016-07-26T17:00:00Z'}

In [21]:
SIO_ncss.metadata.lat_lon_box

{'east': -57.1924, 'north': 46.4944, 'south': 21.7, 'west': -97.8838}

In [22]:
SIO_ncss.variables

{'DOPx', 'DOPy', 'u', 'v'}

In [23]:
SIO_ncss.metadata.variables

{'DOPx': {'attributes': {'_ChunkSize': [1, 896, 1366],
   'comment': "The longitudinal dilution of precision (DOPx) represents the\ncontribution of the radars' configuration geometry to\nuncertainty in the eastward velocity estimate (u). DOPx is a\ndirect multiplier of the standard error in obtaining the\nstandard deviation for the eastward velocity estimate from the\nleast squares best fit. DOPx and DOPy are commonly used to\nobtain the geometric dilution of precision\n(GDOP = sqrt(DOPx^2 + DOPy^2)), a useful metric for filtering\nerrant velocities due to poor geometry.",
   'coordinates': 'time_run time lat lon ',
   'long_name': 'longitudinal dilution of precision'},
  'desc': 'longitudinal dilution of precision',
  'shape': 'time lat lon',
  'type': 'float'},
 'DOPy': {'attributes': {'_ChunkSize': [1, 896, 1366],
   'comment': "The latitudinal dilution of precision (DOPy) represents the\ncontribution of the radars' configuration geometry to\nuncertainty in the northward velocity es

Both sites have the same variables and spatial extent but NDBC only has data from the last 6 days. SIO has data from January 1, 2012 'till today.

Now we can then use the `ncss` objects to create a new query object, which facilitates asking for data from the server.

In [24]:
NDBC_query = NDBC_ncss.query()
SIO_query = SIO_ncss.query()


We construct a query asking for data corresponding to a latitude and longitude box for the mid-Atlantic where 42 lat is the northern extent, 35 lat is the southern extent, -80  long is the western extent and -69 is the eastern extent. 

In [25]:
from datetime import datetime
NDBC_query.lonlat_box(north = 42,south=35,west=-80, east=-69).time(datetime(2015, 7, 17, 23))
NDBC_query.time_range(start=datetime(2016,7,17), end=datetime(2016,7,18))
#query.accept('netcdf')
NDBC_query.variables('u', 'v')

SIO_query.lonlat_box(north = 42,south=35,west=-80, east=-69).time(datetime(2015, 7, 17, 23))
SIO_query.time_range(start=datetime(2016,7,17), end=datetime(2016,7,18))
SIO_query.variables('u', 'v')


var=u&var=v&time_start=2016-07-17T00%3A00%3A00&time_end=2016-07-18T00%3A00%3A00&west=-80&east=-69&north=42&south=35

We now request data from the server using this query. The `NCSS` class handles parsing this NetCDF data (using the `netCDF4` module). If we print out the variable names, we see our requested variables, as well as a few others (more metadata information)

In [26]:
NDBCdata = NDBC_ncss.get_data(NDBC_query)
list(NDBCdata.variables.keys())

HTTPError: Error accessing http://sdf.ndbc.noaa.gov/thredds/ncss/grid/hfradar_usegc_1km?var=u&var=v&time_start=2016-07-17T00%3A00%3A00&time_end=2016-07-18T00%3A00%3A00&west=-80&east=-69&north=42&south=35: 400 NetCDF Subset Service exception handled : Requested time range does not intersect the Data Time Range = 2016-07-19T14:00:00Z to 2016-07-24T19:00:00Z

In [27]:
SIOdata = SIO_ncss.get_data(SIO_query)
list(SIOdata.variables.keys())


[u'u', u'time_run', u'time', u'lat', u'lon', u'v']

In [28]:
print(SIOdata.variables['time_run'])  #note NDBC doesn't have this variable


<type 'netCDF4._netCDF4.Variable'>
float64 time_run(time)
    long_name: run times for coordinate = time
    standard_name: forecast_reference_time
    units: hours since 2012-01-01T00:00:00Z
    missing_value: nan
    _CoordinateAxisType: RunTime
unlimited dimensions: 
current shape = (25,)
filling off

