# Demonstrate some CSW query capabilities 

We will use the owslib library to construct queries and parse responses from CSW

Specify a CSW endpoint.  You can test if it's working with a getCapabilities request:  
```
<endpoint>?request=GetCapabilities&service=CSW
```
for example:
```
http://catalog.data.gov/csw-all?service=CSW&version=2.0.2&request=GetCapabilities
```

In [1]:
from owslib.csw import CatalogueServiceWeb

endpoints = [
    'http://catalog.data.gov/csw-all',  # Granule level production catalog.
    'http://geoport.whoi.edu/csw',
    'http://www.ngdc.noaa.gov/geoportal/csw',
    'https://data.ioos.us/csw',
    'https://dev-catalog.ioos.us/csw'
]

csw = CatalogueServiceWeb(endpoints[-1], timeout=60)
print(csw.version)

2.0.2


In [2]:
from owslib import fes

val = 'sea_water_salinity'

filter1 = fes.PropertyIsLike(
    propertyname='apiso:AnyText',
    literal=('*%s*' % val),
    escapeChar='\\',
    wildCard='*',
    singleChar='?'
)

csw.getrecords2(constraints=[filter1], maxrecords=100, esn='full')

print('Found {} records.\n'.format(len(csw.records.keys())))
for key, value in list(csw.records.items()):
    print('[{}]: {}'.format(value.title, key))

Found 10 records.

[UCSC294-20150430T2218]: deployments/mbari/UCSC294-20150430T2218/UCSC294-20150430T2218.nc3.nc
[ud_134-20150122T1955]: ud_134-20150122T1955
[None]: gov.noaa.nos.ioos:nanoos_regional_data_portal
[None]: gov.noaa.nos.ioos:caricoos_regional_data_portal
[Alaska Regional Data Portal For US Integrated Ocean Observing System]: gov.noaa.nos.ioos:aoos_regional_data_portal
[GLOS Data Portal]: gov.noaa.nos.ioos:glos_regional_data_portal
[NERACOOS Real-time Data Portal]: gov.noaa.nos.ioos:neracoos_regional_data_portal
[PacIOOS Voyager]: gov.noaa.nos.ioos:pacioos_regional_data_portal
[Central California Regional Data Portal For US Integrated Ocean Observing System]: gov.noaa.nos.ioos:cencoos_regional_data_portal
[SECOORA Data Portal]: gov.noaa.nos.ioos:secoora_regional_data_portal


Hmmm..... In the query above, we only get 10 records, even though we specified maxrecords=100.  

What's up with that?

Turns out the CSW service specified a MaxRecordDefault that cannot be exceeded.
For example, checking: https://dev-catalog.ioos.us/csw?request=GetCapabilities&service=CSW we find:
```
<ows:Constraint name="MaxRecordDefault">
    <ows:Value>10</ows:Value>
</ows:Constraint>
```
So we need to loop the getrecords request, incrementing the startposition:

In [3]:
from owslib.fes import SortBy, SortProperty

pagesize = 10
maxrecords = 50
sort_order = 'ASC'  # Should be 'ASC' or 'DESC' (ascending or descending).
sort_property = 'dc:title'  # A supported queryable of the CSW.

sortby = SortBy([SortProperty(sort_property, sort_order)])

In [4]:
startposition = 0

while True:
    print('getting records %d to %d' % (startposition, startposition+pagesize))
    csw.getrecords2(constraints=[filter1],
                    startposition=startposition,
                    maxrecords=pagesize,
                    sortby=sortby)
    for rec, item in csw.records.items():
        print(item.title)
    print()
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break

getting records 0 to 10
Alaska Regional Data Portal For US Integrated Ocean Observing System
AOOS/Models/High-resolution Ice/Ocean Modeling and Assimilation System (HIOMAS)
Arctic Seas Regional Climatology : sea_water_temperature January 0.25 degree
bass-20150706T151619Z
bass-20150706T151619Z
bass-20150827T1909
bass-20150827T1909
Bering Sea
blue-20150627T1242
blue-20150627T1242

getting records 10 to 20
blue-20150627T1242
blue-20160518T1525
blue-20160518T1525
blue-20160818T1448
CariCOOS Realtime Buoy Observations
CariCOOS Realtime Buoy Observations
CeNCOOS/Models/ROMS/California ROMS/California Coastal Regional Ocean Modeling System (ROMS) Forecast
CeNCOOS/Models/ROMS/California ROMS/California Coastal Regional Ocean Modeling System (ROMS) Nowcast
CeNCOOS/Models/ROMS/Monterey Bay ROMS (Oct 2010 to Jan 2013)/Monterey Bay (MB) Regional Ocean Modeling System (ROMS) Forecast
Central California Regional Data Portal For US Integrated Ocean Observing System

getting records 20 to 30
clark-201

Okay, now lets add another query filter and add it to the first one

In [5]:
val = 'ROMS'

filter2 = fes.PropertyIsLike(
    propertyname='apiso:AnyText',
    literal=('*%s*' % val),
    escapeChar='\\',
    wildCard='*',singleChar='?'
)

filter_list = [fes.And([filter1, filter2])] 

In [6]:
startposition = 0
maxrecords = 50

while True:
    print('getting records %d to %d' % (startposition, startposition+pagesize))
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition, maxrecords=pagesize, sortby=sortby)
    for rec,item in csw.records.items():
        print(item.title)
    print()
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break

getting records 0 to 10
CeNCOOS/Models/ROMS/California ROMS/California Coastal Regional Ocean Modeling System (ROMS) Forecast
CeNCOOS/Models/ROMS/California ROMS/California Coastal Regional Ocean Modeling System (ROMS) Nowcast
CeNCOOS/Models/ROMS/Monterey Bay ROMS (Oct 2010 to Jan 2013)/Monterey Bay (MB) Regional Ocean Modeling System (ROMS) Forecast
Regional Ocean Modeling System (ROMS): CNMI
Regional Ocean Modeling System (ROMS): CNMI: Data Assimilating
Regional Ocean Modeling System (ROMS): Guam
Regional Ocean Modeling System (ROMS): Main Hawaiian Islands
Regional Ocean Modeling System (ROMS): Main Hawaiian Islands: Data Assimilating
Regional Ocean Modeling System (ROMS): Oahu
Regional Ocean Modeling System (ROMS): Oahu: Data Assimilating

getting records 10 to 20
Regional Ocean Modeling System (ROMS): Oahu: Data Assimilating
Regional Ocean Modeling System (ROMS): Oahu South Shore
Regional Ocean Modeling System (ROMS): Oregon Coast
Regional Ocean Modeling System (ROMS): Samoa
Region

In [7]:
import random

choice = random.choice(list(csw.records.keys()))

print(csw.records[choice].title)

csw.records[choice].references

Regional Ocean Modeling System (ROMS): Oregon Coast


[{'scheme': 'WWW:LINK',
  'url': 'http://ona.coas.oregonstate.edu:8080/thredds/dodsC/NANOOS/OCOS.html'},
 {'scheme': 'WWW:LINK',
  'url': 'http://www.ncdc.noaa.gov/oa/wct/wct-jnlp-beta.php?singlefile=http://ona.coas.oregonstate.edu:8080/thredds/dodsC/NANOOS/OCOS'},
 {'scheme': 'OPeNDAP:OPeNDAP',
  'url': 'http://ona.coas.oregonstate.edu:8080/thredds/dodsC/NANOOS/OCOS'},
 {'scheme': 'file',
  'url': 'http://ona.coas.oregonstate.edu:8080/thredds/fileServer/NANOOS/OCOS'}]

Lets see what the full XML record looks like

In [8]:
import xml.dom.minidom

xml = xml.dom.minidom.parseString(csw.records[choice].xml)
print(xml.toprettyxml())

<?xml version="1.0" ?>
<csw:SummaryRecord xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:dif="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:fgdc="http://www.opengis.net/cat/csw/csdgm" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gml="http://www.opengis.net/gml" xmlns:inspire_common="http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:inspire_ds="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0" xmlns:ogc="http://www.opengis.net/ogc" xmlns:os="http://a9.com/-/spec/opensearch/1.1/" xmlns:ows="http://www.opengis.net/ows" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope" xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:xlink="h

Yuk!  That's why we use OWSlib!  :-)

Now add contraint to return only records that have either the OPeNDAP or SOS service.  

Let's first see what services are advertised:

In [9]:
try:
    csw.get_operation_by_name('GetDomain')
    csw.getdomain('apiso:ServiceType', 'property')
    print(csw.results['values'])
except:
    print('GetDomain not supported')

[None, 'ERDDAP OPeNDAP', 'ERDDAP tabledap,OPeNDAP,ERDDAP Subset', 'OPeNDAP:OPeNDAP', 'OPeNDAP:OPeNDAP,OGC:SOS', 'OPeNDAP:OPeNDAP,OGC:WMS,file', 'Open Geospatial Consortium Web Coverage Service (WCS),Open Geospatial Consortium Web Map Service (WMS),Open Geospatial Consortium Web Map Service - Cached (WMS-C)', 'Open Geospatial Consortium Web Feature Service (WFS),Open Geospatial Consortium Web Map Service (WMS)', 'Open Geospatial Consortium Web Feature Service (WFS),Open Geospatial Consortium Web Map Service (WMS),Open Geospatial Consortium Web Map Service - Cached (WMS-C)', 'Open Geospatial Consortium Web Map Service (WMS)', 'THREDDS OPeNDAP', 'THREDDS OPeNDAP,Open Geospatial Consortium Sensor Observation Service (SOS)', 'THREDDS OPeNDAP,Open Geospatial Consortium Sensor Observation Service (SOS),THREDDS HTTP Service', 'THREDDS OPeNDAP,Open Geospatial Consortium Web Coverage Service (WCS),Open Geospatial Consortium Sensor Observation Service (SOS),THREDDS NetCDF Subset Service', 'THREDD

In [10]:
services = ['OPeNDAP', 'SOS'] 

service_filt = fes.Or(
    [fes.PropertyIsLike(
            propertyname='apiso:ServiceType',
            literal=('*%s*' % val),
            escapeChar='\\',
            wildCard='*',
            singleChar='?'
        ) for val in services])
    
filter_list = [fes.And([filter1, filter2, service_filt])]

In [11]:
startposition = 0

while True:
    print('getting records %d to %d' % (startposition, startposition+pagesize))
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition,
                    maxrecords=pagesize,
                    sortby=sortby)
    for rec, item in csw.records.items():
        print(item.title)
    print()
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break  

getting records 0 to 10
CeNCOOS/Models/ROMS/California ROMS/California Coastal Regional Ocean Modeling System (ROMS) Forecast
CeNCOOS/Models/ROMS/California ROMS/California Coastal Regional Ocean Modeling System (ROMS) Nowcast
CeNCOOS/Models/ROMS/Monterey Bay ROMS (Oct 2010 to Jan 2013)/Monterey Bay (MB) Regional Ocean Modeling System (ROMS) Forecast
Regional Ocean Modeling System (ROMS): CNMI
Regional Ocean Modeling System (ROMS): CNMI: Data Assimilating
Regional Ocean Modeling System (ROMS): Guam
Regional Ocean Modeling System (ROMS): Main Hawaiian Islands
Regional Ocean Modeling System (ROMS): Main Hawaiian Islands: Data Assimilating
Regional Ocean Modeling System (ROMS): Oahu
Regional Ocean Modeling System (ROMS): Oahu: Data Assimilating

getting records 10 to 20
Regional Ocean Modeling System (ROMS): Oahu: Data Assimilating
Regional Ocean Modeling System (ROMS): Oahu South Shore
Regional Ocean Modeling System (ROMS): Oregon Coast
Regional Ocean Modeling System (ROMS): Samoa
Region

Let's try adding a search for a non-existant service, which should result in no records back:

In [12]:
val = 'not_a_real_service'

filter3 = fes.PropertyIsLike(
    propertyname='apiso:ServiceType',
    literal=('*%s*' % val),
    escapeChar='\\',
    wildCard='*',
    singleChar='?'
)

filter_list = [fes.And([filter1, filter2, filter3])]

csw.getrecords2(constraints=filter_list,maxrecords=100,esn='full')

print('Found {} records.\n'.format(len(csw.records.keys())))
for key, value in list(csw.records.items()):
    print('[{}]: {}'.format(value.title, key))

Found 0 records.



Good!

Now add bounding box constraint. To specify lon,lat order for bbox (which we want to do so that we can use the same bbox with either geoportal server or pycsw requests), we need to request the bounding box specifying the CRS84 coordinate reference system.   The CRS84 option is available in `pycsw 1.1.10`+. The ability to specify the `crs` in the bounding box request is available in `owslib 0.8.12`+.  For more info on the bounding box problem and how it was solved, see this [pycsw issue](https://github.com/geopython/pycsw/issues/287), this [geoportal server issue](https://github.com/Esri/geoportal-server/issues/124), and this [owslib issue](https://github.com/geopython/OWSLib/issues/201)

In [13]:
# [lon_min, lat_min, lon_max, lat_max]
bbox = [-158.4, 21.24, -157.5, 21.77]
bbox_filter = fes.BBox(bbox,crs='urn:ogc:def:crs:OGC:1.3:CRS84')

filter_list = [fes.And([filter1, filter2, service_filt, bbox_filter])]

startposition = 0

while True:
    print('getting records %d to %d' % (startposition, startposition+pagesize))
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition, maxrecords=pagesize, sortby=sortby)
    for rec, item in csw.records.items():
        print(item.title)
    print()
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break  

getting records 0 to 10
Regional Ocean Modeling System (ROMS): Main Hawaiian Islands
Regional Ocean Modeling System (ROMS): Main Hawaiian Islands: Data Assimilating
Regional Ocean Modeling System (ROMS): Oahu
Regional Ocean Modeling System (ROMS): Oahu: Data Assimilating
Regional Ocean Modeling System (ROMS): Oahu South Shore
Regional Ocean Modeling System (ROMS): Waikiki



Now add time contraints.  Here we first define a function that will return records if any data in the records overlaps the specified time period

In [14]:
def date_range(start, stop, constraint='overlaps'):
    """
    Take start and stop datetime objects and return a `fes.PropertyIs<>` filter.
    
    """
    start = start.strftime('%Y-%m-%d %H:%M')
    stop = stop.strftime('%Y-%m-%d %H:%M')

    if constraint == 'overlaps':
        begin = fes.PropertyIsLessThanOrEqualTo(
            propertyname='apiso:TempExtent_begin', literal=stop
        )
        end = fes.PropertyIsGreaterThanOrEqualTo(
            propertyname='apiso:TempExtent_end', literal=start
        )
    elif constraint == 'within':
        begin = fes.PropertyIsGreaterThanOrEqualTo(
            propertyname='apiso:TempExtent_begin', literal=start
        )
        end = fes.PropertyIsLessThanOrEqualTo(
            propertyname='apiso:TempExtent_end', literal=stop
        )
    return begin, end

In [15]:
from datetime import datetime, timedelta

now = datetime.utcnow()
start = now - timedelta(days=3)
stop = now + timedelta(days=3)

print('{} to {}'.format(start, stop))

start, stop = date_range(start, stop)

2016-09-02 14:23:20.517721 to 2016-09-08 14:23:20.517721


In [16]:
filter_list = [fes.And([filter1, filter2, service_filt, bbox_filter, start, stop])]

startposition = 0
while True:
    print('getting records %d to %d' % (startposition, startposition+pagesize))
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition,
                    maxrecords=pagesize,
                    sortby=sortby)
    for rec, item in csw.records.items():
        print(item.title)
    print()
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break  

getting records 0 to 10
Regional Ocean Modeling System (ROMS): Main Hawaiian Islands
Regional Ocean Modeling System (ROMS): Oahu
Regional Ocean Modeling System (ROMS): Oahu South Shore
Regional Ocean Modeling System (ROMS): Waikiki



Now add a NOT filter to eliminate some entries

In [17]:
kw = dict(
    wildCard='*',
    escapeChar='\\',
    singleChar='?',
    propertyname='apiso:AnyText')

not_filt = fes.Not([fes.PropertyIsLike(literal='*Waikiki*', **kw)])

In [18]:
filter_list = [fes.And([filter1, filter2, service_filt, bbox_filter, start, stop, not_filt])]

startposition = 0
while True:
    print('getting records %d to %d' % (startposition, startposition+pagesize))
    csw.getrecords2(constraints=filter_list,
                    startposition=startposition, maxrecords=pagesize, sortby=sortby)
    for rec,item in csw.records.items():
        print(item.title)
    print()
    if csw.results['nextrecord'] == 0:
        break
    startposition += pagesize
    if startposition >= maxrecords:
        break  

getting records 0 to 10
Regional Ocean Modeling System (ROMS): Main Hawaiian Islands
Regional Ocean Modeling System (ROMS): Oahu



Hopefully this notebook demonstrated some of the power (and complexity) of CSW!  ;-)