# Using the NVCL data

We're going to query and visualise the data available to use in the NVCL repositories of the various state geological surveys. 

In [1]:
%pylab inline

import pysiss
from pysiss.webservices import nvcl
import folium
from utilities import website, embed_map

Populating the interactive namespace from numpy and matplotlib


If you don't know what endpoints are available, you can instantiate an NVCL endpoint registry, and see what's available:

In [2]:
registry = nvcl.NVCLEndpointRegistry()
# registry.keys()
registry['GSWA']

{'dataurl': 'http://geossdi.dmp.wa.gov.au/NVCLDataServices/',
 'downloadurl': 'http://geossdi.dmp.wa.gov.au/NVCLDownloadServices/',
 'wfsurl': 'http://geossdi.dmp.wa.gov.au/services/wfs'}

Often what we want to do is get all the data in the first instance, so we'll make importers for each default endpoint in the registry. We do this with the NVCLImporter class, which knows how to query an NVCL endpoint, parse the GeoSciML data that is returned back the server, and return an Borehole object to you to play with. 

In [3]:
endpoints = {}
for ept in registry.keys():
    endpoints[ept] = nvcl.NVCLImporter(ept)
    
# Example
gswa = endpoints['GSWA']
gswa

NVCLImporter(endpoint="GSWA")

So we can query the importer to find out what boreholes are available at this endpoint

In [4]:
for ident, url in gswa.get_borehole_idents_and_urls().items():
    print (ident, url)

ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

And we can get all the datasets as a Borehole instance

In [None]:
bh = gswa.get_borehole('PDP2C')

## Borehole data

Borehole data are represented using `pysiss.borehole.Borehole` classses. These are custom Python objects, since most GIS software doesn't provide an easy way of dealing with borehole data. Each Borehole has an origin_position, and a set of datasets keyed by depth down the borehole. These datasets can be either `pysiss.borehole.PointDatasets` (which are observations taken at individual depths down the borehole) or `pysiss.borehole.IntervalDatasets` (which are observations which are in some way averaged over some borehole interval). We don't provide a way of having a single feature defined for a borehole - you can represent these with Point- or IntervalDatasets with a single member.

In [None]:
bh.point_datasets

In [None]:
data = bh.point_datasets['PDP2C']
data.properties

In [None]:
data = data.to_dataframe()

In [None]:
data.head()

In [None]:
counts = data.groupby(['Grp1sTSAS']).agg(len).ix[:,0]
pie(asarray(counts), labels=counts.index);

In [None]:
bh.__dict__

Ok, so lets do something useful with the borehole data. We'll go and recreate a basic visualisation portal in a few lines of Python code. Here we're using Folium, a Python wrapper around leaflet.js maps, with tiles from MapBox.

In [None]:
# Loop through all boreholes in the NVCL, get metadata but not analytes
bh_idents = gswa.get_borehole_idents()
collection = [ept.get_borehole(ident, get_analytes=False, raise_error=False) 
              for ident in bh_idents
              for ept in endpoints.values()]

In [None]:
# Generate the map
borehole_map = folium.Map(location=[-31.952199935913086, 115.86139678955078],
                          zoom_start=14, 
                          tiles='Mapbox',
                          API_key='jessrobertson.i9c8em78')

# Make a marker on the map for each borehole in the collection
for bh in collection:
    if bh:
        borehole_map.simple_marker((bh.origin_position.latitude, 
                                    bh.origin_position.longitude))
    
# Show the map in the notebook
embed_map(borehole_map)