# Australian Geoscience Datacube API
This notebook describes connecting to the datacube and doing a basic query

In [2]:
import datacube.api
from pprint import pprint

By default, the API will use the configured database connection found in the config file.

Details on setting up the config file and database and be found here:
http://agdc-v2.readthedocs.org/en/develop/db_setup.html

In [3]:
dc = datacube.api.API()

## Summary functions
* __`list_fields()`__ - lists all fields that can be used for searching
* __`list_field_values(field)`__ - lists all the values of the field found in the database

Find out what fields we can search:

In [4]:
dc.list_fields()

[u'product',
 u'lat',
 u'sat_path',
 u'platform',
 u'lon',
 u'orbit',
 'collection',
 u'instrument',
 u'sat_row',
 u'time',
 u'gsi',
 'id']

The `product` and `platform` fields looks interesting. Find out more about them:

In [5]:
dc.list_field_values('product')

[u'NBAR']

In [6]:
dc.list_field_values('platform')

[u'LANDSAT_5']

## Access functions
There are several API calls the describe and provide data in different ways:

* __`get_descriptor()`__ - provides a descripton of the data for a given query
* __`get_data()`__ - provides the data as `xarray.DataArray`s for each variable.  This is usually called based on information returned by the `get_descriptor` call.
* __`get_data_array()`__ - returns an `xarray.DataArray` n-dimensional object, with the variables stack along the dimension labelled `variables`.
* __`get_dataset()`__ - return an `xarray.Dataset` object, containing an `xarray.DataArray` for each variable.

###  get_descriptor
We can make a query and find out about the data:

The query is a nested dict of variables of terms.

In [7]:
query = {
    'product': 'NBAR',
    'platform': 'LANDSAT_5',
}
pprint(dc.get_descriptor(query, include_storage_units=False))

{u'ls5_nbar': {'dimensions': [u'time', u'latitude', u'longitude'],
               'irregular_indices': {u'time': array(['1990-03-03T10:11:16.000000000+1100',
       '1990-05-06T09:10:28.000000000+1000',
       '1990-06-07T09:10:29.000000000+1000',
       '1990-07-25T09:10:22.000000000+1000'], dtype='datetime64[ns]')},
               'result_max': (numpy.datetime64('1990-07-25T09:10:22.000000000+1000'),
                              -33.000125,
                              151.999875),
               'result_min': (numpy.datetime64('1990-03-03T10:11:16.000000000+1100'),
                              -35.999874999999996,
                              148.000125),
               'result_shape': (4, 12000, 16000),
               'variables': {u'band_10': {'datatype_name': dtype('int16'),
                                          'nodata_value': -999},
                             u'band_20': {'datatype_name': dtype('int16'),
                                          'nodata_value': -999},

The query can be restricted to provide information on particular range along a dimension.

For spatial queries, the dimension names should be used

### get_data
This retrives the data, usually as a subset, based on the information provided by the `get_descriptor` call.

The query is in a similar form to the `get_descriptor` call, with the addition of a `variables` parameter.  If not specified, all variables are returned.
The query also accepts an `array_range` parameter on a dimension that provides a subset based on array indicies, rather than labelled coordinates.

In [9]:
query = {
    'product': 'NBAR',
    'platform': 'LANDSAT_5',
    'variables': ['band_30', 'band_40'],
    'dimensions': {
        'longitude' : {
            'range': (148.5, 149.5),
        },
        'latitude' : {
            'range': (-34.8, -35.8),
        },
    }
}
data = dc.get_data(query)
pprint(data)
data['arrays']['band_30']

KeyError: 'coordinate_reference_system'

### get_data_array
This is a convinence function that wraps the `get_data` function, returning only the data, stacked in a single `xarray.DataArray`.

The variables are stacked along the `variable` dimension.

In [10]:
nbar = dc.get_data_array(product='NBAR', platform='LANDSAT_5', latitude=(-34.95,-35.05), longitude=(148.95,149.05))
nbar

<xarray.DataArray u'ls5_nbar' (variable: 6, time: 4, latitude: 400, longitude: 400)>
dask.array<concate..., shape=(6, 4, 400, 400), dtype=float64, chunksize=(1, 1, 200, 200)>
Coordinates:
  * time       (time) datetime64[ns] 1990-03-02T23:11:16 1990-05-05T23:10:28 ...
  * latitude   (latitude) float64 -34.95 -34.95 -34.95 -34.95 -34.95 -34.95 ...
  * longitude  (longitude) float64 149.0 149.0 149.0 149.0 149.0 149.0 149.0 ...
  * variable   (variable) <U7 u'band_10' u'band_20' u'band_30' u'band_40' ...

In [11]:
nbar['variable'].values

array([u'band_10', u'band_20', u'band_30', u'band_40', u'band_50',
       u'band_70'], 
      dtype='<U7')

We can 

In [14]:
red = nbar.sel(variable='band_30')
nir = nbar.sel(variable='band_40')

In [15]:
ndvi = (nir - red) / (nir + red)

In [16]:
nbar.merge(ndvi)

AttributeError: 'DataArray' object has no attribute 'merge'

In [None]:
import xarray

In [None]:
xarray.concat([ndvi], dim='variable')

In [None]:
ndvi['variable'][0] = 'ndvi'