# Australian Geoscience Datacube API
This notebook describes connecting to the datacube and doing a basic query

## Setup enviroment (optional, have your own hacks)

In order to make the code work, you have to setup necessary environment first. 

- There are different ways (such as module load) to do this. 
- Below is a more explicit way of checking and setting environment parameters.


In [None]:
# check which python interpreter your ipython notebook is using. pythonpath and gdal_data
!echo "PYTHONPATH: $PYTHONPATH || GDAL:  $GDAL_DATA"

!which python

In [None]:
import sys

def setup_env(agdc2dev):

    paths=sys.path
    #paths.append(agdc2dev)
    paths.insert(0, agdc2dev)  #prepend the agdc-v2 path for import
    
    print "Now, your import search paths: "
    for p in  sys.path:
        print p

In [None]:
# call the set_env() function if you want to use an updated agdc-v2 code, instead the canned module.
# please chnange the path below

# setup_env("/path2developer_branch/agdc-v2")

#After this you should be able to import necessary path modules

In [None]:
import datacube.api
from pprint import pprint

from IPython.display import display
from collections import defaultdict

import xarray as xr
import xarray.ufuncs

from datacube.api import API
from datacube.index import index_connect
from datacube.config import LocalConfig
from datacube.api._conversion import to_datetime
from datacube.api import make_mask, describe_flags

%matplotlib inline

In [None]:
# enable the configuration to use the right database and datastore dir
# By default, the API will use the configured database connection found in the config file.

# Details on setting up the config file and database and be found here: 
# http://agdc-v2.readthedocs.org/en/develop/db_setup.html

force_prod = True
if force_prod:
    prod_config = LocalConfig.find(['/g/data/v10/public/modules/agdc-py2-prod/1.0.2/datacube.conf'])
    prod_index = index_connect(prod_config, application_name='api-WOfS-dev')
    dc = API(prod_index)
else:
    dc = API(application_name='api-WOfS-dev')

## Summary functions
* __`list_fields()`__ - lists all fields that can be used for searching
* __`list_field_values(field)`__ - lists all the values of the field found in the database

Find out what fields we can search:

In [None]:
dc.list_fields()

The `product` and `platform` fields looks interesting. Find out more about them:

In [None]:
dc.list_field_values('product')

In [None]:
dc.list_field_values('platform')

## Query and Access functions
There are several API calls the describe and provide data in different ways:

* __`get_descriptor()`__ - provides a descripton of the data for a given query
* __`get_data()`__ - provides the data as `xarray.DataArray`s for each variable.  This is usually called based on information returned by the `get_descriptor` call.
* __`get_data_array()`__ - returns an `xarray.DataArray` n-dimensional object, with the variables stack along the dimension labelled `variables`.
* __`get_dataset()`__ - return an `xarray.Dataset` object, containing an `xarray.DataArray` for each variable.

###  get_descriptor
We can make a query and find out about the data:

The query is a nested dict of variables of terms.

In [None]:
query = {
    'product': 'nbar',
    'platform': 'LANDSAT_5',
}
descriptor = dc.get_descriptor(query, include_storage_units=False)
pprint(descriptor)

The query can be restricted to provide information on particular range along a dimension.

For spatial queries, the dimension names should be used.  The default projection for the range query values is in WGS84, although

In [None]:
query = {
    'product': 'nbar',
    'platform': 'LANDSAT_5',
    'dimensions': {
        'x' : {
            'range': (148.5, 149.5),
        },
        'y' : {
            'range': (-34.8, -35.8),
        },
        'time': {
            'range': ((1990, 6, 1), (2020, 7 ,1)),
        }
    }
}
pprint(dc.get_descriptor(query, include_storage_units=False))

A coordinate reference sytsem can be provided for the spatial dimensions, either as a EPSG code or a WKT description:

In [None]:
query = {
    'product': 'nbar',
    'platform': 'LANDSAT_5',
    'dimensions': {
        'x' : {
            'range': (1542112, 1563962),
            'crs': 'EPSG:3577',
        },
        'y' : {
            'range': (-3920000.5,-3926000.5),
            'crs': 'EPSG:3577',
        },
        'time': {
            'range': ((1990, 6, 1), (1990, 7 ,1)),
        }
    }
}

### get_data
This retrieves the data, usually as a subset, based on the information provided by the `get_descriptor` call.

The query is in a similar form to the `get_descriptor` call, with the addition of a `variables` parameter.  If not specified, all variables are returned.
The query also accepts an `array_range` parameter on a dimension that provides a subset based on array indicies, rather than labelled coordinates.

In [None]:
query = {
    'product': 'nbar',
    'platform': 'LANDSAT_5',
    'variables': ['band_3', 'band_4'],
    'dimensions': {
        'x' : {
            'range': (148.5, 149.5),
            'array_range': (0, 1),
        },
        'y' : {
            'range': (-34.8, -35.8),
            'array_range': (0, 1),
        },
        'time': {
            'range': ((1990, 4, 1), (2020, 5, 1))
        }
    }
}
data = dc.get_data(query)
data.keys()

### get_data_array
This is a convinence function that wraps the `get_data` function, returning only the data, stacked in a single `xarray.DataArray`.

The variables are stacked along the `variable` dimension.

In [None]:
nbar = dc.get_data_array(product='nbar', platform='LANDSAT_5', y=(-34.95,-35.05), x=(148.95,149.05))
nbar

### get_dataset
This is a convinience fuction similar to `get_data_array`, returning the data of the query as a `xarray.Dataset` object (similar to netcdf structure)

In [None]:
dc.get_dataset(product='nbar', platform='LANDSAT_5', y=(-34.95,-35.05), x=(148.95,149.05))