# Find_Ls7_Pq_Albers_Metadata

Authors: Claire Krause, Sivaprasad Arapaut. 
Last edited: April 28, 2018

This notebook demonstrates how to find the definition, metadata and measurements about the datasets held within the AGDC. This code is applicable to any product in the datacube, by changing just one line as given below.

>dataset = dc.index.products.get_by_name('**ls7_pq_albers**')

## Step-by-step instructions
Let us go through the process step by step.

### 1. Import modules and libraries
The main module we use here is the **datacube**. The **pandas** is a data re-formatting module for convenience.

The **function print_it()**, is to print out the data in an even more human readable format than **pandas** does, or when the latter is unable to re-format the data.

In [1]:
import datacube
import pandas
def print_it(key,item):
    tt = str(type(item))
    if ('dict' in tt):
        print("{}:".format(key.title()))
        for key in item:
            print_it(key.title(),item[key])
    else:
        print("    {}: {}\n".format(key.title(),item))

***Modules and Classes explained***

**datacube:** This is the main module, the only one we use in this demo, and contains several classes that are used in extracting the data. We use only one class, **Datacube**, from this module. To see the full list of classes in this module, use the `print(dir(datacube))` statement. 

How every class in Datacube works is beyond the scope of this document. See [Datacube Class](https://datacube-core.readthedocs.io/en/stable/dev/api/api.html#datacube-class) for details. You might also want to look at [this page](https://softwareengineering.stackexchange.com/questions/329348/classes-vs-modules-in-python) to learn the difference between a module and class.

> Class: **Datacube**

> Methods: **list_products**, **list_measurements**

> Usage: dc = datacube.Datacube(app='Some descriptive name')

In the above call, the class 'Datacube' is called with one param, **app**, which is a user-defined alphanumeric name to identify this application. The application name is used to track down problems with database queries, so it is strongly advised to use it. It is mandatory if an index is not supplied, as in the above call, but if an index is given the 'app='is ignored.

We use a number of classes and methods from the datacube module as shown below.

- Module **datacube** -> class '**Datacube**' -> (class '**index**' -> class '**products**' -> method '**get_by_name**', class '**list_products**' and class '**list_measurements**')

The above will list the different products and their data as shown in the code blocks given below. They let one find out the available list of products and measurements and, then, find their corresponding data.

**pandas** is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. It is used here to re-format the data into more human readable format.

In [2]:
dc = datacube.Datacube(app='dc-metadata')

The above statement creates a **datacube** object named **dc** which is the class **Datacube**. We will call other classes and methods from it later. The 'app=dc-metadata' is probably not fully correct, as we are going to look at more than metadata. Still, it is a name to identify this application. Errors in execution, if any, will be logged under that name. As explained above, it is optional.

The list of available methods in **dc** are given below, and the ones we use in this doc are marked in bold.

> ['**__init__**', 'close', 'create_storage', 'find_datasets', 'find_datasets_lazy', 'group_datasets', '**index**', '**list_measurements**', '**list_products**', 'load', 'load_data', 'measurement_data', 'product_data', 'product_observations', 'product_sources']

- **__init__**([index, config, app, env, validate_connection]):   *Create the interface for the query and storage.* We have already called this method when creating the object. There will be no error even if do not provide any of the params.

- **index**: This is the class that returns the dataset for your specified product.

- **list_products**(show_archived=False, with_pandas=True): *List products in the datacube.* We will call this in the next block of code to get a list of all products. The 'with_pandas=True' means the data will appear in a nice table format. If you want it in raw format, then pass the param as e.g. '*dc.list_products(with_pandas=False)*'. Then you can write your own functions to format it in any way you like, get subsets of data, etc.

- **list_measurements**(show_archived=False, with_pandas=True): *List measurements for each product.* We will call this method in a code block below to get the measurements for a defined product.

Parameters for the above two methods:	

   - show_archived - – If set as True, include products that have been archived.
   - with_pandas – If True (default), returns the list as a Pandas DataFrame, otherwise as a dictionary.

Return type:	`pandas.DataFrame or list(dict)`

**pandas.DataFrame**: [Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html) is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. 'DataFrame' is a [pandas class object](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame) that converts a dictionary into human readable, table format with row and column headings. By default, the two methods we use return the data in this format.

In [3]:
def custom_list(items):
    n = 0
    print('Name | Description | Product_type | Platform | CRS | Resolution | Spatial_dimensions\n\
------------------------------------------------------------------------------------')
    for i in items:
        n += 1
        try:
            print ("{}. {}  |  {}  |  {}  |  {}  |  {}  |  {} | {} | {}\n".format(n,i['id'],i['name'],i['description'],i['product_type'],i['platform'],i['crs'],i['resolution'],i['spatial_dimensions']))
        except:
            print ("{}. ERROR: id {} has one or more fields with a NaN\n".format(n,i['id']))
            pass
dc.list_products()  # List the products in a table format
#custom_list(dc.list_products(with_pandas=False)) # List selected columns alone from the table.


Unnamed: 0_level_0,name,description,sat_path,instrument,gqa_cep90,gqa_mean_xy,gqa_ref_source,platform,time,gqa_iterative_mean_xy,...,lat,format,gqa_error_message,gqa_abs_xy,orbit,lon,crs,resolution,tile_size,spatial_dimensions
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
36,bom_rainfall_grids,Interpolated Rain Gauge Precipitation 1-Day Au...,,rain gauge,,,,BoM,,,...,,NETCDF,,,,,EPSG:4326,"[-0.05, 0.05]",,"(latitude, longitude)"
32,dsm1sv10,DSM 1sec Version 1.0,,SIR,,,,SRTM,,,...,,ENVI,,,,,EPSG:4326,"[-0.00027777777778, 0.00027777777778]",,"(latitude, longitude)"
53,gamma_ray,The 2015 radiometric or gamma-ray grid of Aust...,,gamma_ray spectrometer,,,,aircraft,,,...,,NETCDF,,,,,"GEOGCS[""GEOCENTRIC DATUM of AUSTRALIA"",DATUM[""...","[-0.001, 0.001]",,"(latitude, longitude)"
63,high_tide_comp_20p,High Tide 20 percentage composites for entire ...,,,,,,,,,...,,NetCDF,,,,,EPSG:3577,"[-25, 25]",,"(y, x)"
64,high_tide_comp_count,High Tide 20 percentage pixel count,,,,,,,,,...,,NetCDF,,,,,EPSG:3577,"[-25, 25]",,"(y, x)"
67,item_v2,Intertidal Extents Model,,,,,,,,,...,,NetCDF,,,,,EPSG:3577,"[-25, 25]",,"(y, x)"
68,item_v2_conf,Average ndwi Standard Deviation,,,,,,,,,...,,NetCDF,,,,,EPSG:3577,"[-25, 25]",,"(y, x)"
65,low_tide_comp_20p,Low Tide 20 percentage composites for entire c...,,,,,,,,,...,,NetCDF,,,,,EPSG:3577,"[-25, 25]",,"(y, x)"
66,low_tide_comp_count,Low Tide 20 percentage pixel count,,,,,,,,,...,,NetCDF,,,,,EPSG:3577,"[-25, 25]",,"(y, x)"
69,ls5_fc_albers,"Landsat 5 Fractional Cover 25 metre, 100km til...",,TM,,,,LANDSAT_5,,,...,,NetCDF,,,,,EPSG:3577,"[-25, 25]","[100000.0, 100000.0]","(y, x)"


In the list above, we are currently interested in only the column 2. These are the names of products in the datacube. In the code sections below we will examine these products.

You may want to look at the other coumn headings too that give more information about where the data comes from, how it is collected, etc. In particular, the value under the column, **crs** (Coordinate Reference System, also known as Saptial Reference System), tells which geographical area the data applies to. For example, CRS = **EPSG:3527** refers to Australia (all states) alone, whereas **EPSG:4326** refers to the whole world and **EPSG:3857** is the world excluding the polar regions.

If you wish to display specified columns from the returned data, then activate the statement using the function, 'custom_list', in the above code block and disable the other.

**Tip:** The list is long and wide to be displayed on one page. To limit the displayed list to a few lines, click anywhere in the first column (i.e. anywhere below the text saying '**In[nnn]** or **Out[nnn]**')

In [4]:
dc.list_measurements()

Unnamed: 0_level_0,Unnamed: 1_level_0,aliases,dtype,flags_definition,name,nodata,spectral_definition,units
product,measurement,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
bom_rainfall_grids,rainfall,,float32,,rainfall,-999,,mm
dsm1sv10,elevation,,float32,,elevation,,,metre
gamma_ray,rad_air_dose_rate_unfiltered,,float32,,rad_air_dose_rate_unfiltered,-99999,,nG/h
gamma_ray,rad_k_equiv_conc_unfiltered,,float32,,rad_k_equiv_conc_unfiltered,-99999,,%K
gamma_ray,rad_u_equiv_conc_unfiltered,,float32,,rad_u_equiv_conc_unfiltered,-99999,,ppm
gamma_ray,rad_th_equiv_conc_unfiltered,,float32,,rad_th_equiv_conc_unfiltered,-99999,,ppm
gamma_ray,rad_air_dose_rate_filtered,,float32,,rad_air_dose_rate_filtered,-99999,,nG/h
gamma_ray,rad_k_equiv_conc_filtered,,float32,,rad_k_equiv_conc_filtered,-99999,,%K
gamma_ray,rad_u_equiv_conc_filtered,,float32,,rad_u_equiv_conc_filtered,-99999,,ppm
gamma_ray,rad_th_equiv_conc_filtered,,float32,,rad_th_equiv_conc_filtered,-99999,,ppm


As in the case of **list_products**, this returns the details corresponding to each product. The products are listed in column 1 and the different measurements in column 2. There are more columns too.

## Let us investigate a specified product's dataset in more detail

In the code block below we create an object, **dataset**, for a desired product from the above list. **Tip:** You can replace the product name in brackets with any other to examine its data. 

In [5]:
# Choose a dataset to investigate
dataset = dc.index.products.get_by_name('ls7_pq_albers')

In the code above, we are creating the **dataset** object by calling three classes and one method hierarchically.

> Class object **dc** -> class **index** -> class **products** -> method **get_by_name**

The returned class object, **dataset**, can be used to display details pertaining to the product.

In [6]:
# Get some information about the data itself
raw = dataset.measurements
formatted_data = pandas.DataFrame.from_dict(raw)

# Uncomment one of the two statements below to display the data as raw (dict) or formatted
#print(raw)   # Display the raw data in dictionary format
#print(formatted_data)  # Print it as formatted

# Function to display it in more human readable format
for key in raw:
    print_it(key,raw[key])
    pass

# Or, see it displayed in the default format
#dataset.measurements


Pixelquality:
    Name: pixelquality

    Dtype: int16

    Units: 1

    Nodata: 0

    Aliases: ['qa_flags', 'quality']

Flags_Definition:
Land_Sea:
    Bits: 9

Values:
    0: sea

    1: land

    Description: Land or Sea

Cloud_Acca:
    Bits: 10

Values:
    0: cloud

    1: no_cloud

    Description: Cloud (ACCA)

Contiguous:
    Bits: 8

Values:
    0: False

    1: True

    Description: All bands for this pixel contain non-null values

Cloud_Fmask:
    Bits: 11

Values:
    0: cloud

    1: no_cloud

    Description: Cloud (Fmask)

Ga_Good_Pixel:
    Bits: [13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Values:
    16383: True

    Description: Best Quality Pixel

Nir_Saturated:
    Bits: 3

Values:
    0: True

    1: False

    Description: NIR band is saturated

Red_Saturated:
    Bits: 2

Values:
    0: True

    1: False

    Description: Red band is saturated

Blue_Saturated:
    Bits: 0

Values:
    0: True

    1: False

    Description: Blue band is saturated

Green_

Here, we are getting the measurements for the product specified in the code above. The data is a dictionary list which can be printed as such, in which case it will be meaningful to an AI robot, or formatted for humans. The **pandas DataFrame**, described earlier, re-formats the data into rows and columns. You can try both by commenting/uncommenting the two 'print' statements in the code above.

Alternatively, using a small Python function, 'print_it', specifically written for this application, it can be formatted in an even more meaningful way. You can change this function to suit your taste, but the pandas format cannot be changed.

In [7]:
# Get some basic metadata
raw = dataset.metadata_doc
formatted_data = pandas.DataFrame.from_dict(raw)
#print(raw)   # Display the raw data in dictionary format
print(formatted_data)  # Print it as formatted


      format instrument   platform product_type
code     NaN        NaN  LANDSAT_7          pqa
name  NetCDF        ETM        NaN          pqa


Here, we are getting the metadata for the product specified in the code above. A method from **pandas DataFrame** re-formats the data into rows and columns. You can try both by commenting/uncommenting the two 'print' statements in teh code above.

In [8]:
raw = dataset.definition
#formatted_data = pandas.DataFrame.from_dict(dataset.definition)
#print(formatted_data)  # Print it as formatted

for key in raw:
    print_it(key,raw[key])
    pass

# Or, see it displayed in the default format
#dataset.definition

    Name: ls7_pq_albers

    Managed: True

Storage:
    Crs: EPSG:3577

Tile_Size:
    X: 100000.0

    Y: 100000.0

Resolution:
    X: 25

    Y: -25

Metadata:
Format:
    Name: NetCDF

Platform:
    Code: LANDSAT_7

Instrument:
    Name: ETM

    Product_Type: pqa

    Description: Landsat 7 Pixel Quality 25 metre, 100km tile, Australian Albers Equal Area projection (EPSG:3577)

    Measurements: [{'name': 'pixelquality', 'dtype': 'int16', 'units': '1', 'nodata': 0, 'aliases': ['qa_flags', 'quality'], 'flags_definition': {'land_sea': {'bits': 9, 'values': {'0': 'sea', '1': 'land'}, 'description': 'Land or Sea'}, 'cloud_acca': {'bits': 10, 'values': {'0': 'cloud', '1': 'no_cloud'}, 'description': 'Cloud (ACCA)'}, 'contiguous': {'bits': 8, 'values': {'0': False, '1': True}, 'description': 'All bands for this pixel contain non-null values'}, 'cloud_fmask': {'bits': 11, 'values': {'0': 'cloud', '1': 'no_cloud'}, 'description': 'Cloud (Fmask)'}, 'ga_good_pixel': {'bits': [13, 12, 11, 10

Here, we are getting the definition for the product specified in the code above. The data is a dictionary list as in the case of metadata. The **pandas DataFrame** is unusable in this case due to some [unknown] difference in format. Hence, we must use the python 'print_it' function given in the code block above. 

**The code blocks above, and their descriptions, should let you examine any product in the datacube. Enjoy! **