Skip to content

Commit

Permalink
Update docs and terminology for current system
Browse files Browse the repository at this point in the history
  • Loading branch information
jeremyh committed Apr 5, 2017
1 parent ee52a89 commit 8d896f9
Show file tree
Hide file tree
Showing 6 changed files with 63 additions and 42 deletions.
5 changes: 3 additions & 2 deletions docs/about/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ Glossary
.. glossary::

AGDC
The Australian Geoscience Data Cube
The Australian Geoscience Data Cube, an Australian implementation of the
ODC.

API
The Data Cube Application Programming Interface gives programmers full
Expand All @@ -29,7 +30,7 @@ Glossary
Open Data Cube

PostgreSQL
The high performance database engine used as an index of Dataset by the
The high performance database engine used as an index of Datasets by the
Data Cube. It is both a relational and document database, and the Data
Cube schema makes use of both of these capabilities.

Expand Down
31 changes: 23 additions & 8 deletions docs/ops/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ See also :ref:`create-configuration-file` for the datacube config file.

.. _product-doc:

Product
-------
Product definition
------------------
Product description document defines some of the metadata common to all the datasets belonging to the products.
It also describes the measurements that product has and some of the properties of the measurements.

Expand Down Expand Up @@ -81,8 +81,8 @@ measurements
.. _dataset-metadata-doc:

Dataset
-------
Dataset metadata document
-------------------------
Dataset document defines critical metadata of the dataset such as:

- measurements
Expand Down Expand Up @@ -217,8 +217,8 @@ lineage
.. _ingestion-config:

Metadata Type
-------------
Metadata Type Definition
------------------------
Metadata Type document defines searchable bits of metadata within `Dataset`_ documents.

Ingestion Config
Expand Down Expand Up @@ -323,10 +323,25 @@ Runtime Config

Runtime Config document specifies database connection configuration options:

This is loaded from the following locations in order, if they exist, with properties from latter files
overriding those in earlier ones:

* /etc/datacube.conf
* $DATACUBE_CONFIG_PATH
* ~/.datacube.conf
* datacube.conf

.. code-block:: text
[datacube]
db_hostname: 127.0.0.1
db_database: datacube
db_username: cubeuser
# A blank host will use a local socket. Specify a hostname (such as localhost) to use TCP.
db_hostname:
# Credentials are optional: you might have other Postgres authentication configured.
# The default username is the current user id
# db_username:
# A blank password will fall back to default postgres driver authentication, such as reading your ~/.pgpass file.
# db_password:
2 changes: 1 addition & 1 deletion docs/ops/db_setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Datacube looks for a configuration file in ~/.datacube.conf or in the location s
[datacube]
db_database: datacube

# A blank host will use a local socket. Specify a hostname to use TCP.
# A blank host will use a local socket. Specify a hostname (such as localhost) to use TCP.
db_hostname:

# Credentials are optional: you might have other Postgres authentication configured.
Expand Down
31 changes: 17 additions & 14 deletions docs/ops/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,17 @@ can start to load in some data. This step is performed using the **datacube**
command line tool.

When you load data into the Data Cube, all you are doing is recording the
existence of and detailed metadata about the data into the **database**. None of
existence of and detailed metadata about the data into the **index**. None of
the data itself is copied, moved or transformed. This is therefore a relatively
safe and fase process.
safe and fast process.

Prerequisites for Indexing Data
-------------------------------

* A working Data Cube setup
* Some *Analysis Ready Data* to load
* A Product Type configuration loaded into the database for each Dataset
* Dataset YAML files for each dataset
* A Product definition added to your Data Cube for each type of dataset
* Dataset metadata documents for each individual dataset


Sample Earth Observation Data
Expand Down Expand Up @@ -50,16 +50,17 @@ Once you have downloaded some data, it will need :ref:`metadata preparation

.. _product-definitions:

Product Definitions
-------------------
Product Definition
------------------

The Data Cube can handle many different types of data, and requires a bit of
information up front to know what to do with them. This is the task of the
information up front to know what to do with them. This is the task of a
Product Definition.

A Product Definition provides a short **name**, a **description**, some basic
source **metadata** and (optionally) a list of **measurements** describing the
type of data that will be contained in the Datasets of it's type.
type of data that will be contained in the Datasets of its type. In Landsat Surface
Reflectance, for example, the measurements are the list of bands.

The **measurements** is an ordered list of data, which specify a **name** and
some **aliases**, a data type or **dtype**, and some options extras including
Expand All @@ -82,13 +83,15 @@ To load Products into your Data Cube run::

Dataset Documents
-----------------
As well as the product information loaded in the previous step, every Dataset
requires some metadata describing what the data represents and where it has come
from, as well has what sort of files it is stored in. We call this *blah* and it
is expected to be stored in _YAML_ documents. It is what is loaded into the
Database for searching, querying and accessing the data.
Every dataset requires a metadata document describing what the data represents and where it has come
from, as well has what format it is stored in. At a minimum, you need the dimensions or fields your want to
search by, such as lat, lon and time, but you can include any information you deem useful.

In the case of data from Geoscience Australia, no further steps are required.
It is typically stored in YAML documents, but JSON is also supported. It is stored in the index
for searching, querying and accessing the data.

The data from Geoscience Australia already comes with relevent files (named ``ga-metadata.yaml``), so
no further steps are required for indexing them.

For third party datasets, see :ref:`prepare-scripts`.

Expand Down
2 changes: 1 addition & 1 deletion docs/user/guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Standalone Tools
The Data Cube software comes with several tools that can be used for data
exploration and exporting, without writing any code.

* `datacube-search`
* `datacube` (see ``datacube --help`` after installation)
* `pixeldrill`
* `movie_generator`

Expand Down
34 changes: 18 additions & 16 deletions docs/user/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The Data Cube is a system designed to:
* Provide a :term:`Python` based :term:`API` for high performance querying and data access
* Give scientists and other users easy ability to perform Exploratory Data Analysis
* Allow scalable continent scale processing of the stored data
* Track the providence of all the contained data to allow for quality control and updates
* Track the provenance of all the contained data to allow for quality control and updates

Getting Started
===============
Expand Down Expand Up @@ -37,21 +37,23 @@ Types of Datasets in a Data Cube
When using the Data Cube, it will contain records about 3 different types of
products and datasets.

========================= ============= ================
Type of product/dataset In Database Data available
========================= ============= ================
Referenced Yes No
------------------------- ------------- ----------------
Indexed Yes Maybe
------------------------- ------------- ----------------
Managed Yes Yes
========================= ============= ================
================= ========== ================= ================================
Type of dataset In Index Data available Typical data
================= ========== ================= ================================
Referenced Yes No Historic or provenance record
----------------- ---------- ----------------- --------------------------------
Indexed Yes Maybe Created externally
----------------- ---------- ----------------- -------------------------------
Ingested Yes Yes Created within the Data Cube
================= ========== ======== ======== ===============================

Referenced Datasets
~~~~~~~~~~~~~~~~~~~

The existence of these datasets is know about through the provenance history
of datasets, but the raw data files are not tracked by the Data Cube.
The existence and metadata of these datasets is known but the data itself is not
accessible to the Data Cube. ie. A dataset without a location.

These usually come from the provenance / source information of other datasets.

Example:

Expand All @@ -60,18 +62,18 @@ Example:
Indexed Datasets
~~~~~~~~~~~~~~~~

Data has been available on disk at some point, with associated metadata
Data is available (has a file location or uri), with associated metadata
available in a format understood by the Data Cube.

Example:

- USGS Landsat Scenes with prepared ``agdc-metadata.yaml``
- GA Landsat Scenes

Managed Datasets
~~~~~~~~~~~~~~~~
Ingested Datasets
~~~~~~~~~~~~~~~~~

On disk data has been created by/and is managed by the Data Cube. The data has
Data has been created by/and is managed by the Data Cube. The data has typically been
been copied, compressed, tiled and possibly re-projected into a shape suitable
for analysis, and stored in NetCDF4 files.

Expand Down

0 comments on commit 8d896f9

Please sign in to comment.