Update docs and terminology for current system

opendatacube · Apr 5, 2017 · 8d896f9 · 8d896f9
1 parent ee52a89
commit 8d896f9
Show file tree

Hide file tree

Showing 6 changed files with 63 additions and 42 deletions.
diff --git a/docs/about/glossary.rst b/docs/about/glossary.rst
@@ -6,7 +6,8 @@ Glossary
 .. glossary::
 
    AGDC
-      The Australian Geoscience Data Cube
+      The Australian Geoscience Data Cube, an Australian implementation of the
+      ODC.
 
    API
       The Data Cube Application Programming Interface gives programmers full
@@ -29,7 +30,7 @@ Glossary
       Open Data Cube
 
    PostgreSQL
-      The high performance database engine used as an index of Dataset by the
+      The high performance database engine used as an index of Datasets by the
       Data Cube. It is both a relational and document database, and the Data
       Cube schema makes use of both of these capabilities.
 

diff --git a/docs/ops/config.rst b/docs/ops/config.rst
@@ -5,8 +5,8 @@ See also :ref:`create-configuration-file` for the datacube config file.
 
 .. _product-doc:
 
-Product
--------
+Product definition
+------------------
 Product description document defines some of the metadata common to all the datasets belonging to the products.
 It also describes the measurements that product has and some of the properties of the measurements.
 
@@ -81,8 +81,8 @@ measurements
 
 .. _dataset-metadata-doc:
 
-Dataset
--------
+Dataset metadata document
+-------------------------
 Dataset document defines critical metadata of the dataset such as:
 
     - measurements
@@ -217,8 +217,8 @@ lineage
 
 .. _ingestion-config:
 
-Metadata Type
--------------
+Metadata Type Definition
+------------------------
 Metadata Type document defines searchable bits of metadata within `Dataset`_ documents.
 
 Ingestion Config
@@ -323,10 +323,25 @@ Runtime Config
 
 Runtime Config document specifies database connection configuration options:
 
+This is loaded from the following locations in order, if they exist, with properties from latter files
+overriding those in earlier ones:
+
+ * /etc/datacube.conf
+ * $DATACUBE_CONFIG_PATH
+ * ~/.datacube.conf
+ * datacube.conf
+
 .. code-block:: text
 
     [datacube]
-    db_hostname: 127.0.0.1
     db_database: datacube
-    db_username: cubeuser
+
+    # A blank host will use a local socket. Specify a hostname (such as localhost) to use TCP.
+    db_hostname:
+
+    # Credentials are optional: you might have other Postgres authentication configured.
+    # The default username is the current user id
+    # db_username:
+    # A blank password will fall back to default postgres driver authentication, such as reading your ~/.pgpass file.
+    # db_password:
 
diff --git a/docs/ops/db_setup.rst b/docs/ops/db_setup.rst
@@ -33,7 +33,7 @@ Datacube looks for a configuration file in ~/.datacube.conf or in the location s
     [datacube]
     db_database: datacube
 
-    # A blank host will use a local socket. Specify a hostname to use TCP.
+    # A blank host will use a local socket. Specify a hostname (such as localhost) to use TCP.
     db_hostname:
 
     # Credentials are optional: you might have other Postgres authentication configured.

diff --git a/docs/ops/indexing.rst b/docs/ops/indexing.rst
@@ -10,17 +10,17 @@ can start to load in some data. This step is performed using the **datacube**
 command line tool.
 
 When you load data into the Data Cube, all you are doing is recording the
-existence of and detailed metadata about the data into the **database**. None of
+existence of and detailed metadata about the data into the **index**. None of
 the data itself is copied, moved or transformed. This is therefore a relatively
-safe and fase process.
+safe and fast process.
 
 Prerequisites for Indexing Data
 -------------------------------
 
  * A working Data Cube setup
  * Some *Analysis Ready Data* to load
- * A Product Type configuration loaded into the database for each Dataset
- * Dataset YAML files for each dataset
+ * A Product definition added to your Data Cube for each type of dataset
+ * Dataset metadata documents for each individual dataset
 
 
 Sample Earth Observation Data
@@ -50,16 +50,17 @@ Once you have downloaded some data, it will need :ref:`metadata preparation
 
 .. _product-definitions:
 
-Product Definitions
--------------------
+Product Definition
+------------------
 
 The Data Cube can handle many different types of data, and requires a bit of
-information up front to know what to do with them. This is the task of the
+information up front to know what to do with them. This is the task of a
 Product Definition.
 
 A Product Definition provides a short **name**, a **description**, some basic
 source **metadata** and (optionally) a list of **measurements** describing the
-type of data that will be contained in the Datasets of it's type.
+type of data that will be contained in the Datasets of its type. In Landsat Surface
+Reflectance, for example, the measurements are the list of bands.
 
 The **measurements** is an ordered list of data, which specify a **name** and
 some **aliases**, a data type or **dtype**, and some options extras including
@@ -82,13 +83,15 @@ To load Products into your Data Cube run::
 
 Dataset Documents
 -----------------
-As well as the product information loaded in the previous step, every Dataset
-requires some metadata describing what the data represents and where it has come
-from, as well has what sort of files it is stored in. We call this *blah* and it
-is expected to be stored in _YAML_ documents. It is what is loaded into the
-Database for searching, querying and accessing the data.
+Every dataset requires a metadata document describing what the data represents and where it has come
+from, as well has what format it is stored in. At a minimum, you need the dimensions or fields your want to
+search by, such as lat, lon and time, but you can include any information you deem useful.
 
-In the case of data from Geoscience Australia, no further steps are required.
+It is typically stored in YAML documents, but JSON is also supported. It is stored in the index
+for searching, querying and accessing the data.
+
+The data from Geoscience Australia already comes with relevent files (named ``ga-metadata.yaml``), so
+no further steps are required for indexing them.
 
 For third party datasets, see :ref:`prepare-scripts`.
 

diff --git a/docs/user/guide.rst b/docs/user/guide.rst
@@ -12,7 +12,7 @@ Standalone Tools
 The Data Cube software comes with several tools that can be used for data
 exploration and exporting, without writing any code.
 
-* `datacube-search`
+* `datacube` (see ``datacube --help`` after installation)
 * `pixeldrill`
 * `movie_generator`
 

diff --git a/docs/user/intro.rst b/docs/user/intro.rst
@@ -9,7 +9,7 @@ The Data Cube is a system designed to:
 * Provide a :term:`Python` based :term:`API` for high performance querying and data access
 * Give scientists and other users easy ability to perform Exploratory Data Analysis
 * Allow scalable continent scale processing of the stored data
-* Track the providence of all the contained data to allow for quality control and updates
+* Track the provenance of all the contained data to allow for quality control and updates
 
 Getting Started
 ===============
@@ -37,21 +37,23 @@ Types of Datasets in a Data Cube
 When using the Data Cube, it will contain records about 3 different types of
 products and datasets.
 
-========================= ============= ================
- Type of product/dataset   In Database   Data available
-========================= ============= ================
- Referenced                Yes           No
-------------------------- ------------- ----------------
- Indexed                   Yes           Maybe
-------------------------- ------------- ----------------
- Managed                   Yes           Yes
-========================= ============= ================
+================= ========== ================= ================================
+ Type of dataset   In Index   Data available           Typical data
+================= ========== ================= ================================
+ Referenced           Yes           No           Historic or provenance record
+----------------- ---------- ----------------- --------------------------------
+ Indexed              Yes           Maybe             Created externally
+----------------- ---------- ----------------- -------------------------------
+ Ingested             Yes           Yes         Created within the Data Cube
+================= ========== ======== ======== ===============================
 
 Referenced Datasets
 ~~~~~~~~~~~~~~~~~~~
 
-The existence of these datasets is know about through the provenance history
-of datasets, but the raw data files are not tracked by the Data Cube.
+The existence and metadata of these datasets is known but the data itself is not
+accessible to the Data Cube. ie. A dataset without a location.
+
+These usually come from the provenance / source information of other datasets.
 
 Example:
 
@@ -60,18 +62,18 @@ Example:
 Indexed Datasets
 ~~~~~~~~~~~~~~~~
 
-Data has been available on disk at some point, with associated metadata
+Data is available (has a file location or uri), with associated metadata
 available in a format understood by the Data Cube.
 
 Example:
 
 - USGS Landsat Scenes with prepared ``agdc-metadata.yaml``
 - GA Landsat Scenes
 
-Managed Datasets
-~~~~~~~~~~~~~~~~
+Ingested Datasets
+~~~~~~~~~~~~~~~~~
 
-On disk data has been created by/and is managed by the Data Cube. The data has
+Data has been created by/and is managed by the Data Cube. The data has typically been
 been copied, compressed, tiled and possibly re-projected into a shape suitable
 for analysis, and stored in NetCDF4 files.