Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation #1455

Merged
merged 12 commits into from
Jun 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
32 changes: 21 additions & 11 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ Developer setup

- ``git clone https://github.com/opendatacube/datacube-core.git``

2. Create a Python environment for using the ODC. We recommend `conda <https://docs.conda.io/en/latest/miniconda.html>`__ as the
2. Create a Python environment for using the ODC. We recommend `Mambaforge <https://mamba.readthedocs.io/en/latest/user_guide/mamba.html>`__ as the
easiest way to handle Python dependencies.

::

conda create -f conda-environment.yml
mamba env create -f conda-environment.yml
Ariana-B marked this conversation as resolved.
Show resolved Hide resolved
conda activate cubeenv

3. Install a develop version of datacube-core.
Expand All @@ -72,26 +72,34 @@ Developer setup
pre-commit install

5. Run unit tests + PyLint
``./check-code.sh``

(this script approximates what is run by Travis. You can
alternatively run ``pytest`` yourself). Some test dependencies may need to be installed, attempt to install these using:

Install test dependencies using:

``pip install --upgrade -e '.[test]'``

If install for these fails please lodge them as issues.
If install for these fails, please lodge them as issues.

Run unit tests with:

``./check-code.sh``

(this script approximates what is run by GitHub Actions. You can
alternatively run ``pytest`` yourself).

6. **(or)** Run all tests, including integration tests.

``./check-code.sh integration_tests``

- Assumes a password-less Postgres database running on localhost called

``agdcintegration``
``pgintegration``

- Otherwise copy ``integration_tests/agdcintegration.conf`` to
- Otherwise copy ``integration_tests/integration.conf`` to
``~/.datacube_integration.conf`` and edit to customise.

- For instructions on setting up a password-less Postgres database, see
the `developer setup instructions <https://datacube-core.readthedocs.io/en/latest/installation/setup/ubuntu.html#postgres-database-configuration>`__.


Alternatively one can use the ``opendatacube/datacube-tests`` docker image to run
tests. This docker includes database server pre-configured for running
Expand All @@ -103,11 +111,13 @@ to ``./check-code.sh`` script.
./check-code.sh --with-docker integration_tests


To run individual test in docker container
To run individual tests in a docker container

::

docker run -ti -v /home/ubuntu/datacube-core:/code opendatacube/datacube-tests:latest pytest integration_tests/test_filename.py::test_function_name
docker build --tag=opendatacube/datacube-tests-local --no-cache --progress plain -f docker/Dockerfile .

docker run -ti -v $(pwd):/code opendatacube/datacube-tests-local:latest pytest integration_tests/test_filename.py::test_function_name


Developer setup on Ubuntu
Expand Down
8 changes: 4 additions & 4 deletions docker/assets/with_bootstrap
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ launch_db () {
sudo -u postgres createuser --superuser "${dbuser}"
sudo -u postgres createdb "${dbuser}"
sudo -u postgres createdb datacube
sudo -u postgres createdb agdcintegration
sudo -u postgres createdb odcintegration
sudo -u postgres createdb pgintegration
sudo -u postgres createdb pgisintegration
}

# Become `odc` user with UID/GID compatible to datacube-core volume
Expand Down Expand Up @@ -58,12 +58,12 @@ launch_db () {
cat <<EOL > $HOME/.datacube_integration.conf
[datacube]
db_hostname:
db_database: agdcintegration
db_database: pgintegration
index_driver: default

[experimental]
db_hostname:
db_database: odcintegration
db_database: pgisintegration
index_driver: postgis

[no_such_driver_env]
Expand Down
5 changes: 4 additions & 1 deletion docs/about-core-concepts/metadata-types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,7 @@ Metadata Types

Metadata type yaml file must contain name, description and dataset keys.

Dataset key must contain id, sources, creation_dt, label and search_fields keys.
Dataset key must contain id, sources, creation_dt, label, and search_fields keys.

For metadata types of spatial datasets, the dataset key must also contain grid_spatial, measurements, and format keys.
Support for non-spatial datasets is likely to be dropped in version 2.0.
10 changes: 8 additions & 2 deletions docs/config_samples/metadata_types/bare_bone.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,16 @@
name: barebone
description: A minimalist metadata type file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying we should fix this here, but this is so minimal that it's almost irrelevant.

Practically nobody configures metadata types, I don't think. Possibly something to consider for the future, @SpacemanPaul as the times I change a metadata type is to make a field searchable for a product... allowing arbitrary searches would be great and using metadata to configure indexed searchable fields would be 🚀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality provided by the metadata subsystem is really handy, it's a shame it can only be configured at "product creation time". Non-index based searching can be configured purely at runtime as it just about constructing a query, no need to have an index for the query to be useful. It's just that datacube only allows configuration from the metadata document stored in DB and linked to a given product. While "stored metadata" is handy it does not need to be the only way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The postgis driver requires metadata type documents to be "eo3 compatible" although it's not 100% clear at the moment what that means.

The only use for metadata type documents going forwards appears to be to expose user-friendly metadata aliases for various dataset metadata entries for use in searches (and ensuring those searches are fully indexed).

dataset:
id: [id] # No longer configurable in newer ODCs.
sources: [lineage, source_datasets] # No longer configurable in newer ODCs.
id: [id] # No longer configurable in newer ODCs.
sources: [lineage, source_datasets] # No longer configurable in newer ODCs.

creation_dt: [properties, 'odc:processing_datetime']
label: [label]
# The following keys are necessary if describing spatial datasets
# grid_spatial: [grid_spatial, projection]
# measurements: [measurements]
# format: [properties, 'odc:file_format']

search_fields:
platform:
description: Platform code
Expand Down
147 changes: 0 additions & 147 deletions docs/installation/data-preparation-scripts.rst

This file was deleted.

12 changes: 12 additions & 0 deletions docs/installation/database/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,18 @@ Alternately, you can configure the ODC connection to Postgres using environment
DB_PASSWORD
DB_DATABASE

To configure a database as a single connection url instead of individual environment variables::

export DATACUBE_DB_URL=postgresql://[username]:[password]@[hostname]:[port]/[database]

Alternatively, for password-less access to a database on localhost::

export DATACUBE_DB_URL=postgresql:///[database]

Further information on database configuration can be found `here <https://github.com/opendatacube/datacube-core/wiki/ODC-EP-010---Replace-Configuration-Layer>`__.
Although the enhancement proposal details incoming changes in v1.9 and beyond, it should largely be compatible with the current behaviour, barring a few
obscure corner cases.

The desired environment can be specified:

1. in code, with the ``env`` argument to the ``datacube.Datacube`` constructor;
Expand Down
1 change: 0 additions & 1 deletion docs/installation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,5 +53,4 @@ This section contains information on setting up and managing the Open Data Cube.
.. toctree::
:caption: Legacy Approaches

data-preparation-scripts
ingesting-data/index
13 changes: 3 additions & 10 deletions docs/installation/indexing-data/step-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,16 +72,9 @@ for searching, querying and accessing the data.
The data from Geoscience Australia already comes with relevant files (named ``ga-metadata.yaml``), so
no further steps are required for indexing them.

For third party datasets, see :ref:`prepare-scripts`.


.. admonition:: Note

:class: info

Some metadata requires cleanup before they are ready to be loaded.

For more information see :ref:`dataset-metadata-doc`.
For third party datasets, see the examples detailed `here <https://github.com/opendatacube/datacube-dataset-config#documented-examples>`__.
For common distribution formats, data can be indexed using one of the tools from `odc-apps-dc-tools <https://github.com/opendatacube/odc-tools/tree/develop/apps/dc_tools>`__.
In other cases, the metadata may need to be mapped to an ODC-compatible format. You can find examples of data preparation scripts `here <https://github.com/opendatacube/datacube-dataset-config/tree/main/old-prep-scripts>`__.


Step 3. Run the Indexing process
Expand Down
20 changes: 2 additions & 18 deletions docs/installation/ingesting-data/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,25 +7,9 @@ Ingesting Data
.. note::

Ingestion is no longer recommended. While it was used as an optimised on-disk
storage mechanism, there are a range of reasons why this is no longer ideal. For example
storage mechanism, there are a range of reasons why this is no longer ideal. For example,
the emergence of cloud optimised storage formats means that software such
as GDAL and Rasterio are optimised for reading many files over the network. Additionally
the limitation of NetCDF reading to a single thread means that reading from .TIF
files on disk could be faster in some situations.

In addition to limited performance improvements, ingestion leads to duplication
of data and opinionated decisions, such as reprojection of data, which can lead
to a loss of data fidelity.

The section below is being retained for completion, but should be considered optional.


.. note::

Ingestion is no longer recommended. While it was used as an optimised on-disk
storage mechanism, there are a range of reasons why this is no longer ideal. For example
the emergence of cloud optimised storage formats means that software such
as GDAL and Rasterio are optimised for reading many files over the network. Additionally
as GDAL and Rasterio are optimised for reading many files over the network. Additionally,
the limitation of NetCDF reading to a single thread means that reading from .TIF
files on disk could be faster in some situations.

Expand Down
11 changes: 8 additions & 3 deletions docs/installation/metadata-types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,17 @@ A Metadata Type defines which fields should be searchable in your product or dat

Three metadata types are added by default called ``eo``, ``telemetry`` and ``eo3``.

You can see the default metadata types in the repository at `datacube/index/default-metadata-types.yaml <https://github.com/opendatacube/datacube-core/blob/develop/datacube/index/default-metadata-types.yaml>`_.

You would create a new metadata type if you want custom fields to be searchable for your products, or
if you want to structure your metadata documents differently.

You can see the default metadata type in the repository at `datacube/index/default-metadata-types.yaml <https://github.com/opendatacube/datacube-core/blob/develop/datacube/index/default-metadata-types.yaml>`_.

To add or alter metadata types, you can use commands like: ``datacube metadata add <path-to-file>``
and to update: ``datacube metadata update <path-to-file>``. Using ``--allow-unsafe`` will allow
you to update metadata types where the changes may have unexpected consequences.

Note that the postgis driver only supports eo3-compatible metadata types, and from version 2.0 onward, support for non-eo3-compatible metadata types
will be fully deprecated.

.. literalinclude:: ../config_samples/metadata_types/bare_bone.yaml
:language: yaml
Expand All @@ -22,4 +24,7 @@ you to update metadata types where the changes may have unexpected consequences.

Metadata type yaml file must contain name, description and dataset keys.

Dataset key must contain id, sources, creation_dt, label and search_fields keys.
Dataset key must contain id, sources, creation_dt, label, and search_fields keys.

For metadata types of spatial datasets, the dataset key must also contain grid_spatial, measurements, and format keys.
Support for non-spatial datasets is likely to be dropped in version 2.0.