Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation #1455

Merged
merged 12 commits into from
Jun 20, 2023
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
32 changes: 21 additions & 11 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ Developer setup

- ``git clone https://github.com/opendatacube/datacube-core.git``

2. Create a Python environment for using the ODC. We recommend `conda <https://docs.conda.io/en/latest/miniconda.html>`__ as the
2. Create a Python environment for using the ODC. We recommend `Mambaforge <https://mamba.readthedocs.io/en/latest/user_guide/mamba.html>`__ as the
easiest way to handle Python dependencies.

::

conda create -f conda-environment.yml
mamba env create -f conda-environment.yml
Ariana-B marked this conversation as resolved.
Show resolved Hide resolved
conda activate cubeenv

3. Install a develop version of datacube-core.
Expand All @@ -72,26 +72,34 @@ Developer setup
pre-commit install

5. Run unit tests + PyLint
``./check-code.sh``

(this script approximates what is run by Travis. You can
alternatively run ``pytest`` yourself). Some test dependencies may need to be installed, attempt to install these using:

Install test dependencies using:

``pip install --upgrade -e '.[test]'``

If install for these fails please lodge them as issues.
If install for these fails, please lodge them as issues.

Run unit tests with:

``./check-code.sh``

(this script approximates what is run by GitHub Actions. You can
alternatively run ``pytest`` yourself).

6. **(or)** Run all tests, including integration tests.

``./check-code.sh integration_tests``

- Assumes a password-less Postgres database running on localhost called

``agdcintegration``
``pgintegration``

- Otherwise copy ``integration_tests/agdcintegration.conf`` to
- Otherwise copy ``integration_tests/integration.conf`` to
``~/.datacube_integration.conf`` and edit to customise.

- For instructions on setting up a password-less Postgres database, see
the `developer setup instructions <https://datacube-core.readthedocs.io/en/latest/installation/setup/ubuntu.html#postgres-database-configuration>`__.


Alternatively one can use the ``opendatacube/datacube-tests`` docker image to run
tests. This docker includes database server pre-configured for running
Expand All @@ -103,11 +111,13 @@ to ``./check-code.sh`` script.
./check-code.sh --with-docker integration_tests


To run individual test in docker container
To run individual tests in a docker container

::

docker run -ti -v /home/ubuntu/datacube-core:/code opendatacube/datacube-tests:latest pytest integration_tests/test_filename.py::test_function_name
docker build --tag=opendatacube/datacube-tests-local --no-cache --progress plain -f docker/Dockerfile .

docker run -ti -v $(pwd):/code opendatacube/datacube-tests-local:latest pytest integration_tests/test_filename.py::test_function_name


Developer setup on Ubuntu
Expand Down
8 changes: 4 additions & 4 deletions docker/assets/with_bootstrap
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ launch_db () {
sudo -u postgres createuser --superuser "${dbuser}"
sudo -u postgres createdb "${dbuser}"
sudo -u postgres createdb datacube
sudo -u postgres createdb agdcintegration
sudo -u postgres createdb odcintegration
sudo -u postgres createdb pgintegration
sudo -u postgres createdb pgisintegration
}

# Become `odc` user with UID/GID compatible to datacube-core volume
Expand Down Expand Up @@ -58,12 +58,12 @@ launch_db () {
cat <<EOL > $HOME/.datacube_integration.conf
[datacube]
db_hostname:
db_database: agdcintegration
db_database: pgintegration
index_driver: default

[experimental]
db_hostname:
db_database: odcintegration
db_database: pgisintegration
index_driver: postgis

[no_such_driver_env]
Expand Down
5 changes: 4 additions & 1 deletion docs/about-core-concepts/metadata-types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,7 @@ Metadata Types

Metadata type yaml file must contain name, description and dataset keys.

Dataset key must contain id, sources, creation_dt, label and search_fields keys.
Dataset key must contain id, sources, creation_dt, label, and search_fields keys.

For metadata types of spatial datasets, the dataset key must also contain grid_spatial, measurements, and format keys.
Support for non-spatial datasets is likely to be dropped in version 2.0.
10 changes: 8 additions & 2 deletions docs/config_samples/metadata_types/bare_bone.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,16 @@
name: barebone
description: A minimalist metadata type file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying we should fix this here, but this is so minimal that it's almost irrelevant.

Practically nobody configures metadata types, I don't think. Possibly something to consider for the future, @SpacemanPaul as the times I change a metadata type is to make a field searchable for a product... allowing arbitrary searches would be great and using metadata to configure indexed searchable fields would be 馃殌

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality provided by the metadata subsystem is really handy, it's a shame it can only be configured at "product creation time". Non-index based searching can be configured purely at runtime as it just about constructing a query, no need to have an index for the query to be useful. It's just that datacube only allows configuration from the metadata document stored in DB and linked to a given product. While "stored metadata" is handy it does not need to be the only way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The postgis driver requires metadata type documents to be "eo3 compatible" although it's not 100% clear at the moment what that means.

The only use for metadata type documents going forwards appears to be to expose user-friendly metadata aliases for various dataset metadata entries for use in searches (and ensuring those searches are fully indexed).

dataset:
id: [id] # No longer configurable in newer ODCs.
sources: [lineage, source_datasets] # No longer configurable in newer ODCs.
id: [id] # No longer configurable in newer ODCs.
sources: [lineage, source_datasets] # No longer configurable in newer ODCs.

creation_dt: [properties, 'odc:processing_datetime']
label: [label]
# The following keys are necessary if describing spatial datasets
# grid_spatial: [grid_spatial, projection]
# measurements: [measurements]
# format: [properties, 'odc:file_format']

search_fields:
platform:
description: Platform code
Expand Down
12 changes: 6 additions & 6 deletions docs/installation/data-preparation-scripts.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Data Preperation Scripts
Data Preparation Scripts
Ariana-B marked this conversation as resolved.
Show resolved Hide resolved
========================

.. note::
Expand Down Expand Up @@ -42,11 +42,11 @@ Download the USGS Collection 1 landsat scenes from any of the links below:

The prepare script for collection 1 - level 1 data is available in
`ls_usgs_prepare.py
<https://github.com/opendatacube/datacube-dataset-config/blob/master/old-prep-scripts/ls_usgs_prepare.py>`_.
<https://github.com/opendatacube/datacube-dataset-config/blob/main/old-prep-scripts/ls_usgs_prepare.py>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should update all this for USGS Landsat Collection 2.

If Collection 1 is still available, it's probably safe to assume it won't be for much longer - and we should at least state this here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to migrate some of the product-specific docs from over here to replace this: https://github.com/opendatacube/datacube-dataset-config

I don't know that any of the prep scripts have been used for like ... 5+ years!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth keeping this page at all, beyond linking to the datacube-dataset-config repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so! The scripts probably don't work...


::

$ wget https://github.com/opendatacube/datacube-dataset-config/raw/master/old-prep-scripts/ls_usgs_prepare.py
$ wget https://github.com/opendatacube/datacube-dataset-config/raw/main/old-prep-scripts/ls_usgs_prepare.py
$ python ls_usgs_prepare.py --help
Usage: ls_usgs_prepare.py [OPTIONS] [DATASETS]...

Expand Down Expand Up @@ -85,14 +85,14 @@ For Landsat collection 1 level 1 product:
To prepare downloaded USGS LEDAPS Landsat scenes for use with the Data Cube, use
the script provided in
`usgs_ls_ard_prepare.py
<https://github.com/opendatacube/datacube-dataset-config/blob/master/agdcv2-ingest/prepare_scripts/landsat_collection/usgs_ls_ard_prepare.py>`_
<https://github.com/opendatacube/datacube-dataset-config/blob/main/agdcv2-ingest/prepare_scripts/landsat_collection/usgs_ls_ard_prepare.py>`_

The following example generates the required Dataset Metadata files, named
`agdc-metadata.yaml` for three landsat scenes.

::

$ wget https://github.com/opendatacube/datacube-dataset-config/raw/master/agdcv2-ingest/prepare_scripts/landsat_collection/usgs_ls_ard_prepare.py
$ wget https://github.com/opendatacube/datacube-dataset-config/raw/main/agdcv2-ingest/prepare_scripts/landsat_collection/usgs_ls_ard_prepare.py
$ python USGS_precollection_oldscripts/usgslsprepare.py --help
Usage: usgslsprepare.py [OPTIONS] [DATASETS]...

Expand Down Expand Up @@ -134,7 +134,7 @@ Then :ref:`index the data <indexing>`.
To view an example of how to `index Sentinel-2 data from S3`_ check out the documentation
available in the datacube-dataset-config_ repository.

.. _`index Sentinel-2 data from S3`: https://github.com/opendatacube/datacube-dataset-config/blob/master/sentinel-2-l2a-cogs.md
.. _`index Sentinel-2 data from S3`: https://github.com/opendatacube/datacube-dataset-config/blob/main/sentinel-2-l2a-cogs.md
.. _datacube-dataset-config: https://github.com/opendatacube/datacube-dataset-config/

Custom Prepare Scripts
Expand Down
12 changes: 12 additions & 0 deletions docs/installation/database/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,18 @@ Alternately, you can configure the ODC connection to Postgres using environment
DB_PASSWORD
DB_DATABASE

To configure a database as a single connection url instead of individual environment variables::

export DATACUBE_DB_URL=postgresql://[username]:[password]@[hostname]:[port]/[database]

Alternatively, for password-less access to a database on localhost::

export DATACUBE_DB_URL=postgresql:///[database]

Further information on database configuration can be found `here <https://github.com/opendatacube/datacube-core/wiki/ODC-EP-010---Replace-Configuration-Layer>`__.
Although the enhancement proposal details incoming changes in v1.9 and beyond, it should largely be compatible with the current behaviour, barring a few
obscure corner cases.

The desired environment can be specified:

1. in code, with the ``env`` argument to the ``datacube.Datacube`` constructor;
Expand Down
1 change: 0 additions & 1 deletion docs/installation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,5 +53,4 @@ This section contains information on setting up and managing the Open Data Cube.
.. toctree::
:caption: Legacy Approaches

data-preparation-scripts
ingesting-data/index
13 changes: 3 additions & 10 deletions docs/installation/indexing-data/step-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,16 +72,9 @@ for searching, querying and accessing the data.
The data from Geoscience Australia already comes with relevant files (named ``ga-metadata.yaml``), so
no further steps are required for indexing them.

For third party datasets, see :ref:`prepare-scripts`.


.. admonition:: Note

:class: info

Some metadata requires cleanup before they are ready to be loaded.

For more information see :ref:`dataset-metadata-doc`.
For third party datasets, see the examples detailed `here <https://github.com/opendatacube/datacube-dataset-config#documented-examples>`__.
For common distribution formations, data can be indexed using one of the tools from `odc-apps-dc-tools <https://github.com/opendatacube/odc-tools/tree/develop/apps/dc_tools>`__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Distribution formations" sounds wrong - I think you mean "distribution formats"?

In other cases, the metadata may need to be mapped to an ODC-compatible format. You can find examples of data preparation scripts `here <https://github.com/opendatacube/datacube-dataset-config/tree/main/old-prep-scripts>`__.


Step 3. Run the Indexing process
Expand Down
20 changes: 2 additions & 18 deletions docs/installation/ingesting-data/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,25 +7,9 @@ Ingesting Data
.. note::

Ingestion is no longer recommended. While it was used as an optimised on-disk
storage mechanism, there are a range of reasons why this is no longer ideal. For example
storage mechanism, there are a range of reasons why this is no longer ideal. For example,
the emergence of cloud optimised storage formats means that software such
as GDAL and Rasterio are optimised for reading many files over the network. Additionally
the limitation of NetCDF reading to a single thread means that reading from .TIF
files on disk could be faster in some situations.

In addition to limited performance improvements, ingestion leads to duplication
of data and opinionated decisions, such as reprojection of data, which can lead
to a loss of data fidelity.

The section below is being retained for completion, but should be considered optional.


.. note::

Ingestion is no longer recommended. While it was used as an optimised on-disk
storage mechanism, there are a range of reasons why this is no longer ideal. For example
the emergence of cloud optimised storage formats means that software such
as GDAL and Rasterio are optimised for reading many files over the network. Additionally
as GDAL and Rasterio are optimised for reading many files over the network. Additionally,
the limitation of NetCDF reading to a single thread means that reading from .TIF
files on disk could be faster in some situations.

Expand Down
11 changes: 8 additions & 3 deletions docs/installation/metadata-types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,17 @@ A Metadata Type defines which fields should be searchable in your product or dat

Three metadata types are added by default called ``eo``, ``telemetry`` and ``eo3``.

You can see the default metadata types in the repository at `datacube/index/default-metadata-types.yaml <https://github.com/opendatacube/datacube-core/blob/develop/datacube/index/default-metadata-types.yaml>`_.

You would create a new metadata type if you want custom fields to be searchable for your products, or
if you want to structure your metadata documents differently.

You can see the default metadata type in the repository at `datacube/index/default-metadata-types.yaml <https://github.com/opendatacube/datacube-core/blob/develop/datacube/index/default-metadata-types.yaml>`_.

To add or alter metadata types, you can use commands like: ``datacube metadata add <path-to-file>``
and to update: ``datacube metadata update <path-to-file>``. Using ``--allow-unsafe`` will allow
you to update metadata types where the changes may have unexpected consequences.

Note that the postgis driver only supports eo3-compatible metadata types, and from version 2.0 onward, support for non-eo3-compatible metadata types
will be fully deprecated.

.. literalinclude:: ../config_samples/metadata_types/bare_bone.yaml
:language: yaml
Expand All @@ -22,4 +24,7 @@ you to update metadata types where the changes may have unexpected consequences.

Metadata type yaml file must contain name, description and dataset keys.

Dataset key must contain id, sources, creation_dt, label and search_fields keys.
Dataset key must contain id, sources, creation_dt, label, and search_fields keys.

For metadata types of spatial datasets, the dataset key must also contain grid_spatial, measurements, and format keys.
Support for non-spatial datasets is likely to be dropped in version 2.0.
88 changes: 26 additions & 62 deletions docs/installation/setup/common_install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,36 +7,32 @@ Python and packages

Python 3.8+ is required.

Anaconda Python
---------------

`Install Anaconda Python <https://www.anaconda.com/download/>`_

Add conda-forge to package channels::

conda config --append channels conda-forge
Conda environment setup
-----------------------

Conda environments are recommended for use in isolating your ODC development environment from your system installation and other Python environments.

Install required Python packages and create a conda environment named ``odc_env``.
We recommend you use Mambaforge to set up your conda virtual environment, as all the required packages are obtained from the conda-forge channel.
Download and install it from `here <https://github.com/conda-forge/miniforge#mambaforge>`_.

Python::
Download the latest version of the Open Data Cube from the `repository <https://github.com/opendatacube/datacube-core>`_::

conda create --name odc_env python=3.8 datacube
git clone https://github.com/opendatacube/datacube-core
cd datacube-core

Activate the ``odc_env`` conda environment::
Create a conda environment named ``cubeenv``::

conda activate odc_env
mamba env create -f conda-environment.yml
Ariana-B marked this conversation as resolved.
Show resolved Hide resolved

Find out more about conda environments `here <https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html>`_.
Activate the ``cubeenv`` conda environment::

Install other packages::
conda activate cubeenv

conda install jupyter matplotlib scipy pytest-cov hypothesis
Find out more about conda environments `here <https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html>`_.


Postgres database configuration
===============================
Postgres testing database configuration
=======================================

This configuration supports local development using your login name.

Expand All @@ -52,7 +48,7 @@ Set a password for the "postgres" database role using the command::

and set the password when prompted. The password text will be hidden from the console for security purposes.

Type **Control+D** or **\q** to exit the posgreSQL prompt.
Type **Control+D** or **\\q** to exit the posgreSQL prompt.

By default in Ubuntu, Postgresql is configured to use ``ident sameuser`` authentication for any connections from the same machine which is useful for development. Check out the excellent Postgresql documentation for more information, but essentially this means that if your Ubuntu username is ``foo`` and you add ``foo`` as a Postgresql user then you can connect to a database without requiring a password for many functions.

Expand All @@ -61,51 +57,19 @@ Since the only user who can connect to a fresh install is the postgres user, her
sudo -u postgres createuser --superuser $USER
sudo -u postgres psql

postgres=# \password $USER
postgres=# \password <foo>

Now we can create an ``agdcintegration`` database for testing::
Now we can create databases for integration testing. You will need 2 databases - one for the Postgres driver and one for the PostGIS driver.
By default, these databases are called ``pgintegration`` and ``pgisintegration``, but you can name them however you want::

createdb agdcintegration
postgres=# create database pgintegration;
postgres=# create database pgisintegration;

Or, directly from the bash terminal::

Connecting to your own database to try out some SQL should now be as easy as::

psql -d agdcintegration


Open Data Cube source and development configuration
===================================================
createdb pgintegration
createdb pgisintegration

Download the latest version of the software from the `repository <https://github.com/opendatacube/datacube-core>`_ ::

git clone https://github.com/opendatacube/datacube-core
cd datacube-core

We need to specify the database user and password for the ODC integration testing. To do this::

cp integration_tests/agdcintegration.conf ~/.datacube_integration.conf

Then edit the ``~/.datacube_integration.conf`` with a text editor and add the following lines replacing ``<foo>`` with your username and ``<foobar>`` with the database user password you set above (not the postgres one, your ``<foo>`` one)::

[datacube]
db_hostname: localhost
db_database: agdcintegration
db_username: <foo>
db_password: <foobar>

Note: For Ubuntu Setup the db_hostname should be set to "/var/run/postgresql". For more refer: https://github.com/opendatacube/datacube-core/issues/1329

Verify it all works
===================

Run the integration tests::

cd datacube-core
./check-code.sh integration_tests

Build the documentation::

cd datacube-core/docs
pip install -r requirements.txt
make html
Connecting to your own database to try out some SQL should now be as easy as::

Then open :file:`_build/html/index.html` in your browser to view the Documentation.
psql -d pgintegration