Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation #1455

Merged
merged 12 commits into from
Jun 20, 2023
28 changes: 19 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ Developer setup

- ``git clone https://github.com/opendatacube/datacube-core.git``

2. Create a Python environment for using the ODC. We recommend `conda <https://docs.conda.io/en/latest/miniconda.html>`__ as the
2. Create a Python environment for using the ODC. We recommend `Mambaforge <https://mamba.readthedocs.io/en/latest/user_guide/mamba.html>`__ as the
easiest way to handle Python dependencies.

::

conda create -f conda-environment.yml
mamba env create -f conda-environment.yml
Ariana-B marked this conversation as resolved.
Show resolved Hide resolved
conda activate cubeenv

3. Install a develop version of datacube-core.
Expand All @@ -72,14 +72,19 @@ Developer setup
pre-commit install

5. Run unit tests + PyLint
``./check-code.sh``

(this script approximates what is run by Travis. You can
alternatively run ``pytest`` yourself). Some test dependencies may need to be installed, attempt to install these using:

Install test dependencies using:

``pip install --upgrade -e '.[test]'``

If install for these fails please lodge them as issues.
If install for these fails, please lodge them as issues.

Run unit tests with:

``./check-code.sh``

(this script approximates what is run by Travis. You can
Ariana-B marked this conversation as resolved.
Show resolved Hide resolved
alternatively run ``pytest`` yourself).

6. **(or)** Run all tests, including integration tests.

Expand All @@ -92,6 +97,9 @@ Developer setup
- Otherwise copy ``integration_tests/agdcintegration.conf`` to
``~/.datacube_integration.conf`` and edit to customise.

- For instructions on setting up a password-less Postgres database, see
the `developer setup instructions <https://datacube-core.readthedocs.io/en/latest/installation/setup/ubuntu.html#postgres-database-configuration>`__.


Alternatively one can use the ``opendatacube/datacube-tests`` docker image to run
tests. This docker includes database server pre-configured for running
Expand All @@ -103,11 +111,13 @@ to ``./check-code.sh`` script.
./check-code.sh --with-docker integration_tests


To run individual test in docker container
To run individual tests in a docker container

::

docker run -ti -v /home/ubuntu/datacube-core:/code opendatacube/datacube-tests:latest pytest integration_tests/test_filename.py::test_function_name
docker build --tag=opendatacube/datacube-tests-local --no-cache --progress plain -f docker/Dockerfile .

docker run -ti -v $(pwd):/code opendatacube/datacube-tests-local:latest pytest integration_tests/test_filename.py::test_function_name


Developer setup on Ubuntu
Expand Down
2 changes: 1 addition & 1 deletion docs/about-core-concepts/metadata-types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ Metadata Types

Metadata type yaml file must contain name, description and dataset keys.

Dataset key must contain id, sources, creation_dt, label and search_fields keys.
Dataset key must contain id, sources, grid_spatial, measurements, creation_dt, label, format, and search_fields keys.
SpacemanPaul marked this conversation as resolved.
Show resolved Hide resolved
9 changes: 7 additions & 2 deletions docs/config_samples/metadata_types/bare_bone.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,15 @@
name: barebone
description: A minimalist metadata type file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying we should fix this here, but this is so minimal that it's almost irrelevant.

Practically nobody configures metadata types, I don't think. Possibly something to consider for the future, @SpacemanPaul as the times I change a metadata type is to make a field searchable for a product... allowing arbitrary searches would be great and using metadata to configure indexed searchable fields would be 馃殌

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality provided by the metadata subsystem is really handy, it's a shame it can only be configured at "product creation time". Non-index based searching can be configured purely at runtime as it just about constructing a query, no need to have an index for the query to be useful. It's just that datacube only allows configuration from the metadata document stored in DB and linked to a given product. While "stored metadata" is handy it does not need to be the only way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The postgis driver requires metadata type documents to be "eo3 compatible" although it's not 100% clear at the moment what that means.

The only use for metadata type documents going forwards appears to be to expose user-friendly metadata aliases for various dataset metadata entries for use in searches (and ensuring those searches are fully indexed).

dataset:
id: [id] # No longer configurable in newer ODCs.
sources: [lineage, source_datasets] # No longer configurable in newer ODCs.
id: [id] # No longer configurable in newer ODCs.
sources: [lineage, source_datasets] # No longer configurable in newer ODCs.

grid_spatial: [grid_spatial, projection]
measurements: [measurements]
creation_dt: [properties, 'odc:processing_datetime']
label: [label]
format: [properties, 'odc:file_format']

search_fields:
platform:
description: Platform code
Expand Down
12 changes: 6 additions & 6 deletions docs/installation/data-preparation-scripts.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Data Preperation Scripts
Data Preparation Scripts
Ariana-B marked this conversation as resolved.
Show resolved Hide resolved
========================

.. note::
Expand Down Expand Up @@ -42,11 +42,11 @@ Download the USGS Collection 1 landsat scenes from any of the links below:

The prepare script for collection 1 - level 1 data is available in
`ls_usgs_prepare.py
<https://github.com/opendatacube/datacube-dataset-config/blob/master/old-prep-scripts/ls_usgs_prepare.py>`_.
<https://github.com/opendatacube/datacube-dataset-config/blob/main/old-prep-scripts/ls_usgs_prepare.py>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should update all this for USGS Landsat Collection 2.

If Collection 1 is still available, it's probably safe to assume it won't be for much longer - and we should at least state this here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to migrate some of the product-specific docs from over here to replace this: https://github.com/opendatacube/datacube-dataset-config

I don't know that any of the prep scripts have been used for like ... 5+ years!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth keeping this page at all, beyond linking to the datacube-dataset-config repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so! The scripts probably don't work...


::

$ wget https://github.com/opendatacube/datacube-dataset-config/raw/master/old-prep-scripts/ls_usgs_prepare.py
$ wget https://github.com/opendatacube/datacube-dataset-config/raw/main/old-prep-scripts/ls_usgs_prepare.py
$ python ls_usgs_prepare.py --help
Usage: ls_usgs_prepare.py [OPTIONS] [DATASETS]...

Expand Down Expand Up @@ -85,14 +85,14 @@ For Landsat collection 1 level 1 product:
To prepare downloaded USGS LEDAPS Landsat scenes for use with the Data Cube, use
the script provided in
`usgs_ls_ard_prepare.py
<https://github.com/opendatacube/datacube-dataset-config/blob/master/agdcv2-ingest/prepare_scripts/landsat_collection/usgs_ls_ard_prepare.py>`_
<https://github.com/opendatacube/datacube-dataset-config/blob/main/agdcv2-ingest/prepare_scripts/landsat_collection/usgs_ls_ard_prepare.py>`_

The following example generates the required Dataset Metadata files, named
`agdc-metadata.yaml` for three landsat scenes.

::

$ wget https://github.com/opendatacube/datacube-dataset-config/raw/master/agdcv2-ingest/prepare_scripts/landsat_collection/usgs_ls_ard_prepare.py
$ wget https://github.com/opendatacube/datacube-dataset-config/raw/main/agdcv2-ingest/prepare_scripts/landsat_collection/usgs_ls_ard_prepare.py
$ python USGS_precollection_oldscripts/usgslsprepare.py --help
Usage: usgslsprepare.py [OPTIONS] [DATASETS]...

Expand Down Expand Up @@ -134,7 +134,7 @@ Then :ref:`index the data <indexing>`.
To view an example of how to `index Sentinel-2 data from S3`_ check out the documentation
available in the datacube-dataset-config_ repository.

.. _`index Sentinel-2 data from S3`: https://github.com/opendatacube/datacube-dataset-config/blob/master/sentinel-2-l2a-cogs.md
.. _`index Sentinel-2 data from S3`: https://github.com/opendatacube/datacube-dataset-config/blob/main/sentinel-2-l2a-cogs.md
.. _datacube-dataset-config: https://github.com/opendatacube/datacube-dataset-config/

Custom Prepare Scripts
Expand Down
12 changes: 12 additions & 0 deletions docs/installation/database/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,18 @@ Alternately, you can configure the ODC connection to Postgres using environment
DB_PASSWORD
DB_DATABASE

To configure a database as a single connection url instead of individual environment variables::

export DATACUBE_DB_URL=postgresql://[username]:[password]@[hostname]:[port]/[database]

Alternatively, for password-less access to a database on localhost::

export DATACUBE_DB_URL=postgresql:///[database]

Further information on database configuration can be found `here <https://github.com/opendatacube/datacube-core/wiki/ODC-EP-010---Replace-Configuration-Layer>`__.
Although the enhancement proposal details incoming changes in v1.9 and beyond, it should largely be compatible with the current behaviour, barring a few
slight discrepancies.
Ariana-B marked this conversation as resolved.
Show resolved Hide resolved

The desired environment can be specified:

1. in code, with the ``env`` argument to the ``datacube.Datacube`` constructor;
Expand Down
7 changes: 4 additions & 3 deletions docs/installation/metadata-types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,16 @@ A Metadata Type defines which fields should be searchable in your product or dat

Three metadata types are added by default called ``eo``, ``telemetry`` and ``eo3``.

You can see the default metadata types in the repository at `datacube/index/default-metadata-types.yaml <https://github.com/opendatacube/datacube-core/blob/develop/datacube/index/default-metadata-types.yaml>`_.

You would create a new metadata type if you want custom fields to be searchable for your products, or
if you want to structure your metadata documents differently.

You can see the default metadata type in the repository at `datacube/index/default-metadata-types.yaml <https://github.com/opendatacube/datacube-core/blob/develop/datacube/index/default-metadata-types.yaml>`_.

To add or alter metadata types, you can use commands like: ``datacube metadata add <path-to-file>``
and to update: ``datacube metadata update <path-to-file>``. Using ``--allow-unsafe`` will allow
you to update metadata types where the changes may have unexpected consequences.

Note that from version 1.9 onward, only eo3-compatible metadata types will be accepted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From version 2.0 onwards.

The postgres driver (currently aka "default", from 1.9 aka "legacy") will continue to support non-eo3-compatible metadata types until it is retired in 2.0.

The postgis driver (currently aka "experimental") already only supports non-eo3-compatible metadata types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


.. literalinclude:: ../config_samples/metadata_types/bare_bone.yaml
:language: yaml
Expand All @@ -22,4 +23,4 @@ you to update metadata types where the changes may have unexpected consequences.

Metadata type yaml file must contain name, description and dataset keys.

Dataset key must contain id, sources, creation_dt, label and search_fields keys.
Dataset key must contain id, sources, grid_spatial, measurements, creation_dt, label, format, and search_fields keys.
Ariana-B marked this conversation as resolved.
Show resolved Hide resolved
83 changes: 23 additions & 60 deletions docs/installation/setup/common_install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,36 +7,32 @@ Python and packages

Python 3.8+ is required.

Anaconda Python
---------------

`Install Anaconda Python <https://www.anaconda.com/download/>`_

Add conda-forge to package channels::

conda config --append channels conda-forge
Conda environment setup
-----------------------

Conda environments are recommended for use in isolating your ODC development environment from your system installation and other Python environments.

Install required Python packages and create a conda environment named ``odc_env``.
We recommend you use Mambaforge to set up your conda virtual environment, as all the required packages are obtained from the conda-forge channel.
Download and install it from `here <https://github.com/conda-forge/miniforge#mambaforge>`_.

Python::
Download the latest version of the Open Data Cube from the `repository <https://github.com/opendatacube/datacube-core>`_::

conda create --name odc_env python=3.8 datacube
git clone https://github.com/opendatacube/datacube-core
cd datacube-core

Activate the ``odc_env`` conda environment::
Create a conda environment named ``cubeenv``::

conda activate odc_env
mamba env create -f conda-environment.yml
Ariana-B marked this conversation as resolved.
Show resolved Hide resolved

Find out more about conda environments `here <https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html>`_.
Activate the ``cubeenv`` conda environment::

Install other packages::
conda activate cubeenv

conda install jupyter matplotlib scipy pytest-cov hypothesis
Find out more about conda environments `here <https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html>`_.


Postgres database configuration
===============================
Postgres testing database configuration
=======================================

This configuration supports local development using your login name.

Expand All @@ -52,7 +48,7 @@ Set a password for the "postgres" database role using the command::

and set the password when prompted. The password text will be hidden from the console for security purposes.

Type **Control+D** or **\q** to exit the posgreSQL prompt.
Type **Control+D** or **\\q** to exit the posgreSQL prompt.

By default in Ubuntu, Postgresql is configured to use ``ident sameuser`` authentication for any connections from the same machine which is useful for development. Check out the excellent Postgresql documentation for more information, but essentially this means that if your Ubuntu username is ``foo`` and you add ``foo`` as a Postgresql user then you can connect to a database without requiring a password for many functions.

Expand All @@ -61,51 +57,18 @@ Since the only user who can connect to a fresh install is the postgres user, her
sudo -u postgres createuser --superuser $USER
sudo -u postgres psql

postgres=# \password $USER
postgres=# \password <foo>

Now we can create an ``agdcintegration`` database for testing::
Now we can create the ``agdcintegration`` and ``odcintegration`` databases for testing::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these names, specifically? Could they be changed to something more descriptive to help the user understand what they're for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the databases used for integration testing. (agdcintegration for tests run against the old default/legacy/postgres index driver, and odcintegration for tests run against the new experimental/postgis index driver).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These databases are used for "integration tests", hence the name. But I agree not most obvious name. Ideally this should be captured as a cli tool, something like datacube bootsrtap --test-db.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel people won't necessarily recognise "agdc" as the legacy option and "odc" as the new option. Would something like: postgresintegration and postgisintegration be more descriptive? Or legacyintegration and currentintegration?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way that's only a convention, one could use whatever database name, as these things are configured via ~/.datacube_integration.conf (see docs below). And we should be testing with non-default names as well to catch any hard-coded assumptions in test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be fantastic to purge this repo of any remaining references to "AGDC" - that name should no longer appear in any ODC-branded repo IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all references to 'agdc' stem from the postgres driver which is going to be deprecated in v2 anyway. I'm not sure doing a potentially breaking rename at this stage would be worth it.


postgres=# create database agdcintegration;
postgres=# create database odcintegration;

Or, directly from the bash terminal::

createdb agdcintegration
createdb odcintegration

Connecting to your own database to try out some SQL should now be as easy as::

psql -d agdcintegration


Open Data Cube source and development configuration
===================================================

Download the latest version of the software from the `repository <https://github.com/opendatacube/datacube-core>`_ ::

git clone https://github.com/opendatacube/datacube-core
cd datacube-core

We need to specify the database user and password for the ODC integration testing. To do this::

cp integration_tests/agdcintegration.conf ~/.datacube_integration.conf

Then edit the ``~/.datacube_integration.conf`` with a text editor and add the following lines replacing ``<foo>`` with your username and ``<foobar>`` with the database user password you set above (not the postgres one, your ``<foo>`` one)::

[datacube]
db_hostname: localhost
db_database: agdcintegration
db_username: <foo>
db_password: <foobar>

Note: For Ubuntu Setup the db_hostname should be set to "/var/run/postgresql". For more refer: https://github.com/opendatacube/datacube-core/issues/1329

Verify it all works
===================

Run the integration tests::

cd datacube-core
./check-code.sh integration_tests

Build the documentation::

cd datacube-core/docs
pip install -r requirements.txt
make html

Then open :file:`_build/html/index.html` in your browser to view the Documentation.
47 changes: 47 additions & 0 deletions docs/installation/setup/macosx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,50 @@ Postgres:


.. include:: common_install.rst


You can now specify the database user and password for ODC integration testing. To do this::

cp integration_tests/agdcintegration.conf ~/.datacube_integration.conf

Then edit the ``~/.datacube_integration.conf`` with a text editor and add the following lines, replacing ``<foo>`` with your username and ``<foobar>`` with the database user password you set above (not the postgres one, your ``<foo>`` one)::

[datacube]
db_hostname: localhost
db_database: agdcintegration
index_driver: default
db_username: <foo>
db_password: <foobar>

[experimental]
db_hostname: localhost
db_database: odcintegration
index_driver: postgis
db_username: <foo>
db_password: <foobar>


Verify it all works
===================

Install additional test dependencies::

cd datacube-core
pip install --upgrade -e '.[test]'

Run the integration tests::

./check-code.sh integration_tests

Note: if moto-based AWS-mock tests fail, you may need to unset all AWS environment variables.

Build the documentation::

pip install --upgrade -e '.[doc]'
cd docs
pip install -r requirements.txt
sudo apt install make
sudo apt install pandoc
make html

Then open :file:`_build/html/index.html` in your browser to view the Documentation.