Skip to content

Commit

Permalink
Merge pull request #1513 from djhoese/feature-data-download
Browse files Browse the repository at this point in the history
Add auxiliary data download API
  • Loading branch information
djhoese committed Feb 12, 2021
2 parents f24a874 + f7eb5fb commit edcceb1
Show file tree
Hide file tree
Showing 17 changed files with 874 additions and 60 deletions.
1 change: 1 addition & 0 deletions continuous_integration/environment.yaml
Expand Up @@ -39,6 +39,7 @@ dependencies:
- fsspec
- pylibtiff
- python-geotiepoints
- pooch
- pip
- pip:
- trollsift
Expand Down
1 change: 1 addition & 0 deletions doc/rtd_environment.yml
Expand Up @@ -10,6 +10,7 @@ dependencies:
- graphviz
- numpy
- pillow
- pooch
- pyresample
- setuptools
- setuptools_scm
Expand Down
1 change: 1 addition & 0 deletions doc/source/conf.py
Expand Up @@ -276,4 +276,5 @@ def __getattr__(cls, name):
'xarray': ('https://xarray.pydata.org/en/stable', None),
'rasterio': ('https://rasterio.readthedocs.io/en/latest', None),
'donfig': ('https://donfig.readthedocs.io/en/latest', None),
'pooch': ('https://www.fatiando.org/pooch/latest/', None),
}
17 changes: 17 additions & 0 deletions doc/source/config.rst
Expand Up @@ -133,6 +133,8 @@ merging of configuration files, they are merged in reverse order. This means
"base" configuration paths should be at the end of the list and custom/user
paths should be at the beginning of the list.

.. _data_dir_setting:

Data Directory
^^^^^^^^^^^^^^

Expand All @@ -146,6 +148,21 @@ defaults to a different path depending on your operating system following the
`appdirs <https://github.com/ActiveState/appdirs#some-example-output>`_
"user data dir".

.. _download_aux_setting:

Download Auxiliary Data
^^^^^^^^^^^^^^^^^^^^^^^

* **Environment variable**: ``SATPY_DOWNLOAD_AUX``
* **YAML/Config Key**: ``download_aux``
* **Default**: True

Whether to allow downloading of auxiliary files for certain Satpy operations.
See :doc:`dev_guide/aux_data` for more information. If ``True`` then Satpy
will download and cache any necessary data files to :ref:`data_dir_setting`
when needed. If ``False`` then pre-downloaded files will be used, but any
other files will not be downloaded or checked for validity.

.. _component_configuration:

Component Configuration
Expand Down
122 changes: 122 additions & 0 deletions doc/source/dev_guide/aux_data.rst
@@ -0,0 +1,122 @@
Auxiliary Data Download
=======================

Sometimes Satpy components need some extra data files to get their work
done properly. These include files like Look Up Tables (LUTs), coefficients,
or Earth model data (ex. elevations). This includes any file that would be too
large to be included in the Satpy python package; anything bigger than a small
text file. To help with this, Satpy includes utilities for downloading and
caching these files only when your component is used. This saves the user from
wasting time and disk space downloading files they may never use.
This functionality is made possible thanks to the
`Pooch library <https://www.fatiando.org/pooch/latest/>`_.

Downloaded files are stored in the directory configured by
:ref:`data_dir_setting`.

Adding download functionality
-----------------------------

The utility functions for data downloading include a two step process:

1. **Registering**: Tell Satpy what files might need to be downloaded and used
later.
2. **Retrieving**: Ask Satpy to download and store the files locally.

Registering
^^^^^^^^^^^

Registering a file for downloading tells Satpy the remote URL for the file,
and an optional hash. The hash is used to verify a successful download.
Registering can also include a ``filename`` to tell Satpy what to name the
file when it is downloaded. If not provided it will be determined from the URL.
Once registered, Satpy can be told to retrieve the file (see below) by using a
"cache key". Cache keys follow the general scheme of
``<component_type>/<filename>`` (ex. ``readers/README.rst``).

Satpy includes a low-level function and a high-level Mixin class for
registering files. The higher level class is recommended for any Satpy
component like readers, writers, and compositors. The lower-level
:func:`~satpy.aux_download.register_file` function can be used for any other
use case.

The :class:`~satpy.aux_download.DataMixIn` class is automatically included
in the :class:`~satpy.readers.yaml_reader.FileYAMLReader` and
:class:`~satpy.writers.Writer` base classes. For any other component (like
a compositor) you should include it as another parent class:

.. code-block:: python
from satpy.aux_download import DataDownloadMixin
from satpy.composites import GenericCompositor
class MyCompositor(GenericCompositor, DataDownloadMixin):
"""Compositor that uses downloaded files."""
def __init__(self, name, url=None, known_hash=None, **kwargs):
super().__init__(name, **kwargs)
data_files = [{'url': url, 'known_hash': known_hash}]
self.register_data_files(data_files)
However your code registers files, to be consistent it must do it during
initialization so that the :func:`~satpy.aux_download.find_registerable_files`.
If your component isn't a reader, writer, or compositor then this function
will need to be updated to find and load your registered files. See
:ref:`offline_aux_downloads` below for more information.

As mentioned, the mixin class is included in the base reader and writer class.
To register files in these cases, include a ``data_files`` section in your
YAML configuration file. For readers this would go under the ``reader``
section and for writers the ``writer`` section. This parameter is a list
of dictionaries including a ``url``, ``known_hash``, and optional
``filename``. For example::

reader:
name: abi_l1b
short_name: ABI L1b
long_name: GOES-R ABI Level 1b
... other metadata ...
data_files:
- url: "https://example.com/my_data_file.dat"
- url: "https://raw.githubusercontent.com/pytroll/satpy/master/README.rst"
known_hash: "sha256:5891286b63e7745de08c4b0ac204ad44cfdb9ab770309debaba90308305fa759"
- url: "https://raw.githubusercontent.com/pytroll/satpy/master/RELEASING.md"
filename: "satpy_releasing.md"
known_hash: null

See the :class:`~satpy.aux_download.DataDownloadMixin` for more information.

Retrieving
^^^^^^^^^^

Files that have been registered (see above) can be retrieved by calling the
:func:`~satpy.aux_download.retrieve` function. This function expects a single
argument: the cache key. Cache keys are returned by registering functions, but
can also be pre-determined by following the scheme
``<component_type>/<filename>`` (ex. ``readers/README.rst``).
Retrieving a file will download it to local disk if needed and then return
the local pathname. Data is stored locally in the :ref:`data_dir_setting`.
It is up to the caller to then open the file.

.. _offline_aux_downloads:

Offline Downloads
-----------------

To assist with operational environments, Satpy includes a
:func:`~satpy.aux_download.retrieve_all` function that will try to find all
files that Satpy components may need to download in the future and download
them to the current directory specified by :ref:`data_dir_setting`.
This function allows you to specify a list of ``readers``, ``writers``, or
``composite_sensors`` to limit what components are checked for files to
download.

The ``retrieve_all`` function is also available through a command line script
called ``satpy_retrieve_all_aux_data``. Run the following for usage information.

.. code-block:: bash
satpy_retrieve_all_aux_data --help
To make sure that no additional files are downloaded when running Satpy see
:ref:`download_aux_setting`.
10 changes: 9 additions & 1 deletion doc/source/dev_guide/custom_reader.rst
Expand Up @@ -571,4 +571,12 @@ One way of implementing a file handler is shown below:
# left as an exercise to the reader :)
If you have any questions, please contact the
:ref:`Satpy developers <dev_help>`.
:ref:`Satpy developers <dev_help>`.

Auxiliary File Download
-----------------------

If your reader needs additional data files to do calibrations, corrections,
or anything else see the :doc:`aux_data` document for more information on
how to download and cache these files without including them in the Satpy
python package.
1 change: 1 addition & 0 deletions doc/source/dev_guide/index.rst
Expand Up @@ -16,6 +16,7 @@ at the pages listed below.
custom_reader
plugins
satpy_internals
aux_data

Coding guidelines
=================
Expand Down
6 changes: 4 additions & 2 deletions satpy/_config.py
Expand Up @@ -38,6 +38,7 @@
'cache_dir': _satpy_dirs.user_cache_dir,
'data_dir': _satpy_dirs.user_data_dir,
'config_path': [],
'download_aux': True,
}

# Satpy main configuration object
Expand Down Expand Up @@ -125,13 +126,14 @@ def config_search_paths(filename, search_dirs=None, **kwargs):
return paths[::-1]


def glob_config(pattern):
def glob_config(pattern, search_dirs=None):
"""Return glob results for all possible configuration locations.
Note: This method does not check the configuration "base" directory if the pattern includes a subdirectory.
This is done for performance since this is usually used to find *all* configs for a certain component.
"""
patterns = config_search_paths(pattern, check_exists=False)
patterns = config_search_paths(pattern, search_dirs=search_dirs,
check_exists=False)
for pattern_fn in patterns:
for path in glob.iglob(pattern_fn):
yield path
Expand Down

0 comments on commit edcceb1

Please sign in to comment.