Skip to content

Commit

Permalink
Optimizations to catalog_downloader. (#182)
Browse files Browse the repository at this point in the history
* Optimizations to catalog_downloader.

* Downloader: add early-out optimization if collection_filter is defined.

* Add flake8-logging-format to testing requirements.txt

* Update .flake8 with enable-extensions as per flake8-logging-format.

* Fix new flake8 errors.

* Fix some errors from pylint (even though flake8 did not spot them).

* Update docs to match new default log level in Dataset.download().
  • Loading branch information
Alex G Rice committed Nov 7, 2022
1 parent c80bd7c commit 5cd7813
Show file tree
Hide file tree
Showing 9 changed files with 178 additions and 82 deletions.
3 changes: 2 additions & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ exclude =
radiant_mlhub.egg-info
build
max-line-length = 140
per-file-ignores = __init__.py:F401
per-file-ignores = __init__.py:F401
enable-extensions = G
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- Performance improvements to Dataset.download() ([182](https://github.com/radiantearth/radiant-mlhub/pull/182))
- Enable INFO level logging by default in Dataset.download() ([182](https://github.com/radiantearth/radiant-mlhub/pull/182))

### Fixed

### Deprecated

### Developer

- Add flake8-logging-format package ([182](https://github.com/radiantearth/radiant-mlhub/pull/182))

## [v0.5.3]

### Added
Expand Down
8 changes: 8 additions & 0 deletions docs/source/api/radiant_mlhub.client.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,14 @@ radiant\_mlhub.client.datasets module
:undoc-members:
:show-inheritance:

radiant\_mlhub.client.datetime\_utils module
--------------------------------------------

.. automodule:: radiant_mlhub.client.datetime_utils
:members:
:undoc-members:
:show-inheritance:

radiant\_mlhub.client.ml\_models module
---------------------------------------

Expand Down
8 changes: 8 additions & 0 deletions docs/source/api/radiant_mlhub.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,14 @@ radiant\_mlhub.if\_exists module
:undoc-members:
:show-inheritance:

radiant\_mlhub.retry\_config module
-----------------------------------

.. automodule:: radiant_mlhub.retry_config
:members:
:undoc-members:
:show-inheritance:

radiant\_mlhub.session module
-----------------------------

Expand Down
34 changes: 21 additions & 13 deletions docs/source/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -204,10 +204,13 @@ The output directory is the current working directory (by default).
>>> print(nasa_marine_debris)
nasa_marine_debris: Marine Debris Dataset for Object Detection in Planetscope Imagery
>>> nasa_marine_debris.download()
nasa_marine_debris: fetch stac catalog: 258KB [00:00, 75252.46KB/s]
unarchive nasa_marine_debris.tar.gz: 100%|████████████████████████████████████| 2830/2830 [00:00<00:00, 14185.00it/s]
download assets: 100%|█████████████████████████████████████████████████████████████| 2825/2825 [00:19<00:00, 145.36it/s]
nasa_marine_debris: fetch stac catalog: 258KB [00:00, 412.53KB/s]
INFO:radiant_mlhub.client.catalog_downloader:unarchive nasa_marine_debris.tar.gz ...
unarchive nasa_marine_debris.tar.gz: 100%|████████████████████| 2830/2830 [00:00<00:00, 5772.09it/s]
INFO:radiant_mlhub.client.catalog_downloader:create stac asset list (please wait) ...
INFO:radiant_mlhub.client.catalog_downloader:2825 unique assets in stac catalog.
download assets: 100%|██████████████████████| 2825/2825 [03:27<00:00, 13.62it/s]
INFO:radiant_mlhub.client.catalog_downloader:assets saved to nasa_marine_debris
Download STAC Catalog Archive Only
----------------------------------
Expand All @@ -227,20 +230,25 @@ the assets just pass the ``catalog_only`` option to the download method:
Logging
-------
The Python logging module can be used to control the verbosity of the download. Turn in INFO or DEBUG messages to see additional messages:
The `Python logging module <https://docs.python.org/3/howto/logging.html>`_ can
be used to control the verbosity of the downloader. The default log level is
INFO.
* Turn on WARNING level to see fewer log messages.
* Set DEBUG level to see more messages. This includes verbose HTTP-level log messages.
.. code-block:: python
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> logging.basicConfig(level=logging.DEBUG)
>>> nasa_marine_debris.download()
nasa_marine_debris: fetch stac catalog: 258KB [00:00, 34940.12KB/s]
INFO:radiant_mlhub.client.catalog_downloader:unarchive nasa_marine_debris.tar.gz...
unarchive nasa_marine_debris.tar.gz: 100%|████████████████████████████████████| 2830/2830 [00:00<00:00, 14191.09it/s]
INFO:radiant_mlhub.client.catalog_downloader:create stac asset list...
INFO:radiant_mlhub.client.catalog_downloader:2825 unique assets in stac catalog.
download assets: 100%|█████████████████████████████████████████████████████████████| 2825/2825 [00:18<00:00, 152.37it/s]
INFO:radiant_mlhub.client.catalog_downloader:assets saved to /home/user/nasa_marine_debris
...
DEBUG:radiant_mlhub.client.catalog_downloader:(thread id: 123145809592320) https://radiantearth.blob.core.windows.net/mlhub/nasa-marine-debris/labels/20170326_153234_0e26_17069-29758-16.npy -> .../nasa_marine_debris/nasa_marine_debris_labels/nasa_marine_debris_labels_20170326_153234_0e26_17069-29758-16/pixel_bounds.npy
...
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): radiantearth.blob.core.windows.net:443
DEBUG:urllib3.connectionpool:https://radiantearth.blob.core.windows.net:443 "HEAD /mlhub/nasa-marine-debris/labels/20181031_095925_103b_32713-31765-16.npy HTTP/1.1" 200 0
...
(omitted many log messages here)
Output Directory
----------------
Expand Down
10 changes: 7 additions & 3 deletions docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,9 +170,13 @@ is relatively small in size. The downloader can also scale up to the largest dat
>>> print(dataset.estimated_dataset_size) # OK the total dataset assets are ~77MB
77207762
>>> dataset.download()
nasa_marine_debris: fetch stac catalog: 258KB [00:00, 404.83KB/s]
unarchive nasa_marine_debris.tar.gz: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 2830/2830 [00:00<00:00, 4744.75it/s]
download assets: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2825/2825 [03:48<00:00, 12.36it/s]
nasa_marine_debris: fetch stac catalog: 258KB [00:00, 412.53KB/s]
INFO:radiant_mlhub.client.catalog_downloader:unarchive nasa_marine_debris.tar.gz ...
unarchive nasa_marine_debris.tar.gz: 100%|████████████████████| 2830/2830 [00:00<00:00, 5772.09it/s]
INFO:radiant_mlhub.client.catalog_downloader:create stac asset list (please wait) ...
INFO:radiant_mlhub.client.catalog_downloader:2825 unique assets in stac catalog.
download assets: 100%|██████████████████████| 2825/2825 [03:27<00:00, 13.62it/s]
INFO:radiant_mlhub.client.catalog_downloader:assets saved to nasa_marine_debris
The :meth:`Dataset.download <radiant_mlhub.models.Dataset.download>` method
saves the STAC catalog and assets into your current working directory (by default).
Expand Down

0 comments on commit 5cd7813

Please sign in to comment.