Abstract Base Classes for index backend interface(s). #1226

SpacemanPaul · 2022-02-07T05:01:09Z

Reason for this pull request

This is a first baby step towards implementing ODC-EP02, ODC-EP03 and ODC-EP04. [^1] The ODC includes some support for defining and using alternative index backends, however, only one backend implementation actually exists, and the only definition of the interface a new index backend is supposed to conform to, is the implementation of that default index backend.

This PR makes no significant code changes.

[^1] https://github.com/opendatacube/datacube-core/wiki/enhancement-proposals

Proposed changes

I have created a family of Abstract Base Classes (abc.ABC) for the key components of the index backend. At this stage they are simply a copy of the methods on the index backend that form the API used by the rest of the ODC. The main difference is they have been fully annotated with type hints.

As work on the EP02 and EP03 continues, it is likely that these ABC interfaces will evolve. A key aim is to keep the legacy default index drive backwards compatible with its existing functionality as much as possible. This PR should not result in any functional changes. The only difference will be some internal ODC classes now have new Abstract Base Classes in their class hierarchy.

I've also added type hints to datacube.utils.changes - mostly for my own sanity. It's much easier to follow now, and I've fixed an historic inconsistency in the return type of the allow_any function.

Note: Some API inconsistencies have been flagged as TODO's in comments. These will be addressed in a future PR. I wanted to avoid actual API changes in the default index driver in this PR.

codecov · 2022-02-07T05:08:58Z

Codecov Report

Merging #1226 (22c5869) into develop (b062184) will increase coverage by 0.03%.
The diff coverage is 96.10%.

@@             Coverage Diff             @@
##           develop    #1226      +/-   ##
===========================================
+ Coverage    93.75%   93.79%   +0.03%     
===========================================
  Files          102      103       +1     
  Lines        10406    10597     +191     
===========================================
+ Hits          9756     9939     +183     
- Misses         650      658       +8

Impacted Files	Coverage Δ
datacube/index/abstract.py	`94.89% <94.89%> (ø)`
datacube/drivers/indexes.py	`91.30% <100.00%> (ø)`
datacube/index/_datasets.py	`94.81% <100.00%> (-0.02%)`	⬇️
datacube/index/_metadata_types.py	`94.25% <100.00%> (-0.54%)`	⬇️
datacube/index/_products.py	`93.93% <100.00%> (-0.55%)`	⬇️
datacube/index/_users.py	`100.00% <100.00%> (ø)`
datacube/index/index.py	`100.00% <100.00%> (+4.00%)`	⬆️
datacube/utils/changes.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b062184...22c5869. Read the comment docs.

woodcockr · 2022-02-07T22:16:21Z

Good to see these EPs able to get some attention now. Great!
On interesting side effect of pulling out the abstract interfaces from the rest of the code for me was it provided focus to the question of what do we really need next? (e.g. hyperspectral and 3D spatial as part of index and query mechanisms).
I need to go blow the dust of the EPs (and my brain) and see what comments I need to add to the suggestion list.

SpacemanPaul · 2022-02-07T22:31:13Z

Thanks Rob. I completely agree, that's why this is where I started!

Also fix return type of allow_any.

datacube/index/abstract.py

datacube/utils/changes.py

Kirill888 · 2022-02-10T05:34:20Z

This looks fine, but I'm not sure about return types being Iterable[..]. I think a function that uses yield .. should be declared to be -> Iterator[..], and Iterable[..] is mostly used on input types. Kinda like -> Dict[..] if that's what you return, but : Mapping[..] if that's what you accept.

SpacemanPaul · 2022-02-10T05:40:49Z

This looks fine, but I'm not sure about return types being Iterable[..]. I think a function that uses yield .. should be declared to be -> Iterator[..], and Iterable[..] is mostly used on input types. Kinda like -> Dict[..] if that's what you return, but : Mapping[..] if that's what you accept.

In general I would agree. But here I'm deliberately trying to be a bit more minimal, so as to give driver developers more flexibility in how they implement the interface.

Kirill888 · 2022-02-10T05:52:57Z

This looks fine, but I'm not sure about return types being Iterable[..]. I think a function that uses yield .. should be declared to be -> Iterator[..], and Iterable[..] is mostly used on input types. Kinda like -> Dict[..] if that's what you return, but : Mapping[..] if that's what you accept.

In general I would agree. But here I'm deliberately trying to be a bit more minimal, so as to give driver developers more flexibility in how they implement the interface.

Honestly, I would go the other way. Force -> List[Product|MetadataType] for those APIs that currently return -> Iterable[Product|MetadataType], I think only dataset query needs to be lazy. Product list being lazy just annoys me every time I try to use it in the notebook, and I don't see when you'd need it to be lazy.

SpacemanPaul · 2022-02-10T08:26:14Z

In general I would agree. But here I'm deliberately trying to be a bit more minimal, so as to give driver developers more flexibility in how they implement the interface.

Honestly, I would go the other way. Force -> List[Product|MetadataType] for those APIs that currently return -> Iterable[Product|MetadataType], I think only dataset query needs to be lazy. Product list being lazy just annoys me every time I try to use it in the notebook, and I don't see when you'd need it to be lazy.

Yeah, I can see your point, and I probably will take this approach as I start to develop the new postgis index driver - but my approach at this stage is to avoid changing the behaviour of the existing legacy/default Postgres index driver as much as possible for as long as possible. (And certainly my intention was that this PR in particular should not change the behaviour of the legacy/default/postgres/only index driver at all.)

We may well revisit this question further down the track, but I don't think this PR is right place for it.

jeremyh

Looks good to me 👍

I'm a little uneasy about leaving the old types in docstrings everywhere when they duplicate real type hints. Many of them are different to the real type hints but not any more understandable.

(I use autodoc-typehints in eodatasets' docs to try to avoid that duplication)
I suspect some of the methods marked as taking a "UUID" param can already accept a string too (ie, a "DSID"), but I haven't tested. It's not a high priority, but API consistency is nice.

To bikeshed (very optional): I'm not a huge fan of the name "DSID", as it's a non-standard contraction, and method signatures use it alongside other UUIDs (eg: the return type) which are all technically dataset ids. I don't know what name would be better: perhaps NonstrictUUID or something :)

jeremyh · 2022-02-10T22:48:25Z

datacube/index/abstract.py

+        """
+
+    @abstractmethod
+    def get_all_dataset_ids(self, archived: bool) -> Iterable[str]:


I didn't notice this before, but we have one and only one method that returns dataset ids as strings instead of uuids :/

Haha - that one is absolutely my fault from a previous PR - I will fix.

jeremyh · 2022-02-10T22:51:58Z

datacube/index/abstract.py

+        """
+        Perform a search, returning count of results.
+
+        :param dict[str,str|float|datacube.model.Range] query:


This param was wrong in the old docs. It's repeated in many other docstrings below too.

If we keep it, it should be a union, not a dict, and have the other field types in it: datetime, int.

Good pickup, shall review.

Abstract Base Classes for index backend interface(s).

6e5b752

SpacemanPaul added 4 commits February 10, 2022 10:42

Add typehints to utils.changes.py.

77d5827

Also fix return type of allow_any.

Typehints on IndexDriverCache

e92765c

Update whats_new.rst

f293f90

Fix circular import dependency.

4bc1df3

SpacemanPaul marked this pull request as ready for review February 10, 2022 05:10

SpacemanPaul requested review from omad, Kirill888 and jeremyh February 10, 2022 05:10

Kirill888 reviewed Feb 10, 2022

View reviewed changes

datacube/index/abstract.py Outdated Show resolved Hide resolved

Kirill888 reviewed Feb 10, 2022

View reviewed changes

datacube/utils/changes.py Outdated Show resolved Hide resolved

Respond to @Kirill888's comments.

a16cbae

SpacemanPaul requested a review from Kirill888 February 10, 2022 05:45

Kirill888 approved these changes Feb 10, 2022

View reviewed changes

jeremyh approved these changes Feb 10, 2022

View reviewed changes

Refined docstrings and some typehints based on feedback from @jeremyh

22c5869

alexgleith approved these changes Feb 11, 2022

View reviewed changes

SpacemanPaul merged commit ac9a466 into develop Feb 11, 2022

SpacemanPaul deleted the index-interface branch February 11, 2022 02:59

SpacemanPaul mentioned this pull request Feb 22, 2022

Index api consistency #1234

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract Base Classes for index backend interface(s). #1226

Abstract Base Classes for index backend interface(s). #1226

SpacemanPaul commented Feb 7, 2022 •

edited

codecov bot commented Feb 7, 2022 •

edited

woodcockr commented Feb 7, 2022

SpacemanPaul commented Feb 7, 2022

Kirill888 commented Feb 10, 2022

SpacemanPaul commented Feb 10, 2022 •

edited

Kirill888 commented Feb 10, 2022

SpacemanPaul commented Feb 10, 2022 •

edited

jeremyh left a comment •

edited

jeremyh Feb 10, 2022

SpacemanPaul Feb 10, 2022

jeremyh Feb 10, 2022

SpacemanPaul Feb 10, 2022

Abstract Base Classes for index backend interface(s). #1226

Abstract Base Classes for index backend interface(s). #1226

Conversation

SpacemanPaul commented Feb 7, 2022 • edited

Reason for this pull request

Proposed changes

codecov bot commented Feb 7, 2022 • edited

Codecov Report

woodcockr commented Feb 7, 2022

SpacemanPaul commented Feb 7, 2022

Kirill888 commented Feb 10, 2022

SpacemanPaul commented Feb 10, 2022 • edited

Kirill888 commented Feb 10, 2022

SpacemanPaul commented Feb 10, 2022 • edited

jeremyh left a comment • edited

Choose a reason for hiding this comment

jeremyh Feb 10, 2022

Choose a reason for hiding this comment

SpacemanPaul Feb 10, 2022

Choose a reason for hiding this comment

jeremyh Feb 10, 2022

Choose a reason for hiding this comment

SpacemanPaul Feb 10, 2022

Choose a reason for hiding this comment

SpacemanPaul commented Feb 7, 2022 •

edited

codecov bot commented Feb 7, 2022 •

edited

SpacemanPaul commented Feb 10, 2022 •

edited

SpacemanPaul commented Feb 10, 2022 •

edited

jeremyh left a comment •

edited