Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract Base Classes for index backend interface(s). #1226

Merged
merged 7 commits into from Feb 11, 2022

Conversation

SpacemanPaul
Copy link
Contributor

@SpacemanPaul SpacemanPaul commented Feb 7, 2022

Reason for this pull request

This is a first baby step towards implementing ODC-EP02, ODC-EP03 and ODC-EP04. [^1] The ODC includes some support for defining and using alternative index backends, however, only one backend implementation actually exists, and the only definition of the interface a new index backend is supposed to conform to, is the implementation of that default index backend.

This PR makes no significant code changes.

[^1] https://github.com/opendatacube/datacube-core/wiki/enhancement-proposals

Proposed changes

I have created a family of Abstract Base Classes (abc.ABC) for the key components of the index backend. At this stage they are simply a copy of the methods on the index backend that form the API used by the rest of the ODC. The main difference is they have been fully annotated with type hints.

As work on the EP02 and EP03 continues, it is likely that these ABC interfaces will evolve. A key aim is to keep the legacy default index drive backwards compatible with its existing functionality as much as possible. This PR should not result in any functional changes. The only difference will be some internal ODC classes now have new Abstract Base Classes in their class hierarchy.

I've also added type hints to datacube.utils.changes - mostly for my own sanity. It's much easier to follow now, and I've fixed an historic inconsistency in the return type of the allow_any function.

Note: Some API inconsistencies have been flagged as TODO's in comments. These will be addressed in a future PR. I wanted to avoid actual API changes in the default index driver in this PR.

@codecov
Copy link

codecov bot commented Feb 7, 2022

Codecov Report

Merging #1226 (22c5869) into develop (b062184) will increase coverage by 0.03%.
The diff coverage is 96.10%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1226      +/-   ##
===========================================
+ Coverage    93.75%   93.79%   +0.03%     
===========================================
  Files          102      103       +1     
  Lines        10406    10597     +191     
===========================================
+ Hits          9756     9939     +183     
- Misses         650      658       +8     
Impacted Files Coverage Δ
datacube/index/abstract.py 94.89% <94.89%> (ø)
datacube/drivers/indexes.py 91.30% <100.00%> (ø)
datacube/index/_datasets.py 94.81% <100.00%> (-0.02%) ⬇️
datacube/index/_metadata_types.py 94.25% <100.00%> (-0.54%) ⬇️
datacube/index/_products.py 93.93% <100.00%> (-0.55%) ⬇️
datacube/index/_users.py 100.00% <100.00%> (ø)
datacube/index/index.py 100.00% <100.00%> (+4.00%) ⬆️
datacube/utils/changes.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b062184...22c5869. Read the comment docs.

@woodcockr
Copy link
Member

Good to see these EPs able to get some attention now. Great!
On interesting side effect of pulling out the abstract interfaces from the rest of the code for me was it provided focus to the question of what do we really need next? (e.g. hyperspectral and 3D spatial as part of index and query mechanisms).
I need to go blow the dust of the EPs (and my brain) and see what comments I need to add to the suggestion list.

@SpacemanPaul
Copy link
Contributor Author

Thanks Rob. I completely agree, that's why this is where I started!

@SpacemanPaul SpacemanPaul marked this pull request as ready for review February 10, 2022 05:10
@Kirill888
Copy link
Member

This looks fine, but I'm not sure about return types being Iterable[..]. I think a function that uses yield .. should be declared to be -> Iterator[..], and Iterable[..] is mostly used on input types. Kinda like -> Dict[..] if that's what you return, but : Mapping[..] if that's what you accept.

@SpacemanPaul
Copy link
Contributor Author

SpacemanPaul commented Feb 10, 2022

This looks fine, but I'm not sure about return types being Iterable[..]. I think a function that uses yield .. should be declared to be -> Iterator[..], and Iterable[..] is mostly used on input types. Kinda like -> Dict[..] if that's what you return, but : Mapping[..] if that's what you accept.

In general I would agree. But here I'm deliberately trying to be a bit more minimal, so as to give driver developers more flexibility in how they implement the interface.

@Kirill888
Copy link
Member

This looks fine, but I'm not sure about return types being Iterable[..]. I think a function that uses yield .. should be declared to be -> Iterator[..], and Iterable[..] is mostly used on input types. Kinda like -> Dict[..] if that's what you return, but : Mapping[..] if that's what you accept.

In general I would agree. But here I'm deliberately trying to be a bit more minimal, so as to give driver developers more flexibility in how they implement the interface.

Honestly, I would go the other way. Force -> List[Product|MetadataType] for those APIs that currently return -> Iterable[Product|MetadataType], I think only dataset query needs to be lazy. Product list being lazy just annoys me every time I try to use it in the notebook, and I don't see when you'd need it to be lazy.

@SpacemanPaul
Copy link
Contributor Author

SpacemanPaul commented Feb 10, 2022

In general I would agree. But here I'm deliberately trying to be a bit more minimal, so as to give driver developers more flexibility in how they implement the interface.

Honestly, I would go the other way. Force -> List[Product|MetadataType] for those APIs that currently return -> Iterable[Product|MetadataType], I think only dataset query needs to be lazy. Product list being lazy just annoys me every time I try to use it in the notebook, and I don't see when you'd need it to be lazy.

Yeah, I can see your point, and I probably will take this approach as I start to develop the new postgis index driver - but my approach at this stage is to avoid changing the behaviour of the existing legacy/default Postgres index driver as much as possible for as long as possible. (And certainly my intention was that this PR in particular should not change the behaviour of the legacy/default/postgres/only index driver at all.)

We may well revisit this question further down the track, but I don't think this PR is right place for it.

Copy link
Contributor

@jeremyh jeremyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

  • I'm a little uneasy about leaving the old types in docstrings everywhere when they duplicate real type hints. Many of them are different to the real type hints but not any more understandable.

    (I use autodoc-typehints in eodatasets' docs to try to avoid that duplication)

  • I suspect some of the methods marked as taking a "UUID" param can already accept a string too (ie, a "DSID"), but I haven't tested. It's not a high priority, but API consistency is nice.

  • To bikeshed (very optional): I'm not a huge fan of the name "DSID", as it's a non-standard contraction, and method signatures use it alongside other UUIDs (eg: the return type) which are all technically dataset ids. I don't know what name would be better: perhaps NonstrictUUID or something :)

"""

@abstractmethod
def get_all_dataset_ids(self, archived: bool) -> Iterable[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice this before, but we have one and only one method that returns dataset ids as strings instead of uuids :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha - that one is absolutely my fault from a previous PR - I will fix.

"""
Perform a search, returning count of results.

:param dict[str,str|float|datacube.model.Range] query:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This param was wrong in the old docs. It's repeated in many other docstrings below too.

If we keep it, it should be a union, not a dict, and have the other field types in it: datetime, int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good pickup, shall review.

@SpacemanPaul SpacemanPaul merged commit ac9a466 into develop Feb 11, 2022
@SpacemanPaul SpacemanPaul deleted the index-interface branch February 11, 2022 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants