fetch_all argument for dataset search methods #1553

SpacemanPaul · 2024-02-20T21:20:07Z

Reason for this pull request

Add fetch_all: bool = False argument to all (non-deprecated) dataset search methods, as per EP-13.

Proposed changes

fetch_all: bool argument (defaulting to False) added to all non-deprecated dataset search methods. When true, all search results are gathered and returned in a fully instantiated list. When false (the default), the method acts as a generator and yields a result at a time.

Notes:

This typically requires wrapper methods as simply having a yield statement in a method/function makes it act as a generator - you can't decide dynamically whether to yield or return from a function. Basic form looks like:

def method(self, ..., fetch_all: bool = False, **query):
    _results = self._method(...., **query)
    if fetch_all:
        return list(_results)   # return as instantiated list
    else:
        return _results         # return as generator.

The search_by_products() method has two layers of lazy evaluation (in generator mode, yields tuples containing a product and a nested generator that yields datasets. The fetch_all=True version fully instantiates both layers.

Tests added / passed
Fully documented, including docs/about/whats_new.rst for all changes

…eeded.

codecov · 2024-02-21T03:11:08Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2ee5434) 85.65% compared to head (8ee99fb) 85.62%.
Report is 2 commits behind head on develop-1.9.

Additional details and impacted files

@@               Coverage Diff               @@
##           develop-1.9    #1553      +/-   ##
===============================================
- Coverage        85.65%   85.62%   -0.04%     
===============================================
  Files              140      140              
  Lines            15175    15279     +104     
===============================================
+ Hits             12998    13082      +84     
- Misses            2177     2197      +20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Ariana-B · 2024-02-22T06:25:48Z

It would be really neat if this behaviour could be implemented via a decorator rather than multiple wrapper functions that are all but identical

SpacemanPaul · 2024-02-22T22:55:39Z

It would be really neat if this behaviour could be implemented via a decorator rather than multiple wrapper functions that are all but identical

Yeah, I thought about that but I couldn't see how to do it without significantly complicating the contract between the abstract base class and the implementations. Definitely something that could be revisited though.

omad · 2024-02-28T23:10:46Z

I think we're better off without this. The added complexity both internally and in the API isn't worth it. The extra argument requires more explanation and make it less clear what the functions do.

Python is full of iterator functions (especially range() and enumerate()), that users need to call list() on if they want it fully loaded into memory. I think we should do the same with search results.

We could include examples in the documentation of using list() on the results, if we think that's a valuable thing for people to do. (I'm not sure if it is.)

omad · 2024-02-28T23:13:14Z

search_by_products() is quite a different method, so maybe it could be different. I'm not sure how and how much it is used.

SpacemanPaul · 2024-02-28T23:44:22Z

I think we're better off without this.

Yeah, I can see that. The main use case as far as I can see is internal - for tests. It's much easier to check test results if they are instantiated into a list.

The main justification is the deprecation/removal of dc.index.datasets.search_eager() - which is just an alias for list(dc.index.datasets.search() - I find the name "search_eager"`very unclear and ambiguous, but it has been heavily used historically in tests. I find this way of doing the same thing more intuitive, as well as being more generally applicable.

Having said that, I don't mind dropping it - except that I built the other more useful PR on top of this one, so dropping this one would cause me some short term pain.

omad · 2024-02-29T01:16:39Z

Yeah, I can see that. The main use case as far as I can see is internal - for tests. It's much easier to check test results if they are instantiated into a list.

The main justification is the deprecation/removal of dc.index.datasets.search_eager() - which is just an alias for list(dc.index.datasets.search() - I find the name "search_eager"`very unclear and ambiguous, but it has been heavily used historically in tests. I find this way of doing the same thing more intuitive, as well as being more generally applicable.

I completely support removing search_eager() for being confusing, and for simplifying the API.

If it's mostly for in tests, I even more think that we should be calling list() explicitly. It's such a common idiom and makes it clearer what is happening. It's also helpful for users of the API as documentation.

Having said that, I don't mind dropping it - except that I built the other more useful PR on top of this one, so dropping this one would cause me some short term pain.

Sorry!

I do think short term pain over long term pain and support though. It's hopefully not too hard to remove from the next PR.

SpacemanPaul added 10 commits February 20, 2024 13:36

Add fetch_all methods.

7908fb8

Bug/typo fixes - passing existing tests.

d42a401

Fix initial approach, test coverage.

cc3582a

Lintage.

4e70dfc

Push wrapper methods up to ABC. Still some cleanup of memory driver n…

5cbdb2e

…eeded.

Lintage and memory driver cleanup.

edab0a3

Update whats_new.rst

061cb47

Attempt to fix broken codecov action.

7fb5bfb

Which version of action is active??

906436e

Codecov aha

126ffe0

SpacemanPaul marked this pull request as ready for review February 21, 2024 03:24

SpacemanPaul requested review from omad and Ariana-B February 21, 2024 03:24

Remove redundant deprecation warning

8ee99fb

SpacemanPaul mentioned this pull request Feb 21, 2024

Fix codecov action and update whats_new.rst #1554

Merged

2 tasks

SpacemanPaul mentioned this pull request Feb 27, 2024

Search returning new args #1557

Merged

2 tasks

omad closed this Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fetch_all argument for dataset search methods #1553

fetch_all argument for dataset search methods #1553

SpacemanPaul commented Feb 20, 2024 •

edited

codecov bot commented Feb 21, 2024 •

edited

Ariana-B commented Feb 22, 2024 •

edited

SpacemanPaul commented Feb 22, 2024 •

edited

omad commented Feb 28, 2024

omad commented Feb 28, 2024

SpacemanPaul commented Feb 28, 2024

omad commented Feb 29, 2024

fetch_all argument for dataset search methods #1553

fetch_all argument for dataset search methods #1553

Conversation

SpacemanPaul commented Feb 20, 2024 • edited

Reason for this pull request

Proposed changes

codecov bot commented Feb 21, 2024 • edited

Codecov Report

Ariana-B commented Feb 22, 2024 • edited

SpacemanPaul commented Feb 22, 2024 • edited

omad commented Feb 28, 2024

omad commented Feb 28, 2024

SpacemanPaul commented Feb 28, 2024

omad commented Feb 29, 2024

SpacemanPaul commented Feb 20, 2024 •

edited

codecov bot commented Feb 21, 2024 •

edited

Ariana-B commented Feb 22, 2024 •

edited

SpacemanPaul commented Feb 22, 2024 •

edited