Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch_all argument for dataset search methods #1553

Closed
wants to merge 11 commits into from

Conversation

SpacemanPaul
Copy link
Contributor

@SpacemanPaul SpacemanPaul commented Feb 20, 2024

Reason for this pull request

Add fetch_all: bool = False argument to all (non-deprecated) dataset search methods, as per EP-13.

Proposed changes

  • fetch_all: bool argument (defaulting to False) added to all non-deprecated dataset search methods. When true, all search results are gathered and returned in a fully instantiated list. When false (the default), the method acts as a generator and yields a result at a time.

Notes:

  1. This typically requires wrapper methods as simply having a yield statement in a method/function makes it act as a generator - you can't decide dynamically whether to yield or return from a function. Basic form looks like:
def method(self, ..., fetch_all: bool = False, **query):
    _results = self._method(...., **query)
    if fetch_all:
        return list(_results)   # return as instantiated list
    else:
        return _results         # return as generator.
  1. The search_by_products() method has two layers of lazy evaluation (in generator mode, yields tuples containing a product and a nested generator that yields datasets. The fetch_all=True version fully instantiates both layers.
  • Tests added / passed
  • Fully documented, including docs/about/whats_new.rst for all changes

Copy link

codecov bot commented Feb 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2ee5434) 85.65% compared to head (8ee99fb) 85.62%.
Report is 2 commits behind head on develop-1.9.

Additional details and impacted files
@@               Coverage Diff               @@
##           develop-1.9    #1553      +/-   ##
===============================================
- Coverage        85.65%   85.62%   -0.04%     
===============================================
  Files              140      140              
  Lines            15175    15279     +104     
===============================================
+ Hits             12998    13082      +84     
- Misses            2177     2197      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SpacemanPaul SpacemanPaul marked this pull request as ready for review February 21, 2024 03:24
@Ariana-B
Copy link
Contributor

Ariana-B commented Feb 22, 2024

It would be really neat if this behaviour could be implemented via a decorator rather than multiple wrapper functions that are all but identical

@SpacemanPaul
Copy link
Contributor Author

SpacemanPaul commented Feb 22, 2024

It would be really neat if this behaviour could be implemented via a decorator rather than multiple wrapper functions that are all but identical

Yeah, I thought about that but I couldn't see how to do it without significantly complicating the contract between the abstract base class and the implementations. Definitely something that could be revisited though.

@SpacemanPaul SpacemanPaul mentioned this pull request Feb 27, 2024
2 tasks
@omad
Copy link
Member

omad commented Feb 28, 2024

I think we're better off without this. The added complexity both internally and in the API isn't worth it. The extra argument requires more explanation and make it less clear what the functions do.

Python is full of iterator functions (especially range() and enumerate()), that users need to call list() on if they want it fully loaded into memory. I think we should do the same with search results.

We could include examples in the documentation of using list() on the results, if we think that's a valuable thing for people to do. (I'm not sure if it is.)

@omad
Copy link
Member

omad commented Feb 28, 2024

search_by_products() is quite a different method, so maybe it could be different. I'm not sure how and how much it is used.

@SpacemanPaul
Copy link
Contributor Author

I think we're better off without this.

Yeah, I can see that. The main use case as far as I can see is internal - for tests. It's much easier to check test results if they are instantiated into a list.

The main justification is the deprecation/removal of dc.index.datasets.search_eager() - which is just an alias for list(dc.index.datasets.search() - I find the name "search_eager"`very unclear and ambiguous, but it has been heavily used historically in tests. I find this way of doing the same thing more intuitive, as well as being more generally applicable.

Having said that, I don't mind dropping it - except that I built the other more useful PR on top of this one, so dropping this one would cause me some short term pain.

@omad
Copy link
Member

omad commented Feb 29, 2024

Yeah, I can see that. The main use case as far as I can see is internal - for tests. It's much easier to check test results if they are instantiated into a list.

The main justification is the deprecation/removal of dc.index.datasets.search_eager() - which is just an alias for list(dc.index.datasets.search() - I find the name "search_eager"`very unclear and ambiguous, but it has been heavily used historically in tests. I find this way of doing the same thing more intuitive, as well as being more generally applicable.

I completely support removing search_eager() for being confusing, and for simplifying the API.

If it's mostly for in tests, I even more think that we should be calling list() explicitly. It's such a common idiom and makes it clearer what is happening. It's also helpful for users of the API as documentation.

Having said that, I don't mind dropping it - except that I built the other more useful PR on top of this one, so dropping this one would cause me some short term pain.

Sorry!

I do think short term pain over long term pain and support though. It's hopefully not too hard to remove from the next PR.

@omad omad closed this Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants