Skip to content

ItemSearch return types overhaul (again) #285

@TomAugspurger

Description

@TomAugspurger

As mentioned in #277 (comment), we now have
ItemSearch.item_collection() and ItemSearch.item_collections(). I don't think it's immediately obvious what
the difference between the two is (spoiler: item_collections() returns an iterator of pystac.ItemCollection objects, one per page;
.item_collection() returns a single pystac.ItemCollection with all the results).

Let's take stock of the current methods for getting stuff out of an ItemSearch, ignoring some deprecated methods:

method return type Notes
items Iterator[Item]
items_as_dicts Iterator[Dict[str, Any]]
item_collections Iterator[ItemCollection]
item_collection ItemCollection
get_all_items_as_dicts Iterator[Dict[str, Any]] Deprecated in favor of get_items or get_item_collections, (?)

We have a couple of dimensions expressed in these methods:

  1. The result set:
    a. A single item (.items(), .items_as_dicts())
    b. Pages at a time (.item_collections(),)
    c. Everything (.item_collection())
  2. The result type:
    a. Plain dict (items_as_dicts, get_all_items_as_dicts)
    b. pystac object (.items(), .item_collection(), .item_collections())

I think each of these is useful.

I've personally used both single items and everything (1a and 1b). I'm not sure how commonly used getting items by page is, but I can imagine some usecases that want to carefully control when HTTP calls happen (and pystac-client uses it internally).

And getting pystac objects or plain dictionaries are both useful. pystac objects for convenience, and plain dictionaries for performance.


So the motivation for this issue is item_collections() vs. item_collection() but I think we can take this opportunity to change a bit more and get to a nice state.

My proposal: make the "by page" explicit by deprecating item_collections() in favor of .pages(). And we want full symmetry,
then we'd have a total of six methods representing the production of the result set & result type.

method return type
items Iterator[pystac.Item]
items_as_dicts Iterator[dict]
pages Iterator[pystac.ItemCollection]
pages_as_dicts Iterator[dict]
item_collection pystac.ItemCollection
item_collection_as_dict dict

The result set is expressed in the method name (items, pages, item_collection). item_collection is a bit of an odd duck there, but I think that's OK (IMO it's better than .everything())

The pystac vs. plain is expressed by the presence or absence of the as_dict suffix.

The item_collection_as_dict and the values in pages_as_dicts would (I think) be the feature collection things:

{
    "type": "FeatureCollection",
    "features": [
        ...
    ]
}

With all that, I think we'd satisfy every use-case and have a coherent naming scheme.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions