Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve discoverability of backend engine options #8447

Open
benbovy opened this issue Nov 12, 2023 · 5 comments · May be fixed by #8520
Open

Improve discoverability of backend engine options #8447

benbovy opened this issue Nov 12, 2023 · 5 comments · May be fixed by #8520

Comments

@benbovy
Copy link
Member

benbovy commented Nov 12, 2023

Is your feature request related to a problem?

Backend engine options are not easily discoverable and we need to know or figure out them before passing it as kwargs to xr.open_dataset().

Describe the solution you'd like

The solution is similar to the one proposed in #8002 for setting a new index.

The API could look like this:

import xarray as xr

ds = xr.open_dataset(
    file_or_obj,
    engine=xr.backends.engine("myengine").with_options(
        option1=True,
        option2=100,
    ),
)

where xr.backends.engine("myengine") returns the MyEngineBackendEntrypoint subclass.

We would need to extend the API for BackendEntrypoint with a .with_options() factory method:

class BackendEntrypoint:
    _open_dataset_options: dict[str, Any]

    @classmethod
    def with_options(cls):
        """This backend does not implement `with_options`."""
        raise NotImplementedError()

Such that

class MyEngineBackendEntryPoint(BackendEntrypoint):
    open_dataset_parameters = ("option1", "option2")

    @classmethod
    def with_options(
        cls,
        option1: bool = False,
        option2: int | None = None,
    ):
        """Get the backend with user-defined options.

        Parameters
        -----------
        option1 : bool, optional
            This is option1.
        option2 : int, optional
            This is option2.

        """
        obj = cls()

        # maybe validate the given input options
        if option2 is None:
            option2 = 1

        obj._options = {"option1": option1, "option2": option2}

        return obj

    def open_dataset(
        self,
        filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
        *,
        drop_variables: str | Iterable[str] | None = None,
        **kwargs,    # no static checker error (liskov substitution principle)
    ):
        # kwargs passed directly to open_dataset take precedence to options
        # or alternatively raise an error?
        option1 = kwargs.get("option1", self._options.get("option1", False))

        ...

Pros:

  • Using .with_options(...) would seamlessly work with IDE auto-completion, static type checkers (I guess? I'm not sure how static checkers support entry-points), documentation, etc.
  • There is no breaking change (xr.open_dataset(obj, engine=...) accepts either a string or a BackenEntryPoint subtype but not yet a BackendEntryPoint object) and this feature could be adopted progressively by existing 3rd-party backends.

Cons:

  • The possible duplicated declaration of options among open_dataset_parameters, .with_options() and .open_dataset() does not look super nice but I don't really know how to avoid that.

Describe alternatives you've considered

A BackendEntryPoint.with_options() factory is not really needed and we could just go with BackendEntryPoint.__init__() instead. Perhaps with_options looks a bit clearer and leaves room for more flexibility in __init__ , though?

Additional context

cc @jsignell stac-utils/pystac#846 (comment)

@max-sixty
Copy link
Collaborator

A BackendEntryPoint.with_options() factory is not really needed and we could just go with BackendEntryPoint.__init__() instead. Perhaps with_options looks a bit clearer and leaves room for more flexibility in __init__ , though?

This was my question on looking at it initially.

Traditionally the main advantage of a builder pattern like this would be to incrementally add options foo.with_option(bar='y').with_option(baz='z'), but here I imagine that's less helpful...

@benbovy
Copy link
Member Author

benbovy commented Nov 14, 2023

Yes I agree .with_options() is not very useful here.

I mostly copied and adapted the suggestion here from #8002, for which we cannot easily reuse Index.__init__() as it is already used internally in xarray for passing the labels to index in addition to the public or internal index build options.

For the backends, BackendEntryPoint.__init__() seems not yet used. It is only called with no argument to instantiate the object. Probably the only motivation of using .with_options() is consistency with #8002.

@jsignell
Copy link
Contributor

I really like the idea of making engine-specific kwargs more discoverable. It does make me a little sad though, because I really want people to be able to use the guess_can_open functionality so that they can just do xr.open_dataset(obj). I guess they still can do that though and it they are going to use engine-specific kwargs they do at some point need to know which engine they are using...

Using .with_options(...) would seamlessly work with IDE auto-completion, static type checkers (I guess? I'm not sure how static checkers support entry-points), documentation, etc.

I'm not totally sure how to test this, but I'm not confident that static type checkers will be able to deduce the kwargs for backends loaded from entrypoints. I think in order to implement this we'd need to add some kind of stubs to xarray itself so that it knows that if you do backends.engine("stac") you will get back a STACBackend. But having written that down I'm not even sure if that is possible since it is not guaranteed by xarray that only one library can register an engine with a particular name.

All this is to say that I think autocomplete will work fine in dynamic IDEs (like ipython or jupyter) but I do not think it will work in static environments. Also there might be some limitation on the amount of deduction that even an ipython will do. Not sure if this makes sense as an example, but I was just testing this out in ipython:

image

See how stacking_library is not being suggested?

@benbovy
Copy link
Member Author

benbovy commented Nov 14, 2023

Unfortunately, it looks like you are right @jsignell and I was a little too optimistic about how dev tools support entry-points.

# test.py (xpystack is installed)
import xarray as xr

engine = xr.backends.plugins.get_backend("stac")
reveal_type(engine)
$ mypy test.py
test.py:4: note: Revealed type is "xarray.backends.common.BackendEntrypoint"
Success: no issues found in 1 source file

There's still the option of explicitly importing the backend class but sadly we're losing the niceties of entry-points:

# assuming we can pass options via `BackendEntryPoint.__init__()`

import xarray as xr
import xpystack

ds = xr.open_dataset(obj, engine=xpystack.STACBackend(stacking_library="stackstac"))

@headtr1ck
Copy link
Collaborator

Oh wow, without checking open issues I have accidentally created a PR that solves exactly this #8520.

This allows passing backends instances directly as engine argument, while still supporting passing these options as kwargs directly in open_dataset. This improves discoverability of the backend options while also keeping everything backwards compatible.
Feel free to add additional comments there!

@headtr1ck headtr1ck linked a pull request Dec 12, 2023 that will close this issue
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants