Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't include dependents when filtering the derived variable registry #445

Closed
3 tasks done
aulemahal opened this issue Feb 10, 2022 · 0 comments · Fixed by #446
Closed
3 tasks done

Don't include dependents when filtering the derived variable registry #445

aulemahal opened this issue Feb 10, 2022 · 0 comments · Fixed by #446

Comments

@aulemahal
Copy link
Contributor

aulemahal commented Feb 10, 2022

Here's a quick checklist in what to include:

  • Include a detailed description of the bug or suggestion

  • Output of intake_esm.show_versions()

  • Minimal, self-contained copy-pastable example that generates the issue if possible. Please be concise with code posted. See guidelines below on how to provide a good bug report:

Description

If the Derived Variable Registry (DVR) includes a derived variable (dv2) that also happens to be a dependent for another derived variable (dv2), this var (dv2) gets included when the catalog search needs the second (dv1). Lets show an example:

import intake
import intake_esm
registry = intake_esm.DerivedVariableRegistry()

@registry.register(variable='FOO', query={'variable': ['FLNS', 'FLUT']})
def func(ds):
    return ds + 1

@registry.register(variable='FLNS', query={'variable': ['FOO', 'FLUT']})
def func2(ds):
    return ds - 1

cat = intake.open_esm_datastore(zarr_col_aws_cesm, registry=registry)
new_cat = cat.search(variable='FOO')

See how we provide a computation for FOO that requires FLNS and FLUT and also a computation for FLNS given FOO and FLUT. This makes a lot of sense and the goal here could be to fill missing variable that are not the same across datasets.

However, the current search method will first check for variable directly in the catalog, then parse the DVR to find if depends exist. In our case here, FOO is not available but FLNS and FLUT are. The problem resides in the subset of the DVR at the end of search method because FLNS and FLUT were appended to the variables list and thus the dependent variable for FLNS gets included in the final registry, even though it is not needed.

My suggestion would be to refactor (again) the derived variable search as to only include the dependent variable that did return results.
I can push a PR soon.

Version information: output of intake_esm.show_versions()

Paste the output of intake_esm.show_versions() here:

INSTALLED VERSIONS
------------------

cftime: 1.5.1.1
dask: 2021.12.0
fastprogress: 0.2.7
fsspec: 2021.11.1
gcsfs: 2021.11.1
intake: 0.6.4
intake_esm: 2021.8.17.post50
netCDF4: 1.5.8
pandas: 1.3.5
requests: 2.26.0
s3fs: 2021.11.1
xarray: 0.20.2
zarr: 2.10.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant