Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable variable interpolation in the catalog with OmegaConfigLoader #2507

Closed
merelcht opened this issue Apr 12, 2023 · 2 comments · Fixed by #2621
Closed

Enable variable interpolation in the catalog with OmegaConfigLoader #2507

merelcht opened this issue Apr 12, 2023 · 2 comments · Fixed by #2621

Comments

@merelcht
Copy link
Member

Description

In the new OmegaConfigLoader templating/variable interpolation works out of the box for parameters, but not for catalog files. This is because the catalog goes through a validation check which checks if all entries in it are valid datasets. When using the ConfigLoader or TemplatedConfigLoader, users can have entries with "yaml anchors", those need to be preceded by an underscore _ so they're skipped from the catalog validation.

Context

Variable interpolation is a core feature of omegaconf and allows users to make their configuration files "smarter" and more reusable. Even with the addition of parsing syntax to the catalog to minimise the number of catalog entries (#2423), it is desirable to allow users to fully leverage omegaconf variable interpolation in the catalog as well as parameters.

Possible Implementation

Require template values to be preceded by a special character, e.g. _ so they're not read as a dataset in the catalog validation.

You could then have a catalog like this:

_pandas:
  type: pandas.CSVDataSet

example_iris_data:
  type: ${_pandas.type}
  filepath: data/01_raw/iris.csv

This idea was already discussed as part of: #2175
A concern with this implementation is that it introduces special Kedro syntax to enable core omegaconf functionality. Users would then not be able to just use the omegaconf docs to find out how variable interpolation works.

Possible Alternatives

  • Another way to skip the catalog validation
  • Resolve variables at an earlier time. Currently, resolution happens at access.
@datajoely
Copy link
Contributor

datajoely commented Apr 12, 2023

Just confirming that this syntax should work for more complex objects too:

_driver_properties:
  properties:
      driver: org.postgresql.Driver
weather:
  type: spark.SparkJDBCDataSet
  table: weather_table
  url: jdbc:postgresql://localhost/test
  credentials: db_credentials
  load_args: ${_driver_properties}
  save_args: ${_driver_properties}

which should render as:

weather:
  type: spark.SparkJDBCDataSet
  table: weather_table
  url: jdbc:postgresql://localhost/test
  credentials: db_credentials
  load_args:
    properties:
      driver: org.postgresql.Driver
  save_args:
    properties:
      driver: org.postgresql.Driver

@merelcht
Copy link
Member Author

As discussed in #2516 we will be going ahead with the _ character syntax to enable variable interpolation in the catalog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants