-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-2143] Adding a ConfigLoader instance into hook specs params #506
Comments
Hi @takikadiri , you've highlighted a very good point. We thought about this and we've actually added a set of hooks to register library components, such as pipelines, data catalog, and config loader, with a Kedro project. I think might solve your use case. |
Thank you @lorenabalan for the quick reply ! It's realy great having the possibility to registrer library component such as the config loader, i will certainly use it. |
Hello @lorenabalan, I am not sure if I miss the point but I think this is not what is at stake here, correct me if I'm wrong. I don't know if this is the best place to write this or if it should be in another issue, but here is a more detailed description of the problem and discussion on different design decisions and potential decisions. DescriptionSince hooks have been released in It is a common pattern for hook to need to access to configuration files (for instance to create a session with an external tool with credentials, to use parameters inside the hook and more likely in the case of kedro-mlflow to use a custom config file for the plugin. I personnaly feel that this configuration file access must be template-independent. The hook is not supposed to assume anything on the template (which may be changed by the user) since the Concrete use cases:Use case 1: Accessing proprietary configuration file inside hook
from kedro.config import TemplatedConfigLoader
class ProjectHooks:
@hook_impl
def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
return TemplatedConfigLoader(
conf_paths,
globals_pattern="*globals.yml",
globals_dict={"param1": "pandas.CSVDataSet"}
)
class MlflowNodeHook:
@hook_impl
def before_node_run(
self,
node: Node,
catalog: DataCatalog,
inputs: Dict[str, Any],
is_async: bool,
run_id: str,
) -> None:
# get the config loader of the current context
config_loader = get_config_loader() # actually, config_loader is not available here, this magic function does not exist! i need to eventually get the one registered in the project
# do whatever I want using the conf and implementing my own logic
conf_mlflow = config_loader.get("mlflow*", "mlflow*/**")
do_my_own_logic(conf_mlflow ) Use case 2: Accessing credentials file inside hookLet's say that I want to create a connection with a remote server (say SAS) globally to interact before/afeter node, and eventually inside node class MlflowPipelineHook:
@hook_impl
def before_pipeline_run(
self, run_params: Dict[str, Any], pipeline: Pipeline, catalog: DataCatalog
) -> None:
# get the config loader of the current context
config_loader = get_config_loader() # actually, config_loader is not available here, this magic function does not exist!
# do whatever I want using the conf and implementing my own logic
credentials = config_loader.get("credentials*", "credentials*/**")
saspy.SASsession(credentials) @WaylonWalker @deepyaman You guys seem to develop a lot of hooks, do these use cases are hitting you too? I see you sometimes use environment variable for configuration of your hooks, I guess it is somehow related to this. Overview of solutions to this problemExisting solutionExisting solution 1 : recreate config loader locallyFor instance, example 1 would become: class MlflowNodeHook:
@hook_impl
def before_node_run(
self,
node: Node,
catalog: DataCatalog,
inputs: Dict[str, Any],
is_async: bool,
run_id: str,
) -> None:
# recreate the config loader manually
conf_paths = [
str(self.project_path / self.CONF_ROOT / "base"), # these attributes are not accessible outside the context, they must be hardcoded actually
str(self.project_path / self.CONF_ROOT / self.env), # suppressed
]
hook_manager = get_hook_manager()
config_loader = hook_manager.hook.register_config_loader( # pylint: disable=no-member
conf_paths=conf_paths
) or ConfigLoader(conf_paths)
# do whatever I want using the conf and implementing my own logic
conf_mlflow = config_loader.get("mlflow*", "mlflow*/**")
do_my_own_logic(conf_mlflow ) Pros:
Cons:
Existing solution 2 : reload context when possibleSome hooks methods have access to some of the project context attributes: for instance, Pros:
Cons:
Existing solution 3 : assume call is made at the root of the kedro project and go back to solution 2For the hooks without access to the project_path, call Pros:
Cons:
Potential solutions which need development on Kedro's sideSolution 1: Add config loader to all hooksAs the title of this issue states, a solution would be to pass the config loader to each Solution 2: Use the
|
@Galileo-Galilei Steel Toes utilizes the project's context by defining your hooks as a property on the ProjectContext rather than a list. from steel_toes import SteelToes
class ProjectContext(KedroContext):
project_name = "kedro0160"
project_version = "0.16.1"
package_name = "kedro0160"
@property
def hooks(self):
self._hooks = [ SteelToes(self), ]
return self._hooks You can see where the context is used inside the hook here. I do feel like this is a bit of a hack and asks users to implement hooks on their project in a non-traditional way. The next upcoming change will make the context argument not required. Note that context contains a I would really like to get access to the project's context inside of a hook, especially if we could configure hook behavior inside of |
Hello @WaylonWalker and thanks for the reply. This is a clever hack and works like a charm, but it breaks auto-discovery and configuration in By the way, it seems @tamsanh is hitting the same problem and need to access the context inside his Some tests on |
@Galileo-Galilei you are right assuming that However, this is still a work in progress and currently it's not at the stage where we can officially announce it and freeze the design. The general idea is that |
Thanks for the insightful answer. I use one of above solutions as a better than nothing way to achieve what I want, and wait for the |
@Galileo-Galilei We've watched your great work on |
Hi @yetudada, would you mind reopening this issue? This works perfectly with kedro==0.17.x, but the This completely prevents accessing configuration / and credentials inside hooks, which was the point of this issue. @datajoely sorry to ping directly but is there something I am missing here? Use cases described above are still prevalent and will no longer be doable with kedro==0.18.x, which feels like a regression. Even worse, above hacks no longer works. |
@idanov @lorenabalan would you mind chiming in here? I'm not sure I have the knowledge |
@Galileo-Galilei @takikadiri We just had a discussion about all this. Not sure what the solution will be, but curious what your thoughts are on making The other thing I'm wondering is whether it would actually be useful to have
|
I think this would solve all our problems and is very flexible because of all the attributes / methods you quote (e.g it automatically solves the problem of accessing credentials as you suggest).
I would love having an with KedroSession.create(project_path, env) as session:
mlflow_config = get_mlflow_config(session)
mlflow_config.setup() which basically sets the mlflow tracking uri and the experiment. Most times people forget to add these two lines in their notebook / scripts and find out that their mlflow configuration is completely messed up. Obviously when it is run to the CLI these two lines are run in the This would also be the place to instantiate global connection (e.g. what we discussed here on
Not sure what the problem is here: if you want to access the from kedro.framework.project import settings
settings.CONFIG_LOADER_CLASS(
conf_source=str(PROJECT_PATH / settings.CONF_SOURCE),
env=env,
**settings.CONFIG_LOADER_ARGS,
) |
Cool, thank you for the feedback! It has always been at the back of my mind that the current instantiation of spark via a custom As for accessing |
I think it does, and it very confusing for users too, as shown by this discussion. the user needs mlflow to be instantiated before Spark, and creating the spark session in the context (before any hook can be executed) instead of a hook completely prevents to "compose" logic nicely. The user ended up to move the spark instantiation logic to the Adding |
Very useful, thanks @Galileo-Galilei. That's a good observation about the drawback about such a hook. Possibly it would make more sense to have an Some use cases that currently spring to mind:
Overall, pending more research, my initial idea would be to do this in a series of steps, roughly in order:
Best of all, this is all non-breaking I believe. |
Adding one minor item in case I forgot.
|
Hi @AntonyMilneQB and @noklam, I really like the direction of the developments, but I still have a few concerns / comments:
I agree, but unless I misunderstand something the current implementation in #1465 make the context a "frozen" attrs class. This has the major default of preventing hook to interoperate and force them to be enteierly independant, e.g. by passing some object which need to be reused by other hooks. A simple and common example would be the following:
I think it is a requested feature for developers to have a way to make hooks communicate with one another. This is currently possible by recreating the entire session, so it does sound like a regression from a plugin developer point of view, even the current way does not look the right way to do this. The simplest solution would enable to dynamically add attributes to the context, i.e. avoid making the context frozen.
Lim's comment ("There used to be 2 kinds of hooks: registration & life-cycle. Managing them using the same hook managers was a mistake") and the difficulties you have to locate this new hook is in line with #904 and the "engines" design patterns: registering objects is not the same as calling them during session lifecycle. |
Hi everyone, I got directed to this issue by @antonymilne and, since it's a bit old, I'm trying to understand what's left to do here. The https://docs.kedro.org/en/stable/kedro.framework.hooks.specs.KedroContextSpecs.html
@antonymilne expressed some more ideas in above but I feel they are beyond the scope of the original issue as stated by @takikadiri is complete. Do you think we can close this one? Should we open follow-up tasks? Or am I missing something and we should keep it open? |
Hi @astrojuanlu, I guess discussions arose from time to time (see slack) on various topics (credentials in hook, stateful hook for interoperability, upgrade between 0.17 and 0.18 about |
Opened #2690 about documenting stateful hooks, and proceeding to close this issue. Thanks all who participated (here and in various discussions), and don't hesitate to open new issues if needed! |
Description
When developping a kedro plugin, i regularly need to access to configs and potentielly some plugin-specific configs files. Since the plugin use hook mechanism, i no longer can bring whatever context attribute to my hook implemantation (except the parameters defined in the hook specs).
Context
Here in the kedro-mlflow plugin we were forced to redefine a ConfigLoader instance inside the plugin.
That lead to incoherence between the context ConfigLoader property and the new Configloader created inside the hook.
Other plugins will need this functionality, i imagine a kedro-spark plugin that use hook mechanism and access a spark config file from project folder path (spark.yml), or a kedro-sas plugins that do the same thing (getting configs in order to create a parametrized session)
Possible Implementation
A possible implementation is to pass the context config_loader to the hook.
hook specs
context
The text was updated successfully, but these errors were encountered: