# Hooks

Hooks are a mechanism to add extra behaviour to Kedro’s main execution in an easy and consistent manner. 

Some examples might include:
- Adding a log statement after the data catalog is loaded.
- Adding data validation to the inputs before a node runs, and to the outputs after a node has run.
- Adding machine learning metrics tracking, using MLflow, throughout a pipeline run.

Hooks Concepts: https://docs.kedro.org/en/stable/hooks/introduction.html

<b>Hook implementation</b>

To add Hooks to your Kedro project, you must:
- Define a hook implementation for a particular hook specification
- Register that hook implementation in the `src/<package_name>/settings.py` file under the `HOOKS` key

<b>Hook execution order</b>

Hooks follow a Last-In-First-Out (LIFO) order, which means the first registered Hook will be executed last.

Hooks are registered in the following order:
1. Project Hooks in settings.py - If you have HOOKS = (hook_a, hook_b,), hook_b will be executed before hook_a
2. Plugin Hooks registered in kedro.hooks, which follows alphabetical order

Use cases: https://docs.kedro.org/en/stable/hooks/common_use_cases.html

Exercise: Implement a way to understand how much time does each node in my pipeline takes to run.

For this usecase, we can try creating a class which provides hook implementations for `before_node_run` and `after_node_run` 
hook specifications.

In [1]:
import logging
import time
from collections import defaultdict
from typing import Any, Dict
from kedro.framework.hooks import hook_impl
from kedro.pipeline.node import Node

logger = logging.getLogger(__name__)

class NodeTimerHook:
    def __init__(self):
        self.node_times = defaultdict(list)

    @hook_impl
    def before_node_run(self, node: Node, inputs: Dict[str, Any]):
        node_name = node.name
        self.node_times[node_name].append({"start": time.perf_counter()})

    @hook_impl
    def after_node_run(self, node: Node, inputs: Dict[str, Any], outputs: Dict[str, Any]):
        node_name = node.name
        timing = self.node_times[node_name][-1]
        timing["end"] = time.perf_counter()
        timing["duration"] = timing["end"] - timing["start"]

    @hook_impl
    def after_pipeline_run(self, run_params: Dict[str, Any]):
        logger.info("Node execution timing summary:")
        for node_name, records in self.node_times.items():
            for i, record in enumerate(records):
                duration = record.get("duration", 0)
                logger.info(f"  - {node_name} [{i+1}]: {duration:.4f} seconds")
