# Steps to debug Kedro pipeline in a notebook
1. Read from stack trace - find out the line of code that produce the error
2. Find which node this function belongs to
3. Trying to rerun the pipeline just before this node
4. If it's not a persisted dataset, you need to change it in `catalog.yml`, and re-run the pipeline, error is thrown again
5. `session` has already been used once, so if you call session again it will throw error. (so he had a wrapper function that recreate `session` and do something similar to `session.run`
6. Create a new session or `%reload_kedro`? 
7. Now `catalog.load` that persisted dataset, i.e. `func(catalog.load("some_data"))`
8. Copy the source code of `func` to notebook, it would work if the function itself is the node function, but if it is some function buried deep down, that's a lot more copy-pasting and change of import maybe.
9. Change the source code and make it work in the notebook
10. Rerun the pipeline to ensure everything works

# Running Session as Usual

In [None]:
%reload_kedro

In [None]:
session

In [None]:
pipelines

In [None]:
session.run()

1


1. Read from stack trace - find out the line of code that produce the error
2. Find which node this function belongs to
3. Trying to rerun the pipeline just before this node
4. If it's not a persisted dataset, you need to change it in `catalog.yml`, and re-run the pipeline, error is thrown again

5. `session` has already been used once, so if you call session again it will throw error. (so he had a wrapper function that recreate `session` and do something similar to `session.run`
6. Create a new session or `%reload_kedro` and re-run? 

This is not efficient because in interactive workflow, these intermdiate variables is likely store in the catalog already.

In [None]:
%reload_kedro

In [None]:
session.run()

1


7. Now `catalog.load` that persisted dataset, i.e. `func(catalog.load("some_data"))`

In [None]:
y_pred = catalog.load("y_pred")
y_test = catalog.load("y_test")

In [None]:
catalog.datasets.y_pred.load().head()  # This is the alternative way to use auto-discovery which can be improved

8. Copy the source code of `func` to notebook, it would work if the function itself is the node function, but if it is some function buried deep down, that's a lot more copy-pasting and change of import maybe.

In [None]:
def report_accuracy(y_pred: pd.Series, y_test: pd.Series):
    """Calculates and logs the accuracy.

    Args:
        y_pred: Predicted target.
        y_test: True target.
    """
    raise ValueError("Simulate some bug here")
    accuracy = (y_pred == y_test).sum() / len(y_test)
    logger = logging.getLogger(__name__)
    logger.info("Model has accuracy of %.3f on test data.", accuracy)

This won't work immediately work, a couple of copy&paste is needed

* manual copy the imports
* Remove the function now - copy the source code as a cell instead



In [None]:
import pandas as pd
import logging

In [None]:
raise ValueError("Simulate some bug here")
accuracy = (y_pred == y_test).sum() / len(y_test)
logger = logging.getLogger(__name__)
logger.info("Model has accuracy of %.3f on test data.", accuracy)

Assume we know that the first line is buggy, let's remove it

In [None]:
# raise ValueError("Simulate some bug here")
accuracy = (y_pred == y_test).sum() / len(y_test)
logger = logging.getLogger(__name__)
logger.info("Model has accuracy of %.3f on test data.", accuracy)
# It now works - lets copy this block back into the function and rerun

9. Change the source code and make it work in the notebook
10. Rerun the pipeline to ensure everything works

In [None]:
%reload_kedro
session.run()

1


It works now!

Debugging with interactive session is not uncommon - compare to IDE/breakpoint. 
* You can make plots and see the data
* You can intercept the variable and continue with the program - espeically useful when it is computation intensive.

See [more comments from Antony](https://github.com/kedro-org/kedro/issues/1832#issuecomment-1242499748)

In [None]:
from inspect import getsource

In [None]:
from kedro.framework.session import KedroSession

In [None]:
# %load dummy.py
def foo():
    pass

In [None]:
print(getsource(KedroSession))

class KedroSession:
    """``KedroSession`` is the object that is responsible for managing the lifecycle
    of a Kedro run. Use `KedroSession.create()` as
    a context manager to construct a new KedroSession with session data
    provided (see the example below).



    Example:
    ::

        >>> from kedro.framework.session import KedroSession
        >>> from kedro.framework.startup import bootstrap_project
        >>> from pathlib import Path

        >>> # If you are creating a session outside of a Kedro project (i.e. not using
        >>> # `kedro run` or `kedro jupyter`), you need to run `bootstrap_project` to
        >>> # let Kedro find your configuration.
        >>> bootstrap_project(Path("<project_root>"))
        >>> with KedroSession.create() as session:
        >>>     session.run()

    """

    def __init__(
        self,
        session_id: str,
        package_name: str = None,
        project_path: Union[Path, str] = None,
        save_on_close: bool = False,
   

More to optimize
1st PoC
* `%load_node` - populate all neccessary data where the node throws error
* When pipeline fail - raise something like `%load_node debug=True` - the traceback should have information about which node the error is coming from.
* Is there anything we can use viz? Sometimes I get question from people can kedro-viz help with debugging too.


More to optimize:
* What if the error is not in the node function but somewhere deeper in the call stack?
* Handle case when the inputs are not in catalog - how to recompute the necessary inputs? Potentially we can use the backtracking to do it in a more efficient way.

In [None]:
%debug

> [0;32m/var/folders/dv/bz0yz1dn71d2hygq110k3xhw0000gp/T/ipykernel_9928/2269451895.py[0m(2)[0;36m<cell line: 2>[0;34m()[0m
[0;32m      1 [0;31m[0;32mimport[0m [0mjson[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 2 [0;31m[0;32mimport[0m [0mautopep8[0m[0;34m[0m[0;34m[0m[0m
[0m
--KeyboardInterrupt--

KeyboardInterrupt: Interrupted by user
