# Debugging

- Debugging a Kedro project within a notebook or IPython shell
- Debugging in VSCode
- Debugging with Kedro Hooks

## Debugging a Kedro project within a notebook

1. %debug
2. %pdb
3. breakpoint() or import pdb; pdb.set_trace()
4. `%load_node <name-of-failing-node>`


In [None]:
# %pdb on

In [1]:
import logging
from typing import Dict, Tuple

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

def split_data(data: pd.DataFrame, parameters: Dict) -> Tuple:
    """Splits data into features and targets training and test sets.

    Args:
        data: Data containing features and target.
        parameters: Parameters defined in parameters/data_science.yml.
    Returns:
        Split data.
    """
    # breakpoint() - Some open issues - https://github.com/ipython/ipykernel/issues/897
    # import pdb
    # pdb.set_trace()
    
    X = data[parameters["features"]]
    y = data["price"]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=parameters["test_size"], random_state=parameters["random_state"]
    )
    return X_train, X_test, y_train, y_test

In [None]:
# Create an example DataFrame
dummy_data = pd.DataFrame(
        {
            "engines": [1, 2, 3],
            "crew": [4, 5, 6],
            "passenger_capacity": [5, 6, 7],
            # "price": [120, 290, 30], "price" column is intentionally missing to trigger a KeyError
        }
    )

dummy_parameters = {
        "model_options": {
            "test_size": 0.2,
            "random_state": 3,
            "features": ["engines", "passenger_capacity", "crew"],
        }
}

# Call the function (this will raise KeyError)
split_data(dummy_data, dummy_parameters["model_options"])

In [None]:
# %pdb off

In [None]:
%debug
# u: go up in the traceback.
# d: go down in the traceback.
# l: List the code around the current line.
# p expr: Print the value of an expression.
# q: Quit the debugger.

<br>
You can also set up the debugger to run automatically when an exception occurs by using the %pdb line magic. 
This automatic behaviour can be enabled with %pdb 1 or %pdb on before executing a program, and disabled with %pdb 0 or %pdb off.


<br>

For import pdb; pdb.set_trace()

| Command | Description                                                                     |
| ------- | ------------------------------------------------------------------------------- |
| `n`     | **Next**: Execute the current line and pause at the next one (same stack frame) |
| `s`     | **Step**: Step into a function call on the current line                         |
| `r`     | **Step**: Step out a function call on the current line                          |
| `c`     | **Continue**: Run until the next breakpoint or end                              |
| `l`     | **List**: Show source code around the current line                              |
| `p var` | **Print**: Print the value of `var`                                             |
| `q`     | **Quit** the debugger                                                           |


<br>
<b> %load_node line magic : </b> 

This is still an [experimental](https://docs.kedro.org/en/0.19.10/notebooks_and_ipython/kedro_and_notebooks.html#load-node-line-magic) feature and is currently only available for Jupyter Notebook (>7.0), Jupyter Lab, IPython, and VS Code Notebook. 

When using this feature in Jupyter Notebook you will need to have the following requirements and minimum versions installed:
<code>
    ipylab>=1.0.0
    notebook>=7.0.0
</code>


You can load the contents of a node in your project into a series of cells using the `%load_node` line magic. To use `%load_node`, 
the node you want to load needs to fulfill two requirements:
* The node needs to have a name
* The node’s inputs need to be persisted

## Debugging in VSCode

- Create launch.json file under .vscode directory

```json
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Kedro Run",
            "type": "debugpy",
            "request": "launch",
            "console": "integratedTerminal",
            "module": "kedro",
            "cwd": "<project-dir-if-not-root>",
            "args": ["run"]
            // Any other arguments should be passed as a comma-seperated-list
            // e.g "args": ["run", "--pipeline", "pipeline_name"]
        }
    ]
}
```

- Add a breakpoint in your `pipeline.py` file
- Click on Debug button on the left pane
- Then select the debug config Python: Kedro Run and click Debug (the green play button)
- Execution should stop at the breakpoint

You can also use `breakpoint()` to debug your pipelines

## Debugging with Kedro Hooks

You can launch a post-mortem debugging session with pdb using Kedro Hooks when an error occurs during a pipeline run

- Create a `PDBPipelineDebugHook` to debug a pipeline error

```python
import pdb
import sys
import traceback

from kedro.framework.hooks import hook_impl

class PDBPipelineDebugHook:
    """A hook class for creating a post mortem debugging with the PDB debugger
    whenever an error is triggered within a pipeline. The local scope from when the
    exception occured is available within this debugging session.
    """

    @hook_impl
    def on_pipeline_error(self):
        # We don't need the actual exception since it is within this stack frame
        _, _, traceback_object = sys.exc_info()

        #  Print the traceback information for debugging ease
        traceback.print_tb(traceback_object)

        # Drop you into a post mortem debugging session
        pdb.post_mortem(traceback_object)
```

<b>Exercise:</b> Create a `PDBNodeDebugHook` to debug a node error