# ðŸ““ The GenAI Revolution Cookbook

**Title:** How to Boost Workflow with LLM Pair Programming in Jupyter AI

**Description:** Install Jupyter AI, configure LLM providers, leverage %ai/%%ai to write Python, debug faster, and accelerate data science notebooks dramatically today.

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



Jupyter AI is a JupyterLab extension that brings LLM\-powered code generation and debugging directly into your notebook cells. Instead of switching to a browser or IDE plugin, you can ask an LLM to scaffold functions, explain errors, or refactor code without leaving your analysis environment. This tutorial shows you how to install Jupyter AI, configure a provider, and use %ai and %%ai magics to generate, debug, and refine Python code in a reproducible notebook workflow.

## Prerequisites

Before you begin, make sure you have:

* Python 3\.8 or later installed locally
* JupyterLab 3\.x or Jupyter Notebook 7\.x. Jupyter AI does not support Google Colab.
* An API key for at least one supported provider. OpenAI, Anthropic, Google, or Mistral.
* Basic familiarity with Jupyter notebooks and Python

## Install Jupyter AI and Dependencies

Jupyter AI works with JupyterLab 3\.x and Notebook 7\.x. Run the following in a terminal to install the magics and common data science dependencies. If you use JupyterLab and want the chat UI, install the optional package.

In [None]:
# Create or activate your environment first if needed

# Core magics and helpful packages
pip install --upgrade pip
pip install jupyter-ai-magics python-dotenv pandas matplotlib

# Optional. Install the JupyterLab chat UI extension if you use JupyterLab.
pip install jupyter-ai

# Optional. Install provider SDKs so you can use their latest models.
# Install only what you plan to use.
pip install openai anthropic google-generativeai mistralai

After installation, launch JupyterLab:

In [None]:
jupyter lab

Open a new notebook to continue.

## Configure API Keys Securely

Jupyter AI reads provider API keys from environment variables. Create a .env file in your project directory and add your keys:

In [None]:
# .env
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
GOOGLE_API_KEY=your_google_key_here
MISTRAL_API_KEY=your_mistral_key_here

Load the keys at the start of your notebook using the following cell:

In [None]:
from dotenv import load_dotenv
_ = load_dotenv()  # Loads variables from .env into the environment

This ensures your keys are available before loading the Jupyter AI extension.

## Load Jupyter AI Magics

Load the Jupyter AI extension to enable %ai and %%ai magics in your notebook:

In [None]:
%load_ext jupyter_ai_magics

Verify the extension is active by running a simple query:

In [None]:
%ai openai/gpt-4o-mini Say hello in one short sentence.

If the extension is loaded correctly, you will see a response from the model.

## Define a Default Model

Set a default model identifier to avoid repeating it in every magic call. You can create a Python variable and interpolate it in prompts.

In [None]:
# Pick a model you have access to.
# Examples: "openai/gpt-4o-mini", "anthropic/claude-3-5-sonnet", "google/gemini-1.5-pro", "mistral/mistral-large"
DEFAULT_MODEL = "openai/gpt-4o-mini"
DEFAULT_MODEL

You can now use {DEFAULT\_MODEL} in your prompts for consistency.

## Generate a Data Cleaning Function

Use the %%ai cell magic to generate a function that cleans a pandas DataFrame. The magic must be the first line of the cell.

In [None]:
%%ai {DEFAULT_MODEL}
You are a Python expert. Write a function named clean_dataframe(df, inplace=False) that performs these steps:
- Strip whitespace from column names.
- Drop exact duplicate rows.
- Trim leading and trailing whitespace in string columns.
- Convert obvious numeric-like columns to numeric where safe.
- Fill missing values in numeric columns with the column median.
- If inplace is True, modify df in place and return df. Otherwise, return a new cleaned DataFrame.
Return only valid Python code for the function definition. Do not include any extra text.

Copy the generated function into a new cell and execute it to make it available in your notebook.

## Refine the Function with Additional Requirements

Ask the model to add error handling and inplace modification support:

In [None]:
%%ai {DEFAULT_MODEL}
You previously wrote clean_dataframe(df, inplace=False).
Refine it with:
- Defensive checks for non-DataFrame inputs. Raise a clear TypeError.
- More careful numeric conversion using errors='ignore'.
- A parameter columns_to_trim that accepts a list of column names to trim. Default trims all string columns.
- Docstring with args, returns, and examples.
Return only the updated Python function definition. No extra commentary.

Copy and run the updated function to replace the previous version.

## Use Prompt Interpolation for Context\-Aware Code

Prompt interpolation lets you embed live data, error traces, or schema details directly into your %%ai prompts. This gives the model richer context for more accurate code generation. To master this technique and understand its impact on LLM accuracy, check out our explainer on the magic of in\-context learning.

Load a sample dataset and pass its schema to the model:

In [None]:
import pandas as pd
import numpy as np

# Create a small, reproducible dataset
rng = np.random.default_rng(42)
df = pd.DataFrame({
    "total_bill": rng.normal(20, 8, 200).round(2),
    "tip": rng.normal(3, 1, 200).round(2),
    "size": rng.integers(1, 6, 200)
}).clip(lower=0)

schema = df.dtypes.to_string()
schema

Generate a transformation function using the schema as context:

In [None]:
%%ai {DEFAULT_MODEL}
You are given this pandas DataFrame schema:
{schema}

Write a function transform_data(df) that:
- Adds a tip_pct column as tip / total_bill. Handle division by zero safely.
- Buckets size into small (1-2), medium (3-4), large (5+).
- Returns a new DataFrame with the new columns.
Return only valid Python code for the function definition.

Copy the generated function into a new cell and run it to apply the transformation.

## Debug Errors with AI Assistance

Introduce a deliberate error to demonstrate debugging:

In [None]:
# Deliberate typo in the column name to trigger a KeyError
bad_df = df.copy()
bad_df["tip_pct"] = bad_df["tip"] / bad_df["total_billl"]  # incorrect column name

Pass the traceback to the model for a fix:

In [None]:
import traceback

try:
    # Re-run to capture the traceback
    bad_df["tip_pct"] = bad_df["tip"] / bad_df["total_billl"]
except Exception:
    error_trace = traceback.format_exc()

error_trace[:600]

In [None]:
%%ai {DEFAULT_MODEL}
You are a Python debugging assistant.
Here is the traceback:
{error_trace}

Given this code that caused the error:
bad_df["tip_pct"] = bad_df["tip"] / bad_df["total_billl"]

Explain the root cause in one sentence, then provide a single corrected line of code.
Return only the fixed line of Python code without extra text.

Apply the suggested fix and validate the result:

In [None]:
# Apply the correct code. If the model suggested something equivalent, use that suggestion.
bad_df["tip_pct"] = bad_df["tip"] / bad_df["total_bill"]

# Quick validation
bad_df["tip_pct"].describe()

## Generate a Plotting Helper

Use the model to scaffold a reusable plotting function:

In [None]:
%%ai {DEFAULT_MODEL}
Write a function plot_histogram(df, column, bins=30, title=None, figsize=(6, 4)):
- Use matplotlib only.
- Validate inputs and raise a ValueError if column is missing or non-numeric.
- Show grid lines and a tight layout.
- Return the matplotlib Axes object.
Return only valid Python code for the function definition.

Copy the function into a new cell and run it to visualize the data:

In [None]:
import matplotlib.pyplot as plt

ax = plot_histogram(df, "total_bill", bins=25, title="Total Bill")
plt.show()

## Validate Generated Code

After generating a function, add minimal sanity checks to ensure correctness:

In [None]:
# Sanity checks for clean_dataframe
import inspect
assert "clean_dataframe" in globals() and inspect.isfunction(clean_dataframe)

toy = pd.DataFrame({"A": [1, 1, None], "B": [" x ", " y", " z "]})
out = clean_dataframe(toy)
assert isinstance(out, pd.DataFrame)
assert "A" in out.columns and "B" in out.columns
assert out.shape[0] <= toy.shape[0]
print("clean_dataframe sanity checks passed.")

These checks catch common issues and help you trust the generated code.

## Handle Provider Errors Gracefully

API calls may fail due to rate limits or invalid keys. Wrap magic calls in a try\-except block to handle errors:

In [None]:
from IPython import get_ipython

try:
    body = "Reply with 'ok' if you received this request."
    get_ipython().run_cell_magic("ai", DEFAULT_MODEL, body)
except Exception as e:
    import logging, time
    logging.exception("AI request failed")
    # Simple retry strategy
    time.sleep(1.5)
    try:
        get_ipython().run_cell_magic("ai", DEFAULT_MODEL, body)
    except Exception as e2:
        logging.exception("Second attempt failed")

For production workflows, log errors and retry with exponential backoff.

## Avoid Leaking Sensitive Data

When interpolating data into prompts, redact or truncate sensitive columns to prevent PII leakage:

In [None]:
def safe_sample(df, cols_to_redact=None, max_rows=5, truncate=4):
    """
    Return a safe preview of df for prompts.
    Redact specified columns and truncate long strings.
    """
    import pandas as pd

    preview = df.sample(min(len(df), max_rows), random_state=42).copy()
    if cols_to_redact:
        for c in cols_to_redact:
            if c in preview.columns:
                preview[c] = "[REDACTED]"
    # Truncate long string values
    def _truncate(x):
        if isinstance(x, str) and len(x) > truncate:
            return x[:truncate] + "..."
        return x
    return preview.applymap(_truncate)

# Example usage
safe_preview = safe_sample(df, cols_to_redact=["email", "ssn"] if {"email", "ssn"}.issubset(df.columns) else [], max_rows=5)
safe_preview

Use safe\_sample instead of the full dataset in your prompts.

## End\-to\-End Runnable Example

Here is a complete, minimal workflow you can run from top to bottom:

In [None]:
# Environment and setup
from dotenv import load_dotenv
_ = load_dotenv()

%load_ext jupyter_ai_magics

import pandas as pd
import numpy as np

# Choose a model you have access to
DEFAULT_MODEL = "openai/gpt-4o-mini"

# Create a simple dataset
rng = np.random.default_rng(0)
df = pd.DataFrame({
    "total_bill": rng.normal(20, 7, 120).round(2),
    "tip": rng.normal(3, 1, 120).round(2),
    "size": rng.integers(1, 6, 120)
}).clip(lower=0)

df.head()

Generate a cleaning function:

In [None]:
%%ai {DEFAULT_MODEL}
Write a function clean_dataframe(df, inplace=False) that:
- Validates df is a pandas DataFrame.
- Strips whitespace from column names.
- Drops duplicate rows.
- Trims whitespace in string columns.
- Converts numeric-like columns with errors='ignore'.
- Fills NaNs in numeric columns with the column median.
- If inplace is True, modify df in place. Otherwise, return a new DataFrame.
Return only valid Python code for the function definition.

Copy the function, run it, and validate:

In [None]:
# Example usage after you paste the generated function
cleaned = clean_dataframe(df)
cleaned.info()

# Basic checks
assert not cleaned.isna().sum().sum()
assert cleaned.shape[0] <= df.shape[0]

Generate a plot:

In [None]:
%%ai {DEFAULT_MODEL}
Write a function plot_histogram(df, column, bins=30, title=None, figsize=(6, 4)):
- Use matplotlib to plot a histogram of df[column].
- Validate the column exists and is numeric.
- Label axes and add a title if provided.
- Return the Axes object.
Return only valid Python code for the function definition.

Copy the function and run it:

In [None]:
import matplotlib.pyplot as plt

ax = plot_histogram(cleaned, "total_bill", bins=25, title="Total Bill Distribution")
plt.show()

## Next Steps

When using %ai and %%ai magics, the quality of your prompt directly impacts the usefulness of the generated code or explanations. For a deeper understanding of how to design prompts that yield reliable and accurate outputs, see our guide on prompt engineering with LLM APIs.

If you want to expand your skills beyond this workflow and become more proficient in AI\-assisted development, our practical roadmap for aspiring GenAI developers outlines the essential skills and projects to accelerate your growth in this field.