# ðŸ““ The GenAI Revolution Cookbook

**Title:** How to Boost Workflow with LLM Pair Programming in Jupyter AI

**Description:** Install Jupyter AI, configure LLM providers, leverage %ai/%%ai to write Python, debug faster, and accelerate data science notebooks dramatically today.

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



Jupyter AI is a JupyterLab extension that brings LLM-powered code generation and debugging directly into your notebook cells. Instead of switching to a browser or IDE plugin, you can ask an LLM to scaffold functions, explain errors, or refactor code without leaving your analysis environment. This tutorial shows AI Builders how to install Jupyter AI, configure a provider, and use `%ai` and `%%ai` magics to generate, debug, and refine Python code in a reproducible notebook workflow.

## Prerequisites

Before you begin, ensure you have:

- Python 3.8 or later installed locally
- JupyterLab 3.x or Jupyter Notebook 7.x (Jupyter AI does not support Google Colab)
- An API key for at least one supported provider (OpenAI, Anthropic, Google, or Mistral)
- Basic familiarity with Jupyter notebooks and Python

## Install Jupyter AI and Dependencies

Jupyter AI requires JupyterLab or Notebook 7. Run the following in a terminal to install the extension and required packages:

In [None]:
pip install jupyter-ai jupyterlab python-dotenv pandas numpy matplotlib seaborn

After installation, launch JupyterLab:

In [None]:
jupyter lab

Open a new notebook to continue.

## Configure API Keys Securely

Jupyter AI reads provider API keys from environment variables. Create a `.env` file in your project directory and add your keys:

In [None]:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

Load the keys at the start of your notebook using the following cell:

In [None]:
from dotenv import load_dotenv
import os

load_dotenv()

required_keys = ["OPENAI_API_KEY", "ANTHROPIC_API_KEY"]
missing = [k for k in required_keys if not os.getenv(k)]

if missing:
    raise EnvironmentError(f"Missing API keys: {', '.join(missing)}. Add them to your .env file.")

print("API keys loaded successfully.")

This ensures your keys are available before loading the Jupyter AI extension.

## Load Jupyter AI Magics

Load the Jupyter AI extension to enable `%ai` and `%%ai` magics in your notebook:

In [None]:
%load_ext jupyter_ai
%load_ext jupyter_ai_magics

Verify the extension is active by running a simple query:

In [None]:
%ai openai:gpt-4o-mini What is 2 + 2?

If the extension is loaded correctly, you will see a response from the model.

## Define a Default Model

Set a default model identifier to avoid repeating it in every magic call:

In [None]:
DEFAULT_MODEL = "openai:gpt-4o-mini"

You can now use `{DEFAULT_MODEL}` in your prompts for consistency.

## Generate a Data Cleaning Function

Use the `%%ai` cell magic to generate a function that cleans a pandas DataFrame. The magic must be the first line of the cell:

In [None]:
%%ai {DEFAULT_MODEL}
You are a Python data assistant. Output only valid Python code.

Write a function clean_users(df: "pandas.DataFrame") that:
- strips whitespace from string columns
- lowercases column names
- removes exact duplicate rows
- ensures a 'signup_date' column is datetime, errors='coerce'
Include a concise docstring and type hints. Do not import pandas inside the function.

Copy the generated function into a new cell and execute it to make it available in your notebook.

## Refine the Function with Additional Requirements

Ask the model to add error handling and inplace modification support:

In [None]:
%%ai {DEFAULT_MODEL}
Take the previous clean_users function design. Output only valid Python code.

- Add parameter strict: bool. If strict, raise ValueError when required columns ['signup_date'] are missing.
- Add parameter inplace: bool. If True, modify df in place and return df.
- Keep type hints and docstring.

Copy and run the updated function to replace the previous version.

## Use Prompt Interpolation for Context-Aware Code

Prompt interpolation lets you embed live data, error traces, or schema details directly into your `%%ai` prompts, giving the model richer context for more accurate code generation. To master this technique and understand its impact on LLM accuracy, check out our explainer on [the magic of in-context learning](/article/the-magic-of-in-context-learning-teach-your-llm-on-the-fly-3).

Load a sample dataset and pass its schema to the model:

In [None]:
import pandas as pd
import seaborn as sns

tips = sns.load_dataset("tips")
schema = tips.dtypes.to_string()

Generate a transformation function using the schema as context:

In [None]:
%%ai {DEFAULT_MODEL}
You are assisting with pandas code. Output only code. No explanations.

Dataset info:
Schema:
{schema}

Task:
- Create a function transform_tips(df) that:
  - computes total_bill_per_person = total_bill / (size if size > 0 else 1)
  - returns a DataFrame with original columns plus the new column
  - includes a short docstring and type hints

Copy the generated function into a new cell and run it to apply the transformation.

## Debug Errors with AI Assistance

Introduce a deliberate error to demonstrate debugging:

In [None]:
import traceback

broken = tips.copy()
try:
    broken["tip_rate"] = broken["tip"] / broken["totalbill"]
except Exception:
    error_text = traceback.format_exc()

print(error_text)

Pass the traceback to the model for a fix:

In [None]:
%ai {DEFAULT_MODEL} Explain this Python error and propose a minimal code fix that is safe for division by zero. Error: {error_text}

Apply the suggested fix and validate the result:

In [None]:
broken["tip_rate"] = broken["tip"] / broken["total_bill"].replace(0, pd.NA)
broken["tip_rate"] = broken["tip_rate"].fillna(0.0)

assert broken["tip_rate"].ge(0).all()
assert broken["tip_rate"].lt(1.0).mean() > 0.5

## Generate a Plotting Helper

Use the model to scaffold a reusable plotting function:

In [None]:
import matplotlib.pyplot as plt

%%ai {DEFAULT_MODEL}
Output only code. Create a function plot_tip_rate(df) that draws a seaborn boxplot of tip as a percent of total_bill grouped by day. Add labels and a title. Do not load the dataset. Assume seaborn as sns and matplotlib.pyplot as plt are already imported.

Copy the function into a new cell and run it to visualize the data:

In [None]:
plot_tip_rate(tips)
plt.show()

## Validate Generated Code

After generating a function, add minimal sanity checks to ensure correctness:

In [None]:
tips_clean = clean_users(tips)

assert "signup_date" in tips_clean.columns
assert tips_clean.columns.str.islower().all()
assert not tips_clean.duplicated().any()

These checks catch common issues and help you trust the generated code.

## Handle Provider Errors Gracefully

API calls may fail due to rate limits or invalid keys. Wrap magic calls in a try-except block to handle errors:

In [None]:
try:
    %ai {DEFAULT_MODEL} Generate a summary of this dataset.
except Exception as e:
    print(f"API call failed: {e}")

For production workflows, log errors and retry with exponential backoff.

## Avoid Leaking Sensitive Data

When interpolating data into prompts, redact or truncate sensitive columns to prevent PII leakage:

In [None]:
safe_sample = tips[["total_bill", "tip", "day"]].head(3).to_markdown(index=False)

Use `safe_sample` instead of the full dataset in your prompts.

## End-to-End Runnable Example

Here is a complete, minimal workflow you can run from top to bottom:

In [None]:
from dotenv import load_dotenv
import os
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

load_dotenv()

required_keys = ["OPENAI_API_KEY"]
missing = [k for k in required_keys if not os.getenv(k)]
if missing:
    raise EnvironmentError(f"Missing API keys: {', '.join(missing)}. Add them to your .env file.")

%load_ext jupyter_ai
%load_ext jupyter_ai_magics

DEFAULT_MODEL = "openai:gpt-4o-mini"

tips = sns.load_dataset("tips")
schema = tips.dtypes.to_string()

Generate a cleaning function:

In [None]:
%%ai {DEFAULT_MODEL}
Output only code. Write a function clean_tips(df: "pandas.DataFrame") -> "pandas.DataFrame" that:
- strips whitespace in object columns
- coerces total_bill and tip to numeric with errors='coerce'
- drops rows where total_bill or tip is null after coercion
- returns a copy, do not modify in place
- include a docstring

Copy the function, run it, and validate:

In [None]:
tips_clean = clean_tips(tips)
assert tips_clean["total_bill"].dtype in ["float64", "int64"]
assert tips_clean["tip"].dtype in ["float64", "int64"]

Generate a plot:

In [None]:
%%ai {DEFAULT_MODEL}
Output only code. Write a function plot_tip_vs_total(df) that draws a scatter plot of total_bill vs tip with a regression line, colored by day, using seaborn. Add axis labels and a title. Assume imports exist and df is passed in.

Copy the function and run it:

In [None]:
plot_tip_vs_total(tips_clean)
plt.show()

## Next Steps

When using `%ai` and `%%ai` magics, the quality of your prompt directly impacts the usefulness of the generated code or explanations. For a deeper understanding of how to design prompts that yield reliable and accurate outputs, see our guide on [prompt engineering with LLM APIs](/article/prompt-engineering-with-llm-apis-how-to-get-reliable-outputs-4).

If you are looking to expand your skills beyond this workflow and become more proficient in AI-assisted development, our [practical roadmap for aspiring GenAI developers](/article/practical-roadmap-for-aspiring-genai-developers) outlines the essential skills and projects to accelerate your growth in this field.