# Case Study: From Notebook to Package with nbdev {#sec-notebook-case-study}

::: {.callout-note}
## Chapter Overview
This case study parallels @sec-case-study (SimpleBot), but follows a notebook-first workflow. We'll build TextKit‚Äîa text analysis library‚Äîentirely in Jupyter notebooks, then ship it as a published Python package using nbdev.
:::

## Project Overview

TextKit is a lightweight text analysis library that provides simple utilities for analyzing text. Key features include:

- Word and character statistics
- Readability scoring (Flesch-Kincaid, etc.)
- Basic sentiment indicators
- Text cleaning utilities

This project is ideal for our notebook case study because:

- **Natural notebook fit**: Text analysis involves exploration and visualization
- **Keeps the theme**: Complements SimpleBot's chatbot focus (analyzing what bots produce)
- **Real utility**: Functions you'd actually use in data analysis
- **Right size**: Small enough to complete, complex enough to demonstrate the workflow

By the end of this chapter, you'll have a package published to PyPI‚Äîbuilt entirely from notebooks.

## Why nbdev for This Project?

In @sec-notebooks, we introduced nbdev as a way to develop libraries from notebooks. Here's why it fits TextKit:

| Traditional Workflow | nbdev Workflow |
|---------------------|----------------|
| Write code in `.py` files | Write code in notebooks |
| Write separate test files | Tests live next to code |
| Write docs separately | Docs generated from notebooks |
| Context switching | Single environment |

For exploratory, iterative work like text analysis, nbdev keeps everything together.

## 1. Setting Up the nbdev Project

### Installing nbdev

```bash
pip install nbdev
```

### Creating the Project

```bash
nbdev_new --lib_name textkit --user yourusername --author "Your Name"
cd textkit
```

This creates:

```
textkit/
‚îú‚îÄ‚îÄ nbs/                    # Your notebooks live here
‚îÇ   ‚îú‚îÄ‚îÄ 00_core.ipynb       # Main module
‚îÇ   ‚îú‚îÄ‚îÄ index.ipynb         # Becomes README and docs homepage
‚îÇ   ‚îî‚îÄ‚îÄ _quarto.yml         # Documentation config
‚îú‚îÄ‚îÄ textkit/                # Generated Python package (don't edit directly)
‚îú‚îÄ‚îÄ settings.ini            # Project configuration
‚îú‚îÄ‚îÄ setup.py                # Generated for pip install
‚îî‚îÄ‚îÄ pyproject.toml
```

### Key Insight: You Edit Notebooks, Not .py Files

The `textkit/` directory contains generated code. Your source of truth is `nbs/*.ipynb`.

## 2. Building the Core Module

### The First Notebook: `00_core.ipynb`

Open `nbs/00_core.ipynb` in Jupyter. The structure:

```python
# Cell 1: Module header
#| default_exp core
```

This directive tells nbdev: "export cells from this notebook to `textkit/core.py`".

### Exporting Functions

```python
#| export
def word_count(text: str) -> int:
    """Count words in text.

    Parameters
    ----------
    text : str
        Input text to analyze

    Returns
    -------
    int
        Number of words

    Examples
    --------
    >>> word_count("Hello world")
    2
    >>> word_count("")
    0
    """
    if not text or not text.strip():
        return 0
    return len(text.split())
```

The `#| export` directive marks this cell for inclusion in the generated module.

### Exploring as You Build

This is where notebooks shine. Between exported cells, add exploration:

```python
# Not exported - just exploration
sample_text = """
The quick brown fox jumps over the lazy dog.
This is a sample paragraph for testing our text analysis functions.
"""

print(f"Word count: {word_count(sample_text)}")
```

Your notebook becomes both implementation AND documentation of your thinking.

## 3. Adding Tests with nbdev

### Inline Doctests

The docstring examples above ARE tests. nbdev runs them automatically:

```bash
nbdev_test
```

### Dedicated Test Cells

For more complex tests:

```python
#| test
def test_word_count_edge_cases():
    assert word_count("") == 0
    assert word_count("   ") == 0
    assert word_count("one") == 1
    assert word_count("one two three") == 3
    # Unicode handling
    assert word_count("caf√© r√©sum√©") == 2
```

### Running Tests

```bash
# Run all tests
nbdev_test

# Run tests for specific notebook
nbdev_test --path nbs/00_core.ipynb
```

## 4. Building More Functionality

### Readability Scores

```python
#| export
def flesch_reading_ease(text: str) -> float:
    """Calculate Flesch Reading Ease score.

    Scores typically range from 0-100:
    - 90-100: Very easy (5th grade)
    - 60-70: Standard (8th-9th grade)
    - 0-30: Very difficult (college graduate)

    Examples
    --------
    >>> score = flesch_reading_ease("The cat sat on the mat.")
    >>> 90 <= score <= 120  # Simple sentence = high score
    True
    """
    words = word_count(text)
    sentences = sentence_count(text)
    syllables = syllable_count(text)

    if words == 0 or sentences == 0:
        return 0.0

    return (
        206.835
        - 1.015 * (words / sentences)
        - 84.6 * (syllables / words)
    )
```

### Helper Functions

```python
#| export
def sentence_count(text: str) -> int:
    """Count sentences in text.

    Examples
    --------
    >>> sentence_count("Hello. World!")
    2
    >>> sentence_count("No punctuation here")
    1
    """
    import re
    if not text.strip():
        return 0
    # Split on sentence-ending punctuation
    sentences = re.split(r'[.!?]+', text)
    # Filter empty strings
    return len([s for s in sentences if s.strip()])
```

```python
#| export
def syllable_count(text: str) -> int:
    """Estimate syllable count (English approximation).

    Examples
    --------
    >>> syllable_count("hello")
    2
    >>> syllable_count("beautiful")
    4
    """
    import re
    text = text.lower()
    words = text.split()

    count = 0
    for word in words:
        word = re.sub(r'[^a-z]', '', word)
        if not word:
            continue
        # Simple heuristic: count vowel groups
        syllables = len(re.findall(r'[aeiouy]+', word))
        # Adjust for silent e
        if word.endswith('e') and syllables > 1:
            syllables -= 1
        count += max(1, syllables)

    return count
```

## 5. Visualizations in Your Notebook

Notebooks excel at visual exploration. Add analysis cells (not exported):

```python
# Visualization - not exported, but shows in docs
import matplotlib.pyplot as plt

def visualize_readability(texts: dict[str, str]):
    """Compare readability across multiple texts."""
    names = list(texts.keys())
    scores = [flesch_reading_ease(t) for t in texts.values()]

    plt.figure(figsize=(10, 5))
    plt.barh(names, scores, color='steelblue')
    plt.xlabel('Flesch Reading Ease Score')
    plt.title('Readability Comparison')
    plt.axvline(x=60, color='red', linestyle='--', label='Standard difficulty')
    plt.legend()
    plt.tight_layout()
    plt.show()

# Demo with sample texts
samples = {
    "Children's book": "The cat sat. The dog ran. They played.",
    "News article": "The committee announced sweeping regulatory changes affecting multiple industries.",
    "Academic paper": "The epistemological ramifications of quantum indeterminacy necessitate reconceptualization.",
}

visualize_readability(samples)
```

This visualization appears in your generated documentation‚Äîshowing users what the library can do.

## 6. Building the Text Analyzer Class

For a more complete API, add a class that combines functionality:

```python
#| export
class TextAnalyzer:
    """Analyze text with multiple metrics.

    Examples
    --------
    >>> analyzer = TextAnalyzer("Hello world. How are you?")
    >>> analyzer.word_count
    5
    >>> analyzer.sentence_count
    2
    """

    def __init__(self, text: str):
        self.text = text
        self._word_count = None
        self._sentence_count = None

    @property
    def word_count(self) -> int:
        if self._word_count is None:
            self._word_count = word_count(self.text)
        return self._word_count

    @property
    def sentence_count(self) -> int:
        if self._sentence_count is None:
            self._sentence_count = sentence_count(self.text)
        return self._sentence_count

    @property
    def avg_words_per_sentence(self) -> float:
        if self.sentence_count == 0:
            return 0.0
        return self.word_count / self.sentence_count

    @property
    def readability(self) -> float:
        return flesch_reading_ease(self.text)

    def summary(self) -> dict:
        """Return all metrics as a dictionary."""
        return {
            "words": self.word_count,
            "sentences": self.sentence_count,
            "avg_words_per_sentence": round(self.avg_words_per_sentence, 1),
            "flesch_reading_ease": round(self.readability, 1),
        }
```

## 7. Adding an Interactive Widget

End with something users can interact with‚Äîdemonstrating the notebook as an application:

```python
# Interactive demo (not exported - for notebook/docs only)
import ipywidgets as widgets
from IPython.display import display

def create_analyzer_widget():
    """Create an interactive text analyzer."""

    text_input = widgets.Textarea(
        value='Enter your text here...',
        placeholder='Paste text to analyze',
        description='Text:',
        layout=widgets.Layout(width='100%', height='150px')
    )

    output = widgets.Output()

    def analyze(change):
        output.clear_output()
        with output:
            if text_input.value.strip():
                analyzer = TextAnalyzer(text_input.value)
                results = analyzer.summary()
                print("üìä Analysis Results")
                print("-" * 30)
                for key, value in results.items():
                    print(f"{key.replace('_', ' ').title()}: {value}")

    text_input.observe(analyze, names='value')

    display(widgets.VBox([
        widgets.HTML("<h3>üìù Text Analyzer</h3>"),
        text_input,
        output
    ]))

# Show the widget
create_analyzer_widget()
```

When viewed in Colab or Binder, users can interact with your library without installing anything.

## 8. Generating the Package

### Export to Python Modules

```bash
nbdev_export
```

This generates `textkit/core.py` from your notebook's `#| export` cells.

### Verify Everything Works

```bash
# Run tests
nbdev_test

# Check for issues
nbdev_clean
nbdev_prepare
```

### The Generated Code

Look at `textkit/core.py`‚Äîit contains clean Python code generated from your notebooks, with proper imports and structure.

## 9. Documentation

### The Index Notebook

`nbs/index.ipynb` becomes both your README.md and documentation homepage. Include:

1. Installation instructions
2. Quick start example
3. Feature overview

```python
# In nbs/index.ipynb

# TextKit

> Simple text analysis for Python

## Installation

```bash
pip install textkit
```

## Quick Start

```python
from textkit.core import TextAnalyzer

text = "Your text here. Analyze it easily."
analyzer = TextAnalyzer(text)
print(analyzer.summary())
```
```

### Build Documentation

```bash
nbdev_docs
```

This generates a Quarto-based documentation site in `_docs/`.

## 10. Publishing to PyPI

### Prepare for Release

```bash
# Clean and prepare
nbdev_prepare

# Build distribution
python -m build
```

### Publish

```bash
# Test PyPI first
twine upload --repository testpypi dist/*

# Then real PyPI
twine upload dist/*
```

### The Result

```bash
pip install textkit
```

You've shipped a Python package‚Äîdeveloped entirely in notebooks.

## 11. Sharing the Notebook Itself

Beyond the package, share the development notebook:

### Colab Badge

```markdown
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/username/textkit/blob/main/nbs/00_core.ipynb)
```

### Binder Badge

```markdown
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/username/textkit/main)
```

Users can:
1. **Install the package** via pip (traditional)
2. **Explore the notebook** to understand the code (educational)
3. **Run interactively** in Colab/Binder (zero-install)

## Comparing Workflows

Here's how this case study compares to the SimpleBot approach (@sec-case-study):

| Aspect | SimpleBot (Scripts) | TextKit (nbdev) |
|--------|---------------------|-----------------|
| Source files | `.py` in `src/` | `.ipynb` in `nbs/` |
| Tests | Separate `tests/` directory | Inline with code |
| Documentation | Separate `docs/` | Generated from notebooks |
| Exploration | Separate REPL/scratch files | Integrated in notebooks |
| Output | Package on PyPI | Package on PyPI |
| Best for | Traditional dev, teams | Exploratory, teaching |

Both workflows produce the same result: a published package. Choose based on how you like to work.

## When to Use This Workflow

The nbdev approach works best when:

- **Exploration is central**: You're figuring things out as you build
- **Teaching matters**: Others will learn from your notebooks
- **Docs should show execution**: You want live examples in documentation
- **Solo or small team**: Git conflicts in notebooks are real

Consider traditional scripts when:

- **Large teams**: Notebook diffs are harder to review
- **Complex architecture**: Many interconnected modules
- **Heavy IDE reliance**: Refactoring tools work better with `.py` files
- **Existing codebase**: Converting to nbdev is non-trivial

## Summary

- **nbdev inverts the workflow**: Notebooks are source, `.py` files are generated
- **Tests live with code**: Doctests and `#| test` cells eliminate context switching
- **Exploration becomes documentation**: Your investigative work helps users
- **Same destination**: Published package, installable via pip
- **Different journey**: Iterative, visual, integrated

## Exercises

1. **Extend TextKit**: Add a `sentiment_words()` function that counts positive/negative words from a simple word list. Include doctests.

2. **Add a notebook**: Create `01_advanced.ipynb` with functions for text comparison (e.g., similarity between two texts).

3. **Publish to TestPyPI**: Go through the full publication workflow to TestPyPI.

4. **Create a Voil√† dashboard**: Convert the interactive widget section into a standalone Voil√† dashboard.

5. **Compare workflows**: Take one function from TextKit and rewrite it in the traditional script workflow. Reflect on the differences.