# Chapter 31: Difflib and Textwrap

This notebook covers two standard-library modules for working with text. `difflib` compares sequences and finds similarities, while `textwrap` provides tools for wrapping, filling, dedenting, and shortening text.

## Key Concepts
- **`difflib.SequenceMatcher`**: Compute similarity ratios between strings
- **`difflib.get_close_matches()`**: Find fuzzy matches from a list of candidates
- **`difflib.unified_diff()` / `ndiff()`**: Generate human-readable diffs
- **`textwrap.wrap()` / `fill()`**: Break long text into lines of a given width
- **`textwrap.dedent()`**: Remove common leading whitespace
- **`textwrap.shorten()`**: Truncate text with a placeholder
- **`textwrap.indent()`**: Add a prefix to selected lines

## Section 1: Sequence Matching with `SequenceMatcher`

`difflib.SequenceMatcher` compares two sequences and computes a similarity ratio between 0.0 (completely different) and 1.0 (identical). It works on any sequence type, but is most commonly used with strings.

In [None]:
import difflib

# Compare two similar strings
matcher: difflib.SequenceMatcher = difflib.SequenceMatcher(None, "hello", "hallo")
ratio: float = matcher.ratio()
print(f"'hello' vs 'hallo': ratio = {ratio:.3f}")

# Compare identical strings
matcher = difflib.SequenceMatcher(None, "python", "python")
print(f"'python' vs 'python': ratio = {matcher.ratio():.3f}")

# Compare very different strings
matcher = difflib.SequenceMatcher(None, "hello", "world")
print(f"'hello' vs 'world': ratio = {matcher.ratio():.3f}")

In [None]:
import difflib

# get_matching_blocks shows where the sequences agree
matcher: difflib.SequenceMatcher = difflib.SequenceMatcher(None, "abcdef", "abcxef")

print(f"Ratio: {matcher.ratio():.3f}")
print("\nMatching blocks (i, j, size):")
for block in matcher.get_matching_blocks():
    print(f"  a[{block.a}:{block.a + block.size}] == b[{block.b}:{block.b + block.size}]"
          f" -> '{('abcdef')[block.a:block.a + block.size]}'")

In [None]:
import difflib

# get_opcodes describes the edits needed to transform a into b
a: str = "hello world"
b: str = "hello python"
matcher: difflib.SequenceMatcher = difflib.SequenceMatcher(None, a, b)

print(f"Transform '{a}' -> '{b}'")
print()
for tag, i1, i2, j1, j2 in matcher.get_opcodes():
    print(f"  {tag:>7}  a[{i1}:{i2}] '{a[i1:i2]}'  ->  b[{j1}:{j2}] '{b[j1:j2]}'")

## Section 2: Finding Fuzzy Matches

`difflib.get_close_matches()` returns the best matches for a word from a list of candidates. It uses `SequenceMatcher` internally and accepts a cutoff threshold.

In [None]:
import difflib

# Find close matches for a misspelled word
words: list[str] = ["apple", "application", "apply", "banana", "appetite"]

matches: list[str] = difflib.get_close_matches("appli", words)
print(f"Close matches for 'appli': {matches}")

# Adjust the cutoff (default is 0.6)
matches = difflib.get_close_matches("appli", words, n=5, cutoff=0.4)
print(f"With cutoff=0.4: {matches}")

# No matches above the threshold
matches = difflib.get_close_matches("zebra", words)
print(f"Close matches for 'zebra': {matches}")

In [None]:
import difflib

# Practical example: "Did you mean?" for command-line tools
valid_commands: list[str] = ["status", "commit", "push", "pull", "branch", "checkout", "merge"]

def suggest_command(user_input: str) -> str:
    """Suggest a valid command if the input is not recognized."""
    if user_input in valid_commands:
        return f"Running: {user_input}"
    suggestions: list[str] = difflib.get_close_matches(user_input, valid_commands, n=3)
    if suggestions:
        return f"Unknown command '{user_input}'. Did you mean: {', '.join(suggestions)}?"
    return f"Unknown command '{user_input}'. No suggestions found."

print(suggest_command("comit"))
print(suggest_command("statis"))
print(suggest_command("checkou"))
print(suggest_command("xyzzy"))

## Section 3: Generating Diffs

`difflib` provides several functions to produce human-readable diffs between sequences of lines, similar to the Unix `diff` command.

In [None]:
import difflib

# ndiff produces a detailed character-level diff
text_a: list[str] = [
    "line one\n",
    "line two\n",
    "line three\n",
]
text_b: list[str] = [
    "line one\n",
    "line 2\n",
    "line three\n",
    "line four\n",
]

print("ndiff output:")
diff: list[str] = list(difflib.ndiff(text_a, text_b))
for line in diff:
    print(f"  {line}", end="")

In [None]:
import difflib

# unified_diff produces output similar to `diff -u`
text_a: list[str] = [
    "def greet(name):\n",
    "    print('Hello ' + name)\n",
    "    return None\n",
]
text_b: list[str] = [
    "def greet(name: str) -> None:\n",
    "    print(f'Hello {name}')\n",
]

print("unified_diff output:")
diff = difflib.unified_diff(
    text_a, text_b,
    fromfile="before.py",
    tofile="after.py",
)
for line in diff:
    print(line, end="")

In [None]:
import difflib

# context_diff produces output similar to `diff -c`
text_a: list[str] = ["alpha\n", "beta\n", "gamma\n", "delta\n"]
text_b: list[str] = ["alpha\n", "BETA\n", "gamma\n", "epsilon\n"]

print("context_diff output:")
diff = difflib.context_diff(
    text_a, text_b,
    fromfile="original.txt",
    tofile="modified.txt",
)
for line in diff:
    print(line, end="")

## Section 4: Text Wrapping with `textwrap`

`textwrap.wrap()` breaks a long string into a list of lines, each no longer than the specified width. `textwrap.fill()` does the same but returns a single string with newlines inserted.

In [None]:
import textwrap

text: str = (
    "This is a long sentence that should be wrapped at a certain width. "
    "The textwrap module makes it easy to format text for display in "
    "terminals, emails, or any fixed-width context."
)

# wrap returns a list of lines
lines: list[str] = textwrap.wrap(text, width=40)
print("wrap(width=40):")
for i, line in enumerate(lines):
    print(f"  [{i}] '{line}'")
    assert len(line) <= 40

print(f"\nAll lines <= 40 chars: True")

In [None]:
import textwrap

text: str = (
    "This is a long sentence that should be wrapped at a certain width. "
    "The textwrap module makes it easy to format text for display."
)

# fill returns a single string with newlines
filled: str = textwrap.fill(text, width=50)
print("fill(width=50):")
print(filled)

# With indentation
print("\nfill with initial_indent and subsequent_indent:")
formatted: str = textwrap.fill(
    text,
    width=50,
    initial_indent="  * ",
    subsequent_indent="    ",
)
print(formatted)

## Section 5: Shortening Text

`textwrap.shorten()` collapses whitespace and truncates text to fit within a given width, appending a placeholder (default `[...]`) when truncation occurs.

In [None]:
import textwrap

text: str = "Hello World, this is a long string"

# Shorten to 20 characters
short: str = textwrap.shorten(text, width=20)
print(f"Original ({len(text)} chars): {text}")
print(f"Shortened ({len(short)} chars): {short}")
assert len(short) <= 20
assert short.endswith("[...]")

# Custom placeholder
short_custom: str = textwrap.shorten(text, width=25, placeholder="...")
print(f"Custom placeholder: {short_custom}")

# When text already fits, no truncation
no_change: str = textwrap.shorten("short", width=20)
print(f"No truncation needed: {no_change}")

## Section 6: Dedenting Text

`textwrap.dedent()` removes common leading whitespace from all lines. This is useful for working with triple-quoted strings that are indented to match the surrounding code.

In [None]:
import textwrap

# A triple-quoted string indented to match surrounding code
indented: str = """\
        Hello,
        This is an indented block.
        Each line has extra whitespace.
    """

print("Before dedent:")
print(indented)

dedented: str = textwrap.dedent(indented)
print("After dedent:")
print(dedented)

In [None]:
import textwrap

# Common pattern: dedent then fill for clean multiline strings
def get_help_text() -> str:
    """Return formatted help text."""
    raw: str = """\
        This is the help text for the application. It describes
        what the program does and how to use it. The text is written
        indented in the source code for readability, but dedented
        before display.
    """
    return textwrap.fill(textwrap.dedent(raw), width=50)

print(get_help_text())

## Section 7: Indenting Text

`textwrap.indent()` adds a prefix to the beginning of selected lines. By default, it only adds the prefix to non-empty lines.

In [None]:
import textwrap

text: str = "line one\nline two\n\nline four"

# Add prefix to all non-empty lines
indented: str = textwrap.indent(text, prefix="    ")
print("Default indent (4 spaces):")
print(indented)

# Add a custom prefix (e.g., line comments)
commented: str = textwrap.indent(text, prefix="# ")
print("\nCommented:")
print(commented)

# Use predicate to control which lines get the prefix
selective: str = textwrap.indent(
    text,
    prefix="> ",
    predicate=lambda line: "two" in line,
)
print("\nSelective (only lines containing 'two'):")
print(selective)

## Section 8: The `TextWrapper` Class

For repeated wrapping with the same settings, use a `TextWrapper` object. This avoids passing the same parameters each time.

In [None]:
import textwrap

# Create a reusable wrapper
wrapper: textwrap.TextWrapper = textwrap.TextWrapper(
    width=45,
    initial_indent="  ",
    subsequent_indent="  ",
    break_long_words=False,
    break_on_hyphens=False,
)

paragraphs: list[str] = [
    "The textwrap module provides convenience functions and a TextWrapper class for formatting text.",
    "TextWrapper instances are reusable, so you can configure wrapping once and apply it to multiple paragraphs.",
]

for i, para in enumerate(paragraphs):
    print(f"Paragraph {i + 1}:")
    print(wrapper.fill(para))
    print()

## Summary

### `difflib` -- Sequence Comparison
- **`SequenceMatcher(None, a, b)`**: Compare two sequences
  - `.ratio()` returns similarity from 0.0 to 1.0
  - `.get_matching_blocks()` shows where sequences agree
  - `.get_opcodes()` describes the edits needed
- **`get_close_matches(word, possibilities, n, cutoff)`**: Find fuzzy matches
- **`unified_diff(a, b)`**: Produce a unified diff (like `diff -u`)
- **`ndiff(a, b)`**: Character-level diff with indicators
- **`context_diff(a, b)`**: Context diff (like `diff -c`)

### `textwrap` -- Text Formatting
- **`wrap(text, width)`**: Break text into a list of lines
- **`fill(text, width)`**: Wrap and return as a single string with newlines
- **`shorten(text, width, placeholder)`**: Truncate with placeholder (default `[...]`)
- **`dedent(text)`**: Remove common leading whitespace from all lines
- **`indent(text, prefix)`**: Add prefix to selected lines
- **`TextWrapper(...)`**: Reusable wrapper object with configurable settings

### Common Patterns
- **"Did you mean?"**: Use `get_close_matches()` to suggest corrections for typos
- **Dedent + fill**: Clean up indented triple-quoted strings for display
- **Diff reports**: Use `unified_diff()` to show file changes