# Chapter 31: String Methods

This notebook covers the built-in `str` methods for splitting, joining, stripping, replacing, and transforming text. It also introduces the `string` module which provides useful character-set constants and the `Template` class for safe substitution.

## Key Concepts
- **Splitting**: `split()`, `rsplit()`, `splitlines()`, `partition()`
- **Joining**: `str.join()` to reassemble sequences of strings
- **Stripping**: `strip()`, `lstrip()`, `rstrip()` for whitespace removal
- **Replacing**: `replace()` for substring substitution
- **Character mapping**: `maketrans()` and `translate()` for bulk character replacement
- **Testing**: `startswith()`, `endswith()`, `isdigit()`, `isalpha()`, and friends
- **`string` module**: `ascii_lowercase`, `digits`, `punctuation`, `Template`

## Section 1: Splitting Strings

`split()` breaks a string into a list of substrings. By default it splits on any whitespace and discards empty strings. You can provide a custom separator and a maximum number of splits.

In [None]:
# Default split on whitespace
text: str = "hello world python"
words: list[str] = text.split()
print(f"Words: {words}")

# Split on a custom separator
csv_line: str = "Alice,30,Engineer"
fields: list[str] = csv_line.split(",")
print(f"Fields: {fields}")

# Limit the number of splits
path: str = "usr/local/bin/python"
parts: list[str] = path.split("/", maxsplit=2)
print(f"Parts (maxsplit=2): {parts}")

In [None]:
# rsplit splits from the right
path: str = "usr/local/bin/python"
parts: list[str] = path.rsplit("/", maxsplit=1)
print(f"rsplit (maxsplit=1): {parts}")

# splitlines splits on line boundaries
multiline: str = "line one\nline two\nline three"
lines: list[str] = multiline.splitlines()
print(f"Lines: {lines}")

## Section 2: Joining Strings

`str.join()` is the inverse of `split()`. It concatenates an iterable of strings with the separator string between each element.

In [None]:
# Join words with different separators
words: list[str] = ["hello", "world", "python"]

print(f"Space join:     {' '.join(words)}")
print(f"Dash join:      {'-'.join(words)}")
print(f"Empty join:     {''.join(words)}")
print(f"Newline join:\n{chr(10).join(words)}")

In [None]:
# Split and join are inverses
original: str = "hello-world-python"
parts: list[str] = original.split("-")
reassembled: str = "-".join(parts)

print(f"Original:    {original}")
print(f"Split:       {parts}")
print(f"Reassembled: {reassembled}")
print(f"Equal:       {original == reassembled}")

## Section 3: Stripping Whitespace

`strip()`, `lstrip()`, and `rstrip()` remove leading and/or trailing characters. By default they remove whitespace, but you can pass a string of characters to strip.

In [None]:
# Strip whitespace from both edges
s: str = "  hello  "

print(f"Original:  '{s}'")
print(f"strip():   '{s.strip()}'")
print(f"lstrip():  '{s.lstrip()}'")
print(f"rstrip():  '{s.rstrip()}'")

In [None]:
# Strip specific characters (removes any combination of those characters)
url: str = "https://example.com///"
print(f"Strip slashes: '{url.rstrip('/')}'")

tag: str = "<title>"
print(f"Strip angle brackets: '{tag.strip('<>')}')")

# Common pattern: cleaning user input
user_input: str = "  alice@example.com  \n"
clean: str = user_input.strip()
print(f"Clean email: '{clean}'")

## Section 4: Replacing and Removing Substrings

`replace()` substitutes all (or a limited number of) occurrences of a substring. To remove a substring, replace it with an empty string.

In [None]:
# Basic replacement
text: str = "hello world"
print(f"Replace 'world' with 'python': {text.replace('world', 'python')}")

# Limit the number of replacements
repeated: str = "aaa"
print(f"Replace 2 of 3 a's: {repeated.replace('a', 'b', 2)}")

# Remove a substring by replacing with empty string
messy: str = "H e l l o"
print(f"Remove spaces: {messy.replace(' ', '')}")

## Section 5: Partition

`partition()` splits a string on the first occurrence of a separator, returning a 3-tuple: `(before, separator, after)`. This is useful when you need to split on the first delimiter only.

In [None]:
# Partition splits on first occurrence
config: str = "key=value=extra"
before, sep, after = config.partition("=")

print(f"Before:    '{before}'")
print(f"Separator: '{sep}'")
print(f"After:     '{after}'")

# rpartition splits on the last occurrence
path: str = "/home/user/documents/file.txt"
directory, slash, filename = path.rpartition("/")
print(f"\nDirectory: '{directory}'")
print(f"Filename:  '{filename}'")

## Section 6: Character Mapping with `translate` and `maketrans`

`str.maketrans()` builds a translation table and `translate()` applies it. This is efficient for replacing or deleting many individual characters in a single pass.

In [None]:
# Replace vowels with digits
table: dict[int, int] = str.maketrans("aeiou", "12345")
result: str = "hello".translate(table)
print(f"'hello' with vowel mapping: '{result}'")

# The third argument to maketrans specifies characters to delete
delete_vowels: dict[int, int | None] = str.maketrans("", "", "aeiou")
result = "hello world".translate(delete_vowels)
print(f"'hello world' without vowels: '{result}'")

In [None]:
# Practical example: sanitize a filename
import string

def sanitize_filename(name: str) -> str:
    """Remove characters that are unsafe in filenames."""
    unsafe: str = '<>:"/\\|?*'
    table: dict[int, int | None] = str.maketrans("", "", unsafe)
    return name.translate(table).strip()

raw: str = 'My File: "Report" (2025).txt'
safe: str = sanitize_filename(raw)
print(f"Raw:  {raw}")
print(f"Safe: {safe}")

## Section 7: Testing String Content

Python provides many `is*()` methods for checking the content of strings, plus `startswith()` and `endswith()` for prefix/suffix checks.

In [None]:
# Prefix and suffix checks
filename: str = "hello.py"
print(f"Starts with 'hello': {filename.startswith('hello')}")
print(f"Ends with '.py':     {filename.endswith('.py')}")

# Accept a tuple of options
test_file: str = "test.txt"
print(f"Ends with .txt or .csv: {test_file.endswith(('.txt', '.csv'))}")

In [None]:
# Content testing methods
samples: list[str] = ["hello", "123", "Hello World", "hello123", "   ", ""]

print(f"{'Value':<15} {'alpha':>6} {'digit':>6} {'alnum':>6} {'space':>6}")
print("-" * 42)
for s in samples:
    label: str = repr(s)
    if s:  # is* methods return False for empty strings
        print(
            f"{label:<15} "
            f"{str(s.isalpha()):>6} "
            f"{str(s.isdigit()):>6} "
            f"{str(s.isalnum()):>6} "
            f"{str(s.isspace()):>6}"
        )
    else:
        print(f"{label:<15} {'(empty string -- all return False)'}")

## Section 8: The `string` Module

The `string` module provides useful constants for character sets and the `Template` class for safe string substitution.

In [None]:
import string

# Character set constants
print(f"ascii_lowercase: {string.ascii_lowercase}")
print(f"ascii_uppercase: {string.ascii_uppercase}")
print(f"ascii_letters:   {string.ascii_letters}")
print(f"digits:          {string.digits}")
print(f"hexdigits:       {string.hexdigits}")
print(f"punctuation:     {string.punctuation}")
print(f"whitespace:      {string.whitespace!r}")

In [None]:
# Membership checks against constants
print(f"'a' in ascii_lowercase: {'a' in string.ascii_lowercase}")
print(f"'Z' in ascii_uppercase: {'Z' in string.ascii_uppercase}")
print(f"'5' in digits:          {'5' in string.digits}")
print(f"'!' in punctuation:     {'!' in string.punctuation}")

# Practical: check if string contains only hex characters
def is_hex_string(s: str) -> bool:
    """Check if s contains only valid hexadecimal characters."""
    return all(c in string.hexdigits for c in s)

print(f"\nis_hex_string('1a2b'):  {is_hex_string('1a2b')}")
print(f"is_hex_string('xyz'):   {is_hex_string('xyz')}")

In [None]:
import string

# string.Template provides safe substitution
tmpl: string.Template = string.Template("Hello, $name! You have $count messages.")

# substitute() raises KeyError for missing keys
result: str = tmpl.substitute(name="Alice", count=5)
print(f"substitute: {result}")

# safe_substitute() leaves missing keys as-is
partial: str = tmpl.safe_substitute(name="Alice")
print(f"safe_substitute (missing 'count'): {partial}")

## Section 9: Case Conversion and Other Transforms

Python strings provide several methods for case conversion and other common transformations.

In [None]:
text: str = "hello WORLD python"

print(f"upper():      {text.upper()}")
print(f"lower():      {text.lower()}")
print(f"title():      {text.title()}")
print(f"capitalize(): {text.capitalize()}")
print(f"swapcase():   {text.swapcase()}")

# casefold() is more aggressive than lower() for case-insensitive comparison
german: str = "Stra\u00dfe"  # Strasse with sharp-s
print(f"\n'{german}'.lower():    '{german.lower()}'")
print(f"'{german}'.casefold(): '{german.casefold()}'")

In [None]:
# Other useful methods

# center, ljust, rjust -- like format spec alignment
print(f"center: '{'hello':^20}'")
print(f"ljust:  '{'hello'.ljust(20)}'")
print(f"rjust:  '{'hello'.rjust(20)}'")

# zfill pads with zeros (respects sign)
print(f"\nzfill: {'42'.zfill(6)}")
print(f"zfill: {'-42'.zfill(6)}")

# count occurrences
print(f"\n'banana'.count('a'): {'banana'.count('a')}")

# find / index
print(f"'hello'.find('ll'):  {'hello'.find('ll')}")
print(f"'hello'.find('xyz'): {'hello'.find('xyz')}")

## Summary

### Splitting and Joining
- **`split(sep, maxsplit)`**: Break string into list; defaults to whitespace
- **`rsplit(sep, maxsplit)`**: Split from the right
- **`splitlines()`**: Split on line boundaries
- **`partition(sep)`** / **`rpartition(sep)`**: Split into 3-tuple on first/last occurrence
- **`sep.join(iterable)`**: Concatenate strings with separator

### Stripping and Replacing
- **`strip()` / `lstrip()` / `rstrip()`**: Remove leading/trailing characters (default: whitespace)
- **`replace(old, new, count)`**: Substitute substrings

### Character Mapping
- **`str.maketrans(from, to, delete)`**: Build a translation table
- **`str.translate(table)`**: Apply the table in one pass

### Testing and Searching
- **`startswith()` / `endswith()`**: Check prefixes/suffixes (accept tuples)
- **`isalpha()`, `isdigit()`, `isalnum()`, `isspace()`**: Content checks
- **`find()` / `index()`**: Locate substrings (`find` returns -1; `index` raises `ValueError`)

### `string` Module
- **Constants**: `ascii_lowercase`, `ascii_uppercase`, `digits`, `punctuation`, `whitespace`
- **`Template`**: `$name` substitution with `substitute()` and `safe_substitute()`