Skip to content

katsuki-a/pdf-splitter

Repository files navigation

PDF Splitter CLI

A simple and efficient Python CLI tool for splitting PDF files based on their outline (bookmarks).

Originally optimized for O'Reilly-style technical books (handling structures like Parts -> Chapters), this tool can be used with any PDF that has a valid outline.

Features

  • Smart Splitting: Automatically detects the PDF outline and splits the document into separate files for each section.
  • Filename Sanitization: Generates safe filenames from chapter titles, removing illegal characters.
  • Flexible Output: Allows specifying a custom output directory. Defaults to a folder named after the input file.
  • Cross-Platform: Works on Windows, macOS, and Linux (Python based).

Requirements

  • Python 3.9+
  • pypdf

Installation

Recommended: uv

  1. Clone this repository:

    git clone https://github.com/katsuki-a/pdf-splitter.git
    cd pdf-splitter
  2. Create a virtual environment with uv:

    uv venv
  3. Install dependencies:

    uv pip install -r requirements.txt
  4. Run commands with uv run:

    uv run python -m src.cli --help

Alternative: venv + pip

If you don't use uv, you can still use the standard Python workflow:

python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Usage

To split a PDF file, run the tool as a module from the project root directory.

Basic Usage

# Recommended
uv run python -m src.cli <input_file_path> [-o <output_directory>] [-d <max_depth>] [--dry-run]

Alternatively, if you are not using uv:

python -m src.cli <input_file_path> [-o <output_directory>] [-d <max_depth>] [--dry-run]

If you encounter module import errors, you can explicitly set the PYTHONPATH:

PYTHONPATH=. python src/cli.py <input_file_path> ...

Arguments

  • input_file: Path to the input PDF file (Required).
  • -o, --output: Directory to save the split PDF files. If omitted, a directory named <input_filename>_split will be created in the same location as the input file.
  • -d, --max-depth: Maximum depth of the outline to process.
    • 1: Top-level chapters only (default).
    • 2: Chapters and sub-sections.
  • --dry-run: Print the planned split without writing PDF files.

Examples

1. Split a file using default settings (top-level chapters only):

uv run python -m src.cli my_book.pdf

2. Split a file including nested sections (up to depth 2):

uv run python -m src.cli my_book.pdf --max-depth 2

3. Split a file and save to a specific directory:

uv run python -m src.cli my_book.pdf --output ./chapters/

4. Preview the split without writing files:

uv run python -m src.cli my_book.pdf --dry-run

Development

Install development dependencies:

uv venv
uv pip install -r requirements-dev.txt

Run the checks locally:

uv run ruff check .
uv run ruff format --check .
uv run python -m pytest

Apply formatting:

uv run ruff format .

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages