A simple and efficient Python CLI tool for splitting PDF files based on their outline (bookmarks).
Originally optimized for O'Reilly-style technical books (handling structures like Parts -> Chapters), this tool can be used with any PDF that has a valid outline.
- Smart Splitting: Automatically detects the PDF outline and splits the document into separate files for each section.
- Filename Sanitization: Generates safe filenames from chapter titles, removing illegal characters.
- Flexible Output: Allows specifying a custom output directory. Defaults to a folder named after the input file.
- Cross-Platform: Works on Windows, macOS, and Linux (Python based).
- Python 3.9+
pypdf
-
Clone this repository:
git clone https://github.com/katsuki-a/pdf-splitter.git cd pdf-splitter -
Create a virtual environment with
uv:uv venv
-
Install dependencies:
uv pip install -r requirements.txt
-
Run commands with
uv run:uv run python -m src.cli --help
If you don't use uv, you can still use the standard Python workflow:
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txtTo split a PDF file, run the tool as a module from the project root directory.
# Recommended
uv run python -m src.cli <input_file_path> [-o <output_directory>] [-d <max_depth>] [--dry-run]Alternatively, if you are not using uv:
python -m src.cli <input_file_path> [-o <output_directory>] [-d <max_depth>] [--dry-run]If you encounter module import errors, you can explicitly set the PYTHONPATH:
PYTHONPATH=. python src/cli.py <input_file_path> ...input_file: Path to the input PDF file (Required).-o,--output: Directory to save the split PDF files. If omitted, a directory named<input_filename>_splitwill be created in the same location as the input file.-d,--max-depth: Maximum depth of the outline to process.1: Top-level chapters only (default).2: Chapters and sub-sections.
--dry-run: Print the planned split without writing PDF files.
1. Split a file using default settings (top-level chapters only):
uv run python -m src.cli my_book.pdf2. Split a file including nested sections (up to depth 2):
uv run python -m src.cli my_book.pdf --max-depth 23. Split a file and save to a specific directory:
uv run python -m src.cli my_book.pdf --output ./chapters/4. Preview the split without writing files:
uv run python -m src.cli my_book.pdf --dry-runInstall development dependencies:
uv venv
uv pip install -r requirements-dev.txtRun the checks locally:
uv run ruff check .
uv run ruff format --check .
uv run python -m pytestApply formatting:
uv run ruff format .This project is licensed under the MIT License - see the LICENSE file for details.