MCP server for PDF processing and analysis using PyPDFium2.
- extract_text: Extract text content from PDF files with page range support
- extract_metadata: Extract PDF metadata including title, author, and page count
- search_text: Search for specific text within PDF files with context
- get_page_count: Get the total number of pages in a PDF file
- extract_pages: Extract specific pages from a PDF and save as a new PDF
- split_pdf: Split a PDF into multiple page-based PDFs with base64 encoding
- merge_pdfs: Merge multiple PDF files into a single PDF
- pdf_to_images: Convert PDF pages to PNG images with configurable DPI
- get_form_fields: Extract all form fields from a PDF including names, types, and values
- fill_form: Fill form fields in a PDF with provided values and save to output path
# Clone the repository
git clone https://github.com/gzigurella/pdf-mcp.git
cd pdf-mcp
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install the package
pip install -e .# Clone and enter directory
git clone https://github.com/gzigurella/pdf-mcp.git
cd pdf-mcp
# Install with uv
uv pip install -e .Add to your ~/.config/opencode/opencode.json:
{
"mcpServers": {
"pdf-mcp": {
"type": "local",
"command": [
"/path/to/pdf-mcp/venv/bin/python",
"-m",
"pdf_mcp"
],
"enabled": true
}
}
}Add to your Claude Desktop config:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"pdf-mcp": {
"command": "/path/to/pdf-mcp/venv/bin/python",
"args": ["-m", "pdf_mcp"]
}
}
}For any MCP-compatible client:
# Start the server directly
/path/to/venv/bin/python -m pdf_mcpThe server communicates via stdio using the MCP protocol.
Extract text content from a PDF file. Supports PDFs with searchable text and can extract text from specific pages or ranges.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file_path | string | Yes | - | Path to the PDF file to extract text from |
| pages | string | No | "all" | Page range to extract (e.g., '1-5', '3,7,9', 'all') |
{
"file_path": "/path/to/document.pdf",
"pages": "1-5"
}Extract metadata from a PDF file including title, author, subject, keywords, creator, producer, creation date, modification date, and page count.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file_path | string | Yes | - | Path to the PDF file to extract metadata from |
{
"file_path": "/path/to/document.pdf"
}Search for specific text within a PDF file. Returns page numbers and context around the found text. Useful for finding specific content in large documents.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file_path | string | Yes | - | Path to the PDF file to search within |
| query | string | Yes | - | Text to search for in the PDF |
| case_sensitive | boolean | No | false | Whether to perform case-sensitive search |
| context_words | integer | No | 10 | Number of words to include before and after each match |
{
"file_path": "/path/to/document.pdf",
"query": "important term",
"case_sensitive": false,
"context_words": 5
}Get the total number of pages in a PDF file. Returns a simple integer count.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file_path | string | Yes | - | Path to the PDF file to count pages for |
{
"file_path": "/path/to/document.pdf"
}Extract specific pages from a PDF file and save as a new PDF. Supports page ranges and individual page selection.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file_path | string | Yes | - | Path to the source PDF file |
| pages | string | Yes | - | Pages to extract (e.g., '1-5', '3,7,9', '1,3-5') |
| output_path | string | Yes | - | Path where the extracted pages will be saved as a new PDF |
{
"file_path": "/path/to/source.pdf",
"pages": "1,3,5-7",
"output_path": "/path/to/output.pdf"
}Split a PDF file into multiple separate PDF files based on page ranges. Returns a JSON with base64-encoded PDFs for each selected page. Supports single pages, page ranges, and all pages.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file_path | string | Yes | - | Path to the PDF file to split |
| page_range | string | Yes | - | Page range to split - 'all', single page (e.g., '1'), or range (e.g., '1-3', '2-5') |
{
"file_path": "/path/to/document.pdf",
"page_range": "1-3"
}Merge multiple PDF files into a single PDF. Files are merged in the order provided.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file_paths | array | Yes | - | List of PDF file paths to merge |
| output_path | string | Yes | - | Path where the merged PDF will be saved |
{
"file_paths": ["/path/to/doc1.pdf", "/path/to/doc2.pdf", "/path/to/doc3.pdf"],
"output_path": "/path/to/merged.pdf"
}Convert PDF pages to PNG images. Returns a JSON with base64-encoded PNG images for each page. Supports custom DPI settings for resolution control.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file_path | string | Yes | - | Path to the PDF file to convert to images |
| dpi | integer | No | 150 | Image resolution in dots per inch |
| format | string | No | "png" | Image format (PNG only) |
{
"file_path": "/path/to/document.pdf",
"dpi": 300,
"format": "png"
}Extract all form fields from a PDF document including field names, types, current values, and available choices for dropdown fields.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file_path | string | Yes | - | Path to the PDF file to extract form fields from |
{
"file_path": "/path/to/form.pdf"
}Returns a JSON with field information:
{
"fields": [
{
"name": "first_name",
"type": "text",
"value": "",
"page": 1,
"rect": {"x0": 50, "y0": 72, "x1": 150, "y1": 92}
},
{
"name": "country",
"type": "combobox",
"value": "",
"page": 1,
"rect": {...},
"choices": ["USA", "Canada", "UK"]
},
{
"name": "accept_terms",
"type": "checkbox",
"value": "",
"page": 1,
"rect": {...},
"on_state": "Yes"
}
],
"total_fields": 3
}Fill form fields in a PDF document with provided values and save to output path. Supports text fields, checkboxes, radio buttons, and dropdowns.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| file_path | string | Yes | - | Path to the source PDF file |
| fields | object | Yes | - | Dictionary of field names and their values to fill |
| output_path | string | Yes | - | Path where the filled PDF will be saved |
{
"file_path": "/path/to/form.pdf",
"fields": {
"first_name": "John",
"last_name": "Doe",
"country": "USA",
"accept_terms": true
},
"output_path": "/path/to/filled_form.pdf"
}Checkbox values accept: true/false, "yes"/"no", "1"/"0".
Radio buttons: use the value from on_state field (get with get_form_fields first).
| Variable | Default | Description |
|---|---|---|
| PDF_MCP_DEBUG | false | Enable debug logging |
# Example
export PDF_MCP_DEBUG=true
python -m pdf_mcpsource venv/bin/activate
pytest
# With coverage
pytest --cov=src --cov-report=htmlpdf-mcp/
├── src/pdf_mcp/
│ ├── __init__.py
│ ├── __main__.py
│ ├── server.py
│ ├── config.py
│ └── tools/
│ ├── __init__.py
│ ├── extract_text.py
│ ├── extract_metadata.py
│ ├── search_text.py
│ ├── get_page_count.py
│ ├── extract_pages.py
│ ├── split_pdf.py
│ ├── merge_pdfs.py
│ ├── pdf_to_images.py
│ ├── get_form_fields.py
│ └── fill_form.py
├── tests/
├── pyproject.toml
└── README.md
If you encounter installation errors, ensure you have Python 3.10 or later:
python --versionMake sure the PDF file paths are correct and the files exist:
ls -l /path/to/your/document.pdfThe tools will raise a RuntimeError if attempting to process encrypted PDFs. Ensure your PDFs are not password-protected.
For very large PDF files, consider processing them in smaller chunks using the extract_pages or split_pdf tools.
If you encounter permission errors, ensure the PDF files are readable:
chmod +r /path/to/your/document.pdf- File Access: The server only processes files that exist and are readable by the running process
- Path Validation: All file paths are validated before processing
- No Network Access: The server does not make any network requests
- Temporary Files: Temporary files are properly cleaned up after processing
- Error Handling: Sensitive information is not exposed in error messages
- Encrypted PDFs: Password-protected PDFs are rejected with appropriate error messages
{
"name": "extract_text",
"arguments": {
"file_path": "/documents/report.pdf",
"pages": "1-3,7,9"
}
}{
"name": "search_text",
"arguments": {
"file_path": "/documents/contract.pdf",
"query": "liability clause",
"case_sensitive": true,
"context_words": 15
}
}{
"name": "merge_pdfs",
"arguments": {
"file_paths": [
"/reports/q1.pdf",
"/reports/q2.pdf",
"/reports/q3.pdf",
"/reports/q4.pdf"
],
"output_path": "/reports/annual.pdf"
}
}{
"name": "pdf_to_images",
"arguments": {
"file_path": "/documents/presentation.pdf",
"dpi": 300
}
}MIT