PDF Sentinel is a lightweight safety inspection library for PDF documents. It detects oversized, vector-heavy, or otherwise resource-intensive pages (like blueprints) that could slow down or crash OCR and pipelines.
-
Detects dangerous or heavy PDF pages:
- Large page dimensions (A0, engineering blueprints, etc.)
- Massive embedded images
- Vector-heavy drawings (architectural plans)
- Pages that exceed safe rasterization thresholds
-
Returns simplified or detailed analysis for pages and files
-
Configurable limits (page size, pixel thresholds, etc.)
-
Optional JSON response for easy API integration
pip install PDFSentinelThis will install the library from PyPI and make it available to import in your project.
from pdfsentinel import PDFSentinel
sentinel = PDFSentinel()
result = sentinel.is_file_safe("samples/test.pdf")
print(result)Output (Python dict):
{
"file_name": "test.pdf",
"pages": 23,
"is_file_safety": true,
"unsafety_pages": []
}You can also get JSON-formatted output directly:
print(sentinel.is_file_safe("samples/test.pdf", json_response=True))print(sentinel.is_page_safe("samples/test.pdf", 2, json_response=True))Example output:
{
"file_name": "test1.pdf",
"page": 24,
"is_page_safety": false,
"errors": [
"page_too_large:2592.0x1728.0",
"too_many_vector_ops:33035",
"raster_estimate_too_big:77760000"
]
}print(sentinel.file_analysis("samples/test.pdf", json_response=True))Output:
{
"file_name": "test1.pdf",
"pages": 2,
"is_file_safety": true,
"results": [
{
"page": 1,
"is_page_safety": true,
"errors": [],
"page_width": 612.0,
"page_height": 792.0,
"max_image_pixels": 0,
"max_vectors_operations": 58,
"max_raster_pixels": 8415000
},
{
"page": 2,
"is_page_safety": false,
"errors": [
"page_too_large:2592.0x1728.0",
"too_many_vector_ops:33035",
"raster_estimate_too_big:77760000"
],
"page_width": 2592.0,
"page_height": 1728.0,
"max_image_pixels": 354652,
"max_vectors_operations": 33035,
"max_raster_pixels": 77760000
}
]
}print(sentinel.page_analysis("samples/test.pdf", 3, json_response=True))You can override safety thresholds per call:
sentinel.is_file_safe(
"samples/test.pdf",
config={
"max_page_size": 1800,
"max_image_pixels": 10_000_000,
"max_vectors_operations": 1000,
"max_raster_pixels": 20_000_000
}
)| Parameter | Default | Description |
|---|---|---|
max_page_size |
2000 | Max page dimension in points |
max_image_pixels |
20,000,000 | Max embedded image total pixels size (w × h) |
max_vectors_operations |
1500 | Max allowed vector drawing operations |
max_raster_pixels |
30,000,000 | Estimated max rasterization size (at 300 dpi) |
MIT License © 2025 — Not Empty Foundation