PDFCLOUD-5351 Initial Python API for pdfRest#1
Merged
datalogics-cgreen merged 51 commits intopdfrest:mainfrom Nov 11, 2025
Merged
PDFCLOUD-5351 Initial Python API for pdfRest#1datalogics-cgreen merged 51 commits intopdfrest:mainfrom
datalogics-cgreen merged 51 commits intopdfrest:mainfrom
Conversation
dd6aa63 to
5e8b5eb
Compare
d16f9e0 to
82264fb
Compare
- Uses uv - Some project settings brought over from pdfassistant-chatbot by Codex. Assisted-by: Codex
Assisted-by: Codex
Assisted-by: Codex
- Update `setup-uv` to v6 - Fix up caching. - Cache keys for virtualenv and the like are based on lockfile
- Best type checker to use in an IDE - Use both pyright and mypy - mypy run on pushes and in CI
82264fb to
0a2a144
Compare
- Introduces typed sync and async clients for the pdfRest API with configurable base URL, API key (with env var fallback), custom headers, and timeouts. - Adds centralized, library-specific exceptions, translating HTTP layer timeout, transport, and request errors, and wrapping non-2xx API responses with status and message details. - Validates and normalizes requests using internal Pydantic models, including header/param coercion, endpoint shape checks, and timeout handling. - Implements an “up” endpoint method in both clients that returns a validated health/metadata response model. - Updates the package exports to expose clients, request option types, response model, and error helpers. Assisted-by: Codex
- Python 3.9 is EOL after its last security release in October 2025. - Python 3.14 is newly released. - Change version ranges to support Python 3.10-3.14 - Use uv-build up to 0.10.0. - Test Python 3.14 in CI
- Modern Python-configured replacement for tox - Use nox in the GitHub Workflow as well - Update msgpack (only); the new version has built wheels for Python 3.14.
- Implemented UUID format validation for API keys. - Enhanced the `/up` client methods to accept additional headers, query parameters, and timeouts. - Raised `PdfRestAuthenticationError` for authentication failures. - Refactored request composition for improved reuse and readability. - Updated tests to cover new validations and functionalities. Assisted-by: Codex
- Introduced `extra_query`, `extra_headers`, `extra_body`, and `timeout` parameters across synchronous and asynchronous methods for enhanced flexibility in API integrations. - Added a validation check to prevent combining JSON payloads with multipart file uploads. - Updated client methods for file handling, downloads, and conversions to include optional request customizations. - Extended tests to cover new request customization capabilities and ensure proper handling of custom parameters in both sync and async clients. Assisted-by: Codex
c4d6864 to
3eac74b
Compare
- Implemented methods to convert PDF files to BMP, GIF, JPEG, PNG, and TIFF formats in both synchronous and asynchronous clients. - Centralized `convert_to_graphic` method for reusable logic across all conversion methods. - Added payload models for each graphic format to ensure parameter validation and API compatibility. - Extended tests to verify conversion logic, including live integration tests for color models, resolution bounds, and smoothing options. Assisted-by: Codex
- Default timeout remains 10 seconds, except for 120 seconds for read, for when pdfRest takes longer to process a file. - Timeout is configurable by the customer both at the client and individual call level. Assisted-by: Codex
- Added details on pytest parallel execution and scheduling. - Documented context manager usage for sync/async clients. - Expanded coding style with endpoint-specific validation, payload models, and method naming conventions. - Provided detailed testing instructions, including parameterized tests, live tests, and validation checks. - Clarified integration guidelines for new endpoints and shared validation suites. Assisted-by: Codex
45fc760 to
a71b0ce
Compare
- Implemented `query_pdf_info` method in both synchronous and asynchronous clients to retrieve metadata about PDF documents. - Introduced `PdfInfoPayload` and `PdfRestInfoResponse` models for payload validation and response handling. - Added `PdfInfoQuery` type and extended it with various metadata fields supported by the pdfRest API. - Expanded test suite with unit, live, and async tests for the new endpoint. - Enhanced validation utilities to handle query inputs and sequence normalization.
- Introduced `ALL_PDF_INFO_QUERIES` constant for predefined query sets. - Updated synchronous and asynchronous `query_pdf_info` methods to use `ALL_PDF_INFO_QUERIES` as the default value for `queries`. - Adjusted type definitions and imports accordingly. - Updated tests to cover new defaults. Assisted-by: Codex
- Files must be PDF.
af5954e to
0e96a20
Compare
- Introduced `preview_redactions` and `apply_redactions` methods for generating redaction previews and applying redactions in both sync and async clients. - Added `PdfRedactionPreviewPayload` and `PdfRedactionApplyPayload` models for payload validation. - Implemented new redaction-related types: `PdfRedactionInstruction`, `PdfRedactionPreset`, `PdfRedactionType`, and `PdfRGBColor`. - Centralized file operation handling via `_post_file_operation` for shared logic across clients. Assisted-by: Codex
- Introduced live tests for `preview_redactions` and `apply_redactions` to validate redaction workflows. - Added unit tests for `PdfRedactionPreviewPayload` and `PdfRedactionApplyPayload` models, ensuring accurate payload validation. - Verified handling of invalid inputs, including incorrect redaction instructions and color values. - Updated test fixtures and utilities for live and isolated testing scenarios. Assisted-by: Codex
- Enhanced the AGENTS.md with redaction-related guidelines and reusable patterns for validation and serialization.
- Pass positional arguments to pytest - --no-parallel makes tests not run in parallel - manually handle -n and add maxschedchunk
- Use only one graphic type to save on testing time; they all have the same limits.
- Use Pydantic models to decompose and validate page ranges rather than longer bespoke parsing code. - Simplified page range handling by removing `_require_positive_page` and `_validate_page_range_entry` in favor of new `AscendingPageRange` and related validators. - Replaced `PageRangeEntry` with `AscendingPageRange` in `BasePdfRestGraphicPayload`. - Enhanced serialization logic with `_serialize_page_ranges`. - Updated supporting type definitions and removed redundant code. Assisted-by: Codex
- More thoroughly test page ranges in light of new changes. - Introduced new live tests to validate PNG conversion with various page range formats, including valid and invalid cases. - Added coverage for page range parsing and error scenarios. - Updated fixtures to support new tests with a 20-page PDF resource. Assisted-by: Codex
- Introduced `split_pdf` and `merge_pdfs` methods in both sync and async clients. - Added `PdfSplitPayload` and `PdfMergePayload` models for payload validation and serialization. - Created associated types and serializers for page groupings and merge sources. - Added comprehensive live and unit tests for splitting and merging functionality. Assisted-by: Codex
- Enhanced `AGENTS.md` with rich type usage, updated live tests, and reproducible fixtures. Assisted-by: Codex
0e96a20 to
72b2c10
Compare
datalogics-cgreen
approved these changes
Nov 11, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PDFCLOUD-5351
Summary
This PR introduces a typed Python SDK for pdfRest with synchronous and asynchronous clients, built around declarative Pydantic validation and rich typed payloads. It follows repository guidelines for structure and testing, uses uv + nox for builds and matrix tests, enforces ruff formatting with pyright/mypy type checking, and treats clients as context managers. File operations and endpoints are exposed through clear, fully named helpers, with shared validators for page ranges, output prefixes, MIME types, and other inputs. Core features include uploads (paths, raw files, URLs), downloads/streaming, graphic conversions (PNG and multi-format), metadata queries, redaction workflows, and split/merge services—each with unit and live tests and consistent request customization support. CI is modernized for speed and reliability across Python 3.10–3.14.
Changes
Project bootstrap and workflows
CI workflow and caching improvements
Tooling setup and initial client interface
Linting, Python versions, and test orchestration
Authentication and early uploads
Authorization header correction and docs
Api-Keyheader; incremental AGENTS.md updates.File upload refactors and URL uploads
UploadFiles), addedcreate_from_paths, normalized paths; expanded tests.create_from_urlswith validation and live coverage.Downloads/streaming and request headers
wsn,User-Agent) with tests.Models organization and URL validation
UploadURLsPydantic model.Model utilities, PNG conversion, and file metadata
convert_to_png(sync/async) with parameter validation and tests.getmethod for retrieving file metadata.Request customization across endpoints
extra_query,extra_headers,extra_body, andtimeoutto all relevant methods with tests.Multi-format graphic conversions and testing guidance
Parallel testing and timeouts; authoring guidelines
PDF info API and defaults
query_pdf_infowith payload/response models and tests.ALL_PDF_INFO_QUERIESas the default; tests updated.PdfInfoPayload.Redaction workflow
preview_redactions/apply_redactionswith typed payloads and new redaction types; live/unit tests; AGENTS.md guidance expanded.CI test tuning
Page-range model refactor and PNG range tests
Split/Merge services
split_pdfandmerge_pdfs(sync/async) with typed payloads, serializers, and comprehensive tests.