Skip to content

PDFCLOUD-5351 Initial Python API for pdfRest#1

Merged
datalogics-cgreen merged 51 commits intopdfrest:mainfrom
datalogics-kam:pdfcloud-1340-initial-api
Nov 11, 2025
Merged

PDFCLOUD-5351 Initial Python API for pdfRest#1
datalogics-cgreen merged 51 commits intopdfrest:mainfrom
datalogics-kam:pdfcloud-1340-initial-api

Conversation

@datalogics-kam
Copy link
Copy Markdown
Contributor

@datalogics-kam datalogics-kam commented Oct 2, 2025

PDFCLOUD-5351

Summary

This PR introduces a typed Python SDK for pdfRest with synchronous and asynchronous clients, built around declarative Pydantic validation and rich typed payloads. It follows repository guidelines for structure and testing, uses uv + nox for builds and matrix tests, enforces ruff formatting with pyright/mypy type checking, and treats clients as context managers. File operations and endpoints are exposed through clear, fully named helpers, with shared validators for page ranges, output prefixes, MIME types, and other inputs. Core features include uploads (paths, raw files, URLs), downloads/streaming, graphic conversions (PNG and multi-format), metadata queries, redaction workflows, and split/merge services—each with unit and live tests and consistent request customization support. CI is modernized for speed and reliability across Python 3.10–3.14.

Changes

  • Project bootstrap and workflows

    • Initial scaffold, repo configuration, and AGENTS.md added; GitHub Workflows adapted.
  • CI workflow and caching improvements

    • Fixed pre-commit workflow caching and UV-related cache keys/options.
  • Tooling setup and initial client interface

    • Updated pre-commit hooks and added uv lock check; added pyright alongside mypy; modernized publish workflow; brought in pydantic/httpx and pytest helpers.
    • First cut of typed sync/async clients with an “up” endpoint, request validation, and unified error handling.
  • Linting, Python versions, and test orchestration

    • Switched to ruff formatting; dropped Python 3.9, added 3.14 (support 3.10–3.14); introduced nox-based multi-version testing; aligned pyright/mypy settings.
  • Authentication and early uploads

    • API key validation and improved /up options with dedicated auth errors; CI variable for API key; initial WIP for file uploads.
  • Authorization header correction and docs

    • Standardized on Api-Key header; incremental AGENTS.md updates.
  • File upload refactors and URL uploads

    • Unified upload handling (UploadFiles), added create_from_paths, normalized paths; expanded tests.
    • Tests use context-manager clients.
    • Added create_from_urls with validation and live coverage.
  • Downloads/streaming and request headers

    • Added download helpers and streaming (sync/async).
    • Injected SDK version and headers (wsn, User-Agent) with tests.
  • Models organization and URL validation

    • Converted models to a package.
    • Replaced ad-hoc URL normalization with UploadURLs Pydantic model.
  • Model utilities, PNG conversion, and file metadata

    • Added validation utilities and response types.
    • Implemented convert_to_png (sync/async) with parameter validation and tests.
    • Added get method for retrieving file metadata.
  • Request customization across endpoints

    • Added extra_query, extra_headers, extra_body, and timeout to all relevant methods with tests.
  • Multi-format graphic conversions and testing guidance

    • Implemented BMP/GIF/JPEG/PNG/TIFF conversions via a shared helper; comprehensive tests for options and bounds.
  • Parallel testing and timeouts; authoring guidelines

    • Enabled nox parallelism; increased default read timeout for long pdfRest operations; expanded AGENTS.md usage/testing guidance.
  • PDF info API and defaults

    • Added query_pdf_info with payload/response models and tests.
    • Introduced ALL_PDF_INFO_QUERIES as the default; tests updated.
    • Enforced PDF MIME-type validation in PdfInfoPayload.
  • Redaction workflow

    • Added preview_redactions/apply_redactions with typed payloads and new redaction types; live/unit tests; AGENTS.md guidance expanded.
  • CI test tuning

    • noxfile gained additional args and a no-parallel mode; CI runs tests without parallelism.
    • Limited live resolution-bound tests to PNG to reduce runtime.
  • Page-range model refactor and PNG range tests

    • Replaced bespoke parsing with Pydantic page-range models and serializers.
    • Added thorough PNG conversion tests for page-range edge cases.
  • Split/Merge services

    • Implemented split_pdf and merge_pdfs (sync/async) with typed payloads, serializers, and comprehensive tests.
    • Updated AGENTS.md with split/merge guidance and improved live test notes.

@datalogics-kam datalogics-kam marked this pull request as draft October 2, 2025 17:47
@datalogics-kam datalogics-kam force-pushed the pdfcloud-1340-initial-api branch 7 times, most recently from dd6aa63 to 5e8b5eb Compare October 9, 2025 18:36
@datalogics-kam datalogics-kam force-pushed the pdfcloud-1340-initial-api branch 2 times, most recently from d16f9e0 to 82264fb Compare October 10, 2025 20:34
@datalogics-kam datalogics-kam force-pushed the pdfcloud-1340-initial-api branch from 82264fb to 0a2a144 Compare October 27, 2025 21:06
- Introduces typed sync and async clients for the pdfRest API with configurable
  base URL, API key (with env var fallback), custom headers, and timeouts.
- Adds centralized, library-specific exceptions, translating HTTP layer timeout,
  transport, and request errors, and wrapping non-2xx API responses with status
  and message details.
- Validates and normalizes requests using internal Pydantic models, including
  header/param coercion, endpoint shape checks, and timeout handling.
- Implements an “up” endpoint method in both clients that returns a validated
  health/metadata response model.
- Updates the package exports to expose clients, request option types, response
  model, and error helpers.

Assisted-by: Codex
- Python 3.9 is EOL after its last security release in October 2025.

- Python 3.14 is newly released.

- Change version ranges to support Python 3.10-3.14

- Use uv-build up to 0.10.0.

- Test Python 3.14 in CI
- Modern Python-configured replacement for tox

- Use nox in the GitHub Workflow as well

- Update msgpack (only); the new version has built wheels for Python
  3.14.
- Implemented UUID format validation for API keys.
- Enhanced the `/up` client methods to accept additional headers,
  query parameters, and timeouts.
- Raised `PdfRestAuthenticationError` for authentication failures.
- Refactored request composition for improved reuse and readability.
- Updated tests to cover new validations and functionalities.

Assisted-by: Codex
- Introduced `extra_query`, `extra_headers`, `extra_body`, and `timeout`
  parameters across synchronous and asynchronous methods for enhanced
  flexibility in API integrations.
- Added a validation check to prevent combining JSON payloads with
  multipart file uploads.
- Updated client methods for file handling, downloads, and conversions
  to include optional request customizations.
- Extended tests to cover new request customization capabilities and
  ensure proper handling of custom parameters in both sync and async
  clients.

Assisted-by: Codex
@datalogics-kam datalogics-kam force-pushed the pdfcloud-1340-initial-api branch 2 times, most recently from c4d6864 to 3eac74b Compare November 5, 2025 20:44
- Implemented methods to convert PDF files to BMP, GIF, JPEG, PNG, and
  TIFF formats in both synchronous and asynchronous clients.
- Centralized `convert_to_graphic` method for reusable logic across all
  conversion methods.
- Added payload models for each graphic format to ensure parameter
  validation and API compatibility.
- Extended tests to verify conversion logic, including live integration
  tests for color models, resolution bounds, and smoothing options.

Assisted-by: Codex
- Default timeout remains 10 seconds, except for 120 seconds for read,
  for when pdfRest takes longer to process a file.
- Timeout is configurable by the customer both at the client and
  individual call level.

Assisted-by: Codex
- Added details on pytest parallel execution and scheduling.
- Documented context manager usage for sync/async clients.
- Expanded coding style with endpoint-specific validation, payload
  models, and method naming conventions.
- Provided detailed testing instructions, including parameterized tests,
  live tests, and validation checks.
- Clarified integration guidelines for new endpoints and shared
  validation suites.

Assisted-by: Codex
@datalogics-kam datalogics-kam force-pushed the pdfcloud-1340-initial-api branch from 45fc760 to a71b0ce Compare November 6, 2025 18:48
- Implemented `query_pdf_info` method in both synchronous and
  asynchronous clients to retrieve metadata about PDF documents.
- Introduced `PdfInfoPayload` and `PdfRestInfoResponse` models for
  payload validation and response handling.
- Added `PdfInfoQuery` type and extended it with various metadata fields
  supported by the pdfRest API.
- Expanded test suite with unit, live, and async tests for the new
  endpoint.
- Enhanced validation utilities to handle query inputs and sequence
  normalization.
- Introduced `ALL_PDF_INFO_QUERIES` constant for predefined query sets.
- Updated synchronous and asynchronous `query_pdf_info` methods to use
  `ALL_PDF_INFO_QUERIES` as the default value for `queries`.
- Adjusted type definitions and imports accordingly.
- Updated tests to cover new defaults.

Assisted-by: Codex
@datalogics-kam datalogics-kam force-pushed the pdfcloud-1340-initial-api branch from af5954e to 0e96a20 Compare November 10, 2025 19:42
@datalogics-kam datalogics-kam marked this pull request as ready for review November 10, 2025 19:42
- Introduced `preview_redactions` and `apply_redactions` methods for
  generating redaction previews and applying redactions in both sync and
  async clients.
- Added `PdfRedactionPreviewPayload` and `PdfRedactionApplyPayload`
  models for payload validation.
- Implemented new redaction-related types: `PdfRedactionInstruction`,
  `PdfRedactionPreset`, `PdfRedactionType`, and `PdfRGBColor`.
- Centralized file operation handling via `_post_file_operation` for
  shared logic across clients.

Assisted-by: Codex
- Introduced live tests for `preview_redactions` and `apply_redactions`
  to validate redaction workflows.
- Added unit tests for `PdfRedactionPreviewPayload` and
  `PdfRedactionApplyPayload` models, ensuring accurate payload
  validation.
- Verified handling of invalid inputs, including incorrect redaction
  instructions and color values.
- Updated test fixtures and utilities for live and isolated testing
  scenarios.

Assisted-by: Codex
- Enhanced the AGENTS.md with redaction-related guidelines and reusable
  patterns for validation and serialization.
- Pass positional arguments to pytest

- --no-parallel makes tests not run in parallel

- manually handle -n and add maxschedchunk
- Use only one graphic type to save on testing time; they all have the
  same limits.
- Use Pydantic models to decompose and validate page ranges rather than
  longer bespoke parsing code.
- Simplified page range handling by removing `_require_positive_page`
  and `_validate_page_range_entry` in favor of new `AscendingPageRange`
  and related validators.
- Replaced `PageRangeEntry` with `AscendingPageRange` in `BasePdfRestGraphicPayload`.
- Enhanced serialization logic with `_serialize_page_ranges`.
- Updated supporting type definitions and removed redundant code.

Assisted-by: Codex
- More thoroughly test page ranges in light of new changes.
- Introduced new live tests to validate PNG conversion with various page
  range formats, including valid and invalid cases.
- Added coverage for page range parsing and error scenarios.
- Updated fixtures to support new tests with a 20-page PDF resource.

Assisted-by: Codex
- Introduced `split_pdf` and `merge_pdfs` methods in both sync and async
  clients.
- Added `PdfSplitPayload` and `PdfMergePayload` models for payload
  validation and serialization.
- Created associated types and serializers for page groupings and merge
  sources.
- Added comprehensive live and unit tests for splitting and merging
  functionality.

Assisted-by: Codex
- Enhanced `AGENTS.md` with rich type usage, updated live tests, and
  reproducible fixtures.

Assisted-by: Codex
@datalogics-cgreen datalogics-cgreen merged commit 16881dc into pdfrest:main Nov 11, 2025
13 of 14 checks passed
@datalogics-kam datalogics-kam deleted the pdfcloud-1340-initial-api branch February 6, 2026 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants