diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..ece320d --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,80 @@ +# Contributing + +Thanks for contributing to `pdfrest`. + +## Development setup + +1. Install project tooling: + +```bash +uv sync --group dev +``` + +2. (Recommended) install git hooks: + +```bash +uv run pre-commit install +``` + +3. Verify package import/version: + +```bash +uv run python -c "import pdfrest; print(pdfrest.__version__)" +``` + +## Code quality checks + +Run these before opening a PR: + +```bash +uv run ruff format . +uv run ruff check . +uv run basedpyright +``` + +## Tests + +Quick local run: + +```bash +uv run pytest -n auto --maxschedchunk 2 +``` + +Full interpreter matrix with coverage artifacts (`coverage/py/`): + +```bash +uvx nox -s tests +``` + +Class/function coverage gate for client classes: + +```bash +uvx nox -s class-coverage +``` + +To reuse existing coverage JSON without rerunning tests: + +```bash +uvx nox -s class-coverage -- --no-tests +``` + +## Examples + +Run all examples: + +```bash +uvx nox -s examples +``` + +Run one example: + +```bash +uv run nox -s run-example -- examples/delete/delete_example.py +``` + +## Docs preview (optional) + +```bash +uv run mkdocs serve +uv run mkdocs build --strict +``` diff --git a/README.md b/README.md index e838a4d..d3326e8 100644 --- a/README.md +++ b/README.md @@ -1,61 +1,141 @@ -# pdfrest +# pdfRest Python SDK -Python client library for the PDFRest service. The project is managed with -[uv](https://docs.astral.sh/uv/) and targets Python 3.9 and newer. +[![Tests](https://img.shields.io/github/actions/workflow/status/pdfrest/pdfrest-python/test-and-publish.yml?branch=main&label=tests)](https://github.com/pdfrest/pdfrest-python/actions/workflows/test-and-publish.yml) +[![PyPI Version](https://img.shields.io/pypi/v/pdfrest)](https://pypi.org/project/pdfrest/) +[![Python Versions](https://img.shields.io/pypi/pyversions/pdfrest)](https://pypi.org/project/pdfrest/) +[![llms.txt](https://img.shields.io/badge/llms.txt-available-2ea44f)](https://python.pdfrest.com/llms.txt) -## Running examples +Build production-grade PDF automation with the official Python SDK for +[pdfRest](https://pdfrest.com/): a powerful PDF API platform for conversion, +OCR, extraction, redaction, security, forms, and AI-ready document workflows. -```bash -uvx nox -s examples -uv run nox -s run-example -- examples/delete/delete_example.py -``` +- Homepage: [pdfrest.com](https://pdfrest.com/) +- API docs: [pdfrest.com/apidocs](https://pdfrest.com/apidocs/) +- Python SDK docs: [python.pdfrest.com](https://python.pdfrest.com/) +- API Lab: [pdfrest.com/apilab](https://pdfrest.com/apilab/) -## Getting started +## Why pdfRest -```bash -uv sync -uv run python -c "import pdfrest; print(pdfrest.__version__)" -``` +- Enterprise PDF quality powered by Adobe PDF Library technology. +- Fast onboarding with API Lab, code samples, and straightforward REST patterns. +- Chainable API workflows that let you pass outputs between calls. +- Deployment flexibility: Cloud, self-hosted on AWS, or self-hosted container. +- Security and compliance resources published in the trust center and product + documentation. + +## Why this SDK + +- Official typed Python interface to pdfRest (`PdfRestClient` and + `AsyncPdfRestClient`). +- Pydantic-backed request/response models for safer integrations. +- High-level helpers for the endpoints teams use most in production. +- Consistent error handling, request customization, and file management helpers. + +## What you can build -## Development +Use this PDF API for workflows like: -To install the tooling used by CI locally, include the `--group dev` flag: +- Convert and transform: PDF to Word/Excel/PowerPoint/images/Markdown, and + convert files to PDF/PDF-A/PDF-X. +- Extract and analyze: OCR, text extraction, image extraction, PDF metadata. +- Secure and govern: redaction, encryption, permissions, signing, watermarking. +- Compose and optimize: merge/split, compress, flatten, rasterize, color + conversion. +- Form operations: import/export form data, flatten forms, XFA to Acroforms. + +## Built for AI and LLM pipelines + +pdfRest is especially useful for document AI systems: + +- Convert PDFs to structured Markdown for downstream retrieval and training data + prep. +- Extract clean text and metadata for indexing and chunking pipelines. +- Summarize and translate document content with API-native operations. +- Keep multi-step pipelines efficient by chaining outputs between operations. + +## Installation + +`pdfrest` supports Python `3.10+`. + +Recommended (`uv`): ```bash -uv sync --group dev +uv add pdfrest ``` -It is recommended to enable the pre-commit hooks after installation: +Fallback (`pip`): ```bash -uv run pre-commit install +pip install pdfrest ``` -Run the test suite with: +## Quick start + +Set your API key in `PDFREST_API_KEY`: ```bash -uv run pytest +export PDFREST_API_KEY="your-api-key" ``` -Check per-function coverage for the client classes: +Run your script: ```bash -uvx nox -s class-coverage +uv run python your_script.py ``` -To reuse an existing `coverage/py/coverage.json` without rerunning -tests, add `-- --no-tests` (and optional `--coverage-json path`). +Example (upload + extract text): -## Documentation +```python +from pathlib import Path -Run the docs site locally: +from pdfrest import PdfRestClient -```bash -uv run mkdocs serve +with PdfRestClient() as client: + uploaded = client.files.create_from_paths([Path("input.pdf")])[0] + result = client.extract_pdf_text(uploaded, full_text="document") + +preview = "" +if result.full_text is not None and result.full_text.document_text is not None: + preview = result.full_text.document_text[:500] +print(preview) ``` -Build the static documentation site: +Async example: -```bash -uv run mkdocs build --strict +```python +import asyncio +from pathlib import Path + +from pdfrest import AsyncPdfRestClient + + +async def main() -> None: + async with AsyncPdfRestClient() as client: + uploaded = (await client.files.create_from_paths([Path("input.pdf")]))[0] + result = await client.extract_pdf_text(uploaded, full_text="document") + preview = "" + if result.full_text is not None and result.full_text.document_text is not None: + preview = result.full_text.document_text[:500] + print(preview) + + +asyncio.run(main()) ``` + +## Deployment options + +- Cloud (default): use `PdfRestClient()` with `PDFREST_API_KEY`. +- Self-hosted: set `base_url="https://your-api-host"` and keep the same Python + SDK surface. + +## Learn more + +- API toolkit overview: [pdfrest.com](https://pdfrest.com/) +- Resources and insights: + [pdfrest.com/resources](https://pdfrest.com/resources/) +- Example scripts: `examples/README.md` +- Python SDK docs: [python.pdfrest.com](https://python.pdfrest.com/) + +## For contributors + +Contributor workflows live in `CONTRIBUTING.md`. diff --git a/docs/llms-full.txt b/docs/llms-full.txt new file mode 100644 index 0000000..f5dc658 --- /dev/null +++ b/docs/llms-full.txt @@ -0,0 +1,246 @@ +# pdfRest Python SDK Documentation (Full LLM Guide) + +> Expanded machine-readable guide for the `pdfrest` Python SDK documentation, +> examples, and repository resources. + +This file is intended for LLMs, assistants, and automated documentation tools +that need a richer overview than `llms.txt`. It summarizes the purpose of the +SDK, the most important docs pages, common usage patterns, and practical +integration constraints. + +The `pdfrest` Python SDK is the official Python client for the pdfRest PDF API. +It provides typed request/response models, synchronous and asynchronous clients, +and high-level endpoint helpers for common PDF workflows such as conversion, +OCR, extraction, redaction, security, forms, and document optimization. + +## Canonical Entry Points + +- [Short Guide (`llms.txt`)](https://python.pdfrest.com/llms.txt): Minimal, + curated index of the most important docs and platform links. +- [SDK Docs Home](https://python.pdfrest.com/): Landing page for the Python SDK + documentation. +- [GitHub Repository](https://github.com/pdfrest/pdfrest-python): Source code, + tests, examples, CI workflows, and contribution guidance. +- [PyPI Package](https://pypi.org/project/pdfrest/): Install metadata and + release history. + +## Product and Platform Context + +- [pdfRest Homepage](https://pdfrest.com/): Product overview and capabilities. +- [pdfRest API Docs](https://pdfrest.com/apidocs/): Endpoint-level REST + documentation for the platform itself. +- [API Lab](https://pdfrest.com/apilab/): Interactive API testing and starter + code generation. +- [Product Documentation](https://docs.pdfrest.com/): Broader deployment and + platform guidance (Cloud, AWS, Container, and feature docs). + +Use the Python SDK docs for Python integration details. Use the platform API +docs/product docs when you need endpoint semantics, deployment decisions, or +service-wide behavior outside the Python wrapper. + +## Core SDK Concepts + +- Sync and async clients: + - `PdfRestClient` for synchronous workflows. + - `AsyncPdfRestClient` for asynchronous workflows. +- File-first workflow: + - Upload local files first (for example via `client.files.create_from_paths`). + - Pass uploaded `PdfRestFile` objects into endpoint helpers. + - Download/read/stream outputs via file helpers. +- Authentication: + - Default environment variable is `PDFREST_API_KEY`. + - The SDK uses the `Api-Key` header for pdfRest-hosted endpoints. +- Deployment targeting: + - Cloud uses the default base URL. + - Self-hosted deployments can be targeted with `base_url=...` while keeping + the same Python method calls. +- Typed interfaces: + - Pydantic-backed models validate inputs and expose structured responses. + - Public shared type contracts are available under `pdfrest.types`. + +## Documentation Pages (Python SDK) + +### Getting Started + +- [Getting Started](https://python.pdfrest.com/getting-started/) +- Purpose: First-time setup and initial API call. +- Includes: + - install instructions (`uv`, `pip`, `poetry`) + - exporting `PDFREST_API_KEY` + - upload + extract-text quickstart example + - links to API reference and API Lab + +### Client Configuration + +- [Client Configuration](https://python.pdfrest.com/client-configuration/) +- Purpose: Configure runtime behavior and request customization. +- Typical topics: + - API key and base URL + - timeouts + - extra headers / query / body overrides + - transport behavior and request options + - logging/debugging + +### Using Files + +- [Using Files](https://python.pdfrest.com/using-files/) +- Purpose: Explain the SDK file-management model used by most endpoint helpers. +- Key workflows: + - upload from paths/URLs + - read bytes/text/json from results + - write downloaded files locally + - stream file content + - delete remote files + +### API Guide + +- [API Guide](https://python.pdfrest.com/api-guide/) +- Purpose: Practical SDK usage guidance across endpoint families. +- Use this when you want examples and integration patterns rather than raw API + signatures. + +### API Reference + +- [API Reference](https://python.pdfrest.com/api-reference/) +- Purpose: Generated reference for: + - `pdfrest` package exports + - public types (`pdfrest.types`) + - public models (`pdfrest.models`) +- Use this for precise names, signatures, and import surfaces. + +## Common Workflow Patterns + +### Upload then process + +Most document-processing calls follow this pattern: + +1. Upload source files. +2. Pass returned `PdfRestFile` objects into a client method. +3. Use the response to inspect metadata and/or download resulting files. + +This file-based flow is central to the SDK and is preferred over passing raw +file IDs directly in application code. + +### Sync usage pattern + +- Use `with PdfRestClient() as client:` +- This ensures the underlying HTTP transport is closed deterministically. + +### Async usage pattern + +- Use `async with AsyncPdfRestClient() as client:` +- This ensures async transport cleanup and avoids leaked connections. + +### Request customization + +Many endpoint helpers support runtime overrides for advanced integration cases: + +- `extra_headers` +- `extra_query` +- `extra_body` +- `timeout` + +Use these when you need to test server behavior, pass preview features, or +adjust request behavior without bypassing the typed SDK entirely. + +## Endpoint Families (High-Level) + +This SDK includes high-level helpers across major pdfRest capabilities. Refer +to the API guide and API reference for exact method names and options. + +### Conversion and Export + +- Convert to Office/image/Markdown formats (for example Word, Excel, + PowerPoint, PNG/JPEG/TIFF/GIF/BMP, Markdown) +- Convert source assets to PDF (Office, HTML, URL, email, images, PostScript) +- Convert to standards formats (`PDF/A`, `PDF/X`) + +### Extraction and Analysis + +- Extract PDF text +- Extract text to file outputs +- Extract images +- Query PDF information/metadata +- OCR PDF + +### AI and Language Workflows + +- Summarize PDF text +- Translate PDF text +- Translate/summarize outputs to files (where supported) +- Markdown conversion for downstream indexing, retrieval, and LLM pipelines + +### Security and Governance + +- Redaction preview/apply +- Passwords and permissions +- Encryption/decryption/restrictions +- Watermarking (text/image) +- Digital signing + +### Document Composition and Optimization + +- Merge PDFs +- Split PDFs +- Compress +- Linearize +- Rasterize +- Flatten annotations/layers/transparencies/forms +- Color conversion + +### Forms and Structured Data + +- Import form data +- Export form data +- Flatten PDF forms +- Convert XFA to Acroforms + +## Public API Surface (Repository Context) + +- Top-level package exports include: + - `PdfRestClient` + - `AsyncPdfRestClient` + - file clients (`PdfRestFilesClient`, `AsyncPdfRestFilesClient`) + - exception types (for API/auth/request/timeout/transport errors) +- Shared public types live under: + - `pdfrest.types` +- Public models live under: + - `pdfrest.models` + +When generating or suggesting code, prefer imports from public modules and +avoid private/underscored modules. + +## Examples and Repository Resources + +- [Examples README](https://github.com/pdfrest/pdfrest-python/blob/main/examples/README.md): + How to run included examples, including `uv`-based execution. +- [Examples Directory](https://github.com/pdfrest/pdfrest-python/tree/main/examples): + Runnable scripts demonstrating endpoint usage patterns. +- [Repository README](https://github.com/pdfrest/pdfrest-python/blob/main/README.md): + Product-facing overview, install, quickstart, and links. +- [Contributing Guide](https://github.com/pdfrest/pdfrest-python/blob/main/CONTRIBUTING.md): + Developer workflows (lint, type-checking, tests, examples, docs preview). + +## Practical Guidance for LLMs and Code Assistants + +- Prefer `uv` commands when suggesting modern setup (`uv add pdfrest`, + `uv run ...`), but `pip install pdfrest` is a valid fallback. +- Prefer context-manager usage for clients (`with` / `async with`). +- For local files: + - upload first + - pass `PdfRestFile` objects to endpoint methods + - use `client.files.*` helpers for IO +- Use public imports (`pdfrest`, `pdfrest.types`, `pdfrest.models`) unless a + task explicitly requires internal test-only serialization details. +- For endpoint-specific arguments and literals, consult the Python API reference + and public type definitions before inventing values. + +## Change Sensitivity / Freshness + +The SDK evolves over time (new endpoints, new payload options, and expanded test +coverage). For latest method availability, supported Python versions, and +release details, prefer: + +- [PyPI Package](https://pypi.org/project/pdfrest/) +- [GitHub Repository](https://github.com/pdfrest/pdfrest-python) +- [Python SDK Docs](https://python.pdfrest.com/) diff --git a/docs/llms.txt b/docs/llms.txt new file mode 100644 index 0000000..fdc4422 --- /dev/null +++ b/docs/llms.txt @@ -0,0 +1,46 @@ +# pdfRest Python SDK Documentation + +> Official documentation for the `pdfrest` Python SDK, a typed Python client +> for the pdfRest PDF processing API. + +This site documents how to install, configure, and use the Python SDK to call +pdfRest endpoints for PDF conversion, OCR, extraction, redaction, security, +forms, and other document workflows. + +The SDK supports both synchronous and asynchronous clients and can target +pdfRest Cloud or self-hosted deployments by changing the client `base_url`. + +- [Expanded guide (`llms-full.txt`)](https://python.pdfrest.com/llms-full.txt): + Richer machine-readable overview with docs summaries, workflow patterns, and + integration guidance. + +## Primary Docs + +- [Getting Started](https://python.pdfrest.com/getting-started/): Install the + SDK, configure `PDFREST_API_KEY`, and make your first API call. +- [Client Configuration](https://python.pdfrest.com/client-configuration/): + Configure API key, base URL, timeouts, logging, and request customization. +- [Using Files](https://python.pdfrest.com/using-files/): Upload, download, + stream, and manage files with the SDK file helpers. +- [API Guide](https://python.pdfrest.com/api-guide/): Practical guidance for + using endpoint helpers and common SDK workflows. +- [API Reference](https://python.pdfrest.com/api-reference/): Generated + reference for the `pdfrest` package, public models, and public types. + +## Product and Platform + +- [pdfRest Homepage](https://pdfrest.com/): Product overview, capabilities, and + platform positioning. +- [pdfRest API Docs](https://pdfrest.com/apidocs/): Endpoint-level REST API + documentation for the pdfRest platform. +- [API Lab](https://pdfrest.com/apilab/): Interactive environment for testing + pdfRest API endpoints and generating starter code. +- [Product Docs](https://docs.pdfrest.com/): Broader pdfRest documentation, + including deployment options and platform guides. + +## Repository + +- [GitHub Repository](https://github.com/pdfrest/pdfrest-python): Source code, + examples, tests, CI workflows, and contribution guidelines for this SDK. +- [PyPI Package](https://pypi.org/project/pdfrest/): Package distribution, + release history, and install metadata for `pdfrest`. diff --git a/mkdocs.yml b/mkdocs.yml index f74272e..f562764 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,5 +1,5 @@ site_name: pdfrest -site_description: Python client library for interacting with the PDFRest API +site_description: Python client library for interacting with the pdfRest API repo_url: https://github.com/pdfrest/pdfrest-python docs_dir: docs site_dir: site diff --git a/pyproject.toml b/pyproject.toml index b7888e3..2a2edc0 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,7 +1,7 @@ [project] name = "pdfrest" -version = "1.0.1" -description = "Python client library for interacting with the PDFRest API" +version = "1.0.2" +description = "Python client library for interacting with the pdfRest API" readme = {file = "README.md", content-type = "text/markdown"} authors = [ {name = "Datalogics"}, diff --git a/uv.lock b/uv.lock index e889725..5ee282c 100644 --- a/uv.lock +++ b/uv.lock @@ -961,7 +961,7 @@ wheels = [ [[package]] name = "pdfrest" -version = "1.0.1" +version = "1.0.2" source = { editable = "." } dependencies = [ { name = "exceptiongroup" },