Skip to content

PDFCLOUD-5464 Add additional pdfRest tools#6

Merged
datalogics-kam merged 84 commits intopdfrest:mainfrom
datalogics-cgreen:pdfcloud-5464-add-tools
Feb 5, 2026
Merged

PDFCLOUD-5464 Add additional pdfRest tools#6
datalogics-kam merged 84 commits intopdfrest:mainfrom
datalogics-cgreen:pdfcloud-5464-add-tools

Conversation

@datalogics-cgreen
Copy link
Copy Markdown
Contributor

@datalogics-cgreen datalogics-cgreen commented Dec 18, 2025

PDFCLOUD-5464

Adds the following tools:

  • Linearize PDF
  • Summarize PDF
  • Translate PDF
  • Extract Images
  • Extract Text
  • Convert to Markdown
  • OCR PDF
  • Convert to Excel
  • Convert to PowerPoint
  • Convert XFA Forms
  • Flatten Transparencies
  • Rasterize PDF
  • Flatten Annotations
  • Convert to PDF/A

Comment thread src/pdfrest/client.py
Comment thread src/pdfrest/models/_internal.py
Comment thread src/pdfrest/client.py Outdated
Comment thread src/pdfrest/models/public.py Outdated
Comment thread src/pdfrest/client.py Outdated
Comment thread src/pdfrest/models/public.py Outdated
Comment thread src/pdfrest/client.py Outdated
Comment thread src/pdfrest/client.py Outdated
Comment thread src/pdfrest/models/public.py Outdated
Comment thread src/pdfrest/client.py
@datalogics-cgreen datalogics-cgreen changed the title PDFCLOUD-5464 add tools PDFCLOUD-5464 Add additional pdfRest tools Dec 19, 2025
@datalogics-cgreen
Copy link
Copy Markdown
Contributor Author

datalogics-cgreen commented Dec 20, 2025

Tests are currently failing simply because an expected patch to return inputId in cases where warning is returned has not yet reached pdfRest.

EDIT: Or, we could add a test file with forms. I GUESS. 🤤

Comment thread src/pdfrest/models/_internal.py Outdated
Comment thread src/pdfrest/client.py
Comment thread src/pdfrest/client.py Outdated
Comment thread src/pdfrest/models/public.py Outdated
Comment thread src/pdfrest/models/public.py Outdated
Comment thread src/pdfrest/models/public.py Outdated
Comment thread src/pdfrest/models/public.py Outdated
@datalogics-kam
Copy link
Copy Markdown
Contributor

I had Codex review the tests against the (slightly) updated version of TESTING_GUIDELINES.md from #7 and #8, and it came up with the following. Please correct these concerns insofar as they refer to new work in this PR (idosyncracies like where the PNG live tests are can be handled separately).

  • Unit suites generally follow the hermetic contract: e.g. tests/test_convert_to_png.py:24-241 clears PDFREST_API_KEY, drives httpx.MockTransport, asserts serialized payloads, and mirrors every scenario with @pytest.mark.asyncio counterparts plus request-customization/validation cases, matching the “Core Principles” and “Request Customization” sections of TESTING_GUIDELINES.md.
  • Live PNG coverage still lives inside the unit module (tests/test_convert_to_png.py:582-613) instead of in a dedicated tests/live/ module, which violates the “Live Test Requirement” bullet that mandates one module per endpoint under tests/live/. (This can be fixed separately—kam)
  • Live suites that only exercise the sync client remain for 12 endpoints (tests/live/test_live_convert_to_markdown.py:5-44, test_live_convert_to_pdfx.py:5-54, test_live_convert_to_word.py:5-44, test_live_extract_images.py:5-42, test_live_extract_text.py:5-49, test_live_flatten_pdf_forms.py:5-78, test_live_graphic_conversions.py:111-330, test_live_linearize_pdf.py:5-62, test_live_ocr_pdf.py:5-111, test_live_pdf_redactions.py:5-169, test_live_summarize_pdf_text.py:5-77, test_live_translate_pdf_text.py:5-76). TESTING_GUIDELINES.md explicitly requires “matching sync and async tests for every live module,” so Async coverage and fixtures need to be added to each.
  • Many live acceptance tests assert only that response.output_files or response.markdown is truthy (e.g. tests/live/test_live_graphic_conversions.py:124-227, tests/live/test_live_extract_images.py:21-25, tests/live/test_live_extract_text.py:21-33). The guideline “Make assertions for every relevant response attribute” isn’t being met—these suites should validate filenames, MIME types, sizes, warnings, and input_id echoes, similar to the richer PNG page-range checks in the same file. (The tests don't have to go crazy, but the number of files and their mime types would be good, and again, existing endpoints don't have to get fixed in this PR—kam)
  • Across the live suites, pytest.raises is frequently used without a match= argument (e.g. tests/live/test_live_convert_to_powerpoint.py:62-75, tests/live/test_live_extract_images.py:28-41, tests/live/test_live_convert_to_markdown.py:31-44, tests/live/test_live_graphic_conversions.py:166-314). TESTING_GUIDELINES.md requires pairing every pytest.raises with a regex to lock down server error wording, so these need explicit match= expressions.

Next steps: 1) Move the PNG live cases into tests/live/test_live_convert_to_png.py (or similar) and remove the duplicates from the unit module. 2) Add AsyncPdfRestClient coverage to the 12 outstanding live modules. 3) Strengthen live assertions to check names, MIME types, sizes, warnings, and input_id. 4) Add descriptive match= patterns to every pytest.raises in the live suites.

@datalogics-kam
Copy link
Copy Markdown
Contributor

The XFA tests are failing because there's a warning and no output: 'No XFA forms were detected in the input PDF. No output was produced.'

Probably the test should include a file with XFA forms (and since this is a public repo, it has to be something that is rights-cleared, btw).

But also, #7 also relaxes the strictness a little so that there doesn't have to be an output file. The tests in that module do happen to test both ways: returning a file, and just a warning.

Assisted-by: Codex
Assisted-by: Codex
Assisted-by: Codex
Assisted-by: Codex
Assisted-by: Codex
Assisted-by: Codex
Assisted-by: Codex
@datalogics-cgreen
Copy link
Copy Markdown
Contributor Author

@datalogics-kam I attempted to run the script but ran into trouble. Please see e003088.

The report at the end showed test failures (hence, the latest force push), but I saw nothing in the printout that indicated there were differences between any synchronous and asynchronous tests.

Comment thread scripts/check_test_parity.sh
@datalogics-kam
Copy link
Copy Markdown
Contributor

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6bb55440a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/check_test_parity.sh
@datalogics-kam
Copy link
Copy Markdown
Contributor

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6bb55440a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/check_test_parity.sh Outdated
Copy link
Copy Markdown
Contributor

@datalogics-kam datalogics-kam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were the outstanding stragglers I alluded to in standup.

Comment thread tests/live/test_live_graphic_conversions.py Outdated
Comment thread tests/live/test_live_graphic_conversions.py Outdated
Comment thread tests/live/test_live_convert_xfa_to_acroforms.py Outdated
Comment thread scripts/check_test_parity.sh Outdated
Comment thread tests/live/test_live_ocr_pdf.py Outdated
- Added `_EXPECTED_FILE_FORMATS`, `_expected_file_format`, and `_assert_output_files` helper.
- Applied `_assert_output_files` in:
  - PNG success (sync + async)
  - Valid color model tests (all formats, sync + async)
  - Resolution bounds (PNG)
  - Valid smoothing tests (all formats, sync + async)
  - PNG page-range variants (sync + async)

Assisted-by: Codex
Copy link
Copy Markdown
Contributor

@datalogics-kam datalogics-kam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Tests are running.
  • I believe all the issues have been addressed.
  • If we find any other tuning of parameters or names, we can address it later.
  • #22 will improve coverage testing and clean up some coverage issues.

@datalogics-kam datalogics-kam merged commit 73a3bcf into pdfrest:main Feb 5, 2026
14 checks passed
@datalogics-cgreen datalogics-cgreen deleted the pdfcloud-5464-add-tools branch February 5, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants