-
Notifications
You must be signed in to change notification settings - Fork 3
PDFCLOUD-5464 Add additional pdfRest tools #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
datalogics-kam
merged 84 commits into
pdfrest:main
from
datalogics-cgreen:pdfcloud-5464-add-tools
Feb 5, 2026
Merged
Changes from all commits
Commits
Show all changes
84 commits
Select commit
Hold shift + click to select a range
6332cf9
Add Linearize PDF
datalogics-cgreen d7c815e
Add Summarize PDF
datalogics-cgreen a7de425
Add Translate PDF
datalogics-cgreen 935fda2
Add Extract Images
datalogics-cgreen 760c98f
Add Extract Text
datalogics-cgreen 1ccf51f
Add Convert to Markdown
datalogics-cgreen 2ac7e00
Add OCR PDF
datalogics-cgreen f78d6f9
Refactor OCR PDF to utilize PdfRestFileBasedResponse
datalogics-cgreen 52728ac
Remove and replace ExtractImagesResponse
datalogics-cgreen 4a35193
Split Translate PDF methods by output type
datalogics-cgreen 3104a63
client.py: Ruff format imports
datalogics-cgreen 9c3f650
Add Convert to Excel
datalogics-cgreen e615b1c
Add conversion to PowerPoint
datalogics-cgreen e9c5129
Add missing live Excel test
datalogics-cgreen 33b8ee2
Add missing live PowerPoint test
datalogics-cgreen e814bf4
Add XFA to AcroForms
datalogics-cgreen b981456
Add Flatten Transparencies
datalogics-cgreen 1120039
Add Rasterize PDF
datalogics-cgreen 971ec82
Fix problems reported by ruff
datalogics-cgreen bde50f1
Add Flatten Annotations
datalogics-cgreen 8c116dc
Remove erroneous `output_format` parameter from Markdown live test
datalogics-cgreen e6de765
Add missing `page_break_comments` parameter to Markdown conversion
datalogics-cgreen 2d7d556
Test Extract Images live with PDF with images
datalogics-cgreen ec3edca
Extract Text: Add missing body parameters
datalogics-cgreen 3e736da
Translate PDF: Fix name of destination language parameter
datalogics-cgreen 541f2ed
Translate PDF live test: Fix expected field name
datalogics-cgreen 2e8f371
Extract Text live test: Fix incorrect return field
datalogics-cgreen fc55159
Extract Text test: Remove lingering `.text`
datalogics-cgreen 1bed8c0
Convert to Markdown: Excise `output_format` completely
datalogics-cgreen c0d1750
Extract Text test: Fix expected fields
datalogics-cgreen 4316421
Translate PDF test: Fix expected translated text field
datalogics-cgreen ab2e6d8
Translate PDF test: Fix unexpected GET
datalogics-cgreen 97821c4
Split Summarize PDF method by response file type
datalogics-cgreen 6f080ec
Translate PDF: Improve response types
datalogics-cgreen ce5d126
Revise and rename `extract_text` to `extract_pdf_text_to_file`
datalogics-cgreen 629047d
PDF to Markdown: Use `PdfRestFileBasedResponse`
datalogics-cgreen d875637
Add missing async live tests
datalogics-cgreen 57868d2
Add additional assertions to live tests for new tools
datalogics-cgreen 73d94af
Add match= expressions to live tests
datalogics-cgreen 34fc14e
Fix Translate PDF test regex matches
datalogics-cgreen 906a905
Fix expected file format from Extract Text
datalogics-cgreen 4050ab0
Convert to PNG live tests: Fix expected error
datalogics-cgreen 76d5f88
Redact PDF live test: Fix expected error message
datalogics-cgreen 34ed3f1
Convert XFA: Allow `warning` in response
datalogics-cgreen ff00e22
Add Convert to PDF/A (Archive PDF) and tests
datalogics-cgreen d5a8f13
Update src/pdfrest/models/public.py
datalogics-cgreen aa107cc
Remove unused `ConvertToMarkdownResponse` class
datalogics-cgreen 4f719a9
Remove unused fields from Summarize PDF response
datalogics-cgreen 242a36c
Translate PDF: Remove unused response fields
datalogics-cgreen 38e209c
Extract Text response: Remove unused fields
datalogics-cgreen 81345e0
Remove unused `ExtractTextResponse`
datalogics-cgreen 39d9836
Remove "pdf" from Summarize method names
datalogics-cgreen 24b9154
models: Add `_bool_to_on_off` converter for boolean fields
datalogics-kam 9ff3a86
Replace on/off in external interface with `bool`
datalogics-cgreen 9c99f33
Set default values (from pdfRest) on optional client parameters
datalogics-cgreen 47953f5
Create types for remaining Literal arguments in clients
datalogics-cgreen cbc0ff8
OCR PDF: Fix misleading method descriptions
datalogics-cgreen 2ab5abc
OCR PDF: Add missing `languages` body parameter
datalogics-cgreen a090f18
Translate PDF: Verify language code
datalogics-cgreen d57d39d
Update JPEG test expectations regarding default values
datalogics-cgreen ce5e2de
Adjust PDF/A tests to expect `rasterize_if_errors_encountered` default
datalogics-cgreen 4ba0344
scripts: Add `check_test_parity.sh` to verify sync-async test coverage
datalogics-kam a8a1551
docs: Add `check_test_parity.sh` usage across documentation
datalogics-kam 87932e0
Set image `smoothing` to `"none"` by default, per pdfRest
datalogics-cgreen e003088
Modify test parity check to be runnable on Mac
datalogics-cgreen 432ccfa
Add missing unit tests
datalogics-cgreen c3a1c1f
Fill in missing live tests of conversion methods
datalogics-cgreen ff5a496
Live flatten/linearize/rasterize: add async variants, missing sync tests
datalogics-cgreen d4a9999
Resolve disparities in Query PDF live tests
datalogics-cgreen 8b5caaf
Round out async test coverage in live graphic tests
datalogics-cgreen d5f602e
Live split/merge/summarize/translate: add async variants, missing opt…
datalogics-cgreen 1d9ecbb
Markdown/powerpoint: Add missing `async` tests
datalogics-cgreen adc6aec
Extract images/text: Fix `async` test parity
datalogics-cgreen a8ce096
Flatten annots/transparencies/linearize/rasterize: Fix `async` test p…
datalogics-cgreen bd0bb8c
OCR/summarize: Fix `async` test parity
datalogics-cgreen f204c7a
Translate: Fix `async` test parity
datalogics-cgreen e8946dd
Address test failures in Rasterize unit test and live PDF test
datalogics-cgreen 559c82b
Query PDF: Fix naming of `async` tests
datalogics-cgreen f6bb554
Address remaining smattering of synchronous/`async` parity in tests
datalogics-cgreen 6af6db5
Handle pytest progress suffix when parsing node IDs
datalogics-cgreen 3ea9dd7
Filter deleted tests before invoking pytest
datalogics-cgreen 95ba010
Live graphic tests: Add additional assertions on output
datalogics-cgreen 23d3c8f
Convert forms live test: Use XFA input
datalogics-cgreen 3e7d369
OCR live test: Use rasterized input
datalogics-cgreen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,164 @@ | ||
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
| IFS=$'\n\t' | ||
|
|
||
| base_ref="${1:-upstream/main}" | ||
| head_ref="${2:-HEAD}" | ||
|
|
||
| if ! git rev-parse --verify "$base_ref" > /dev/null 2>&1; then | ||
| echo "Base ref '$base_ref' not found." >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| if ! git rev-parse --verify "$head_ref" > /dev/null 2>&1; then | ||
| echo "Head ref '$head_ref' not found." >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| test_files=() | ||
| while IFS= read -r file; do | ||
| if [[ -n "$file" ]]; then | ||
| test_files+=("$file") | ||
| fi | ||
| done < <( | ||
| git diff --name-only --diff-filter=d "$base_ref..$head_ref" -- tests | grep -E '\.py$' || true | ||
| ) | ||
|
|
||
| if [[ ${#test_files[@]} -eq 0 ]]; then | ||
| echo "No changed test files under tests/ for $base_ref..$head_ref." | ||
| exit 0 | ||
| fi | ||
|
|
||
| tmp_output="$(mktemp)" | ||
| tmp_tests="$(mktemp)" | ||
| tmp_counts="$(mktemp)" | ||
| tmp_missing_sync="$(mktemp)" | ||
| tmp_missing_async="$(mktemp)" | ||
| tmp_payload="$(mktemp)" | ||
| trap 'rm -f "$tmp_output" "$tmp_tests" "$tmp_counts" "$tmp_missing_sync" "$tmp_missing_async" "$tmp_payload"' EXIT | ||
|
|
||
| echo "Running pytest on changed tests:" | ||
| printf ' - %s\n' "${test_files[@]}" | ||
|
|
||
| uv run pytest -vv -rA -n auto "${test_files[@]}" | tee "$tmp_output" | ||
|
|
||
| awk ' | ||
| { | ||
| line = $0; | ||
| sub(/^\[[^]]+\][[:space:]]+/, "", line); | ||
| sub(/[[:space:]]+\[[^]]+\]$/, "", line); | ||
| if (line ~ /^(PASSED|FAILED|SKIPPED|XFAIL|XPASS|ERROR)[[:space:]]+tests\/.*::/) { | ||
| sub(/^(PASSED|FAILED|SKIPPED|XFAIL|XPASS|ERROR)[[:space:]]+/, "", line); | ||
| print line; | ||
| } else if (line ~ /^tests\/.*::.*[[:space:]]+(PASSED|FAILED|SKIPPED|XFAIL|XPASS|ERROR)$/) { | ||
| sub(/[[:space:]]+(PASSED|FAILED|SKIPPED|XFAIL|XPASS|ERROR)$/, "", line); | ||
| print line; | ||
| } | ||
|
datalogics-kam marked this conversation as resolved.
|
||
| } | ||
| ' "$tmp_output" > "$tmp_tests" | ||
|
|
||
| if [[ ! -s "$tmp_tests" ]]; then | ||
| echo "No test node IDs detected in pytest output; try rerunning with -vv." >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| awk -v sync_file="$tmp_missing_sync" \ | ||
| -v async_file="$tmp_missing_async" \ | ||
| -v payload_file="$tmp_payload" \ | ||
| -v counts_file="$tmp_counts" ' | ||
| function is_async(nodeid) { | ||
| return (nodeid ~ /::test_.*async_/); | ||
| } | ||
| function normalize(nodeid) { | ||
| sub(/::test_live_async_/, "::test_live_", nodeid); | ||
| sub(/::test_async_/, "::test_", nodeid); | ||
| return nodeid; | ||
| } | ||
| { | ||
| total++; | ||
| if ($0 ~ /::test_.*(payload|validation)/) { | ||
| payload_like[$0] = 1; | ||
| } | ||
| if (is_async($0)) { | ||
| async_count++; | ||
| norm = normalize($0); | ||
| async_norm[norm] = 1; | ||
| async_orig[norm] = $0; | ||
| } else { | ||
| sync_count++; | ||
| norm = normalize($0); | ||
| sync_norm[norm] = 1; | ||
| sync_orig[norm] = $0; | ||
| } | ||
| } | ||
| END { | ||
| missing_sync = 0; | ||
| missing_async = 0; | ||
|
|
||
| for (n in async_norm) { | ||
| if (!(n in sync_norm)) { | ||
| missing_sync++; | ||
| print async_orig[n] >> sync_file; | ||
| } | ||
| } | ||
| for (n in sync_norm) { | ||
| if (!(n in async_norm)) { | ||
| missing_async++; | ||
| print sync_orig[n] >> async_file; | ||
| } | ||
| } | ||
| payload_count = 0; | ||
| for (t in payload_like) { | ||
| payload_count++; | ||
| print t >> payload_file; | ||
| } | ||
|
|
||
| print "total=" total > counts_file; | ||
| print "sync_count=" sync_count >> counts_file; | ||
| print "async_count=" async_count >> counts_file; | ||
| print "missing_sync=" missing_sync >> counts_file; | ||
| print "missing_async=" missing_async >> counts_file; | ||
| print "payload_count=" payload_count >> counts_file; | ||
| } | ||
| ' "$tmp_tests" | ||
|
|
||
| total=0 | ||
| sync_count=0 | ||
| async_count=0 | ||
| missing_sync=0 | ||
| missing_async=0 | ||
| payload_count=0 | ||
| while IFS='=' read -r key value; do | ||
| case "$key" in | ||
| total) total="$value" ;; | ||
| sync_count) sync_count="$value" ;; | ||
| async_count) async_count="$value" ;; | ||
| missing_sync) missing_sync="$value" ;; | ||
| missing_async) missing_async="$value" ;; | ||
| payload_count) payload_count="$value" ;; | ||
| esac | ||
| done < "$tmp_counts" | ||
|
|
||
| echo "" | ||
| echo "Test parity report" | ||
| echo "Total tests: $total" | ||
| echo "Sync tests: $sync_count" | ||
| echo "Async tests: $async_count" | ||
| echo "Missing sync counterparts: $missing_sync" | ||
| if [[ "$missing_sync" -gt 0 ]]; then | ||
| sort "$tmp_missing_sync" | while read -r line; do | ||
| echo " - $line" | ||
| done | ||
| fi | ||
| echo "Missing async counterparts: $missing_async" | ||
| if [[ "$missing_async" -gt 0 ]]; then | ||
| sort "$tmp_missing_async" | while read -r line; do | ||
| echo " - $line" | ||
| done | ||
| fi | ||
| echo "Payload/validation-style tests (name contains payload/validation): $payload_count" | ||
| if [[ "$payload_count" -gt 0 ]]; then | ||
| sort "$tmp_payload" | while read -r line; do | ||
| echo " - $line" | ||
| done | ||
| fi | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.