Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
6332cf9
Add Linearize PDF
datalogics-cgreen Dec 18, 2025
d7c815e
Add Summarize PDF
datalogics-cgreen Dec 18, 2025
a7de425
Add Translate PDF
datalogics-cgreen Dec 18, 2025
935fda2
Add Extract Images
datalogics-cgreen Dec 18, 2025
760c98f
Add Extract Text
datalogics-cgreen Dec 18, 2025
1ccf51f
Add Convert to Markdown
datalogics-cgreen Dec 18, 2025
2ac7e00
Add OCR PDF
datalogics-cgreen Dec 18, 2025
f78d6f9
Refactor OCR PDF to utilize PdfRestFileBasedResponse
datalogics-cgreen Dec 19, 2025
52728ac
Remove and replace ExtractImagesResponse
datalogics-cgreen Dec 19, 2025
4a35193
Split Translate PDF methods by output type
datalogics-cgreen Dec 19, 2025
3104a63
client.py: Ruff format imports
datalogics-cgreen Dec 19, 2025
9c3f650
Add Convert to Excel
datalogics-cgreen Dec 19, 2025
e615b1c
Add conversion to PowerPoint
datalogics-cgreen Dec 19, 2025
e9c5129
Add missing live Excel test
datalogics-cgreen Dec 19, 2025
33b8ee2
Add missing live PowerPoint test
datalogics-cgreen Dec 19, 2025
e814bf4
Add XFA to AcroForms
datalogics-cgreen Dec 19, 2025
b981456
Add Flatten Transparencies
datalogics-cgreen Dec 19, 2025
1120039
Add Rasterize PDF
datalogics-cgreen Dec 19, 2025
971ec82
Fix problems reported by ruff
datalogics-cgreen Dec 19, 2025
bde50f1
Add Flatten Annotations
datalogics-cgreen Dec 19, 2025
8c116dc
Remove erroneous `output_format` parameter from Markdown live test
datalogics-cgreen Dec 19, 2025
e6de765
Add missing `page_break_comments` parameter to Markdown conversion
datalogics-cgreen Dec 19, 2025
2d7d556
Test Extract Images live with PDF with images
datalogics-cgreen Dec 19, 2025
ec3edca
Extract Text: Add missing body parameters
datalogics-cgreen Dec 19, 2025
3e736da
Translate PDF: Fix name of destination language parameter
datalogics-cgreen Dec 19, 2025
541f2ed
Translate PDF live test: Fix expected field name
datalogics-cgreen Dec 19, 2025
2e8f371
Extract Text live test: Fix incorrect return field
datalogics-cgreen Dec 19, 2025
fc55159
Extract Text test: Remove lingering `.text`
datalogics-cgreen Dec 20, 2025
1bed8c0
Convert to Markdown: Excise `output_format` completely
datalogics-cgreen Dec 20, 2025
c0d1750
Extract Text test: Fix expected fields
datalogics-cgreen Dec 20, 2025
4316421
Translate PDF test: Fix expected translated text field
datalogics-cgreen Dec 20, 2025
ab2e6d8
Translate PDF test: Fix unexpected GET
datalogics-cgreen Dec 20, 2025
97821c4
Split Summarize PDF method by response file type
datalogics-cgreen Jan 6, 2026
6f080ec
Translate PDF: Improve response types
datalogics-cgreen Jan 6, 2026
ce5d126
Revise and rename `extract_text` to `extract_pdf_text_to_file`
datalogics-cgreen Jan 6, 2026
629047d
PDF to Markdown: Use `PdfRestFileBasedResponse`
datalogics-cgreen Jan 7, 2026
d875637
Add missing async live tests
datalogics-cgreen Jan 7, 2026
57868d2
Add additional assertions to live tests for new tools
datalogics-cgreen Jan 7, 2026
73d94af
Add match= expressions to live tests
datalogics-cgreen Jan 7, 2026
34fc14e
Fix Translate PDF test regex matches
datalogics-cgreen Jan 7, 2026
906a905
Fix expected file format from Extract Text
datalogics-cgreen Jan 7, 2026
4050ab0
Convert to PNG live tests: Fix expected error
datalogics-cgreen Jan 7, 2026
76d5f88
Redact PDF live test: Fix expected error message
datalogics-cgreen Jan 7, 2026
34ed3f1
Convert XFA: Allow `warning` in response
datalogics-cgreen Jan 7, 2026
ff00e22
Add Convert to PDF/A (Archive PDF) and tests
datalogics-cgreen Jan 7, 2026
d5a8f13
Update src/pdfrest/models/public.py
datalogics-cgreen Jan 8, 2026
aa107cc
Remove unused `ConvertToMarkdownResponse` class
datalogics-cgreen Jan 8, 2026
4f719a9
Remove unused fields from Summarize PDF response
datalogics-cgreen Jan 8, 2026
242a36c
Translate PDF: Remove unused response fields
datalogics-cgreen Jan 8, 2026
38e209c
Extract Text response: Remove unused fields
datalogics-cgreen Jan 8, 2026
81345e0
Remove unused `ExtractTextResponse`
datalogics-cgreen Jan 9, 2026
39d9836
Remove "pdf" from Summarize method names
datalogics-cgreen Jan 9, 2026
24b9154
models: Add `_bool_to_on_off` converter for boolean fields
datalogics-kam Jan 9, 2026
9ff3a86
Replace on/off in external interface with `bool`
datalogics-cgreen Jan 9, 2026
9c99f33
Set default values (from pdfRest) on optional client parameters
datalogics-cgreen Jan 12, 2026
47953f5
Create types for remaining Literal arguments in clients
datalogics-cgreen Jan 12, 2026
cbc0ff8
OCR PDF: Fix misleading method descriptions
datalogics-cgreen Jan 12, 2026
2ab5abc
OCR PDF: Add missing `languages` body parameter
datalogics-cgreen Jan 12, 2026
a090f18
Translate PDF: Verify language code
datalogics-cgreen Jan 13, 2026
d57d39d
Update JPEG test expectations regarding default values
datalogics-cgreen Jan 13, 2026
ce5e2de
Adjust PDF/A tests to expect `rasterize_if_errors_encountered` default
datalogics-cgreen Jan 13, 2026
4ba0344
scripts: Add `check_test_parity.sh` to verify sync-async test coverage
datalogics-kam Jan 14, 2026
a8a1551
docs: Add `check_test_parity.sh` usage across documentation
datalogics-kam Jan 14, 2026
87932e0
Set image `smoothing` to `"none"` by default, per pdfRest
datalogics-cgreen Jan 14, 2026
e003088
Modify test parity check to be runnable on Mac
datalogics-cgreen Jan 14, 2026
432ccfa
Add missing unit tests
datalogics-cgreen Jan 15, 2026
c3a1c1f
Fill in missing live tests of conversion methods
datalogics-cgreen Jan 15, 2026
ff5a496
Live flatten/linearize/rasterize: add async variants, missing sync tests
datalogics-cgreen Jan 15, 2026
d4a9999
Resolve disparities in Query PDF live tests
datalogics-cgreen Jan 15, 2026
8b5caaf
Round out async test coverage in live graphic tests
datalogics-cgreen Jan 15, 2026
d5f602e
Live split/merge/summarize/translate: add async variants, missing opt…
datalogics-cgreen Jan 15, 2026
1d9ecbb
Markdown/powerpoint: Add missing `async` tests
datalogics-cgreen Jan 15, 2026
adc6aec
Extract images/text: Fix `async` test parity
datalogics-cgreen Jan 15, 2026
a8ce096
Flatten annots/transparencies/linearize/rasterize: Fix `async` test p…
datalogics-cgreen Jan 15, 2026
bd0bb8c
OCR/summarize: Fix `async` test parity
datalogics-cgreen Jan 15, 2026
f204c7a
Translate: Fix `async` test parity
datalogics-cgreen Jan 15, 2026
e8946dd
Address test failures in Rasterize unit test and live PDF test
datalogics-cgreen Jan 15, 2026
559c82b
Query PDF: Fix naming of `async` tests
datalogics-cgreen Jan 15, 2026
f6bb554
Address remaining smattering of synchronous/`async` parity in tests
datalogics-cgreen Jan 15, 2026
6af6db5
Handle pytest progress suffix when parsing node IDs
datalogics-cgreen Feb 5, 2026
3ea9dd7
Filter deleted tests before invoking pytest
datalogics-cgreen Feb 5, 2026
95ba010
Live graphic tests: Add additional assertions on output
datalogics-cgreen Feb 5, 2026
23d3c8f
Convert forms live test: Use XFA input
datalogics-cgreen Feb 5, 2026
3e7d369
OCR live test: Use rasterized input
datalogics-cgreen Feb 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
- `uv run pre-commit run --all-files` — enforce formatting and lint rules before
pushing.
- `uv run pytest` — execute the suite with the active interpreter.
- `scripts/check_test_parity.sh` — run changed tests and report sync/async
parity gaps (accepts optional base/head refs, defaults to
`upstream/main..HEAD`).
- `uv build` — produce wheels and sdists identical to the release workflow.
- `uvx nox -s tests` — create matrix virtualenvs via nox and execute the pytest
session.
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,9 @@ Run the test suite with:
```bash
uv run pytest
```

Check sync/async parity for changed tests (defaults to `upstream/main..HEAD`):

```bash
scripts/check_test_parity.sh
```
3 changes: 3 additions & 0 deletions TESTING_GUIDELINES.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ iteration required.
request customization, validation failures, file helpers, and live calls. Do
not hide the transport behind a parameter; the test name itself should reveal
which client is under test.
- **Check parity regularly.** Run `scripts/check_test_parity.sh` (defaults to
`upstream/main..HEAD`) to spot missing sync/async counterparts, keeping
parameterized test IDs aligned between transports.
- **Exercise both sides of the contract.** Hermetic tests (via
`httpx.MockTransport`) validate serialization and local validation. Live
suites prove the server behaves the same way, including invalid literal
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ requires-python = ">=3.10"
dependencies = [
"exceptiongroup>=1.3.0",
"httpx>=0.28.1",
"langcodes>=3.4.0",
"pydantic>=2.12.0",
]

Expand Down
164 changes: 164 additions & 0 deletions scripts/check_test_parity.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

base_ref="${1:-upstream/main}"
head_ref="${2:-HEAD}"

if ! git rev-parse --verify "$base_ref" > /dev/null 2>&1; then
echo "Base ref '$base_ref' not found." >&2
exit 1
fi

if ! git rev-parse --verify "$head_ref" > /dev/null 2>&1; then
echo "Head ref '$head_ref' not found." >&2
exit 1
fi

test_files=()
Comment thread
datalogics-kam marked this conversation as resolved.
while IFS= read -r file; do
if [[ -n "$file" ]]; then
test_files+=("$file")
fi
done < <(
git diff --name-only --diff-filter=d "$base_ref..$head_ref" -- tests | grep -E '\.py$' || true
)

if [[ ${#test_files[@]} -eq 0 ]]; then
echo "No changed test files under tests/ for $base_ref..$head_ref."
exit 0
fi

tmp_output="$(mktemp)"
tmp_tests="$(mktemp)"
tmp_counts="$(mktemp)"
tmp_missing_sync="$(mktemp)"
tmp_missing_async="$(mktemp)"
tmp_payload="$(mktemp)"
trap 'rm -f "$tmp_output" "$tmp_tests" "$tmp_counts" "$tmp_missing_sync" "$tmp_missing_async" "$tmp_payload"' EXIT

echo "Running pytest on changed tests:"
printf ' - %s\n' "${test_files[@]}"

uv run pytest -vv -rA -n auto "${test_files[@]}" | tee "$tmp_output"

awk '
{
line = $0;
sub(/^\[[^]]+\][[:space:]]+/, "", line);
sub(/[[:space:]]+\[[^]]+\]$/, "", line);
if (line ~ /^(PASSED|FAILED|SKIPPED|XFAIL|XPASS|ERROR)[[:space:]]+tests\/.*::/) {
sub(/^(PASSED|FAILED|SKIPPED|XFAIL|XPASS|ERROR)[[:space:]]+/, "", line);
print line;
} else if (line ~ /^tests\/.*::.*[[:space:]]+(PASSED|FAILED|SKIPPED|XFAIL|XPASS|ERROR)$/) {
sub(/[[:space:]]+(PASSED|FAILED|SKIPPED|XFAIL|XPASS|ERROR)$/, "", line);
print line;
}
Comment thread
datalogics-kam marked this conversation as resolved.
}
' "$tmp_output" > "$tmp_tests"

if [[ ! -s "$tmp_tests" ]]; then
echo "No test node IDs detected in pytest output; try rerunning with -vv." >&2
exit 1
fi

awk -v sync_file="$tmp_missing_sync" \
-v async_file="$tmp_missing_async" \
-v payload_file="$tmp_payload" \
-v counts_file="$tmp_counts" '
function is_async(nodeid) {
return (nodeid ~ /::test_.*async_/);
}
function normalize(nodeid) {
sub(/::test_live_async_/, "::test_live_", nodeid);
sub(/::test_async_/, "::test_", nodeid);
return nodeid;
}
{
total++;
if ($0 ~ /::test_.*(payload|validation)/) {
payload_like[$0] = 1;
}
if (is_async($0)) {
async_count++;
norm = normalize($0);
async_norm[norm] = 1;
async_orig[norm] = $0;
} else {
sync_count++;
norm = normalize($0);
sync_norm[norm] = 1;
sync_orig[norm] = $0;
}
}
END {
missing_sync = 0;
missing_async = 0;

for (n in async_norm) {
if (!(n in sync_norm)) {
missing_sync++;
print async_orig[n] >> sync_file;
}
}
for (n in sync_norm) {
if (!(n in async_norm)) {
missing_async++;
print sync_orig[n] >> async_file;
}
}
payload_count = 0;
for (t in payload_like) {
payload_count++;
print t >> payload_file;
}

print "total=" total > counts_file;
print "sync_count=" sync_count >> counts_file;
print "async_count=" async_count >> counts_file;
print "missing_sync=" missing_sync >> counts_file;
print "missing_async=" missing_async >> counts_file;
print "payload_count=" payload_count >> counts_file;
}
' "$tmp_tests"

total=0
sync_count=0
async_count=0
missing_sync=0
missing_async=0
payload_count=0
while IFS='=' read -r key value; do
case "$key" in
total) total="$value" ;;
sync_count) sync_count="$value" ;;
async_count) async_count="$value" ;;
missing_sync) missing_sync="$value" ;;
missing_async) missing_async="$value" ;;
payload_count) payload_count="$value" ;;
esac
done < "$tmp_counts"

echo ""
echo "Test parity report"
echo "Total tests: $total"
echo "Sync tests: $sync_count"
echo "Async tests: $async_count"
echo "Missing sync counterparts: $missing_sync"
if [[ "$missing_sync" -gt 0 ]]; then
sort "$tmp_missing_sync" | while read -r line; do
echo " - $line"
done
fi
echo "Missing async counterparts: $missing_async"
if [[ "$missing_async" -gt 0 ]]; then
sort "$tmp_missing_async" | while read -r line; do
echo " - $line"
done
fi
echo "Payload/validation-style tests (name contains payload/validation): $payload_count"
if [[ "$payload_count" -gt 0 ]]; then
sort "$tmp_payload" | while read -r line; do
echo " - $line"
done
fi
Loading