Skip to content

PDFCLOUD-5594 Redacted values#30

Merged
datalogics-cgreen merged 7 commits intopdfrest:mainfrom
datalogics-kam:pdfcloud-5594-redacted-values
Feb 20, 2026
Merged

PDFCLOUD-5594 Redacted values#30
datalogics-cgreen merged 7 commits intopdfrest:mainfrom
datalogics-kam:pdfcloud-5594-redacted-values

Conversation

@datalogics-kam
Copy link
Copy Markdown
Contributor

@datalogics-kam datalogics-kam commented Feb 20, 2026

PDFCLOUD-5594

Note: PR #29 must be merged first.

Why this change

Demo/free-tier API responses can include redacted values, and those values caused strict response parsing and live-test instability in this branch. The goal here is to keep demo-key responses parseable while remaining non-informative.

What changed (high level)

This PR adds targeted demo-redaction sanitization, demo-restriction warning logging, and documentation.

  • Added reusable BeforeValidator helpers for demo redactions (bool, int, file id).
  • Applied those validators to the response fields implicated by this PR’s failing tests/logs (PdfRestInfoResponse plus unzip raw file IDs).
  • Added response-body demo-restriction message detection in the shared client response path and warning logs when detected.
  • Added focused unit tests for the impacted pdf-info, unzip, and client demo-message logging paths, including warning-log assertions.
  • Documented demo-key behavior, placeholder replacements, and warning-log visibility in getting-started docs.

Workflow impact in this PR:

  • Python matrix remains 3.103.14.
  • PR-only step remains diff-cover (if: github.event_name == 'pull_request').

Behavior changes

For implicated fields only, redacted demo values are replaced with parseable placeholders and logged.

  • Most redacted booleans -> False.
  • all_queries_processed -> True.
  • Redacted integers -> 0.
  • Redacted unzip file IDs -> 00000000-0000-4000-8000-000000000000.
  • Value-replacement warning format: Demo value <val> detected in <field-name>; replaced with <replacement>.
  • Demo restriction body messages are now logged at warning level when returned (for example in message, warning, or keyMessage).

Validation

  • uv run ruff check src/pdfrest/models/_demo_value_sanitizers.py src/pdfrest/models/public.py src/pdfrest/models/_internal.py src/pdfrest/client.py tests/test_query_pdf_info.py tests/test_unzip_file.py tests/test_client.py
  • uv run pytest tests/test_query_pdf_info.py tests/test_unzip_file.py -n auto --maxschedchunk 2
  • uv run pytest tests/test_client.py -k "demo_restriction_message or non_demo_key_message" -n auto --maxschedchunk 2
  • uv run basedpyright

Risks and follow-ups

  • Risk: placeholder coercion may differ from full-fidelity values expected by some live tests under demo keys.
  • Follow-up: make live-test assertions explicitly demo-aware where needed.

Assisted-by: Codex

@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 20, 2026

Deploy Preview for pdfrest-python ready!

Name Link
🔨 Latest commit 3a5b42e
🔍 Latest deploy log https://app.netlify.com/projects/pdfrest-python/deploys/6998ca4d16faa50008c61042
😎 Deploy Preview https://deploy-preview-30--pdfrest-python.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87c843a58f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/pdfrest/models/_demo_value_sanitizers.py
Comment thread src/pdfrest/models/_demo_value_sanitizers.py
- Introduced `BeforeValidator` to process demo redacted values for various
  fields in `_internal.py` and `public.py`.
- Added `demo_value_sanitizers` module to centralize logic for detecting
  and replacing demo redacted values with appropriate defaults:
  - For boolean fields: replace with `True` or `False`.
  - For integer fields: replace with `0`.
  - For file IDs: replace with a placeholder UUID.
- Applied validators to metadata properties, ensuring consistent handling
  of demo data input.

Assisted-by: Codex
- Added `test_query_pdf_info_demo_redacted_booleans_replaced` in
  `test_query_pdf_info.py` to validate detection and replacement of demo
  redacted boolean and integer fields with appropriate values.
- Added `test_unzip_file_demo_redacted_id_replaced_and_logged` in
  `test_unzip_file.py` to verify sanitization of redacted file IDs and
  logging of replacements.
- Ensured proper log messages were emitted for all demo value replacements.

Assisted-by: Codex
- Documented how demo/free-tier keys lead to masked or redacted values in
  API responses and detailed the replacement behavior for such values.
- Added examples of fields with replacement logic and default values:
  - Boolean fields replaced with `True` or specific defaults.
  - Integer fields replaced with `0`.
  - File IDs replaced with placeholder UUIDs for consistency.
- Included a logging configuration example for tracking redacted value
  replacements in Python applications.

Assisted-by: Codex
- Added `_log_demo_restriction_messages` to log when API responses include
  demo mode restriction messages in specific fields (`message`, `warning`,
  `keyMessage`) for both sync and async clients.
- Introduced `_is_demo_restriction_message` utility to identify demo-related
  restriction messages based on known patterns.
- Updated documentation in `getting-started.md` with examples of these
  log messages and how to configure logging for monitoring.
- Created new test cases to validate the detection and logging of demo
  restriction messages:
  - Tests ensure messages are logged once even if duplicated across fields.
  - Verified async and sync clients handle and log these cases consistently.

Assisted-by: Codex
- Introduced `_is_demo_fallback_file_id` and `_build_demo_fallback_file`
  to detect and return placeholder metadata for missing demo fallback files.
- Updated `file_info` and `async_file_info` methods to return fallback data
  when a 404 error is encountered with a demo fallback file ID.
- Improved logging to notify users when fallback data is being returned.

tests: Add coverage for demo fallback file handling

- Added tests to validate sync and async handling of demo fallback files.
- Verified proper logging when returning placeholder metadata for 404 cases.

Assisted-by: Codex
- Updated `_demo_value_sanitizers.py` to enforce parseable-but-useless
  replacements for demo-redacted boolean-like strings to maintain operability
  while preserving data obscurity.
- Added comments clarifying the intent of replacements in demo mode.
- Updated AGENTS.md to document the redaction approach for demo/free-tier
  values.

Assisted-by: Codex
Copy link
Copy Markdown
Contributor

@datalogics-cgreen datalogics-cgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, with one observation

Comment thread tests/test_client.py
@datalogics-cgreen datalogics-cgreen merged commit b6c3cec into pdfrest:main Feb 20, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants