-
Notifications
You must be signed in to change notification settings - Fork 3
PDFCLOUD-5594 Redacted values #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
datalogics-cgreen
merged 7 commits into
pdfrest:main
from
datalogics-kam:pdfcloud-5594-redacted-values
Feb 20, 2026
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
5dacbca
models: Add validators to handle demo redacted values in PDF metadata
datalogics-kam ae24ce6
tests: Add tests for sanitizing and replacing demo redacted values
datalogics-kam f79108b
docs: Add guidance on handling demo redacted values in API responses
datalogics-kam c12ddf4
client: Log and test demo restriction messages in API responses
datalogics-kam 8cb1536
client: Add demo fallback handling for 404 file-info lookups
datalogics-kam da22e3e
models: Clamp demo redacted values to parseable constants
datalogics-kam 3a5b42e
pyproject: Update to 1.0.1 for demo bug fixes
datalogics-kam File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
| import re | ||
| from typing import Any | ||
|
|
||
| from pydantic import ValidationInfo | ||
|
|
||
| LOGGER = logging.getLogger("pdfrest.models") | ||
|
|
||
| _DEMO_UUID = "00000000-0000-4000-8000-000000000000" | ||
| _REDACTED_X_PATTERN = re.compile(r"^[Xx-]{8,}$") | ||
|
|
||
|
|
||
| def _field_name(info: ValidationInfo) -> str: | ||
| return info.field_name or "<unknown>" | ||
|
|
||
|
|
||
| def _looks_like_demo_redaction(value: Any) -> bool: | ||
| if not isinstance(value, str): | ||
| return False | ||
| if _looks_like_generate_redacted_string(value): | ||
| return True | ||
| return bool(_REDACTED_X_PATTERN.fullmatch(value)) | ||
|
|
||
|
|
||
| def _looks_like_generate_redacted_string(value: str) -> bool: | ||
| """Detect strings redacted by PDFCloud-API generateRedactedString. | ||
|
|
||
| The upstream redactor preserves the first two characters and replaces all | ||
| non-whitespace characters after that with '*'. | ||
| """ | ||
| if len(value) < 3: | ||
| return False | ||
| tail = value[2:] | ||
| if "*" not in tail: | ||
| return False | ||
| return all(char == "*" or char.isspace() for char in tail) | ||
|
|
||
|
|
||
| def _log_replacement(original: Any, replacement: Any, info: ValidationInfo) -> None: | ||
| LOGGER.warning( | ||
| "Demo value %s detected in %s; replaced with %s", | ||
| original, | ||
| _field_name(info), | ||
| replacement, | ||
| ) | ||
|
|
||
|
|
||
| def _demo_bool_or_passthrough( | ||
| value: Any, info: ValidationInfo, *, replacement: bool | ||
| ) -> Any: | ||
| if value is None or isinstance(value, bool): | ||
| return value | ||
| if _looks_like_demo_redaction(value): | ||
| # Intentionally clamp demo-redacted bool-like strings to a configured | ||
| # constant. The goal is parseability without restoring potentially | ||
| # meaningful signal that demo mode is designed to obscure. | ||
| _log_replacement(value, replacement, info) | ||
| return replacement | ||
| return value | ||
|
|
||
|
|
||
| def demo_bool_false_or_passthrough(value: Any, info: ValidationInfo) -> Any: | ||
| return _demo_bool_or_passthrough(value, info, replacement=False) | ||
|
|
||
|
|
||
| def demo_bool_true_or_passthrough(value: Any, info: ValidationInfo) -> Any: | ||
| return _demo_bool_or_passthrough(value, info, replacement=True) | ||
|
datalogics-kam marked this conversation as resolved.
|
||
|
|
||
|
|
||
| def demo_file_id_or_passthrough(value: Any, info: ValidationInfo) -> Any: | ||
| if value is None: | ||
| return value | ||
| if _looks_like_demo_redaction(value): | ||
| replacement = _DEMO_UUID | ||
| _log_replacement(value, replacement, info) | ||
| return replacement | ||
| return value | ||
|
|
||
|
|
||
| def demo_int_or_passthrough(value: Any, info: ValidationInfo) -> Any: | ||
| if value is None or isinstance(value, int): | ||
| return value | ||
| if _looks_like_demo_redaction(value): | ||
| replacement = 0 | ||
| _log_replacement(value, replacement, info) | ||
| return replacement | ||
| return value | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.