Release v0.31.0 · manykarim/robotframework-doctestlibrary

v0.31.0 Release Notes

New Features

Character Replacements for Text Normalization (#130)

Added support for custom character replacements to normalize special Unicode characters before text comparison. This solves issues where PDFs contain visually identical but technically different characters (e.g., non-breaking spaces \u00A0 vs regular spaces).

PdfTest Library:

New character_replacements initialization parameter applies to all keywords
Keyword-level character_replacements parameter for Compare Pdf Documents, Compare Pdf Structure, PDF Should Contain Strings, and PDF Should Not Contain Strings

Library    DocTest.PdfTest    character_replacements={'\u00A0': ' '}

*** Test Cases ***
Compare with normalized whitespace
    Compare Pdf Documents    ref.pdf    cand.pdf    character_replacements={'\u00A0': ' '}

VisualTest Library:

New character_replacements initialization parameter
New Set Character Replacements keyword for runtime configuration
Applied to Get Text, Get Text From Document, and Get Text From Area keywords

*** Test Cases ***
Get text with normalized characters
    Set Character Replacements    {'\u00A0': ' '}
    ${text}=    Get Text    document.pdf
    Set Character Replacements    ${NONE}

Ignore Page Boundaries in PDF Structure Comparison (#129)

Added options to compare PDF text content while ignoring page structure differences. This is useful when font or size changes cause text to reflow across pages differently.

New Parameters:

Parameter	Default	Description
`ignore_page_boundaries`	`${False}`	Flatten text across all pages and compare only content and order
`check_geometry`	`${True}`	When `${False}`, skip line position/size comparison
`check_block_count`	`${True}`	When `${False}`, skip block count validation per page

*** Test Cases ***
Compare PDFs ignoring page breaks
    Compare Pdf Structure    reference.pdf    candidate.pdf
    ...    ignore_page_boundaries=${True}

Compare content only (ignore positions)
    Compare Pdf Structure    reference.pdf    candidate.pdf
    ...    check_geometry=${False}    check_block_count=${False}

Improvements

Improved LLM prompt quality for more consistent AI-assisted comparisons
Reduced test flakiness in LLM-related tests
Added comprehensive unit tests for text normalization and PDF structure comparison
Added acceptance tests for character replacement functionality

Internal Changes

Extended StructureExtractionConfig dataclass with character_replacements field
Added apply_character_replacements() function to TextNormalization.py
Added compare_document_text_only() function to PdfStructureComparator.py
Extended StructureComparisonResult with difference counting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.31.0

Choose a tag to compare

Sorry, something went wrong.