Skip to content

Conversation

@NSoiffer
Copy link
Contributor

@NSoiffer NSoiffer commented Jul 21, 2025

Link to issue number:

Fixes #18511

Summary of the issue:

If the structure tree is used for MathML in PDF, calls to getValue return the interpreted &gt; so that it becomes < and we have an HTML syntax error.

Description of user facing changes:

The bug is fixed.

Description of developer facing changes:

None.

Description of development approach:

Called html.escape() for the result of the getValue (when it was not "none")

Testing strategy:

I used the pdf doc that was part of the issue and tested to make sure I saw the error before the fix and that there was no error after the fix.b

Known issues with pull request:

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • x] API is compatible with existing add-ons.
  • Security precautions taken.

@coderabbitai summary

NSoiffer added 4 commits July 19, 2025 14:32
…ce`. It isn't used in speech (although maybe it should trigger a pause if wide), but it is used in some braille notations as a signal that this is a "fill in the blank" space.

I also added the elementary math attributes used in MathPlayer. Neither MathCAT nor Access8Math currently support the elementary math notations, but it is on the list of things to implement for MathCAT.

Note: potentially this could go into the beta, but I can't get the beta to build on my machine so I can't test the fix there.
@NSoiffer NSoiffer requested a review from a team as a code owner July 21, 2025 04:57
@NSoiffer NSoiffer requested a review from seanbudd July 21, 2025 04:57
@seanbudd
Copy link
Member

This seems to have code changes from #18508

Could you clarify if that PR should be closed in favour of this one? There will be merge conflicts when we squash merge these.

@seanbudd seanbudd changed the title Bug 18511 -- unescaped "<" in MathML in PDF Fix unescaped "<" in MathML in PDF Jul 22, 2025
@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Jul 22, 2025
@NSoiffer
Copy link
Contributor Author

My apologies. I think I forgot to checkout master before creating the new branch. They are separate bugs, but both deal with PDF math and are in proximity to each other. Hence, they are slightly related. Please do whatever is simplest.

@seanbudd
Copy link
Member

@NSoiffer - it would probably be better to remove the duplication to make it easier to track and revert the separate changes

@seanbudd seanbudd marked this pull request as draft July 31, 2025 05:55
@seanbudd seanbudd added this to the 2025.3 milestone Jul 31, 2025
@seanbudd seanbudd removed this from the 2025.3 milestone Aug 13, 2025
@seanbudd seanbudd marked this pull request as ready for review September 17, 2025 00:03
Copilot AI review requested due to automatic review settings September 17, 2025 00:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an HTML syntax error in MathML PDF content where unescaped "<" characters were causing parsing issues. The fix ensures proper HTML escaping of values retrieved from PDF structure trees when generating MathML content.

  • Added HTML escaping to node values in MathML generation
  • Updated parameter passing for html.escape() to use explicit keyword argument

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
user_docs/en/changes.md Added changelog entry documenting the MathML escaping fix
source/NVDAObjects/IAccessible/adobeAcrobat.py Applied HTML escaping to PDF node values and updated parameter syntax

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@seanbudd seanbudd enabled auto-merge (squash) September 17, 2025 00:04
@seanbudd seanbudd merged commit db11e28 into nvaccess:master Sep 17, 2025
40 checks passed
@github-actions github-actions bot added this to the 2026.1 milestone Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Text content is not correctly linearised as XML for MathML reading

2 participants