Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly decode mathml bytes buffer to unicode string from mathType #10803

Merged
merged 6 commits into from Mar 2, 2020

Conversation

michaelDCurran
Copy link
Member

@michaelDCurran michaelDCurran commented Feb 19, 2020

Link to issue number:

None.

Summary of the issue:

In NVDA 2019.3 it is no longer possible to read or interact with math in Microsoft word. We fetch the Math from Microsoft Word using MathType. However, MathType returns the math as utf8 encoded bytes, not a unicode string. Yet when we then pass this to our xml handling code, an error is raised due to it not being unicode.

Description of how this pull request fixes the issue:

Decode the byte string from mathType into unicode first.

Testing performed:

Tested NVDA reading all math equations in this
Sample Expressions to Navigate.docx

Known issues with pull request:

I can't find specific documentation for mathType stating that the string is really encoded as utf8. But, since the XML (and therefore mathml) standard encoding should be utf8, I think this is a safe assumption. Of course in 2019.2.1 and below, as we did not specify an encoding, it would have assumed mbcs.

Change log entry:

Bug fixes:

  • NVDA can again read and interact with math equations in Microsoft Word.

@michaelDCurran michaelDCurran requested a review from leonardder Feb 19, 2020
@AppVeyorBot
Copy link

@AppVeyorBot AppVeyorBot commented Feb 19, 2020

See test results for failed build of commit 39daeafc7a

@michaelDCurran michaelDCurran requested a review from josephsl Feb 20, 2020
Copy link
Collaborator

@leonardder leonardder left a comment

Just only a style thing.

Could we somehow verify the utf-8 assumption by creating equations that contain unicode characters?

@@ -1,8 +1,8 @@
#appModules/winword.py
Copy link
Collaborator

@leonardder leonardder Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revisit the full header and add appropriate spaces after the hashes?

@michaelDCurran michaelDCurran merged commit 42be101 into master Mar 2, 2020
1 check passed
@michaelDCurran michaelDCurran deleted the decodeMathTypeStrings branch Mar 2, 2020
@nvaccessAuto nvaccessAuto added this to the 2020.1 milestone Mar 2, 2020
michaelDCurran added a commit that referenced this issue Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants