Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Shared] Fix Markdown parse issue #4481

Merged
merged 9 commits into from
Jul 31, 2020
Merged

[Shared] Fix Markdown parse issue #4481

merged 9 commits into from
Jul 31, 2020

Conversation

golddove
Copy link
Member

@golddove golddove commented Jul 23, 2020

Related Issue

Fixes #4463

Description

Markdown emphasis parsing for was broken on Android devices.

How Verified

Manual testing

Microsoft Reviewers: Open in CodeFlow

@@ -263,7 +263,7 @@ bool EmphasisParser::IsLeftEmphasisDelimiter(const char ch) const

bool EmphasisParser::IsRightEmphasisDelimiter(const char ch) const
{
if ((ch == EOF || MarkDownBlockParser::IsSpace(ch)) && (m_lookBehind != DelimiterType::WhiteSpace) &&
if ((ch == EOF || ch == '\xff' || MarkDownBlockParser::IsSpace(ch)) && (m_lookBehind != DelimiterType::WhiteSpace) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed for Android? and why is it needed now?

@shalinijoshi19
Copy link
Member

This looks to be a gap in our testing; Can we add a test card for this at the very least with this change? Is it possible to cover this with existing sharedmodel unit tests? @RebeccaAnne

@@ -263,7 +263,7 @@ bool EmphasisParser::IsLeftEmphasisDelimiter(const char ch) const

bool EmphasisParser::IsRightEmphasisDelimiter(const char ch) const
{
if ((ch == EOF || MarkDownBlockParser::IsSpace(ch)) && (m_lookBehind != DelimiterType::WhiteSpace) &&
if ((ch == EOF || ch == '\xff' || MarkDownBlockParser::IsSpace(ch)) && (m_lookBehind != DelimiterType::WhiteSpace) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now a complicated enough condition to warrant a quick explanatory code comment pls!

@shalinijoshi19
Copy link
Member

void MarkDownBlockParser::ParseBlock(std::stringstream& stream)

The Markdown parser looks like it should have unit tests for this? @RebeccaAnne, does it?


Refers to: source/shared/cpp/ObjectModel/MarkDownBlockParser.cpp:10 in f6f5cac. [](commit_id = f6f5cac, deletion_comment = False)

@almedina-ms
Copy link
Contributor

This looks to be a gap in our testing; Can we add a test card for this at the very least with this change? Is it possible to cover this with existing sharedmodel unit tests? @RebeccaAnne

This is not a gap in testing, this is an issue about the device it's been deployed, in emulators everything works as expected while in devices it doesn't. As of this moment there are tests in the c++ side that verify that everything works as expected, we could add tests on Android but the tests would still yield "false positives" on emulators

@golddove
Copy link
Member Author

Yeah, the unit tests didn't catch it because it is non-deterministic (it happens on some devices, based on the c++ implementation).

@jwoo-msft The bug isn't actually new, it seems to have been introduced as part of #3342 last year. But we recently brought that change from main to release with #4309. Hence, Teams is now seeing it.

As for why it works on some platforms:

C++ standard doesn't define whether 'char' is signed or unsigned. It's up to the compiler. The compiler usually makes this decision based on what the target device architecture prefers.

We can verify this by looking at CHAR_MIN (which is 0 if char is unsigned char, and -128 if its signed). This is what I see if I debug in the emulator:
image
And when I debug on my device:
image

The issue occurs if the compiler decided 'char' is an unsigned char. So EOF (which is -1) assigned to unsigned char (so it becomes 255), no longer equals EOF (which is still -1)... more details here.

This was just a draft PR, but I think the proper solution is to revert ch to int (unless there was a specific reason for that change?), or perhaps switching char to explicit signed char would work (but I think we would lose access to special characters if we do that? I'm not sure..). Thoughts? @paulcam206

@jwoo-msft
Copy link
Member

jwoo-msft commented Jul 24, 2020

Yeah, the unit tests didn't catch it because it is non-deterministic (it happens on some devices, based on the c++ implementation).

@jwoo-msft The bug isn't actually new, it seems to have been introduced as part of #3342 last year. But we recently brought that change from main to release with #4309. Hence, Teams is now seeing it.

As for why it works on some platforms:

C++ standard doesn't define whether 'char' is signed or unsigned. It's up to the compiler. The compiler usually makes this decision based on what the target device architecture prefers.

We can verify this by looking at CHAR_MIN (which is 0 if char is unsigned char, and -128 if its signed). This is what I see if I debug in the emulator:
image
And when I debug on my device:
image

The issue occurs if the compiler decided 'char' is an unsigned char. So EOF (which is -1) assigned to unsigned char (so it becomes 255), no longer equals EOF (which is still -1)... more details here.

This was just a draft PR, but I think the proper solution is to revert ch to int (unless there was a specific reason for that change?), or perhaps switching char to explicit signed char would work (but I think we would lose access to special characters if we do that? I'm not sure..). Thoughts? @paulcam206

I see. Yes, #3342 added support for non-Latin language support, and as I added new markdown list marker support '+', and '*', I noticed that #3342 was not picked up, and it's an important fix, so I added it.
@shalinijoshi19, I think we should have a new Github issue tracking a new lexer that supports non-Latin language. I suggest we go with ANTLR for the lexer work as we will likely use ANTLR for the shared model template. as we only want to support a subset of markdown, and ANTLR can't generate a parser for markdown language and almost all of the issues we are seeing with the markdown happens at the tokens and not the grammar, it would be beneficial to spend 2-3 days of dev works on a lexer generated by ANTLR based on the token rule that's already verified works with non-Latin and control characters.

@ghost ghost added the no-recent-activity label Jul 29, 2020
@ghost ghost assigned almedina-ms Jul 29, 2020
@ghost
Copy link

ghost commented Jul 29, 2020

Hi @golddove. This non-spec pull request has had no recent activity for the past 5 days . Please take the necessary actions (review, address feedback or commit if reviewed already) to move this along.

@paulcam206
Copy link
Member

I think switching to int here is likely the correct answer :)

@ghost ghost removed the no-recent-activity label Jul 30, 2020
@ghost
Copy link

ghost commented Jul 30, 2020

Hi @paulcam206; Thanks for commenting on this previously stale pull request. Resetting staleness. @golddove FYI.

@jwoo-msft
Copy link
Member

jwoo-msft commented Jul 31, 2020

fixed compiler warnings and updated IsAlphaNum function to be able to use with int;

@golddove golddove marked this pull request as ready for review July 31, 2020 03:05
@shalinijoshi19
Copy link
Member

shalinijoshi19 commented Jul 31, 2020

Great find guys!
Wrt test coverage, question: If markdown parsing is broken in general would we know about it today or whats a good test suite that I can run if I were to make changes to it revamp the MD parser? What test suite can i run today to at least know it's not a general/widespread issue? (I understand this specidic issue is different etc)..

@shalinijoshi19
Copy link
Member

@golddove /@paulcam206, this looks like it would need to be ported back to 1.2?

@@ -739,19 +739,19 @@ namespace AdaptiveCardsSharedModelUnitTest
MarkDownParser parser("");
Assert::AreEqual<bool>(false, parser.IsEscaped());

parser.TransformToHtml();
(void) parser.TransformToHtml();
Assert::AreEqual<bool>(false, parser.IsEscaped());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool these are the tests I was looking for then! Tx

@golddove golddove merged commit f3d2f03 into main Jul 31, 2020
@golddove golddove deleted the golddove/4463 branch July 31, 2020 18:37
@paulcam206 paulcam206 added this to the 20.07 milestone Jul 31, 2020
golddove added a commit that referenced this pull request Jul 31, 2020
* Fix Markdown parse issue

* Revert char casts

* Fixed compilation errors & updated IsAlphaNumeric function & fixed
existing compilation warnings

* added EOF check in IsAlphaNum()

* removed casting

Co-authored-by: nesalang <nesalang@gmail.com>
golddove added a commit that referenced this pull request Jul 31, 2020
* Cherry-pick of #4481

Co-authored-by: nesalang <nesalang@gmail.com>
rankush pushed a commit to rankush/AdaptiveCards that referenced this pull request May 8, 2024
* Fix Markdown parse issue

* Revert char casts

* Fixed compilation errors & updated IsAlphaNumeric function & fixed
existing compilation warnings

* added EOF check in IsAlphaNum()

* removed casting

Co-authored-by: nesalang <nesalang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Android] Text with markdown is broken in most of the components
5 participants