[Shared] Fix Markdown parse issue #4481

golddove · 2020-07-23T22:28:47Z

Related Issue

Fixes #4463

Description

Markdown emphasis parsing for was broken on Android devices.

How Verified

Manual testing

Microsoft Reviewers: Open in CodeFlow

jwoo-msft · 2020-07-24T01:09:41Z

source/shared/cpp/ObjectModel/MarkDownBlockParser.cpp

@@ -263,7 +263,7 @@ bool EmphasisParser::IsLeftEmphasisDelimiter(const char ch) const

 bool EmphasisParser::IsRightEmphasisDelimiter(const char ch) const
 {
-    if ((ch == EOF || MarkDownBlockParser::IsSpace(ch)) && (m_lookBehind != DelimiterType::WhiteSpace) &&
+    if ((ch == EOF || ch == '\xff' || MarkDownBlockParser::IsSpace(ch)) && (m_lookBehind != DelimiterType::WhiteSpace) &&


why is this needed for Android? and why is it needed now?

shalinijoshi19 · 2020-07-24T16:46:30Z

This looks to be a gap in our testing; Can we add a test card for this at the very least with this change? Is it possible to cover this with existing sharedmodel unit tests? @RebeccaAnne

shalinijoshi19 · 2020-07-24T16:47:15Z

source/shared/cpp/ObjectModel/MarkDownBlockParser.cpp

@@ -263,7 +263,7 @@ bool EmphasisParser::IsLeftEmphasisDelimiter(const char ch) const

 bool EmphasisParser::IsRightEmphasisDelimiter(const char ch) const
 {
-    if ((ch == EOF || MarkDownBlockParser::IsSpace(ch)) && (m_lookBehind != DelimiterType::WhiteSpace) &&
+    if ((ch == EOF || ch == '\xff' || MarkDownBlockParser::IsSpace(ch)) && (m_lookBehind != DelimiterType::WhiteSpace) &&


I think this is now a complicated enough condition to warrant a quick explanatory code comment pls!

shalinijoshi19 · 2020-07-24T16:48:14Z

void MarkDownBlockParser::ParseBlock(std::stringstream& stream)

The Markdown parser looks like it should have unit tests for this? @RebeccaAnne, does it?

Refers to: source/shared/cpp/ObjectModel/MarkDownBlockParser.cpp:10 in f6f5cac. [](commit_id = f6f5cac, deletion_comment = False)

almedina-ms · 2020-07-24T17:19:53Z

This looks to be a gap in our testing; Can we add a test card for this at the very least with this change? Is it possible to cover this with existing sharedmodel unit tests? @RebeccaAnne

This is not a gap in testing, this is an issue about the device it's been deployed, in emulators everything works as expected while in devices it doesn't. As of this moment there are tests in the c++ side that verify that everything works as expected, we could add tests on Android but the tests would still yield "false positives" on emulators

golddove · 2020-07-24T18:29:40Z

Yeah, the unit tests didn't catch it because it is non-deterministic (it happens on some devices, based on the c++ implementation).

@jwoo-msft The bug isn't actually new, it seems to have been introduced as part of #3342 last year. But we recently brought that change from main to release with #4309. Hence, Teams is now seeing it.

As for why it works on some platforms:

C++ standard doesn't define whether 'char' is signed or unsigned. It's up to the compiler. The compiler usually makes this decision based on what the target device architecture prefers.

We can verify this by looking at CHAR_MIN (which is 0 if char is unsigned char, and -128 if its signed). This is what I see if I debug in the emulator:

And when I debug on my device:

The issue occurs if the compiler decided 'char' is an unsigned char. So EOF (which is -1) assigned to unsigned char (so it becomes 255), no longer equals EOF (which is still -1)... more details here.

This was just a draft PR, but I think the proper solution is to revert ch to int (unless there was a specific reason for that change?), or perhaps switching char to explicit signed char would work (but I think we would lose access to special characters if we do that? I'm not sure..). Thoughts? @paulcam206

jwoo-msft · 2020-07-24T18:48:50Z

Yeah, the unit tests didn't catch it because it is non-deterministic (it happens on some devices, based on the c++ implementation).

@jwoo-msft The bug isn't actually new, it seems to have been introduced as part of #3342 last year. But we recently brought that change from main to release with #4309. Hence, Teams is now seeing it.

As for why it works on some platforms:

C++ standard doesn't define whether 'char' is signed or unsigned. It's up to the compiler. The compiler usually makes this decision based on what the target device architecture prefers.

We can verify this by looking at CHAR_MIN (which is 0 if char is unsigned char, and -128 if its signed). This is what I see if I debug in the emulator:

And when I debug on my device:

The issue occurs if the compiler decided 'char' is an unsigned char. So EOF (which is -1) assigned to unsigned char (so it becomes 255), no longer equals EOF (which is still -1)... more details here.

This was just a draft PR, but I think the proper solution is to revert ch to int (unless there was a specific reason for that change?), or perhaps switching char to explicit signed char would work (but I think we would lose access to special characters if we do that? I'm not sure..). Thoughts? @paulcam206

I see. Yes, #3342 added support for non-Latin language support, and as I added new markdown list marker support '+', and '*', I noticed that #3342 was not picked up, and it's an important fix, so I added it.
@shalinijoshi19, I think we should have a new Github issue tracking a new lexer that supports non-Latin language. I suggest we go with ANTLR for the lexer work as we will likely use ANTLR for the shared model template. as we only want to support a subset of markdown, and ANTLR can't generate a parser for markdown language and almost all of the issues we are seeing with the markdown happens at the tokens and not the grammar, it would be beneficial to spend 2-3 days of dev works on a lexer generated by ANTLR based on the token rule that's already verified works with non-Latin and control characters.

ghost · 2020-07-29T20:00:41Z

Hi @golddove. This non-spec pull request has had no recent activity for the past 5 days . Please take the necessary actions (review, address feedback or commit if reviewed already) to move this along.

paulcam206 · 2020-07-30T20:40:07Z

I think switching to int here is likely the correct answer :)

ghost · 2020-07-30T20:40:11Z

Hi @paulcam206; Thanks for commenting on this previously stale pull request. Resetting staleness. @golddove FYI.

existing compilation warnings

jwoo-msft · 2020-07-31T01:58:23Z

fixed compiler warnings and updated IsAlphaNum function to be able to use with int;

shalinijoshi19 · 2020-07-31T18:23:19Z

Great find guys!
Wrt test coverage, question: If markdown parsing is broken in general would we know about it today or whats a good test suite that I can run if I were to make changes to it revamp the MD parser? What test suite can i run today to at least know it's not a general/widespread issue? (I understand this specidic issue is different etc)..

shalinijoshi19 · 2020-07-31T18:24:53Z

@golddove /@paulcam206, this looks like it would need to be ported back to 1.2?

shalinijoshi19 · 2020-07-31T18:26:41Z

...ce/shared/cpp/AdaptiveCardsSharedModel/AdaptiveCardsSharedModelUnitTest/MarkDownUnitTest.cpp

@@ -739,19 +739,19 @@ namespace AdaptiveCardsSharedModelUnitTest
            MarkDownParser parser("");
            Assert::AreEqual<bool>(false, parser.IsEscaped());

-            parser.TransformToHtml();
+            (void) parser.TransformToHtml();
            Assert::AreEqual<bool>(false, parser.IsEscaped());


cool these are the tests I was looking for then! Tx

* Fix Markdown parse issue * Revert char casts * Fixed compilation errors & updated IsAlphaNumeric function & fixed existing compilation warnings * added EOF check in IsAlphaNum() * removed casting Co-authored-by: nesalang <nesalang@gmail.com>

* Cherry-pick of #4481 Co-authored-by: nesalang <nesalang@gmail.com>

* Fix Markdown parse issue * Revert char casts * Fixed compilation errors & updated IsAlphaNumeric function & fixed existing compilation warnings * added EOF check in IsAlphaNum() * removed casting Co-authored-by: nesalang <nesalang@gmail.com>

Fix Markdown parse issue

f6f5cac

golddove requested review from jwoo-msft, RebeccaAnne, paulcam206 and almedina-ms July 23, 2020 22:28

jwoo-msft reviewed Jul 24, 2020

View reviewed changes

shalinijoshi19 reviewed Jul 24, 2020

View reviewed changes

Merge branch 'main' into golddove/4463

b65ff79

ghost added the no-recent-activity label Jul 29, 2020

ghost assigned almedina-ms Jul 29, 2020

ghost removed the no-recent-activity label Jul 30, 2020

golddove and others added 3 commits July 30, 2020 19:00

Revert char casts

e1c1672

Merge branch 'main' into golddove/4463

34f4e6d

Fixed compilation errors & updated IsAlphaNumeric function & fixed

e92b3c5

existing compilation warnings

added EOF check in IsAlphaNum()

1914dd9

golddove marked this pull request as ready for review July 31, 2020 03:05

jwoo-msft and others added 2 commits July 30, 2020 20:20

removed casting

127abcc

Merge branch 'main' into golddove/4463

d11bc3e

almedina-ms approved these changes Jul 31, 2020

View reviewed changes

Merge branch 'main' into golddove/4463

eda5610

shalinijoshi19 approved these changes Jul 31, 2020

View reviewed changes

golddove merged commit f3d2f03 into main Jul 31, 2020

golddove deleted the golddove/4463 branch July 31, 2020 18:37

paulcam206 added this to the 20.07 milestone Jul 31, 2020

paulcam206 added the AdaptiveCards v1.2.11 label Jul 31, 2020

golddove mentioned this pull request Jul 31, 2020

[Shared] Fix Markdown parse issue #4508

Merged

golddove added a commit that referenced this pull request Jul 31, 2020

[Shared] Fix Markdown parse issue (#4508)

f305404

* Cherry-pick of #4481 Co-authored-by: nesalang <nesalang@gmail.com>

shalinijoshi19 modified the milestones: 20.07, 1.3 Schema Refresh Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Shared] Fix Markdown parse issue #4481

[Shared] Fix Markdown parse issue #4481

golddove commented Jul 23, 2020 •

edited by ghost

jwoo-msft Jul 24, 2020

shalinijoshi19 commented Jul 24, 2020

shalinijoshi19 Jul 24, 2020

shalinijoshi19 commented Jul 24, 2020

almedina-ms commented Jul 24, 2020

golddove commented Jul 24, 2020

jwoo-msft commented Jul 24, 2020 •

edited

ghost commented Jul 29, 2020

paulcam206 commented Jul 30, 2020

ghost commented Jul 30, 2020

jwoo-msft commented Jul 31, 2020 •

edited

shalinijoshi19 commented Jul 31, 2020 •

edited

shalinijoshi19 commented Jul 31, 2020

shalinijoshi19 Jul 31, 2020

[Shared] Fix Markdown parse issue #4481

[Shared] Fix Markdown parse issue #4481

Conversation

golddove commented Jul 23, 2020 • edited by ghost

Related Issue

Description

How Verified

Microsoft Reviewers: Open in CodeFlow

jwoo-msft Jul 24, 2020

Choose a reason for hiding this comment

shalinijoshi19 commented Jul 24, 2020

shalinijoshi19 Jul 24, 2020

Choose a reason for hiding this comment

shalinijoshi19 commented Jul 24, 2020

almedina-ms commented Jul 24, 2020

golddove commented Jul 24, 2020

jwoo-msft commented Jul 24, 2020 • edited

ghost commented Jul 29, 2020

paulcam206 commented Jul 30, 2020

ghost commented Jul 30, 2020

jwoo-msft commented Jul 31, 2020 • edited

shalinijoshi19 commented Jul 31, 2020 • edited

shalinijoshi19 commented Jul 31, 2020

shalinijoshi19 Jul 31, 2020

Choose a reason for hiding this comment

golddove commented Jul 23, 2020 •

edited by ghost

jwoo-msft commented Jul 24, 2020 •

edited

jwoo-msft commented Jul 31, 2020 •

edited

shalinijoshi19 commented Jul 31, 2020 •

edited