Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When converting to Markdown from HTML not all "content" that may look like MD is correctly escaped #312

Open
Chris-Marassovich opened this issue Aug 9, 2022 · 2 comments

Comments

@Chris-Marassovich
Copy link

Hello and thanks for a great library.

I may also be using this wrong so please set me straight.
I get HTML content from an editor.
The user may have entered content that looks like markdown but entering markdown was not their intent.

When I do the following:

var converter = new ReverseMarkdown.Converter();
string html = html_string_from_editor;
string result = converter.Convert(html);

During my testing I note that much of the content that looks like markdown is correctly escaped but not all is escaped.
It is a 50/50 split from my testing.

I note that Italics and bold are nicely escaped
entered into editor as *Italic Text* will come out of the Convert as \*Italic Text\* .

I note some of the formats completely ignored are
Headings such as # Heading 1
Points such as - Point 1.

Am I using this wrong?
Is there a config setting?
Or simply escaping a heading is not supported?

Quite a show stopper for me as I am using Xamarin Forms and the only editor I can use wants and produces HTML and my choice is to store the text as Markdown hence the conversion. My user base has a lot of text content that is certainly valid Markdown but the user is not aware of that and certainly does not expect the change in formatting.
'Why is my text bold and larger and where did the # go?' are just some of the questions I am getting.

Thanks in advance.
Chris ...

@Chris-Marassovich
Copy link
Author

As I have a wander through the source I can see within the Converters/Text.cs that within the ctor there is the code added to the _escapedKeyChars collection .... that escapes just bold and italics.

Is this the right spot?
Why only bold and italics escaped?
Assumption is we simply? add some more logic here to escape the other md formats?

cheers
Chris ...

@mysticmind
Copy link
Owner

mysticmind commented Aug 31, 2022

@Chris-Marassovich Appreciate if you could share few examples with source html and the converted markdown text so that I get a better picture of what you are facing as an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants