Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mermaid extension erase all unicode in html output #2

Open
retsyo opened this issue Oct 19, 2020 · 5 comments
Open

mermaid extension erase all unicode in html output #2

retsyo opened this issue Oct 19, 2020 · 5 comments

Comments

@retsyo
Copy link

retsyo commented Oct 19, 2020

>>> import markdown
>>>
>>> md = markdown.Markdown( extensions=['md_mermaid'] )
>>> strMd = '# hello world'
>>> print(md.convert(strMd))
<h1>hello world</h1>
>>> strMd = 'a你好b'
>>> print(md.convert(strMd))
<p>ab</p>
>>> strMd = u'a你好b'
>>> print(md.convert(strMd))
<p>ab</p>
>>>
@orobardet
Copy link

orobardet commented Oct 21, 2020

Same here, even with accented characters. It even removes all non basic ascii characters from the whole markdow file (not only from mermaid code).
It seems to be a big regression in the commit 9f279f1 (I don't understand why this code was added). As it's the only difference between version 0.1.0 et 0.1.1, a workaround for now is to pin the 0.1.0 version of the package.

@gmat
Copy link

gmat commented Jun 28, 2021

@oruelle why is it useful to remove all characters except acsii with strip_notprintable() ? This breaks unicode pages. Can you give us some cases where is useful ? Maybe strip_notprintable should be call in some if else conditions to be defined.

Thanks

@mEDI-S
Copy link

mEDI-S commented Apr 18, 2022

i have a lot problems with the current version too and have as fix only add strip_notprintable(line) in the re.mach funktions added and the raw lines not modificid, this help a lot to found the correct end and start from a code block

i remoed all other strip_notprintable()
and add only

m_start = MermaidRegex.match( strip_notprintable(line) )
m_end = re.match(r"^["+mermaid_sign+"]{3}[\ \t]*$", strip_notprintable(line) )

hope this help

githubwua added a commit to githubwua/md_mermaid that referenced this issue Jan 18, 2023
Apply oruelle#2 (comment) to unfilter non-ascii characters such as Japanese or Emoji.
@rayalan
Copy link

rayalan commented Sep 20, 2023

I'm running into a variant of this with the latest mindmap syntax, which relies heavily on the leading whitespace. The current line.strip() call wipes out the whitespace, turning

mindmap
  root
    topic
      subtopic
Loading

into

mindmap
root
topic
subtopic

which won't render.

As others before me have said, I'm not what problem we're trying to solve by stripping when we are inside of mermaid code -- why can't the mermaid code be sent as-is to the mermaid parser?

@retsyo
Copy link
Author

retsyo commented Dec 23, 2023

Same here, even with accented characters. It even removes all non basic ascii characters from the whole markdow file (not only from mermaid code). It seems to be a big regression in the commit 9f279f1 (I don't understand why this code was added). As it's the only difference between version 0.1.0 et 0.1.1, a workaround for now is to pin the 0.1.0 version of the package.

This commit should be cancelled since it gives bad results

retsyo added a commit to retsyo/md_mermaid that referenced this issue Dec 23, 2023
fix long-known bug during dealing with unicode which has puzzled and effected some projects. You may find the detail in oruelle#2. Why I leave two `new_lines.append(line)`? Because in oruelle#2, @rayalan says his app depends on leading space, as a result I can't judge what we should do exactly. However, lets fix this to let most of the projects runs without problem again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants