Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown Lexer doesn't recognize strong/emph at the very beginning of the text #1492

PowerSnail opened this issue Jul 7, 2020 · 3 comments


Copy link


from pygments.lexers.markup import MarkdownLexer

l = MarkdownLexer()

print(list(l.get_tokens("*bolded text*")))
# OUT: [(Token.Text, '*bolded'), (Token.Text, ' '), (Token.Text, 'text*'), (Token.Text, '\n')]

print(list(l.get_tokens(" *bolded text*")))
# OUT: [(Token.Text, ' '), (Token.Generic.Emph, '*bolded text*'), (Token.Text, '\n')]

print(list(l.get_tokens("**bolded text**")))
# OUT: [(Token.Text, '**bolded'), (Token.Text, ' '), (Token.Text, 'text**'), (Token.Text, '\n')]

print(list(l.get_tokens(" **bolded text**")))
# OUT: [(Token.Text, ' '), (Token.Generic.Strong, '**bolded text**'), (Token.Text, '\n')]

As shown in the code, if I add a space before the bold-ed/emph-ed text, the lexer outputs Token.Generic.Strong for the entire bold-ed/emph-ed phrase.

If nothing precedes the bold-ed/emph-ed phrase, the lexer sees the phrase as regulart text and breaks down the phrase into parts.


Python: Python 3.8.3
Pygments: 2.6.1

Copy link

On top of that:
It seems like the MarkdownLexer is only recognizing tokens when they are surrounded by whitespace.

For example this inline code between brackets is not recognized:

(`my inline code`)

Copy link

I created a PR which fixes these issues: #1495.

Copy link

Anteru commented Jul 21, 2020

I'd say that is comprehensively fixed. Thanks for your contribution!

@Anteru Anteru closed this as completed Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

3 participants