Links with underscores are parsed as emphasized text #198

HuemorDave · 2014-06-09T17:01:53Z

I have noticed that the Markdown parser gives incorrect results with underscores in both link titles and URLs. Example:

[http://example.com/a_link.html](http://example.com/a_link.html)

On most Markdown parsers (including Github's), you get:

http://example.com/a_link.html

Instead, evilstreak/markdown-js produces a parse tree roughly (Ignore the auto linking) equivalent to:

[http://example.com/a link.html](http://example.com/a link.html)

i.e. the range between the underscores is parsed as an em first, before the parser looks for links. The result being that the link doesn't actually become a link. I've told the client to escape the underscores as a workaround but I believe this is a parsing issue.

For the record I am using the latest version of the master branch compiled today, and I verified the issue by directly invoking the parser in a JS console. (e.g. without any of the filtering our front-end code does) Is anyone else able to reproduce this?

The text was updated successfully, but these errors were encountered:

bteplygin · 2015-06-24T15:33:29Z

I have the same problem, within a single link.

todb-r7 · 2016-01-14T22:14:15Z

Sorry if I'm out of line, but is there a 👍 mechanism for bugs like this? Just ran into it. I'll take a run at fixing it myself, but in case I fail, wanted to make sure it got bumped.

In any event, a string_like_this should not be parsed out as a string<em>like</em>this. Ever. You can see the correct behavior in this very issue comment, when you compare a string_like_this vs a string like this.

todb-r7 · 2016-01-14T22:51:02Z

Alright, I have no idea how to fix this proper. If I were to be hacky about it, I'd start around here:

markdown-js/src/dialects/gruber.js

Line 541 in 9b8aa65

__escape__ : /^\\[\\`\*_{}<>\[\]()#\+.!\-]/,

and have a regex test where this_string is treated the same as this \_string

Essentially, it seems like you need a rule similar the the escaping rule, but taking into account any leading character that's non-whitespace. IOW, these should not get formatted: this_example_ , _this_example, or_this_example.

In all cases, there's an underscore that's immediately preceded by an alphanum.

This is solved in other implementations, like GitHub flavored markdown (as you can see above).

mehaase · 2016-02-12T06:26:33Z

@todb-r7 I think 👍 is the only mechanism for voting on GitHub issues, unfortunately. Anyway, yes I noticed the same issue recently. Finally started debugging it today and that led me here.

This is one of the most insane code bases I've ever looked at, but I surmise that changing the behavior of underscores inside words is going to be very difficult, since the current implementation doesn't distinguish "start underscore" from "end underscore" in the same way that it distinguishes [ from ].

However, I did manage to fix the issue of parsing a link with underscores. Change the following line in DialectHelpers.inline_until_char() from

  var res = this.dialect.inline.__oneElement__.call(this, text.substr( consumed ) );

to

  var maxInline = text.indexOf(want, consumed);
  var inlineEligible;

  if (maxInline === -1) {
    inlineEligible = text.substr(consumed);
  } else {
    inlineEligible = text.substr(consumed, maxInline);
  }

  var res = this.dialect.inline.__oneElement__.call(this, inlineEligible);

The principle here is that the link parser will see a link like [foo_bar](/baz_bat.html) and it will call DialectHelpers.inline_until_char("foo_bar](/baz_bat.html)", "]", meaning that it should try to parse any inline elements in the text prior to ]. Unfortunately, the implementation ignores the terminating character ] and will return any inline elements through the end of the string, resulting in a parse tree where bar](/baz is inside an em node, the ] is never matched, and so it never produces any a node.

The fix is to not let this.dialect.inline__oneElement__ go past the ] character. This works in my limited testing, but I don't have the know-how or interest to run the full battery of unit tests. This project appears to be dead, so I am posting this in case it helps somebody else.

tthoeye mentioned this issue Nov 21, 2015

Problems with introducing URLs that contains underscores "_" in the system. tthoeye/barometer-survey-tool#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Links with underscores are parsed as emphasized text #198

Links with underscores are parsed as emphasized text #198

HuemorDave commented Jun 9, 2014

bteplygin commented Jun 24, 2015

todb-r7 commented Jan 14, 2016

todb-r7 commented Jan 14, 2016

mehaase commented Feb 12, 2016

Links with underscores are parsed as emphasized text #198

Links with underscores are parsed as emphasized text #198

Comments

HuemorDave commented Jun 9, 2014

bteplygin commented Jun 24, 2015

todb-r7 commented Jan 14, 2016

todb-r7 commented Jan 14, 2016

mehaase commented Feb 12, 2016