Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links with underscores are parsed as emphasized text #198

Open
HuemorDave opened this issue Jun 9, 2014 · 4 comments
Open

Links with underscores are parsed as emphasized text #198

HuemorDave opened this issue Jun 9, 2014 · 4 comments

Comments

@HuemorDave
Copy link

I have noticed that the Markdown parser gives incorrect results with underscores in both link titles and URLs. Example:

[http://example.com/a_link.html](http://example.com/a_link.html)

On most Markdown parsers (including Github's), you get:

http://example.com/a_link.html

Instead, evilstreak/markdown-js produces a parse tree roughly (Ignore the auto linking) equivalent to:

[http://example.com/a link.html](http://example.com/a link.html)

i.e. the range between the underscores is parsed as an em first, before the parser looks for links. The result being that the link doesn't actually become a link. I've told the client to escape the underscores as a workaround but I believe this is a parsing issue.

For the record I am using the latest version of the master branch compiled today, and I verified the issue by directly invoking the parser in a JS console. (e.g. without any of the filtering our front-end code does) Is anyone else able to reproduce this?

@bteplygin
Copy link

I have the same problem, within a single link.

@todb-r7
Copy link

todb-r7 commented Jan 14, 2016

Sorry if I'm out of line, but is there a 👍 mechanism for bugs like this? Just ran into it. I'll take a run at fixing it myself, but in case I fail, wanted to make sure it got bumped.

In any event, a string_like_this should not be parsed out as a string<em>like</em>this. Ever. You can see the correct behavior in this very issue comment, when you compare a string_like_this vs a string like this.

@todb-r7
Copy link

todb-r7 commented Jan 14, 2016

Alright, I have no idea how to fix this proper. If I were to be hacky about it, I'd start around here:

__escape__ : /^\\[\\`\*_{}<>\[\]()#\+.!\-]/,

and have a regex test where this_string is treated the same as this \_string

Essentially, it seems like you need a rule similar the the escaping rule, but taking into account any leading character that's non-whitespace. IOW, these should not get formatted: this_example_ , _this_example, or_this_example.

In all cases, there's an underscore that's immediately preceded by an alphanum.

This is solved in other implementations, like GitHub flavored markdown (as you can see above).

@mehaase
Copy link

mehaase commented Feb 12, 2016

@todb-r7 I think 👍 is the only mechanism for voting on GitHub issues, unfortunately. Anyway, yes I noticed the same issue recently. Finally started debugging it today and that led me here.

This is one of the most insane code bases I've ever looked at, but I surmise that changing the behavior of underscores inside words is going to be very difficult, since the current implementation doesn't distinguish "start underscore" from "end underscore" in the same way that it distinguishes [ from ].

However, I did manage to fix the issue of parsing a link with underscores. Change the following line in DialectHelpers.inline_until_char() from

  var res = this.dialect.inline.__oneElement__.call(this, text.substr( consumed ) );

to

  var maxInline = text.indexOf(want, consumed);
  var inlineEligible;

  if (maxInline === -1) {
    inlineEligible = text.substr(consumed);
  } else {
    inlineEligible = text.substr(consumed, maxInline);
  }

  var res = this.dialect.inline.__oneElement__.call(this, inlineEligible);

The principle here is that the link parser will see a link like [foo_bar](/baz_bat.html) and it will call DialectHelpers.inline_until_char("foo_bar](/baz_bat.html)", "]", meaning that it should try to parse any inline elements in the text prior to ]. Unfortunately, the implementation ignores the terminating character ] and will return any inline elements through the end of the string, resulting in a parse tree where bar](/baz is inside an em node, the ] is never matched, and so it never produces any a node.

The fix is to not let this.dialect.inline__oneElement__ go past the ] character. This works in my limited testing, but I don't have the know-how or interest to run the full battery of unit tests. This project appears to be dead, so I am posting this in case it helps somebody else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants