Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegEx Suggestion #13

Closed
fractaledmind opened this issue May 28, 2014 · 1 comment
Closed

RegEx Suggestion #13

fractaledmind opened this issue May 28, 2014 · 1 comment

Comments

@fractaledmind
Copy link

The regular expression for finding links seems to be a bit loose.

self.match_links = re.compile(r"""(\[[^^]*?\])\s?             # text
                              (\[.*?\]|\(.*?\r?\n?.*?\)\)?)   # ref/url
                               """, re.MULTILINE | re.X)

For example, markdown text like this:

This morning, a friend noted a discrepancy between two recent headlines at The Mac Observer:

+ March 5: "[Apple CFO Peter Oppenheimer to Retire, Luca Maestri to Take Over][tmo1]"
+ May 7: "[PR Queen Katie Cotton Leaving Apple][tmo2]"

[tmo1]\: http://www.macobserver.com/tmo/article/apple-cfo-peter-oppenheimer-to-retire-luca-maestri-to-take-over
[tmo2]\: http://www.macobserver.com/tmo/article/pr-queen-katie-cotton-leaving-apple

[I tweeted the two headlines and corresponding URLs][t], with a single word of commentary: "Hmm". I said no more partly because I was near the 140-character limit, and partly to see what the reaction would be. Some got it, but many repliers missed my point, mistakenly thinking it was related to an exodus of executives from the company.<sup id="fnr1-2014-05-08">[1]</sup>

My point was to draw attention to the disparate job descriptions: "Apple CFO" vs. "PR Queen".

[Julia Richert pointed to][j] a similar discrepancy -- two Philip Elmer-DeWitt headlines on his weblog at CNN/Fortune/Money

will match items thus:

  • [Apple CFO Peter Oppenheimer to Retire, Luca Maestri to Take Over][tmo1]
  • [PR Queen Katie Cotton Leaving Apple][tmo2]
  • [tmo1]: http://www.macobserver.com/tmo/article/apple-cfo-peter-oppenheimer-to-retire-luca-maestri-to-take-over [tmo2]: http://www.macobserver.com/tmo/article/pr-queen-katie-cotton-leaving-apple \n [I tweeted the two headlines and corresponding URLs][t]
  • [1]</sup> \n My point was to draw attention to the disparate job descriptions: "Apple CFO" vs. "PR Queen". \n [Julia Richert pointed to][j]

To tighten up the regex, I suggest the following change, which appropriately catches all markdown links, but only markdown links:

self.match_links = re.compile(r"""(\[[^^\[\]:]*?\])\s?? # text
                              (\[[^\[\]:]*?\]|          # ref
                              \(.*?\r?\n?.*?\)\)?)      # url
                              """, re.MULTILINE | re.X)

Clearly, the only real change is to add to the characters that cannot be within the linked text or in the reference id.

@seth-brown
Copy link
Owner

Thanks for taking the time to describe the problem and for suggesting a solution. I've merged a fix for the issue you've described. The latest version of ForMd should fix the problem. Let me know if you have any further issues. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants