Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update path regex to not accept closing parens #35

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

skyebook
Copy link

@skyebook skyebook commented Feb 9, 2017

This fixes a few cases, one of which is documented in #34:

  • css url's of the format background-image:url(http://example.com/img.jpg);
  • markdown url's such as look at [my image](http://example.com/img.jpg) blah blah

This fixes a few cases:
* css url's of the format "background-image:url(http://example.com/img.jpg);" 
* markdown url's such as "look at [my image](http://example.com/img.jpg) blah blah"
@kevva
Copy link
Owner

kevva commented May 21, 2017

Parentheses at the end of URLs are valid though and commonly used by for example Wikipedia. We need to check if the URL starts with a parens too and then skip the closing one. Until we have lookbehind assertions, my best advice is just to remove them manually.

@sindresorhus
Copy link
Contributor

@kevva Maybe you can simulate lookbehind: https://stackoverflow.com/questions/7376238/javascript-regex-look-behind-alternative

Also see the JavaScript section in https://stackoverflow.com/a/35271017/64949 Could maybe use XRegExp.matchRecursive.

@kevva
Copy link
Owner

kevva commented Jun 7, 2017

It's almost impossible to take all edge cases into account while still following the spec. It's also somewhat common in PHP to use the following syntax when passing arrays into query strings.

/foo.php?bar[]=1&bar[]=2&bar[]=3

The spec says the following though:

A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax.

@sindresorhus
Copy link
Contributor

@kevva Since we can't easily do anything about it right now, I think we should at least give the user a choice, so we should add an option whether to match parens or not. For my use-case, I'm ok with not supporting a few Wikipedia links with parens in favor of not matching Markdown syntax.

@kevva
Copy link
Owner

kevva commented Jun 13, 2017

Happy to. Do you think it fits better as a standalone option or using it with the strict one we have today?

@sindresorhus
Copy link
Contributor

Standalone option.

@sindresorhus
Copy link
Contributor

Lookbehinds are now in Chrome beta and will be out in Chrome stable in Oktober and probably come to Node.js this year. We could feature detect lookbehinds and use it when possible.

We need parens support for Refined GitHub and we can target latest Chrome.

// @bfred-it

@kevva
Copy link
Owner

kevva commented Sep 25, 2017

Cool! I've keeping an eye on any babel plugins or likewise that adds lookbehinds but seems like there are none yet.

@sindresorhus
Copy link
Contributor

sindresorhus commented Sep 25, 2017

@kevva Yeah, I'm not surprised. It's very difficult to polyfill lookbehind.

@kevva
Copy link
Owner

kevva commented Sep 25, 2017

Yup, DmitrySoshnikov/regexp-tree#66.

@sindresorhus
Copy link
Contributor

Chrome 62 is out with lookbehind support. I suggest implementing lookbehind support behind an option for now.

@sindresorhus
Copy link
Contributor

Lookbehind support is available in Node.js 10, so I suggest resolving this with lookbehinds, target Node.js 10, and do a major version bump.

@gajus
Copy link

gajus commented Apr 11, 2019

@skyebook, @kevva Given that lookbehind is now available is this something either of you could take a look at?

@niftylettuce
Copy link
Collaborator

@skyebook @gajus Please see #72 and also I have discovered that if a link starts with a [] brackets, or a mix, e.g. ][][][][][][ then the links returned contain these starting bracket prefixes.

@niftylettuce
Copy link
Collaborator

For anyone stumbling on that, you can clean up the results from url-regex matches by running them with:

const someLink = "[][][][][[[]])())))]]]google.com][[](()()";
someLink.replace(/^[\[\]\(\)]*|[\[\]\(\)]*$/g, ''); // google.com

niftylettuce added a commit to spamscanner/spamscanner that referenced this pull request Aug 12, 2020
@niftylettuce
Copy link
Collaborator

Another issue for folks to be aware of is that URL's returned from this regex will include stuff like this:

{background-image:url('someurl.com/some/path with {background-image:url(' prefixed, or some other combination

@niftylettuce
Copy link
Collaborator

niftylettuce commented Aug 12, 2020

The fix for my previous comment is this:

let someLink = "{background-image:url('someurl.com/some/path";
const quoteIndex = someLink.lastIndexOf('"');
const apostropheIndex = someLink.lastIndexOf("'");
const index = apostropheIndex > quoteIndex ? apostropheIndex : quoteIndex;
if (index !== -1) someLink = someLink.substring(index + 1); // 'someurl.com/some/path'

@niftylettuce
Copy link
Collaborator

This issue is fixed in my maintained and modern version of this package at https://github.com/niftylettuce/url-regex-safe. You should be able to switch from url-regex to url-regex-safe now. See the updated list of options as I added some new ones, and changed a few defaults to more sensible ones (since not everyone is parsing Markdown for instance).

IITII added a commit to IITII/kutt that referenced this pull request Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants