Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nested parens support, closes #23 #56

Closed
wants to merge 1 commit into from

Conversation

torifat
Copy link

@torifat torifat commented Sep 22, 2017

No description provided.

@torifat
Copy link
Author

torifat commented Oct 2, 2017

@puzrin Hi, can you have a look at this PR please.

@agnesro
Copy link

agnesro commented Oct 16, 2017

Hi, this would be great to have fixed.

@rlidwka
Copy link
Member

rlidwka commented Oct 24, 2017

It only supports 1 level of nesting?

> require('../')().match('test http://example.com/(a(b(c(d)e)f)g) test')[0].text
'http://example.com/(a(b(c(d)e)f'

Also, a word of caution - there could be a possibility of ReDoS here, but I was not able to come up with an exploit yet, so I could be wrong there (screenshot from debuggex.com).

errata:

  • it supports 2 levels of nesting, not 1
  • it's not (.*)*, it's (a*)* where a = (?!ZCc|[()]), ZCc - unicode controls + spaces.

@puzrin
Copy link
Member

puzrin commented Oct 24, 2017

@torifat, at first glance, this looks incomplete and dangerous.

I tend to reject. Objections?

@torifat
Copy link
Author

torifat commented Oct 24, 2017

@rlidwka Yes, it's only for 1 level. I will try to look at the ReDoS issue.

@puzrin Yes, you can reject it. And can you please provide me with some example input for which it could be incomplete and dangerous. It would be easier to come up with a better solution with that.

Thanks.

@puzrin
Copy link
Member

puzrin commented Oct 24, 2017

@torifat

  1. You refered to Better handle nested parens #23. It contains examples. If you say your commit solve it, i expect all examples to be ok. In other case your PR looks incomplete. Adding 1 sublevel only - looks very strange.
  2. I mean ReDoS. If you used pattern ~ (.*)* or similar "well known", that will be rejected without direct proofs of code safety.

@rlidwka
Copy link
Member

rlidwka commented Oct 24, 2017

@rlidwka Yes, it's only for 1 level. I will try to look at the ReDoS issue.

Sorry, I meant 2 levels. As soon as I found the example above - http://example.com/(a(b(c(d)e)f)g) - being truncated in the middle, I just assumed this is a bug and the implementation is meant to support an arbitrary nesting level. Or is it intended to support 2 levels only?

In any case, I think the right output is either http://example.com/(a(b(c(d)e)f)g) (all of it) or http://example.com/ (before first bracket). Anything else just seems like an arbitrary cutoff.

About ReDoS: funnily enough, I think (.*)* is safe actually. The problem with those is hard backtracking, and there's no fail condition to trigger it. I mean (a*)*b is a problem, but (a*)* by itself works. But as soon as we add anything at the end that could fail a match, we'd have a vulnerability without even knowing it. So if you decide to change the implementation, please avoid those if possible.

Regexp in this PR is a bit more complex. It may or may not be vulnerable (needs closer look which I don't have time for at the moment). But there is a regexp pattern that could cause it to be vulnerable after unrelated code changes (which isn't worth the hassle for a partial implementation imho).

@torifat
Copy link
Author

torifat commented Oct 25, 2017

In any case, I think the right output is either http://example.com/(a(b(c(d)e)f)g) (all of it) or http://example.com/ (before first bracket). Anything else just seems like an arbitrary cutoff.

I will try to come up with something which addresses this issue.

And, yes it's 2 levels 🤦‍♂️ I worked on this about a month ago. Sorry for causing the confusion.

Thanks for taking your time to explain the problem. I'm closing this now will reopen with an updated PR.

@torifat torifat closed this Oct 25, 2017
@puzrin
Copy link
Member

puzrin commented Oct 25, 2017

Btw, take a look at #22. Because #23 was created to summarize details after implementation issues.

Also, could you describe details of your use case? For example, in markdown it's not a problem to surround "difficult" links with <...>

@torifat
Copy link
Author

torifat commented Oct 25, 2017

Thanks, @puzrin. I looked at #22, but I will make sure I didn't skip anything from there.

Regarding the use case, we are using linkify: true to detect plaintext links (not written using markdown) inside markdown.

@puzrin
Copy link
Member

puzrin commented Oct 25, 2017

Regarding the use case, we are using linkify: true to detect plaintext links (not written using markdown) inside markdown.

I need a bit different view:

  • what are those 2-level-nested-parens-links for?
  • what % of unrecognized links do they take?

In other words, if you improve very small % of missed cases and pay with significant maintenance difficulties - that's not good. Current parser is already in one step before falling into madness :). That's a price of "simple regexp use".

@torifat
Copy link
Author

torifat commented Oct 25, 2017

Ah, ok. So, it's a very common use case for us. Jira creates links with parens when we apply jql (advance search). For example:

https://xyz.jira.com/issues/?jql=project%20in%20(HNW%2C%20HND)%20AND%20issuetype%20in%20(Bug%2C%20Improvement%2C%20Task)%20AND%20status%20in%20(Done%2C%20New)%20AND%20priority%20%3D%20Major%20AND%20resolution%20%3D%20Unresolved%20AND%20%22Epic%20Link%22%20in%20(HNFP-66%2C%20HNFP-187)%20AND%20assignee%20in%20(currentUser()%2C%20ccordle

In other words, if you improve very small % of missed cases and pay with significant maintenance difficulties - that's not good. Current parser is already in one step before falling into madness :). That's a price of "simple regexp use".

I agree with you. But, for us it's common that's why I was trying to fix it.

@puzrin
Copy link
Member

puzrin commented Oct 25, 2017

Now understands. Use case "jira search results" looks reasonable. If you continue with PR please use such type of link in test/example. I see no problem to accept well-known narrow use-case when good universal implementation is not possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants