-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regex limitations #81
Conversation
Resolves jneen#68
Could you include a unit test? |
This does seem to solve the issue, but I'm a little unsure about the overall change provided here. I don't have a good sense of how people are using the Granted, this is still a 0.x library, so I can do that, but I want to feel really sure that it's not something too wild. |
This seems like overkill. You should be able to do - var anchored = RegExp('^(?:'+re.source+')', (''+re).slice((''+re).lastIndexOf('/')+1));
+ var anchored = RegExp('^(?:'+re.source+')', (''+re).slice((''+re).lastIndexOf('/')+1).replace('g', '')); instead. |
@michaelficarra That doesn't solve the issue this is trying to solve. What this does is allow the regex to be greedy if the global flag is set, so the use cases I mentioned in #68 work |
I think @michaelficarra makes a good point... RegExps with the > twoChars = P.regex(/../g)
Parser { _: [Function] }
> twoChars.parse("aa")
{ status: true, value: 'aa' }
> twoChars.parse("aa")
{ status: false,
index: { offset: 0, line: 1, column: 1 },
expected: [ '/../g' ] }
> twoChars.parse("aa")
{ status: true, value: 'aa' }
> twoChars.parse("aa")
{ status: false,
index: { offset: 0, line: 1, column: 1 },
expected: [ '/../g' ] } |
Actually, we need to be careful with the var anchored = RegExp('^(?:'+re.source+')', (''+re).slice((''+re).lastIndexOf('/')+1).replace(/[gy]/g, '')); |
Yeah, based on the MDN page, I think we should only be passing https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp |
No, it is important to preserve |
Oh, yeah I suppose those are useful too. So just |
Ah yeah, I forgot that Maybe instead there can be a parameter to the regex function that you can pass |
For the parser you mentioned before, you can just manually wrap it in > P.regex(/([+\-]?([0-9]+|[0-9]*\.[0-9]+)(e[+-]?[0-9]+)?)+/).parse('+1.4')
{ status: true, value: '+1.4' } |
Yeah that's true. I think it would still be good to have some way of specifying whether you would want the regex to be matched greedily to make the library easier to use. It wasn't clear that this was possible (you even said so) considering how long it stayed open, until I came up with this solution. But if you aren't interested in integrating a solution into the library itself then I can just close this pull request. |
Yeah... regexps are hard, so this took a while to think about. I'm gonna say that since this is possible within the library already, without any large effort, I'm gonna keep it as-is so that we can do the least amount of processing possible on regexps. I'm gonna open an issue to make warnings/errors for regexps with unsupported flags after this. I'm going to decline this PR, but thanks for all the discussion on this -- it was a good learning experience! |
Yeah, regexp can be hard to reason about sometimes. The anchoring part in the regex code is also possible for the user to write too but it makes sense to have it built-in, and so I still think it would be good to include an optional param to turn on greedy evaluation. This issue wasn't so much with regex as it was dealing with the quirks of the library, since the initial example I gave matched the regexp fully when I used |
Yeah, #83 should cover that, I think. |
I would also be happy with, e.g. |
(which is how |
@jneen What would it even mean for the regex to not be anchored at the beginning? |
It would mean nonsense, of course. But there are some regexen that have different behavior when wrapped in |
Can you give an example? I thought it was simply due to the use of stateful flags (g and y). |
Ah, I see, that was in fact because of |
Yay! Okay well it looks like you've got it all under control then :] |
If anyone here wants to take a look at the PR I did to address this, feel free: #85 |
@wavebeem It would be helpful to add a sentence on how to construct greedy regular expressions. A lot of grammars assume the lexer is greedy. |
@Risto-Stevcev What's the shortest example you can give me for that? I'd be happy to add it. |
@wavebeem How about this one: You can make your regex greedy by wrapping it in |
Looks good to me. Thanks.
|
Resolves #68