Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Positive lookahead doesn't look beyond the current line? #11

Closed
hediyi opened this issue Jun 29, 2016 · 9 comments
Closed

Positive lookahead doesn't look beyond the current line? #11

hediyi opened this issue Jun 29, 2016 · 9 comments
Labels

Comments

@hediyi
Copy link
Collaborator

hediyi commented Jun 29, 2016

I want to make sure a character is followed by something that not necessarily on the current line. When that something is on next line, using (?=) doesn't work, as in

/(\/)(?=[^\/]*\/[ix]*)/

If /ix doesn't appear in the same line with the previous /, the regexp doesn't match.

If that's really the case, is there a way to make lookahead work across multiple lines?

@kkos
Copy link
Owner

kkos commented Jun 30, 2016

Oniguruma has variable syntax/behavior function.
I don't know which syntax mode you are using.
If you are using default syntax(== Ruby), then [^\/] can match a newline.
ONIG_SYN_NOT_NEWLINE_IN_NEGATIVE_CC flag is disabled in default syntax.

if /(/)(?=[^\/]/[ix])/ =~ "/\n/ix"
puts "MATCH"
else
puts "NOT MATCH"
end

==> MATCH
ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]

You should go to ruby community if you ask about behavior of ruby regexp.

@hediyi
Copy link
Collaborator Author

hediyi commented Jun 30, 2016

Thank you very much for the reply! 😇

I'm trying to fix some bug of the Ruby language package of Atom, which I believe is using oniguruma as their regex engine. Actually, the bug is the package can't recognize a multi-line regex. The problem lies somewhere in

(?<![\\w)])((/))(?![?*+])
(?=
  (\\\\/|[^/])*
  /[eimnosux]*\\s*
)

which is basically a more verbose version of the previous example. It's really tricky to differentiate the division operator / and regex delimiter / without parsing the Ruby code. 😂

@hediyi
Copy link
Collaborator Author

hediyi commented Jun 30, 2016

I even tried ((/))(?=[^/]*/), it matches a single-line regex, but still not multi-line regex, so I suspected that [^/] in the lookahead doesn't match a LF.

Now that I think about it, maybe I've looked into the wrong spot.

@kkos
Copy link
Owner

kkos commented Jul 1, 2016

I have never used Atom. I don't know about it.
But if atom/language-ruby uses oniguruma as regexp engine, why the package.json dosn't include dependencies to oniguruma module.
Is Atom running on JavaScript on Browser?

@hediyi
Copy link
Collaborator Author

hediyi commented Jul 1, 2016

Pretty much, it's written in CoffeeScript, to be exact, and running on Node.js. The UI is basically parsed HTML and CSS, I guess you can call it a desktop editor created with Web technologies 😄

why the package.json dosn't include dependencies to oniguruma module

It has, in the core: https://github.com/atom/atom/blob/9ea68024acccd7dc7494f50d03496c16b193c0c4/package.json#L46
I think oniguruma backs up all its language packages, and also the find-and-replace feature.

@kkos
Copy link
Owner

kkos commented Jul 1, 2016

I have installed Atom 1.8.0 in Windows.
I tried to run your last regexp in it.
(Very difficult for me to run coffee-script in Atom. I hacked character-count package.)
And it match with multi-line string.
m[0] = '/'.

file: .atom/.apm/character-count/lib/character-count.coffee
module.exports =
count: ->
m = "/\n\n\n/".match(/((/))(?=[^\/]*/)/)
alert(m[0])

activate: (state) ->
atom.commands.add 'atom-workspace', 'character-count:count', => @count()

@hediyi
Copy link
Collaborator Author

hediyi commented Jul 4, 2016

👍 @kkos.

I also tried /[^/]*/ in the find-and-replace (to find slashes in ) with regex, it worked. But in the grammar file of a language package, no luck; even /\n*/ doesn't work, no idea why.

To minimize the possibilities why this doesn't work, I wrote a grammar file from scratch, this is the only rule in the grammar:

image

When the slashes are in the same line, it matches (see the scopes at cursor):

image

But if put newlines in between, it doesn't (see the scopes at cursor and the change of color):

image

It looks like a pattern in the grammar file doesn't search across lines—if it can't find a match in a line, it just starts fresh in the next line.

@hediyi
Copy link
Collaborator Author

hediyi commented Jul 5, 2016

Found the answer in TextMate's manual which Atom's grammars are based on:

Note that the regular expressions are matched against only a single line of the document at a time. That means it is not possible to use a pattern that matches multiple lines. The reason for this is technical: being able to restart the parser at an arbitrary line and having to re-parse only the minimal number of lines affected by an edit. In most situations, it is possible to use the begin/end model to overcome this limitation.

Thanks for your time, really really appreciate it 😉

@kkos
Copy link
Owner

kkos commented Jul 6, 2016

I thank for your investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants