Positive lookahead doesn't look beyond the current line? #11

hediyi · 2016-06-29T22:22:45Z

I want to make sure a character is followed by something that not necessarily on the current line. When that something is on next line, using (?=) doesn't work, as in

/(\/)(?=[^\/]*\/[ix]*)/

If /ix doesn't appear in the same line with the previous /, the regexp doesn't match.

If that's really the case, is there a way to make lookahead work across multiple lines?

The text was updated successfully, but these errors were encountered:

kkos · 2016-06-30T07:10:20Z

Oniguruma has variable syntax/behavior function.
I don't know which syntax mode you are using.
If you are using default syntax(== Ruby), then [^\/] can match a newline.
ONIG_SYN_NOT_NEWLINE_IN_NEGATIVE_CC flag is disabled in default syntax.

if /(/)(?=[^\/]/[ix])/ =~ "/\n/ix"
puts "MATCH"
else
puts "NOT MATCH"
end

==> MATCH
ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]

You should go to ruby community if you ask about behavior of ruby regexp.

hediyi · 2016-06-30T08:29:15Z

Thank you very much for the reply! 😇

I'm trying to fix some bug of the Ruby language package of Atom, which I believe is using oniguruma as their regex engine. Actually, the bug is the package can't recognize a multi-line regex. The problem lies somewhere in

(?<![\\w)])((/))(?![?*+])
(?=
  (\\\\/|[^/])*
  /[eimnosux]*\\s*
)

which is basically a more verbose version of the previous example. It's really tricky to differentiate the division operator / and regex delimiter / without parsing the Ruby code. 😂

hediyi · 2016-06-30T08:53:14Z

I even tried ((/))(?=[^/]*/), it matches a single-line regex, but still not multi-line regex, so I suspected that [^/] in the lookahead doesn't match a LF.

Now that I think about it, maybe I've looked into the wrong spot.

kkos · 2016-07-01T01:43:38Z

I have never used Atom. I don't know about it.
But if atom/language-ruby uses oniguruma as regexp engine, why the package.json dosn't include dependencies to oniguruma module.
Is Atom running on JavaScript on Browser?

hediyi · 2016-07-01T02:14:52Z

Pretty much, it's written in CoffeeScript, to be exact, and running on Node.js. The UI is basically parsed HTML and CSS, I guess you can call it a desktop editor created with Web technologies 😄

why the package.json dosn't include dependencies to oniguruma module

It has, in the core: https://github.com/atom/atom/blob/9ea68024acccd7dc7494f50d03496c16b193c0c4/package.json#L46
I think oniguruma backs up all its language packages, and also the find-and-replace feature.

kkos · 2016-07-01T06:55:02Z

I have installed Atom 1.8.0 in Windows.
I tried to run your last regexp in it.
(Very difficult for me to run coffee-script in Atom. I hacked character-count package.)
And it match with multi-line string.
m[0] = '/'.

file: .atom/.apm/character-count/lib/character-count.coffee
module.exports =
count: ->
m = "/\n\n\n/".match(/((/))(?=[^\/]*/)/)
alert(m[0])

activate: (state) ->
atom.commands.add 'atom-workspace', 'character-count:count', => @count()

hediyi · 2016-07-04T13:02:50Z

👍 @kkos.

I also tried /[^/]*/ in the find-and-replace (to find slashes in ) with regex, it worked. But in the grammar file of a language package, no luck; even /\n*/ doesn't work, no idea why.

To minimize the possibilities why this doesn't work, I wrote a grammar file from scratch, this is the only rule in the grammar:

When the slashes are in the same line, it matches (see the scopes at cursor):

But if put newlines in between, it doesn't (see the scopes at cursor and the change of color):

It looks like a pattern in the grammar file doesn't search across lines—if it can't find a match in a line, it just starts fresh in the next line.

hediyi · 2016-07-05T12:26:59Z

Found the answer in TextMate's manual which Atom's grammars are based on:

Note that the regular expressions are matched against only a single line of the document at a time. That means it is not possible to use a pattern that matches multiple lines. The reason for this is technical: being able to restart the parser at an arbitrary line and having to re-parse only the minimal number of lines affected by an edit. In most situations, it is possible to use the begin/end model to overcome this limitation.

Thanks for your time, really really appreciate it 😉

kkos · 2016-07-06T00:51:09Z

I thank for your investigation.

kkos closed this as completed Jul 6, 2016

kkos added the question label Aug 23, 2016

bdb mentioned this issue Oct 28, 2016

Multiple memory leaks with 6.1.1 (and 5.9.6) #31

Closed

kkos mentioned this issue Feb 19, 2020

Multi-line regular expressions parsing #181

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Positive lookahead doesn't look beyond the current line? #11

Positive lookahead doesn't look beyond the current line? #11

hediyi commented Jun 29, 2016

kkos commented Jun 30, 2016

hediyi commented Jun 30, 2016 •

edited

Loading

hediyi commented Jun 30, 2016 •

edited

Loading

kkos commented Jul 1, 2016

hediyi commented Jul 1, 2016

kkos commented Jul 1, 2016 •

edited

Loading

hediyi commented Jul 4, 2016

hediyi commented Jul 5, 2016

kkos commented Jul 6, 2016

Positive lookahead doesn't look beyond the current line? #11

Positive lookahead doesn't look beyond the current line? #11

Comments

hediyi commented Jun 29, 2016

kkos commented Jun 30, 2016

hediyi commented Jun 30, 2016 • edited Loading

hediyi commented Jun 30, 2016 • edited Loading

kkos commented Jul 1, 2016

hediyi commented Jul 1, 2016

kkos commented Jul 1, 2016 • edited Loading

hediyi commented Jul 4, 2016

hediyi commented Jul 5, 2016

kkos commented Jul 6, 2016

hediyi commented Jun 30, 2016 •

edited

Loading

hediyi commented Jun 30, 2016 •

edited

Loading

kkos commented Jul 1, 2016 •

edited

Loading