Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect regex matching #30

Closed
Zantier opened this issue May 6, 2015 · 7 comments
Closed

Incorrect regex matching #30

Zantier opened this issue May 6, 2015 · 7 comments
Labels
Milestone

Comments

@Zantier
Copy link

Zantier commented May 6, 2015

Version 6.7.7.
I've removed all plugins apart from the plugin manager.

Here is the simplest case I've came up with:

  • Open a new tab
  • Type/paste at least 380 "a"s
  • Ctrl+F -> with "Regular expression" search mode selected, search for ([^b])*b

1 occurrence is found, which matches the whole text, when of course the document doesn't contain "b".

@tommilligan
Copy link

Interesting bug. Replicated easily (v6.7.7). Confirmed that only happens after exactly 380 characters. Bug not present if () removed; (a)*b gives bug but a*b does not even after 200,000 characters.

Suspect memory/storage issue of () capturing group?

@ghost
Copy link

ghost commented May 6, 2015

Oddly enough I get the bug after 270 characters.

@Zantier
Copy link
Author

Zantier commented May 7, 2015

It appears to be an issue with Scintilla, as I was able to reproduce a similar behaviour in SciTE by dropping in SciLexer.dll v3.3.4 (which Notepad++ v6.7.7 is using) in place of v3.5.5.

When I place SciLexer.dll v3.5.5 in the same directory as Notepad++, I no longer see the issue, even with 200,000 "a"s in a row. Not that SciLexer.dll v3.5.5 is perfect though, as it seems to fail to match (a)*, or a more useful regex such as (a|b)* in both Notepad++ and SciTE...


Edit: Note I don't know whether SciLexer.dll v3.5.5 was built with boost for PCRE support. I simply took SciLexer.dll from the SciTE binary download. I don't have time to check right now.

@milipili milipili added the bug label May 31, 2015
@guy038
Copy link

guy038 commented Jun 24, 2015

Hello Zantier, tommilligan, jonandr and All,

I did some tests, on a 6.7.9 version, with its native plugins, and I found out a strange rule about that regex issue ! Follow the few steps below, to reproduce it :

  • Let's suppose that you write, in a new tab, a line with several lower letters a ( let's say, for instance, between 10 and 20 )
  • Then, just add some lines, even empty, before and after this line of a's letters, with the two following conditions :
  1. They don't contain any lower letter a or b

  2. If your select all the contents of that test file, you get, exactly, 285 characters

  • Go back to the very beginning of the file ( CTRL + Origin )
  • Open the Find dialog
  • Check the Match case option and the Regular expression radio button
  • Uncheck the option Wrap around
  • Type the Zantier's regex ([^b])*b in the Find what zone

=> When clicking on the Find Next button, you wrongly get the entire contents of this test file :-(

Now, :

  • Close the Find dialog , hitting the ESC key
  • Delete ONE character ONLY, in a line, located before, or even after, the line of a's ( so, if you hit CTRL-A, you get, exactly, a 284 characters selection )
  • Go back, again, to the very beginning of the file, with the CTRL + Origin shortcut
  • Type on the F3 key to repeat the same regex search ([^b])*b

=> This time, the regex engine, doesn't find, as expected, any matched string ( as the file contains NO lower letter b )

So, for some odd reasons, this regex doesn't work, as soon as the size of the file is > 284 bytes ???

Then, I replace, successively, the SciLexer.dll v3.5.6 file by :

  • The last version of SciLexer.dll v3.3.4 ( which came with N++ 6.7.3 ) => Same limit, between 284 and 285
  • The last version of SciLexer.dll v2.2.7 ( which came with N++ 6.3.0 ) => The limit is between 269 and 270

BTW, the regex ([^b])+b produces the same issue, but, luckily, the regexes ([^b]*)b ( that, however, creates a different group 1 ) and the regex [^b]*b, do work, in all cases, independently of the file"'s size !

Best regards,

guy038

@robert-andrzejuk
Copy link

Has anyone reported this on the Scintilla bug system?

@donho
Copy link
Member

donho commented Nov 25, 2018

It's an issue concerning Scintilla.

@donho donho closed this as completed Nov 25, 2018
@sasumner
Copy link
Contributor

It's an issue concerning Scintilla

The symptoms sound like Notepad++ isn't catching an exception that it should (see #4761 (comment)).

Anyway, if it is really thought to be a Scintilla issue, why not update Scintilla and see if it remains. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants