[\s\S] doesn't seem to work #4

Twigpig · 2017-06-16T12:37:03Z

Using Regex "Test:\s*([\s\S]?)\s;" (without quotes, obviously) with an input of "Test: hello ;" correctly Returns "hello" on other Regex tools (e.g. http://www.regexr.com/) but returns no results using TRegExpr.

Using "Test:\s*(.?)\s;" works for this case in TRegExpr but obviously wouldn't do the same job if you were using a multi-line input string.

vs

Unless I'm mistaken, the below should return "hell\nlo":

test case for the issue

andgineer · 2018-10-22T10:52:42Z

It looks the same issue as https://bugs.freepascal.org/view.php?id=34130

In this implementation of regexpr you cannot use 'not-space' inside character class.
The compiler expects just simple chars or intervals (like [a-z]).

You can invert character class as [^a-z] but I do not understand what do you want to do.

for example expression '(?m)Test:\s*(.*?)\s;' will be found in input text 'Test: hel'#$d#$a'lo ;' and returns 'Test: hel'#$d#$a'lo ;'
3bf2b36

Twigpig · 2018-10-22T13:09:43Z

The problem with using (.*?) instead of ([\s\S]*?) is that it doesn't return results that include line breaks, even with the multiline flag enabled (as demonstrated in my second to last screenshot above).

andgineer · 2018-10-22T17:02:29Z

(.) actually works - I even add the test for it ((?m) switches on multiline flag):3bf2b36 But of cause you need '*' because there are more than one char in you example. Please help me to reproduce the problem and I will fix it.The best way - give me or send in github pull request a test that will fail but you think it should not. [\s\S] just won't work because TRegExpr does not work with metachars inside char class (square brackets '[]').TRegExpr expects inside char class just chars or intervals (like [a-z]). 22.10.2018, 16:09, "Twigpig" <notifications@github.com>:The problem with using "(.?)" instead of "([\s\S]?)" is that it doesn't return results that include line breaks, even with the multiline flag enabled (as demonstrated in my second to last screenshot above).—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.

Twigpig · 2018-10-23T10:31:12Z

[\s\S] just won't work because TRegExpr does not work with metachars inside char class (square brackets '[]').TRegExpr expects inside char class just chars or intervals (like [a-z]).

I was classing this as a bug because I wrongly expected this kind of syntax to be consistent across all variations of RegEx and other implementations of RegEx allow it. However it sounds like you're suggesting it's intentionally not supported in TRegExpr. If you don't mind me asking, is there a reason? Was it by design?

My aim was to produce a variation of the above RegEx that could be used in both TRegExpr and http://www.regexr.com/ but it seems that the required syntax for each is incompatible with the other (and I suppose that's okay).

Thank you for taking the time to look into this and getting back to me. Much appreciated.

andgineer · 2018-10-23T14:24:53Z

This library I wrote 20 years ago and at the time I implemented just the re subset that I need.
You know, at the moment there was no other libraries for regular expressions in Delphi.
So I implemented it myself but just for my tasks.

Now I do not use pascal in my everyday life so if we are going to continue development we need somebody with current pascal skills and wish to join TRegExpr development.

In fact that's the main reason why I published it on github.

Meanwhile I am going to fix bugs .. if you can find one, after 20 years of the library exposure ;)

Twigpig · 2018-10-23T14:39:33Z

Ah, I see. So it's just something it doesn't currently handle, rather than something it shouldn't allow on principle? If that's the case, perhaps I'll implement the feature myself if I can find the time.

Thanks again.

andgineer · 2018-10-23T15:48:28Z

As I see there no such thing in POSIX basic standard
https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended

As for extensions this is [:blank:] in POSIX, \s in vim and no such thing in perl.

Or this is [:digit:] for POSIX and \d for vim and perl.

May be perl way is better because POSIX is too verbose.
But with perl you have to understand that for example \r means one character and \w means character class.
Do not see any problems with that..

totyaxy · 2019-07-22T14:52:47Z

This library I wrote 20 years ago

It's very intresting, because the latest trunk version works correctly with UTF8 chars in Lazarus 2.0.3 / fpc 3.0.5. , and no need anymore the complicate UTF8<->unicodestring conversion.

Thank you for this library!

Alexey-T · 2019-11-15T23:46:42Z

\S \D \W not allowed in [], they are handled here

                case regparse^ of // r.e.extensions
                  'd': EmitRangeStr ('0123456789');
                  'w': EmitRangeStr (WordChars);
                  's': EmitRangeStr (SpaceChars);
                  else EmitSimpleRangeC (UnQuoteChar (regparse));
                 end; { of case}

we cannot add here handling of \D\ W\ S - too much chars needed in param (65k minus few).

sync

andgineer added a commit that referenced this issue Oct 22, 2018

https://github.com/masterandrey/TRegExpr/issues/4

3bf2b36

test case for the issue

andgineer closed this as completed Nov 21, 2019

andgineer pushed a commit that referenced this issue Jun 3, 2020

Merge pull request #4 from andgineer/master

5f9d6dc

sync

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[\s\S] doesn't seem to work #4

[\s\S] doesn't seem to work #4

Twigpig commented Jun 16, 2017

andgineer commented Oct 22, 2018

Twigpig commented Oct 22, 2018 •

edited

andgineer commented Oct 22, 2018 via email

Twigpig commented Oct 23, 2018

andgineer commented Oct 23, 2018 •

edited

Twigpig commented Oct 23, 2018

andgineer commented Oct 23, 2018 •

edited

totyaxy commented Jul 22, 2019

Alexey-T commented Nov 15, 2019 •

edited

[\s\S] doesn't seem to work #4

[\s\S] doesn't seem to work #4

Comments

Twigpig commented Jun 16, 2017

andgineer commented Oct 22, 2018

Twigpig commented Oct 22, 2018 • edited

andgineer commented Oct 22, 2018 via email

Twigpig commented Oct 23, 2018

andgineer commented Oct 23, 2018 • edited

Twigpig commented Oct 23, 2018

andgineer commented Oct 23, 2018 • edited

totyaxy commented Jul 22, 2019

Alexey-T commented Nov 15, 2019 • edited

Twigpig commented Oct 22, 2018 •

edited

andgineer commented Oct 23, 2018 •

edited

andgineer commented Oct 23, 2018 •

edited

Alexey-T commented Nov 15, 2019 •

edited