Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugger and Core UI disagree on step count #1144

Open
Zarthus opened this Issue Apr 3, 2019 · 11 comments

Comments

Projects
None yet
3 participants
@Zarthus
Copy link
Collaborator

commented Apr 3, 2019

Bug Description

Sometimes the debugger and UI have different step counts

Reproduction steps

Repro 1: https://regex101.com/r/xlbCgC/1 (0 steps UI, 29 in Debugger)

I ran across another case where there was a non-zero value in the UI but still a difference in the UI and debugger, but I didn't save the regex and can't re-re-produce it.

@darthglowball

This comment has been minimized.

Copy link

commented Apr 6, 2019

If you mean by debugger "a regex engine that follows an expression verbatim, without optimizations", then I agree: such unoptimized engine should show way more steps than what I am seeing with the following test case:

Regex:
[^\)]*?\d\s*\)

String:
cgfhlgf;kj scale(34.2542 , -222222222.323, )dfsfgsjdnakjsbnjakscbnsjnlascnj asjdnacjbnakjbnahjjhhhhhhhhhhhhhhhhhh pqoaskdnakldnaklnklwaiddddddddddddddddddddddddddddddddddddddnnnnnnnnnnnnnnnnnnnnbbbbbjbbbb )bcvx d svzdczvdghbgjmhbvhjmbvbvhjbgjhjvcghhgchcchjvjhgvhjvhjvbvjjc

I am seeing 16192 steps in PHP and Python, and the substring after the last ")" doesn't increase the steps. I expected that the part after ")" would also be checked by the engine, but apparantly no. I am not sure if this case is a bug in the steps label, an internal regex engine optimization or my ignorance. Now I am curious how those steps are acquired. Is it with a method from one of those regex libraries?

@Zarthus

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 6, 2019

@darthglowball The "regex debugger" is a feature on the site for PCRE.

I am referring to ui versus debugger

@darthglowball

This comment has been minimized.

Copy link

commented Apr 7, 2019

@Zarthus Ah I understand. I am seeing the same problem of unequal debugger vs. PCRE step count. I am seeing confirmation of the problem I talked about with the regex [^\)]*?\d\s*\) now that I am looking at the debugger step count: in PCRE the steps don't increase when adding characters after the final ")", while in the debugger it does. If you follow the steps of [^\)]*?\d\s*\) verbatim (meaning no optimizations), you should get the behaviour seen in the debugger: every character is looked at atleast once for the start of a match. I suspect that PCRE somehow optimizes the search by stopping prematurely, as opposed to whatever step count that debugger achieves. It could also be a bug in the UI code if these two step UI's are supposed to be equal, but I'll have to wait for the author to clarify about how it is calculated and wether the debugger uses the same engine as PCRE.

@darthglowball

This comment has been minimized.

Copy link

commented Apr 7, 2019

@Zarthus in your case of zero steps, that can't be due to an engine optimization. Zero steps in no time: definitely a bug somewhere. Something potentially relevant: our cases have that same ending ")".

@firasdib

This comment has been minimized.

Copy link
Owner

commented Apr 7, 2019

@Zarthus

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 7, 2019

Perhaps instead of a bug, some additional clarification UI-wise would be beneficial then?

Most likely better in the debugger ('this regexp is unoptimized in order to show you the full journey'), but maybe you can also detect this kind of optimization in the UI and title="0 steps due to optimization, see the full journey in the debugger" it?

edit: Also useful would be showing the optimized step count (in addition to the unoptimized step count) in the debugger, if not complex

@Zarthus Zarthus added debugger discussion and removed bug labels Apr 7, 2019

@darthglowball

This comment has been minimized.

Copy link

commented Apr 7, 2019

@firasdib what about abc^ with multi-line mode off in both Python and PCRE that gives changing steps (it's gotten to 24) depending on what string you use it on? This could be optimized by looking at the caret first (unless you are already doing thousands of such optimizations, in which case the fixed overhead for any string would be huge). Another UI issue I have with "regular" matching, irrespective of optimizations, is that the steps aren't representative of the actual steps through the string, but only of the match.

@firasdib

This comment has been minimized.

Copy link
Owner

commented Apr 8, 2019

@darthglowball The steps show how many steps were taken to achieve said match. If you want to see how the engine arrived at that position, you need to use the debugger.

@Zarthus It seems like its getting a bit confusing and perhaps verbose. I'm not against some type of clarification, but perhaps a notice in the debugger is better?

@Zarthus

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 8, 2019

I'm all for it.

@darthglowball

This comment has been minimized.

Copy link

commented Apr 8, 2019

@firasdib did you happen update your regular matcher yesterday? I could've sworn that if I used something like xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx and the regex \d, that it showed 1 step, corresponding only to the effort of searching from the start of a match. Now it shows 37 steps, which is the effort of searching from the beginning of the string. If it stays like this, you have solved my issue.

EDIT: searching for the literal 0 gives 2 steps. Why this difference in steps when searching for 0 or \d?

@firasdib

This comment has been minimized.

Copy link
Owner

commented Apr 11, 2019

@darthglowball No, I haven't changed anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.