Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex search yields zero results when matches exist #50001

Closed
tedhudek opened this issue May 16, 2018 · 14 comments
Closed

Regex search yields zero results when matches exist #50001

tedhudek opened this issue May 16, 2018 · 14 comments
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug search Search widget and operation issues verified Verification succeeded
Milestone

Comments

@tedhudek
Copy link
Member

tedhudek commented May 16, 2018

  • VSCode Version: 1.23.1
  • OS Version: Windows 10,version 1709

Steps to Reproduce:

  1. Clone https://github.com/MicrosoftDocs/windows-driver-docs.
  2. Open repo in VS Code (staging branch).
  3. Ctrl+Shift+F, turn on Use Regular Expression.
  4. Search for: ^The.*?:$
  5. No results.
  6. Remove $ and search again.
  7. Thousands of results.
  8. Click the second result (3dprint\3d-manufacturing-keywords-overview.md)
  9. Editor opens with file.
  10. Re-add $ at end of regex and search again.
  11. One result (just the open file).
  12. Close file and re-run search.
  13. Zero results.
  14. Expected result: 4625 matches

Does this issue occur when all extensions are disabled?: Yes

@vscodebot
Copy link

vscodebot bot commented May 16, 2018

(Experimental duplicate detection)
Thanks for submitting this issue. Please also check if it is already covered by an existing one, like:

@vscodebot vscodebot bot added the search Search widget and operation issues label May 16, 2018
@roblourens
Copy link
Member

Do your files use CRLF line endings? Most likely #36309 (comment)

@roblourens roblourens added the info-needed Issue requires more information from poster label May 16, 2018
@tedhudek
Copy link
Member Author

Yes. Agreed. Is there a recommended workaround? This is a terrible experience for a frequent use case on Windows.

@tedhudek
Copy link
Member Author

tedhudek commented May 16, 2018

I would go as far as to say that this is totally broken. The best case result is the user doesn't trust Code for regex searching and goes off and uses another tool (my case). And then eventually wastes their time and yours opening a dup issue. The worst case is they proceed on the understanding that a pattern doesn't match and never discover that it really does.

@tedhudek
Copy link
Member Author

And why is the find in files behavior different if a file is open (i.e. we get a match, step 11 above)? Completely non-intuitive.

Cheap workaround is to insert \s+ before $, for example: ^The.*?:\s+$. But still!

@roblourens
Copy link
Member

Open files are searched in the editor model using the JS regex engine and there are a couple subtle differences between that and the Rust regex engine used by ripgrep, a tool that that search uses.

I guess #36309 is "\s matches CR" and this is "$ doesn't match CR".

This one we can actually work around automatically by replacing $ with \r?$. That's a hack but sounds like maybe the right thing to do...

@roblourens roblourens added bug Issue identified by VS Code Team member as probable bug and removed info-needed Issue requires more information from poster labels May 16, 2018
@roblourens roblourens added this to the May 2018 milestone May 16, 2018
@tedhudek
Copy link
Member Author

A separate issue could be that when invoking Find in Files, the output should be the same regardless of whether a file is open. I understand there is a different engine used by Ctrl+F within a single file, and when using Ctrl+Shift+F across all files, but when using the latter, the output should not change when files are open or closed.

@roblourens
Copy link
Member

roblourens commented May 16, 2018

That would be ideal and I wish I could make it work, but if we want to support things like search finding matches in modified files, it's not practical.

@tedhudek
Copy link
Member Author

It still seems counterintuitive that \r?$ matches more things than $. At first glance, it would seem that the former is a subset (or equal set) of the latter.

@roblourens
Copy link
Member

roblourens commented May 16, 2018

It's really foo$ vs foo\r?$ where the first matches nothing because ripgrep doesn't consider \r as matching $.

@tedhudek
Copy link
Member Author

From my limited understanding, Windows uses CR+LF, or \r\n, but UNIX uses just line feed, i.e. \n. Is ripgrep equating $ with \n? If so, then why does it matter what precedes the line feed? Sorry, just want to understand fully :)

@roblourens
Copy link
Member

No problem at all. You are exactly right. But from ripgrep's perspective, \r is a character sitting between foo and $ (if the line is foo\r\n, and your regex is foo$).

If the regex doesn't have a match for that character, then the regex engine stops matching when it reaches \r. If we insert an optional \r in that place, then the regex will match.

@tedhudek
Copy link
Member Author

Of course! Got it :)

Thanks, Rob!

@BurntSushi
Copy link

(FWIW, adding support for CRLF line ending support is on my radar for future enhancements to Rust's regex engine, but it will be quite some time before it happens unfortunately.)

@roblourens roblourens modified the milestones: May 2018, June 2018 May 30, 2018
roblourens added a commit that referenced this issue Jun 4, 2018
@jrieken jrieken added the verified Verification succeeded label Jun 28, 2018
@vscodebot vscodebot bot locked and limited conversation to collaborators Jul 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue identified by VS Code Team member as probable bug search Search widget and operation issues verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

4 participants