Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex end-of-line search is broken #36309

Closed
Spongman opened this issue Oct 15, 2017 · 10 comments
Closed

regex end-of-line search is broken #36309

Spongman opened this issue Oct 15, 2017 · 10 comments
Assignees
Labels
feature-request Request for new features or functionality upstream Issue identified as 'upstream' component related (exists outside of VS Code) verification-needed Verification of issue is requested verified Verification succeeded
Milestone

Comments

@Spongman
Copy link

Spongman commented Oct 15, 2017

  • VSCode Version: Code 1.17.1 (1e9d365, 2017-10-10T14:24:50.851Z)
  • OS Version: Windows_NT x64 10.0.15063
  • Extensions:
Extension Author (truncated) Version
gitlens eam 5.6.4
Go luk 0.6.66
csharp ms- 1.12.1

Steps to Reproduce:

  1. Find "\s+$", no lines match (expected, consider this a pre-condition)
  2. Find in Files "\s+$", it matches every line.

this occurs regardless of "search.useRipgrep" setting.

Reproduces without extensions: Yes

seriously, can we please get a version of 'Find in Files' that's not completely broken, uses the same regex matching as the in-file matching. i don't care if it's slow, i just want correct results. as it is, I have to open up another tool to do regex search/replace because VScode's feature is so broken.

@vscodebot vscodebot bot added new release editor-find Editor find operations labels Oct 15, 2017
@roblourens
Copy link
Member

I don't see this -

image

It matches every line in every file? Can you include a screenshot?

@roblourens
Copy link
Member

roblourens commented Oct 16, 2017

Oh, I see this on Windows. It's actually matching the CR line ending. Ripgrep has this behavior as well. Need to check how we actually avoid this in the editor...

Cheap workaround - avoid matching \r with something like [ \t]+$ instead

@roblourens
Copy link
Member

What tool do you use for search with this regex? I see the same behavior in Sublime.

@roblourens roblourens added the info-needed Issue requires more information from poster label Oct 16, 2017
@Spongman
Copy link
Author

grep -P

it doesn't really matter or not this agrees with some other tool. it doesn't agree with itself - the in-file search rules should be the same as the many-file rules.

@roblourens
Copy link
Member

it doesn't agree with itself - the in-file search rules should be the same as the many-file rules.

Ideally, but there are two regex engines in play that will not agree in a few cases.

It's interesting that grep -P doesn't have the same behavior because the perl regex docs say

\s is a whitespace character and represents [\ \t\r\n\f]

@Spongman
Copy link
Author

yeah, can we have an option so it only uses one regex engine?

again, i don't care if it's slow.

@roblourens
Copy link
Member

I forgot that you said this happens regardless of search.useRipgrep so I guess I still need to figure out how what special handling the editor has for this. But the editor may do processing that doesn't make sense for search. It's interesting that JS matches \r in \s OR as the start of $.

@BurntSushi
Copy link

BurntSushi commented Oct 22, 2017

See also: BurntSushi/ripgrep#416 and rust-lang/regex#244

This particular type of match semantics varies wildly among regex engines. Some don't consider \r at all when computing lines (e.g., ripgrep), while some properly treat either \r\n or \n as line endings while others will treat \r all by itself as a line ending (perhaps a holdover from older macs). In some environments (like, say, Python) line endings are normalized, so you tend not to notice this inconsistency even if the regex engine doesn't know about \r.

IIRC, GNU grep does something akin to normalizing line endings before executing a search, but it is quite complex. (That probably explains why grep -P works in this case.) However, that is the most likely path for ripgrep to get this feature.

A shot in the dark here is that if your editor is doing line ending normalization (or something), then that may be why there's an inconsistency?

@rebornix rebornix removed the editor-find Editor find operations label Nov 21, 2017
@roblourens roblourens added bug Issue identified by VS Code Team member as probable bug and removed info-needed Issue requires more information from poster labels Apr 5, 2018
@roblourens roblourens added the upstream Issue identified as 'upstream' component related (exists outside of VS Code) label Sep 13, 2018
@roblourens roblourens added this to the October 2018 milestone Sep 13, 2018
@alexdima alexdima removed the editor label Sep 20, 2018
@roblourens
Copy link
Member

roblourens commented Oct 27, 2018

Yeah the editor does normalize lines and essentially joins them with \n for searching, so you can't match \r at all in a CRLF file. I think that is good for an editor so the user doesn't have to think about their line ending style.

How do we match this behavior by rewriting the regex?

I think that in single line mode, we would just rewrite \s to [ \t\f], but in multiline mode, \s still needs to match across lines and one \s must match \r\n, so we would have to rewrite it to ([ \t\f]|\r?\n). I think that's right.

@roblourens
Copy link
Member

roblourens commented Nov 8, 2018

Working on this... I will remap \s, \W, and \n.

@roblourens roblourens added feature-request Request for new features or functionality and removed bug Issue identified by VS Code Team member as probable bug labels Nov 8, 2018
@roblourens roblourens added the verification-needed Verification of issue is requested label Dec 4, 2018
@alexr00 alexr00 added the verified Verification succeeded label Dec 6, 2018
@vscodebot vscodebot bot locked and limited conversation to collaborators Dec 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request Request for new features or functionality upstream Issue identified as 'upstream' component related (exists outside of VS Code) verification-needed Verification of issue is requested verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

7 participants