Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vis: fix search wrapping bugs #787

Merged
merged 1 commit into from
Jan 17, 2020
Merged

vis: fix search wrapping bugs #787

merged 1 commit into from
Jan 17, 2020

Conversation

zsugabubus
Copy link
Contributor

See commit message.

1) “$” matches in the middle of the text.
visvis
   ^   - standing here
    \/ - at first we search forward-\
\_/    - wrap, if nothing found <---/

After wrapping, in the second range “$” will treat end of the range
as EOL so “/vis$” will wisely match and moves cursor to the first
column.

2) No match after wrapping.
vissssss
 ^^       - standing here or here
 \\____/  - search this before wrapping ---\
V         - search range after wrapping <--/

“/vis” will *not* match (after wrapping), because it crosses ranges.

---

So the real solution would be that instead of the end position, the
start position of the possible match should be limited because a match
can cross the search ranges. To keep things simple, simply search two
whole text after wrapping.

visvis
\____/
@martanne martanne merged commit c9831b1 into martanne:master Jan 17, 2020
@martanne
Copy link
Owner

Thanks, there is a more general problem though. The regex anchors ^ and $ match the start/end of the search range irrespective of whether they are at a newline or not. I guess this could be fixed by supplying REG_NOTBOL and REG_NOTEOL when needed?

@zsugabubus
Copy link
Contributor Author

zsugabubus commented Jan 18, 2020

My first thought was that, but think about what if:

  • It's end-of-line indeed because the next character is \n. That would require an extra check.
  • The possible match would span through the two parts. Then it's not a good idea to stop before the second half of the string.

For me it appeared to be the simplest solution, everything other would make the handling of this/these edge-case(s) unnecessarily complicated. And it goes through the text (absolutely at most) twice only if no matches found at all.

@zsugabubus zsugabubus deleted the fix-search-wrap branch January 18, 2020 00:05
@martanne
Copy link
Owner

I agree that it is more complex and requires additional checks. However, currently the anchors match immediately even when they shouldn't. For example searching for ^vis in the following is mishandled (#denotes the cursor/selection):

vi#vis
visvis

An analogous problem exists for backward searches with $.

@zsugabubus
Copy link
Contributor Author

Give it a try: vis/fix-search-anchors. It seems it's working with POSIX regexps, but I didn't test it with --enable-tre.

martanne added a commit that referenced this pull request Jan 23, 2020
Previously, searches wrapping around did not report any results if they
started from within the eventual match. Fix this by enlarging the search
area to the whole text after reaching the first boundary.

See also #787.
@martanne
Copy link
Owner

Thanks, it seems more tricky than I initially thought. I think the logic about the REG_* flags belongs in the caller. I pushed some changes, let me know whether it works for you? It was only briefly tested.

@zsugabubus
Copy link
Contributor Author

zsugabubus commented Jan 23, 2020

  1. Not sure if it's related, but I found another anomaly (just before I wanted to tell you that I found everything okay) around zero-width backwards searches:
vis
vis|

(Works with any text.)

Try /$, N.

  1. If I understand right, skipping “junks” in text_search_range_{back,for}ward() is needed because regexec() expects a zero-terminated string and it requires special handling.
    It can cause issues in cases where REG_NOTEOL was not supplied because then it will give a match for $ right before the zero chunk (and ^ can give a match right after the chunk).

I also found that advancing to the next non-zero char could be simplified to:

/* No need for `len` and `junk`. */
while (cur != end) {
  ...
  /* rawmemchr() is non-standard but we already require _GNU_SOURCE because of memrchr() */
  cur = rawmemchr(cur, '\0'); /* Knowing that buf is always zero-terminated. */
  while (*cur == '\0' && cur < end)
    ++cur;
  eflags |= REG_NOTBOL; /* Surely not line beginning if the previous char was NUL. */
}

@martanne
Copy link
Owner

Thanks, point 1. should be fixed by b416a91?

And yes searching in binary files, will still show some oddities.

jeremybobbin pushed a commit to jeremybobbin/vis that referenced this pull request Aug 17, 2024
Previously, searches wrapping around did not report any results if they
started from within the eventual match. Fix this by enlarging the search
area to the whole text after reaching the first boundary.

See also martanne#787.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants