New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strutils.multiReplace() crashes if search string is "" #9557

Closed
genotrance opened this Issue Oct 29, 2018 · 16 comments

Comments

Projects
None yet
5 participants
@genotrance
Copy link
Contributor

genotrance commented Oct 29, 2018

import strutils
var
  search = ""
  replacement = "abc"
echo "string".multiReplace(search, replacement)
# EDIT(@timotheecour ) I think this should be: `echo "string".multiReplace(("", "abc"))`

Crash output

strutils.nim             multiReplace
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

This works on 0.18.0 but fails on 0.19.0 and #head.

@krux02

This comment has been minimized.

Copy link
Contributor

krux02 commented Oct 29, 2018

I am not sure if "" should be supported. If the search engine is non utf-8 multibyte characters aware, it can destroy characters. I would prefer if this would just raise an error for illegal argument.

@Araq

This comment has been minimized.

Copy link
Member

Araq commented Oct 29, 2018

I am not sure if "" should be supported. If the search engine is non utf-8 multibyte characters aware, it can destroy characters. I would prefer if this would just raise an error for illegal argument.

Excellent point that I never considered before. However, IMO an empty string simply doesn't "match" and so the original string is returned unmodified.

@genotrance

This comment has been minimized.

Copy link
Contributor

genotrance commented Oct 29, 2018

Search string could be "" at runtime, I would prefer the proc would handle it rather than having to check myself. It's not too much work but everyone else using the proc will also benefit.

That being said, even the replace() proc is broken from Araq's definition.

import strutils

echo "abc".replace("", "a")

# Prints aaabaca
@timotheecour

This comment has been minimized.

Copy link
Collaborator

timotheecour commented Oct 29, 2018

However, IMO an empty string simply doesn't "match" and so the original string is returned unmodified.

import nre
doAssert replace("foo", re"", "-") == "-f-o-o-"
  • likewise with nim-regex (pending nitely/nim-regex#29 ; currently returns "-f-o-o")

  • "likewise" with re pending #9437 ; currently returns "-foo"

  • see also relevant discussion here:
    #9280 (comment)

This was discussed and we choose this implemementation because Python/JS/whatever also does it this way.

@dom96

This comment has been minimized.

Copy link
Member

dom96 commented Oct 29, 2018

This should be consistent with replace I guess.

@Araq

This comment has been minimized.

Copy link
Member

Araq commented Oct 30, 2018

This was discussed and we choose this implemementation because Python/JS/whatever also does it this way.

That is what I said before knowing it breaks UTF-8 and can only be fixed by even more special logic for empty matches which is bad.

@timotheecour

This comment has been minimized.

Copy link
Collaborator

timotheecour commented Oct 30, 2018

@Araq
knowing that, would it be possible to enforce the same behavior across these for dealing with empty match string:

  • re.replace
  • nre.replace
  • count
  • strutils.replace
  • strutils.multiReplace
  • nim-regex (external)

and could that behavior be to throw an error (either doAssert or ValueError)
seems like a saner behavior, that would catch some bugs

@krux02 krux02 added the Stdlib label Oct 30, 2018

@timotheecour

This comment has been minimized.

Copy link
Collaborator

timotheecour commented Oct 30, 2018

@krux02 there were a few options discussed in this issue; which option did you pick in your commit?

@krux02

This comment has been minimized.

Copy link
Contributor

krux02 commented Oct 31, 2018

@timotheecour well you can look in the commit and see. But sure. In this PR strutils won't match empty strings on anything anymore. I didn't touch re or nre because those are wrappers for PCRE and I can't control what PCRE matches and what it doesn't match.

@krux02 krux02 closed this in e653121 Oct 31, 2018

@timotheecour

This comment has been minimized.

Copy link
Collaborator

timotheecour commented Oct 31, 2018

for clarity, here's new behavior:

echo multiReplace("abcd", ("", "123"))
abcd

also, @genotrance I think your original issue was wrong, since replacement has to be varargs[(string, string)] ; I'll edit it in top-post

@dom96

This comment has been minimized.

Copy link
Member

dom96 commented Oct 31, 2018

Really? We've changed the way replace works again?!

@krux02

This comment has been minimized.

Copy link
Contributor

krux02 commented Oct 31, 2018

Was it changed in the past? The only alternative to "empty string doesn't match on anything" is to make everything utf8, and state clearly that string is only for utf8 data.

@timotheecour

This comment has been minimized.

Copy link
Collaborator

timotheecour commented Nov 1, 2018

the only alternative is throwing, which seems saner behavior to catch bugs (empty string is probably a bug in user code in that case)

narimiran added a commit to narimiran/Nim that referenced this issue Nov 1, 2018

@Araq

This comment has been minimized.

Copy link
Member

Araq commented Nov 1, 2018

We don't throw on API usage bugs, we could have used doAssert but that makes people nervous who value "continue" over "crash and burn".

narimiran added a commit that referenced this issue Nov 1, 2018

narimiran added a commit that referenced this issue Nov 1, 2018

fixes #9557
(cherry picked from commit e653121)
@timotheecour

This comment has been minimized.

Copy link
Collaborator

timotheecour commented Nov 1, 2018

/cc @Araq

We don't throw on API usage bugs, we could have used doAssert

huh? see #9280 (which added doAssert to count) and your comment (#9280 (comment)) in pretty much the same context (for count on empty string):

This should instead use assert or doAssert, it's an API misuse.

As for the rest of your comment:

but that makes people nervous who value "continue" over "crash and burn".

which is why I suggested in #9280 (comment) (as well as here) to throw (by that I mean a recoverable exception) instead of doAssert
My suggestion is to simply throw CatchableError (or perhaps ValueError but I like the simplicity of using the root of catchable errors)
That's not crash and burn, that's normal exception handling mechanism ; it can peacefully coexist inside a production-server-that-can-never-die

That was a main point behind the exception hierarchy rewrite in #8237

@Araq

This comment has been minimized.

Copy link
Member

Araq commented Nov 2, 2018

But programming bugs are not recoverable/catchable errors, see the RFC I wrote.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment