-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace and empty strings #72216
Comments
A few days ago, the following behavior surprised me. >>> "".replace("", "prefix", 1)
''
>>> "".replace("", "prefix")
'prefix' It seems to me this edge case isn't correctly documented/implemented. I tested with python 2.7, 3.4 and 3.5, I guess it applies for other versions as well. Here are a few elements, for reference. « string.replace(s, old, new[, maxreplace]) >>> "" in ""
True
>>> "".count("")
1
>>> "".find("")
0 https://hg.python.org/cpython/file/2.7/Objects/stringobject.c#l2750 Thanks, Stéphane. |
For what it's worth, here is the behavior in PyPy 2.2.1 >>>> "".replace("", "prefix", 1)
'prefix'
>>>> "".replace("", "prefix")
'prefix' |
Just the least intrusive patch. Also, to me PyPy behaviour doesn't seem correct. |
Thanks for your help. However, I'm not sure I would agree, regarding the correct behavior. I guess the main question is « What is an occurrence? ». Are you not convinced that in, count and find indicate occurrences? >>> "blabla".replace("", "ε")
'εbεlεaεbεlεaε' To give a bit more (useless?) context, I was implementing a GCD algorithm based on replacements by Knuth (very beginning of TAOCP) and it looks like the current cpython implementation gives 0 for gcd(1, 1) :( |
str.replace("", whatever) doesn't make sense. Why not raising an exception in this case? |
Agreed with Stéphane. Since count() returns 1, resulting string should contain 1 replacement. Using regular expressions: >>> re.sub('', 'x', '')
'x'
>>> re.sub('', 'x', '', 1)
'x' |
Victor: because it would break backward compatibility to raise an error where one wasn't raised before. There's not enough motivation for that. I'm not sure about the 1 case; it would again be a change in behavior. Probably we should just document it as there isn't enough reason to risk breaking working code. |
I understand it might be a rather rare case. Nevertheless, don't you think the inconsistency Serhiy pointed out makes it look like a bug needing a fix? |
There may be related discussion in bpo-24243, also about searching for empty strings. A while ago I meant to add documetation and tests for that, but I lost momentum after cleaning up the existing tests. Some of the behaviours are undocumented and surprising, but if you look at the implementation it is clear they are not accidental. E.g. I think there is a function called replace_interleave() or something. IMO you can find an unlimited number of instances of an empty string at index zero of any string. So calls like "ABC".strip("") are sensible to raise an exception, and the interleave mode of "ABC".count("") is unexpected. But I don’t see a big need to change this existing behaviour as long as it is documented. |
Interestingly, initially string.replace() was implemented in terms of split/join. string.replace(s, '', s2, n) returned s for any s, s2 and n. After making replace() a method of str, in aca7b5eaf5e8 (1999-10-12), it became raising ValueError for empty pattern string. Since 762dd09edb83 (bpo-599128, 2002-08-23) it supports zero-length pattern string. str.replace('', s1, s2, n) returned '' for any s1, s2 and n. New implementation added in 41809406a35e (2006-05-25) made str.replace('', '', 'x', -1) returning 'x' while str.replace('', '', 'x', 1) still returns ''. As you can see, the behavior of replacing with empty pattern is not stable. It was changed several times. Maybe it is a time for new change. |
The current behavior is really surprising. >>> "".replace("", "|")
'|'
>>> "".replace("", "|", -1)
'|' vs >>> "".replace("", "|", 0)
''
>>> "".replace("", "|", 1)
''
>>> "".replace("", "|", 1000)
'' I always expect "|". --- This behavior makes sense to me: >>> "abc".replace("", "|")
'|a|b|c|'
>>> "abc".replace("", "|", -1)
'|a|b|c|'
>>> "abc".replace("", "|", 0)
'abc'
>>> "abc".replace("", "|", 1)
'|abc'
>>> "abc".replace("", "|", 100)
'|a|b|c|' |
Well, in fact, I would expect that "".replace(...) would always return "": for any argument passed to replace(). |
What result would you expect of |
I don't suggest to change these operator and method: >>> "" in ""
True
>>> "".count("")
1 |
There are expected some relations between str methods. For example, s.replace(a, b, n) == s.replace(a, b) if n >= s.count(a) Inconsistency between "".replace("", s, n) and "".replace("", s) is just a bug, and it should be fixed in the most consistent way. There are precedences, the behavior of replace() already was changed 3 times in the past. I think that chances to break some code are tiny, we just fix inconsistency why can puzzle users. |
Note that no tests were affected by this change. |
Because of possible compatibility issue (very unlikely) this change is done only in master and will not be backported. |
Yeah, I agree. It would be a mistake to change the Python behavior in a minor release. |
Thanks Stèphańe and Serhiy, I just discovered this strange behavior in 3.8, and wondered how my logic was wrong, until I pinpointed the inconsistent behaviour of str.replace with an empty first parameter and replace count of 1. Glad to see it is fixed in 3.9. I guess for x.replace( a, b, c ) the workaround would be x.replace( a, b, c ) if a else x.replace( a, b ) At least for recent versions of Python 3. |
Nope: I guess for x.replace( a, b, c ) the workaround would be x.replace( a, b, c ) if x else x.replace( a, b ) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: