Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str_[i]replace(): Add support for (string needle, array replace) #980

Closed
wants to merge 3 commits into from

Conversation

flaupretre
Copy link
Contributor

Implements feature request #38685 (https://bugs.php.net/bug.php?id=38685).

For full details, read the RFC (https://wiki.php.net/rfc/cyclic-replace).

When str_replace() receives a string as 'needle' and an array as 'replace', each occurence of 'needle' is replaced with an element of 'replace', in array order, and looping if count(replace array) is lower than count of needles found in subject.

Defines new php_str_to_array_ex() and php_str_to_array() C functions.

@smalyshev
Copy link
Contributor

Looks like str_ireplace.phpt is broken, please take a look.

@flaupretre
Copy link
Contributor Author

Right. I had forgotten to replace the %s/%d in my result.

Fixed.

convert_to_string(entry);
if (Z_STRLEN_P(entry) > max_len) max_len = Z_STRLEN_P(entry);
} ZEND_HASH_FOREACH_END();
zend_hash_internal_pointer_reset_ex(arr, &pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset pointer here means? ZEND_HASH_FOREACH doesn't affects the internal pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand what you mean. A first pass is done to convert array elements to string, then internal pointer is reset. What's wrong here ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZEND_HASH_FOREACH doesn't affects the internal pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I moved the call to zend_hash_internal_pointer_reset_ex() just before starting the replace phase.

Feature request #38685
Add php_str_to_array_ex() and php_str_to_array() C functions
offset=0;
for (p = checked_haystack, end = p+length; (r = (char*)php_memnstr(p, checked_needle, needle_len, end)); p = r + needle_len) {
if ((entry = zend_hash_get_current_data_ex(arr, &pos)) == NULL) {
if (zend_hash_num_elements(arr) == 0) { /* Empty replace array */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho this should be a precondition error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, maybe, but I just made changes to support 'array search' as a real extension of 'string search'. What this means is that it allows cyclic replace when search is an array too (in this case, elements of the replace array will be strings or arrays).If an element of the replace array is an array, it is used for cyclic replace.

Do you think we should stop everything if we detect an empty array replacement ? It is probably the cleanest way to do it. I am not very familiar with PHP error handling, could the error be trapped, do we need to return a (partially) valid result, even after raising the error ?

Adds support for (search=array of strings, replace=array of array of strings)
Extends the basic cyclic case: (search=string, replace=array of strings)
@rquadling
Copy link
Contributor

I think that looping through the replacement array when you run out of replacements could be a bit confusing and doesn't provide a way to limit the number of replacements.

str_[i]replace('a', [1,2], 'aaa') would become 121, with no way to make it become 12a.

So, would it be appropriate to allow for a limit parameter to be applied like so : str_[i]replace(mixed $search, mixed $replace, mixed $subject, $limit = 0[, &$count]) or str_[i]replace(mixed $search, mixed $replace, mixed $subject, $loop = false[, &$count])?

Hmm. But then, looping COULD be an issue when one of the matches also exists as part of the replacements.

Not sure of the solution, but I thinking automatic looping, with no way to NOT loop is not an ideal way to work. Considering that we currently replace with empty strings if there are too few replacements, then maybe that should be our default/maintained position when using an array of replacements for a single match.

@flaupretre
Copy link
Contributor Author

I agree with you. We should provide a way to control whether we loop or not, but I haven't found the right solution yet.

First, I don't like the idea of adding a $limit argument. If the user provides a replacement array, the array count gives us a limit to consider. The problem arises when we exhaust the replacement array, but a boolean argument is not enough, as we have 4 possible behaviors : looping, empty strings, repeating the last replace argument (provided we eliminate the empty array case), or stopping the replacements.

The ideal way would be to find a way to encode the behavior in the replacemnt array itself, for instance with a special meaning for the last element. a last element equal to null, for instance, could mean something special. and false can be used too. These 3 values won't conflict too much with real replacement strings. This gives 3 special values and a default case, it should be enough.

Now, we need to decide the mapping between values and behavior. I propose :

default: stop replacements
true : loop
false : repeat last string (previous to last actually)
null : empty strings

Actually, I am not sure for null -> empty strings, as it can be done easily with an empty string followed by false. This case exists for search/replace arrays because, when count(replace) < count(search), a default replacement value MUST be provided but, here, the case is different : it is cyclic replacement, not one-to-one.So, I finally think we have 3 cases only.

The good news is that it is relatively easy to implement. Let me know what you think.

@rquadling
Copy link
Contributor

Just talking this through/showing the working ...

str_[i]replace('a', [1, 2], 'aaaa') gives 12aa
str_[i]replace('a', [1, 2, true], 'aaaa') gives 1212
str_[i]replace('a', [1, 2, false], 'aaaa') gives 1222
str_[i]replace('a', [1, 2, null], 'aaaa') gives 12

A concern is if the replacement array is dynamic, then the last value could influence the result in an unexpected way.

I think having an element in the array that behaves differently because it is the last element in the replacements array and the number of elements in the array is different to the number of matches, doesn't seem right. Too many potential issues if a user is expecting normal type juggling.

Having the last element of the replacements being special in some way is so unexpected.

I still feel an option to explicitly indicate the behaviour when there is a shortfall of replacements to matches is the sa[fn]est way to go.

Maybe str_[i]replace(mixed $search, mixed $replace, mixed $subject[, $shortfall = STR_REPLACE_STOP[, &count]) where $shortfall can be one of :

STR_REPLACE_STOP - No more replacements
STR_REPLACE_FIRST - Use the first element of $replace if it is an array
STR_REPLACE_LAST - Use the last element of $replace if it is an array
STR_REPLACE_LOOP - Loop through the $replace array until all matches have been replaced
$string - Replace unmatched matches with this string.

@flaupretre
Copy link
Contributor Author

Well, I agree that the looping behavior should be independant of the replace array. it was just the best solution I had found without adding a function argument...

If we add one, it will have to be in last position. On an intial design, I would put it before 'count' but the BC break is too important.

I don't see the use case for STR_REPLACE_FIRST. It is a sort of half loop, but why not...

I am not in favor of allowing a $string in this argument, as providing a string there is the same as storing the string at the end of the replace array and setting STR_REPLACE_LAST. We also need to reserve flag values for this parameter. If we allow any $string, we can only reserve null, true, and false, as numbers will have to be accepted as $string. It will be quite short, especially if we want the options value to be extensible for future needs. I prefer to use STR_REPLACE_LAST and let each replace array provide the last string it wants.

I would add another value : STR_REPLACE_EMPTY, to replace with empty strings. I was not sure, as this is just a shortcut of the more general STR_REPLACE_LAST case, but the case is probably common enough to deserve a specific flag.

@rquadling
Copy link
Contributor

It would seem that [1,2,3], STR_REPLACE_FIRST === [1,2,3,1], STR_REPLACE_LAST.

But the replacement array may be dynamic, and so manually having to find the first element of the replacements array, attach it to the end of the replacements array and then say use the last one for the shortfall, rather than the first one does seem more complicated than it should be. It would seem to make for a cleaner read with STR_REPLACE_FIRST. Of course, I've no use case for this yet, but you know developers! They'll find the absolute edge in about 30 seconds after a feature is released.

I agree on the $string. I can certainly see that this would provide simply too many combinations to deal with. Flags are much safer and easier to document/understand/read.

And certainly agree on STR_REPLACE_EMPTY.

Add support for muti-level array recursion in search, replace, and subject.
@flaupretre
Copy link
Contributor Author

I hope it's OK now.

RFC, patch and additional tests are in sync and ready for review.

@krakjoe krakjoe added RFC and removed Feature labels Jan 3, 2017
@krakjoe
Copy link
Member

krakjoe commented Jan 3, 2017

@flaupretre there are merge conflicts ...

Can I encourage you to push forward with the RFC ?

Alternatively, if you consider this work abandoned, please close this PR.

@krakjoe
Copy link
Member

krakjoe commented Feb 3, 2017

Having waited a month for feedback, and since this work would appear to be abandoned, I'm closing this PR, and will update the RFC.

Please take this action as encouragement to open a clean PR against a supported branch, and push forward with the RFC in a timely manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants