Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACP: Add std::string::String::replace_and_count and/or replace_with #344

Open
alpaylan opened this issue Feb 25, 2024 · 6 comments
Open
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@alpaylan
Copy link

Proposal

Add std::string::String::replace_and_count and/or replace_with and/or replace_and.

Problem statement

Today, standart library String replacement method does not (1) creating any type of telemetric information such as the number of replacements, or (2) allow any type of dynamic computed pattern.

For (1), such telemetry information is useful for logging or validation purposes; for (2), dynamic computed patterns allow for more complex transformation now possible str::replace today.

Motivating examples or use cases

Collection Use Case 1: Checking if count > 0 for source string validation. This is currently only achievable by doing a 2nd pass over the string, resulting in 2 full traversals in the worse case.

Collection Use Case 2: Logging the count/position of the first match/positions of all matches.

Dynamic Pattern Use Case 1: Randomizing certain patterns within the string for anonymization purposes.

Dynamic Pattern Use Case 2: Tagging each replacement with a count number.

Solution sketch

I propose 3 candidate API's.

API 1:(replace_and_count) This API changes the return type of the replace function into (String, usize) from the previous String, the second parameter being the number of changes to the original variable.

API 2:(replace_with) This API receives an FnMut that will return a &str instead of the current &str and uses that. This API can be used for dynamic patterns.

API 3:(replace_and) This API receives an additional FnMut that does not return a value for executing side effects, such as counting.

Alternatives

In the Rust Internals Forum discussion where I posted the Pre-RFC, there was also the replace_iter suggestion that I did not fully understand.

Links and related work

Rust Internals Discussion: https://internals.rust-lang.org/t/pre-rfc-std-replace-and-count/20320/7

A Reddit Question: https://www.reddit.com/r/rust/comments/m39ybn/how_to_count_string_replacements_idiomatically/

@alpaylan alpaylan added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Feb 25, 2024
@kornelski
Copy link

To me replace_with is more general than replace_and_count. Users who want a count can increment a counter in the closure.

While we're improving replace, please consider returning Cow to avoid allocations when no replacements are found.

@alpaylan
Copy link
Author

To me replace_with is more general than replace_and_count. Users who want a count can increment a counter in the closure.

Definitely, indeed, the level of generality/flexibility is replace_with > replace_and > replace_and_count. I added all three for having a more complete discussion of the use cases, and deciding if we want a more general version or a specialized(perhaps optimized) one.

While we're improving replace, please consider returning Cow to avoid allocations when no replacements are found.

That makes sense to me, I guess the time not spent allocating would be worth the overhead of the Cow?

@m-ou-se
Copy link
Member

m-ou-se commented Mar 12, 2024

We discussed this in last week's libs-api meeting.

We thought replace_and_count and replace_and are too specific/niche for the standard library, but some of us though that replace_with might be a reasonable addition.

However, the exact signature of replace_with isn't immediately obvious. It should probably take the &str of the to-be-replaced substring as an argument. However, there is no obvoius lifetime of the new &str returned from the closure, and returning String or anything like that is also not great. An alternative could be to have the closure take a FnOnce(&str) that it can call to submit the replacement string, but that would invalidate the original string to which it might still have a reference. So maybe it should take a new type as argument? Separately from that, it might make sense for the closure to return a ControlFlow so it can return Break if it wants to stop replacing more occurences.. But at that point the signature is getting rather complicated.

In other words:

impl String {
    pub fn replace_with(&mut self, p: impl Pattern, f: impl FnMut(&str) -> &'uhh str); // lifetime issues
    pub fn replace_with(&mut self, p: impl Pattern, f: impl FnMut(&str) -> String); // unnecessary allocations
    pub fn replace_with(&mut self, p: impl Pattern, f: impl FnMut(&'uhh str, impl FnOnce(&str))); // lifetime issues again
    pub fn replace_with(&mut self, p: impl Pattern, f: impl FnMut(ReplacableSubstring<'_>)); // maybe??
    pub fn replace_with(&mut self, p: impl Pattern, f: impl FnMut(ReplacableSubstring<'_>) -> ControlFlow<()>); // starting to look complicated ..
}

Considering how underpowered std's Pattern currently is, this might not be worth it.

Anyway, we'd be interested to 1) hear your thoughts, and 2) see some real world use cases (that aren't already better solved by using e.g. the regex crate).

@kennytm
Copy link
Member

kennytm commented Mar 12, 2024

since regex is called out, you could also use their solution to unify -> &'uhh str and -> String cases without needing 🐮

impl String {
    pub fn replace_with(&mut self, p: impl Pattern, f: F) -> Self
        where F: FnMut(&str) -> R, R: AsRef<str>;
}

(or even make it f: impl Replacer)

@alpaylan
Copy link
Author

Hi, first of all, thanks for the detailed reply and consideration. What I had in mind was similar to the signature below, which is I think better solved using the Regex approach.

enum StringOrStr<'a> {
    String(String),
    Str(&'a str),
}
fn replace_with<'a, P: Pattern<'a>>(&'a self, from: P, mut f: impl FnMut(&str) -> StringOrStr<'a>) -> String;

Anyway, we'd be interested to 1) hear your thoughts, and 2) see some real world use cases (that aren't already better solved by using e.g. the regex crate).

To be honest, I did not know that regex allowed for closures in the replacer. The code sample below solves the cases I had in mind.

  let s = re.replace_all(&s, |c: &regex::Captures| {
      println!("replacing {:?}", c.get(0).unwrap().as_str());
      count += 1;
      "hello"
  });

Therefore I don't think my suggestion solves any problems unsolved by the Regex crate. The only remaining question is whether replace_with is common enough such that it makes sense to add it to the standart library although there is a popular create that solves the same problem. The fact that neither I, nor the others in the Rust Internals forum knew about this functionality in Regex crate suggests that the function is currently not sufficiently visible; but I'm not really familiar with the bar for adding functions to standart library. How ubiquitous should the functionality be to deserve a place in std?

@programmerjake
Copy link
Member

enum StringOrStr<'a> {
    String(String),
    Str(&'a str),
}

that's already in the standard library as Cow<'a, str>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

5 participants