Skip to content

Optimize String.replace_* functions #6240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 23, 2017

Conversation

michalmuskala
Copy link
Member

@michalmuskala michalmuskala commented Jun 21, 2017

  • Skip unnecessary matches on the "rest" size
  • Don't prepend/append empty strings

The second one is especially important - doing "" <> str will force a copy of
the string. Checking, if the string is empty first, allows us to skip this copy.
Using replace functions with empty replacements is rather common - one example
is the String.trim* family of functions, but even in user code it's quite
frequent.
Removing excessive copy means the speedup can be made arbitrarily large
depending on input. The slowdown for cases with non-empty strings couldn't be
reliably measured.

* Skip unnecessary matches on the "rest" size
* Don't prepend/append empty strings

The second one is especially important - doing "" <> str, will force a copy of
the string. Checking, if the string is empty first, allows us to skip this copy.
Using replace functions with empty replacements is rather common - one example
is the String.trim* family of functions, but even in user code it's quite
frequent.
Removing excessive copy, means the speedup can be made arbitrarily large
depending on input. The slowdown for cases with non-empty strings couldn't be
reliably measured.
@josevalim
Copy link
Member

josevalim commented Jun 21, 2017 via email

@michalmuskala
Copy link
Member Author

Yes, it does copy.

Given the following module:

defmodule Test do
  def test do
    str = :binary.copy("a", 10000)
    a = binary_part(str, 10, 1000)
    IO.inspect :binary.referenced_byte_size(a)
    IO.inspect :binary.referenced_byte_size(a <> "")
  end
end

Running Test.test() produces the following output:

10000
2000

This means a new binary is created and the data copied.

Optimizing this at VM level might actually be quite hard given how binaries work. I can try reporting it to the OTP team.

@michalmuskala
Copy link
Member Author

Also - the Travis failure seems unrelated.

@michalmuskala
Copy link
Member Author

What about the ++ operator?

I missed this question earlier. In general, it will always copy the thing on the left and never the one on the right, so [] ++ list is fine, I have no idea about list ++ [] - I don't even have an idea how to check this.

@josevalim
Copy link
Member

defmodule Foo do
  def bar do
    list = List.duplicate(0, 10000000)
    :timer.tc(fn -> list ++ [] end) |> elem(0) |> IO.inspect
    :timer.tc(fn -> [] ++ list end) |> elem(0) |> IO.inspect
    :timer.tc(fn -> list ++ [] end) |> elem(0) |> IO.inspect
    :timer.tc(fn -> [] ++ list end) |> elem(0) |> IO.inspect
  end
end

When the empty list is on the right side, we can say that it at least traverses the whole left side. It is unclear if it copies it though. I would say that traversing is unecessary since the operation should fail if it is an improper list.

defp prepend(prefix, suffix), do: prefix <> suffix

defp append(prefix, ""), do: prefix
defp append(prefix, suffix), do: prefix <> suffix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the names should be prepend_unless_empty and append_unless_empty if we go with this. Any particular reason why not going with a concat function that checks for both sides then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whatyouhide I like the specific functions, so I would go with prepend_unless_emptyand append_unless_empty.

@josevalim josevalim modified the milestone: v1.5.0 Jun 22, 2017
@michalmuskala
Copy link
Member Author

Updated

@josevalim josevalim merged commit e0480b6 into elixir-lang:master Jun 23, 2017
@josevalim
Copy link
Member

❤️ 💚 💙 💛 💜

michalmuskala added a commit to michalmuskala/elixir that referenced this pull request Jun 28, 2017
It wasn't actually working, because of the wrong pattern match the function
never returned true.
whatyouhide pushed a commit that referenced this pull request Jun 30, 2017
josevalim pushed a commit that referenced this pull request Jul 1, 2017
Signed-off-by: José Valim <jose.valim@plataformatec.com.br>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants