Optimize String.replace_* functions #6240

michalmuskala · 2017-06-21T08:40:49Z

Skip unnecessary matches on the "rest" size
Don't prepend/append empty strings

The second one is especially important - doing "" <> str will force a copy of
the string. Checking, if the string is empty first, allows us to skip this copy.
Using replace functions with empty replacements is rather common - one example
is the String.trim* family of functions, but even in user code it's quite
frequent.
Removing excessive copy means the speedup can be made arbitrarily large
depending on input. The slowdown for cases with non-empty strings couldn't be
reliably measured.

* Skip unnecessary matches on the "rest" size * Don't prepend/append empty strings The second one is especially important - doing "" <> str, will force a copy of the string. Checking, if the string is empty first, allows us to skip this copy. Using replace functions with empty replacements is rather common - one example is the String.trim* family of functions, but even in user code it's quite frequent. Removing excessive copy, means the speedup can be made arbitrarily large depending on input. The slowdown for cases with non-empty strings couldn't be reliably measured.

josevalim · 2017-06-21T09:05:40Z

Have you verified that an empty string actually copies the left/right side? What about the ++ operator? It feels those improvements should be rather done at the VM level? -- *José Valim* www.plataformatec.com.br Skype: jv.ptec Founder and Director of R&D

michalmuskala · 2017-06-21T09:15:06Z

Yes, it does copy.

Given the following module:

defmodule Test do
  def test do
    str = :binary.copy("a", 10000)
    a = binary_part(str, 10, 1000)
    IO.inspect :binary.referenced_byte_size(a)
    IO.inspect :binary.referenced_byte_size(a <> "")
  end
end

Running Test.test() produces the following output:

10000
2000

This means a new binary is created and the data copied.

Optimizing this at VM level might actually be quite hard given how binaries work. I can try reporting it to the OTP team.

michalmuskala · 2017-06-21T09:16:08Z

Also - the Travis failure seems unrelated.

michalmuskala · 2017-06-21T15:55:38Z

What about the ++ operator?

I missed this question earlier. In general, it will always copy the thing on the left and never the one on the right, so [] ++ list is fine, I have no idea about list ++ [] - I don't even have an idea how to check this.

josevalim · 2017-06-21T16:14:00Z

defmodule Foo do
  def bar do
    list = List.duplicate(0, 10000000)
    :timer.tc(fn -> list ++ [] end) |> elem(0) |> IO.inspect
    :timer.tc(fn -> [] ++ list end) |> elem(0) |> IO.inspect
    :timer.tc(fn -> list ++ [] end) |> elem(0) |> IO.inspect
    :timer.tc(fn -> [] ++ list end) |> elem(0) |> IO.inspect
  end
end

When the empty list is on the right side, we can say that it at least traverses the whole left side. It is unclear if it copies it though. I would say that traversing is unecessary since the operation should fail if it is an improper list.

whatyouhide · 2017-06-21T20:42:53Z

lib/elixir/lib/string.ex

+  defp prepend(prefix, suffix), do: prefix <> suffix
+
+  defp append(prefix, ""), do: prefix
+  defp append(prefix, suffix), do: prefix <> suffix


I think the names should be prepend_unless_empty and append_unless_empty if we go with this. Any particular reason why not going with a concat function that checks for both sides then?

@whatyouhide I like the specific functions, so I would go with prepend_unless_emptyand append_unless_empty.

michalmuskala · 2017-06-22T16:27:05Z

Updated

josevalim · 2017-06-23T23:07:19Z

❤️ 💚 💙 💛 💜

It wasn't actually working, because of the wrong pattern match the function never returned true.

Signed-off-by: José Valim <jose.valim@plataformatec.com.br>

josevalim approved these changes Jun 21, 2017

View reviewed changes

whatyouhide reviewed Jun 21, 2017

View reviewed changes

josevalim modified the milestone: v1.5.0 Jun 22, 2017

fixup! rename append/prepend_unless_empty

7ee1e75

josevalim merged commit e0480b6 into elixir-lang:master Jun 23, 2017

michalmuskala added a commit to michalmuskala/elixir that referenced this pull request Jun 28, 2017

Fix optimisation from elixir-lang#6240

d507321

It wasn't actually working, because of the wrong pattern match the function never returned true.

whatyouhide pushed a commit that referenced this pull request Jun 30, 2017

Fix binaries optimisation from #6240 (#6272)

ceb29ae

josevalim pushed a commit that referenced this pull request Jul 1, 2017

Fix binaries optimisation from #6240 (#6272)

a8d217e

Signed-off-by: José Valim <jose.valim@plataformatec.com.br>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize String.replace_* functions #6240

Optimize String.replace_* functions #6240

Uh oh!

michalmuskala commented Jun 21, 2017 •

edited

Loading

Uh oh!

josevalim commented Jun 21, 2017 via email

Uh oh!

michalmuskala commented Jun 21, 2017

Uh oh!

michalmuskala commented Jun 21, 2017

Uh oh!

michalmuskala commented Jun 21, 2017

Uh oh!

josevalim commented Jun 21, 2017

Uh oh!

whatyouhide Jun 21, 2017

Uh oh!

josevalim Jun 22, 2017

Uh oh!

michalmuskala commented Jun 22, 2017

Uh oh!

josevalim commented Jun 23, 2017

Uh oh!

Uh oh!

Optimize String.replace_* functions #6240

Optimize String.replace_* functions #6240

Uh oh!

Conversation

michalmuskala commented Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented Jun 21, 2017 via email

Uh oh!

michalmuskala commented Jun 21, 2017

Uh oh!

michalmuskala commented Jun 21, 2017

Uh oh!

michalmuskala commented Jun 21, 2017

Uh oh!

josevalim commented Jun 21, 2017

Uh oh!

whatyouhide Jun 21, 2017

Choose a reason for hiding this comment

Uh oh!

josevalim Jun 22, 2017

Choose a reason for hiding this comment

Uh oh!

michalmuskala commented Jun 22, 2017

Uh oh!

josevalim commented Jun 23, 2017

Uh oh!

Uh oh!

michalmuskala commented Jun 21, 2017 •

edited

Loading