-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Optimize String.replace_* functions #6240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Skip unnecessary matches on the "rest" size * Don't prepend/append empty strings The second one is especially important - doing "" <> str, will force a copy of the string. Checking, if the string is empty first, allows us to skip this copy. Using replace functions with empty replacements is rather common - one example is the String.trim* family of functions, but even in user code it's quite frequent. Removing excessive copy, means the speedup can be made arbitrarily large depending on input. The slowdown for cases with non-empty strings couldn't be reliably measured.
Have you verified that an empty string actually copies the left/right side?
What about the ++ operator?
It feels those improvements should be rather done at the VM level?
--
*José Valim*
www.plataformatec.com.br
Skype: jv.ptec
Founder and Director of R&D
|
Yes, it does copy. Given the following module: defmodule Test do
def test do
str = :binary.copy("a", 10000)
a = binary_part(str, 10, 1000)
IO.inspect :binary.referenced_byte_size(a)
IO.inspect :binary.referenced_byte_size(a <> "")
end
end Running 10000
2000 This means a new binary is created and the data copied. Optimizing this at VM level might actually be quite hard given how binaries work. I can try reporting it to the OTP team. |
Also - the Travis failure seems unrelated. |
I missed this question earlier. In general, it will always copy the thing on the left and never the one on the right, so |
defmodule Foo do
def bar do
list = List.duplicate(0, 10000000)
:timer.tc(fn -> list ++ [] end) |> elem(0) |> IO.inspect
:timer.tc(fn -> [] ++ list end) |> elem(0) |> IO.inspect
:timer.tc(fn -> list ++ [] end) |> elem(0) |> IO.inspect
:timer.tc(fn -> [] ++ list end) |> elem(0) |> IO.inspect
end
end When the empty list is on the right side, we can say that it at least traverses the whole left side. It is unclear if it copies it though. I would say that traversing is unecessary since the operation should fail if it is an improper list. |
lib/elixir/lib/string.ex
Outdated
defp prepend(prefix, suffix), do: prefix <> suffix | ||
|
||
defp append(prefix, ""), do: prefix | ||
defp append(prefix, suffix), do: prefix <> suffix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the names should be prepend_unless_empty
and append_unless_empty
if we go with this. Any particular reason why not going with a concat
function that checks for both sides then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide I like the specific functions, so I would go with prepend_unless_empty
and append_unless_empty
.
Updated |
❤️ 💚 💙 💛 💜 |
It wasn't actually working, because of the wrong pattern match the function never returned true.
The second one is especially important - doing
"" <> str
will force a copy ofthe string. Checking, if the string is empty first, allows us to skip this copy.
Using replace functions with empty replacements is rather common - one example
is the String.trim* family of functions, but even in user code it's quite
frequent.
Removing excessive copy means the speedup can be made arbitrarily large
depending on input. The slowdown for cases with non-empty strings couldn't be
reliably measured.