-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Added performant impl for string upcase/downcase :ascii
mode.
#7680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
the existing implementation with binary comprehensions turned out to be _far_ slower than the other modes. The current implementation is >= 2.5X faster than the earlier implementation
@tckb Although I said to go with |
@josevalim on further benchmarking with larger & different inputs, I have not seen any consistent performance difference between https://gist.github.com/tckb/ee13867e8a50e069afb04b25fc9efd85 Regardless, these new implementations have outperformed the standard library's impl by several folds. cc: @michalmuskala |
It is your call @michalmuskala. |
Let's go with the body-recursive version. That's the implementation we use for |
To clarify, we are going with |
thanks @michalmuskala @josevalim I've changed the impl to |
The |
Ah! good catch @michalmuskala , was wondering about that. Is there any specific side-effect of this? |
Not really, just tidiness. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename c
to char
or to another descriptive name? Thank you
lib/elixir/lib/string.ex
Outdated
end | ||
|
||
def upcase(string, mode) when mode in @conditional_mappings do | ||
String.Casing.upcase(string, [], mode) | ||
end | ||
|
||
defp upcase_ascii(<<c, rest::bits>>) when c >= ?a and c <= ?z, do: [c - 32 | upcase_ascii(rest)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do
defp upcase_ascii(<<c, rest::bits>>) when c in ?a..?z, do: ...
The in
seems clearer to me and is compiled to the same thing so we should be fine performance-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within the same minute we both commented the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do not compile the same (there is an additional is_integer check which hmay be removed by the compiler or not) but we can also run into bootstrap issues, unfortunataely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, the Kernel.".."/2
macros has the integer checks for the range. personally, I like the <=
>=
syntax. It's more pragmatic to me ;)
https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/kernel.ex#L3023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the extra check is not supposed to matter in future erlang versions anyway but the biggest issue here is bootstrapping (in
may not be available when we compile the String
module).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the is_integer
check on extracting data from a binary will only be eliminated on OTP 21 (I contributed that optimisation recently).
for <<x <- string>>, | ||
do: if(x >= ?a and x <= ?z, do: <<x - 32>>, else: <<x>>), | ||
into: "" | ||
IO.iodata_to_binary(upcase_ascii(string)) | ||
end | ||
|
||
def upcase(string, mode) when mode in @conditional_mappings do | ||
String.Casing.upcase(string, [], mode) | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not with a computer but would when char in ?a..?z, do:
work? If so that will read better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eksperimental normally, yes -- but as per @josevalim using in
at this level will cause bootstrapping issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked the additional is_integer checks. So i would not recommend using in
okay, here are the todo's
I'll hold off the push for a day just in-case there are further changes ;) |
Changes committed |
❤️ 💚 💙 💛 💜 |
The existing implementation with binary comprehensions turned out to be slower than the other modes. The current implementation is >= 2.5X faster than the earlier implementation. Signed-off-by: José Valim <jose.valim@plataformatec.com.br>
the existing implementation with binary comprehensions turned out to be
far slower than the other modes.
The current implementation is >= 2.5X faster than the earlier implementation
this PR is a consequence of the discussions in #7673
PS: this is my first PR in elixir