Added performant impl for string upcase/downcase `:ascii` mode. #7680

tckb · 2018-05-12T14:16:41Z

the existing implementation with binary comprehensions turned out to be
far slower than the other modes.

The current implementation is >= 2.5X faster than the earlier implementation

this PR is a consequence of the discussions in #7673

PS: this is my first PR in elixir

the existing implementation with binary comprehensions turned out to be _far_ slower than the other modes. The current implementation is >= 2.5X faster than the earlier implementation

josevalim · 2018-05-12T16:25:26Z

@tckb Although I said to go with dn4_1, @michalmuskala brought good points about dn4_2 and memory use, so can you please migrate to dn4_2 instead? Thank you!

tckb · 2018-05-13T09:14:32Z

@josevalim on further benchmarking with larger & different inputs, I have not seen any consistent performance difference between dn4_1(downcase_ascii_patternMatch) and dn4_2(downcase_ascii_iodata). For memory consumption, dn4_2 has slightly better memory usage than the other. In one case (all characters of same case), I have noticed dn4_1 with better memory usage.

https://gist.github.com/tckb/ee13867e8a50e069afb04b25fc9efd85
https://gist.github.com/tckb/4cfd89a3c2e10f307f681679b528b099

Regardless, these new implementations have outperformed the standard library's impl by several folds.
https://gist.github.com/tckb/6e0a55e9482e5de785059e3a2de2b4e2

cc: @michalmuskala

josevalim · 2018-05-13T11:05:23Z

It is your call @michalmuskala.

michalmuskala · 2018-05-13T21:02:20Z

Let's go with the body-recursive version. That's the implementation we use for Enum.map (through :lists.map), so let's be consistent.

josevalim · 2018-05-13T22:19:18Z

To clarify, we are going with dn4_2 then. Can you please @tckb? thanks for the extra benchmarks!

tckb · 2018-05-14T04:30:06Z

thanks @michalmuskala @josevalim I've changed the impl to dn4_2 I'm assuming the test cases related to this method are already in-place. I haven't created one

michalmuskala · 2018-05-14T15:07:46Z

The upcase_ascii function is implemented in-between the upcase clauses. This causes a compiler warning. Let's move upcase_ascii after all the upcase clauses. Similar with downcase_ascii.

tckb · 2018-05-14T15:11:53Z

Ah! good catch @michalmuskala , was wondering about that. Is there any specific side-effect of this?

michalmuskala · 2018-05-14T15:13:56Z

Not really, just tidiness.

eksperimental

Could we rename c to char or to another descriptive name? Thank you

whatyouhide · 2018-05-14T15:41:10Z

lib/elixir/lib/string.ex

  end

  def upcase(string, mode) when mode in @conditional_mappings do
    String.Casing.upcase(string, [], mode)
  end

+  defp upcase_ascii(<<c, rest::bits>>) when c >= ?a and c <= ?z, do: [c - 32 | upcase_ascii(rest)]


I think we should do

defp upcase_ascii(<<c, rest::bits>>) when c in ?a..?z, do: ...

The in seems clearer to me and is compiled to the same thing so we should be fine performance-wise.

Within the same minute we both commented the same

They do not compile the same (there is an additional is_integer check which hmay be removed by the compiler or not) but we can also run into bootstrap issues, unfortunataely.

right, the Kernel.".."/2 macros has the integer checks for the range. personally, I like the <= >= syntax. It's more pragmatic to me ;)

https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/kernel.ex#L3023

I think the extra check is not supposed to matter in future erlang versions anyway but the biggest issue here is bootstrapping (in may not be available when we compile the String module).

I believe the is_integer check on extracting data from a binary will only be eliminated on OTP 21 (I contributed that optimisation recently).

eksperimental · 2018-05-14T15:40:35Z

lib/elixir/lib/string.ex

-    for <<x <- string>>,
-      do: if(x >= ?a and x <= ?z, do: <<x - 32>>, else: <<x>>),
-      into: ""
+    IO.iodata_to_binary(upcase_ascii(string))
  end

  def upcase(string, mode) when mode in @conditional_mappings do
    String.Casing.upcase(string, [], mode)
  end



Im not with a computer but would when char in ?a..?z, do: work? If so that will read better

@eksperimental normally, yes -- but as per @josevalim using in at this level will cause bootstrapping issues.

I overlooked the additional is_integer checks. So i would not recommend using in

tckb · 2018-05-14T17:54:11Z

okay, here are the todo's

change c to more "descriptive" name
~~use in instead of explicit >, <~~ not considering as it might cause issue while bootstrapping.

I'll hold off the push for a day just in-case there are further changes ;)

@josevalim @michalmuskala @whatyouhide @eksperimental :D

tckb · 2018-05-15T18:36:49Z

Changes committed

josevalim · 2018-05-15T19:16:49Z

❤️ 💚 💙 💛 💜

The existing implementation with binary comprehensions turned out to be slower than the other modes. The current implementation is >= 2.5X faster than the earlier implementation. Signed-off-by: José Valim <jose.valim@plataformatec.com.br>

Added performant impl for string upcase/downcase :ascii mode.

ce917da

the existing implementation with binary comprehensions turned out to be _far_ slower than the other modes. The current implementation is >= 2.5X faster than the earlier implementation

Changed to alt impl with the one having better memory utilization.

4eba4b0

Removed unncessary functions

04737d3

josevalim approved these changes May 14, 2018

View reviewed changes

Minor house keeping - keeping compiler happy ;)

93a03a4

michalmuskala approved these changes May 14, 2018

View reviewed changes

eksperimental suggested changes May 14, 2018

View reviewed changes

whatyouhide reviewed May 14, 2018

View reviewed changes

eksperimental reviewed May 14, 2018

View reviewed changes

Changed name from c to char

113f2e7

josevalim merged commit 95ffb4c into elixir-lang:master May 15, 2018

Added performant impl for string upcase/downcase :ascii mode. #7680

Added performant impl for string upcase/downcase :ascii mode. #7680

Uh oh!

Conversation

tckb commented May 12, 2018

Uh oh!

josevalim commented May 12, 2018

Uh oh!

tckb commented May 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

josevalim commented May 13, 2018

Uh oh!

michalmuskala commented May 13, 2018

Uh oh!

josevalim commented May 13, 2018

Uh oh!

tckb commented May 14, 2018

Uh oh!

michalmuskala commented May 14, 2018

Uh oh!

tckb commented May 14, 2018

Uh oh!

michalmuskala commented May 14, 2018

Uh oh!

eksperimental left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tckb May 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tckb commented May 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tckb commented May 15, 2018

Uh oh!

josevalim commented May 15, 2018

Uh oh!

Uh oh!

Added performant impl for string upcase/downcase `:ascii` mode. #7680

Added performant impl for string upcase/downcase `:ascii` mode. #7680

tckb commented May 13, 2018 •

edited

Loading

tckb May 14, 2018 •

edited

Loading

tckb commented May 14, 2018 •

edited

Loading