Skip to content

Conversation

@josevalim
Copy link
Member

No description provided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call @hauleth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found that this improves performance by about 20%. I am looking for other optimisations now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great. I wonder if we can implement shift_right, scale_up and scale_down using log2 and if that would be more efficient too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be the best if there would be popcnt (population count aka return number of set bits) and lzcnt (leading zeroes count) support. But this probably would need to be implemented as NIF, so not possible for Elixir, maybe as functions in Erlang itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josevalim BTW for greater speed improvement you need to inline that, as currently the improvement is negligible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function call:

Compiling 1 file (.ex)
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
Number of Available Cores: 4
Available memory: 8 GB
Elixir 1.8.0
Erlang 21.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: Large | large precision, Medium | large precision, Trivial | large precision
Estimated total run time: 42 s


Benchmarking built in with input Large | large precision...
Benchmarking built in with input Medium | large precision...
Benchmarking built in with input Trivial | large precision...
Benchmarking built in fix with input Large | large precision...
Benchmarking built in fix with input Medium | large precision...
Benchmarking built in fix with input Trivial | large precision...

##### With input Large | large precision #####
Name                   ips        average  deviation         median         99th %
built in            172.81        5.79 ms     ±9.42%        5.61 ms        8.27 ms
built in fix        170.80        5.85 ms    ±19.30%        5.41 ms       10.51 ms

Comparison:
built in            172.81
built in fix        170.80 - 1.01x slower

##### With input Medium | large precision #####
Name                   ips        average  deviation         median         99th %
built in fix        154.15        6.49 ms    ±15.14%        6.13 ms       10.06 ms
built in            151.96        6.58 ms    ±26.71%        6.14 ms       13.16 ms

Comparison:
built in fix        154.15
built in            151.96 - 1.01x slower

##### With input Trivial | large precision #####
Name                   ips        average  deviation         median         99th %
built in fix        160.36        6.24 ms    ±21.60%        5.80 ms       11.78 ms
built in            156.33        6.40 ms    ±18.28%        6.03 ms       10.99 ms

Comparison:
built in fix        160.36
built in            156.33 - 1.03x slower

Inlined:

Compiling 1 file (.ex)
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
Number of Available Cores: 4
Available memory: 8 GB
Elixir 1.8.0
Erlang 21.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: Large | large precision, Medium | large precision, Trivial | large precision
Estimated total run time: 42 s


Benchmarking built in with input Large | large precision...
Benchmarking built in with input Medium | large precision...
Benchmarking built in with input Trivial | large precision...
Benchmarking built in fix with input Large | large precision...
Benchmarking built in fix with input Medium | large precision...
Benchmarking built in fix with input Trivial | large precision...

##### With input Large | large precision #####
Name                   ips        average  deviation         median         99th %
built in fix        186.26        5.37 ms    ±29.93%        4.78 ms       11.67 ms
built in            170.14        5.88 ms    ±11.70%        5.62 ms        9.29 ms

Comparison:
built in fix        186.26
built in            170.14 - 1.09x slower

##### With input Medium | large precision #####
Name                   ips        average  deviation         median         99th %
built in fix        175.94        5.68 ms    ±28.99%        5.18 ms       11.74 ms
built in            159.96        6.25 ms    ±10.12%        6.00 ms        9.41 ms

Comparison:
built in fix        175.94
built in            159.96 - 1.10x slower

##### With input Trivial | large precision #####
Name                   ips        average  deviation         median         99th %
built in fix        201.76        4.96 ms    ±12.30%        4.76 ms        7.44 ms
built in            168.05        5.95 ms    ±13.54%        5.65 ms        9.01 ms

Comparison:
built in fix        201.76
built in            168.05 - 1.20x slower

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I inlined it after your comment about performance. Inlined sign as well.

@josevalim josevalim force-pushed the jv-subnormal-float-round branch from 6397c88 to 2104890 Compare January 24, 2019 09:12
@hauleth
Copy link
Contributor

hauleth commented Jan 24, 2019

@josevalim about performance I am also think that we could check if the number is less than 1.0e-18 (it will always be 0.0 in such case) or greater than 1.0e18 (it will always be identity function). And plain comparison should be faster than bit mangling.

@josevalim josevalim merged commit 0ccf798 into master Jan 25, 2019
@josevalim
Copy link
Member Author

❤️ 💚 💙 💛 💜

@josevalim josevalim deleted the jv-subnormal-float-round branch January 25, 2019 07:39
josevalim added a commit that referenced this pull request Jan 25, 2019
Closes #8685

Signed-off-by: José Valim <jose.valim@plataformatec.com.br>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants