improve performance for some orddict funs #91

vinoski · 2013-09-30T07:35:48Z

Improve the performance of some orddict funs by reimplementing them using
the lists module in a way that preserves backward compatibility.

The QuickCheck programs linked below were used to verify backward
compatibility:

Both tests, which were run on R16B02, require the original orddict module
to be renamed to olddict, and that code:unstick_mod/1 be applied to orddict
in order to allow it to be replaced with the revised orddict.

The first QuickCheck test first generates a list of pairs of terms, then
uses the list to create both an original and revised orddict, then randomly
chooses one of the revised orddict functions and applies it to the two
orddict instances, and finally verifies that the results of the operation
are the same for both instances. The second QuickCheck test is similar
except that it first creates an instance of the original and revised
orddicts and then folds over the same randomly-generated list of revised
orddict functions, applying each function to each orddict instance and
verifying that the results match.

The QuickCheck tests originally found one problem in the revised fetch and
find functions related to key comparison between integer and real keys,
which was easily fixed. Many tens of thousands of runs of each test
thereafter revealed no other errors. A number of intentional errors were
also injected into the revised orddict module to make sure the QuickCheck
tests would catch them, and they were caught in all cases. Code coverage
analysis was also applied to verify that the QuickCheck tests were covering
the revised functions; the only revised functions left uncovered were the
trivial clauses dealing with empty orddicts, specifically:

is_key(Key, [])
fetch(Key, [])
find(Key, [])
store(Key, New, [])

The revised orddict functions were also tested to assess performance
against the original orddict implementation. The test program used is
available here:

https://gist.github.com/vinoski/61772a052f3501e1e128

Since an orddict instance is implemented as a list, the test program
creates ordicts of length 1, 10, 100, and 1000 and uses them to assess
performance at each length. Performance was measured using timer:tc/3 to
time a number of iterations of various tests against the original orddict
and against the revised orddict. The following tests were applied:

fetch: Create an orddict of length L, where L is 1, 10, 100, or
1. Create a random integer key, which may or may not be present in the
  orddict. For each L, time 10000 attempts to fetch the random key and also
  fetch a key known to not be present in the orddict.
find: Create an orddict of length L, where L is 1, 10, 100, or
1. Create a random integer key, which may or may not be present in the
  orddict. For each L, time 10000 attempts to find the random key and also
  find a key known to not be present in the orddict.
from_list: Time 1000 attempts to create orddicts of lengths 1, 10, 100,
and 1000 from a list of random pairs with integer keys. For lengths
greater than 1, two different tests are performed: one passing a list of
pairs in sorted key order, and the other passing a list of pairs in
reverse sorted key order. Since orddicts are ordered, these orderings
effect worst-case and best-case behavior of the original
orddict:from_list/2 implementation respectively.
is_key: In orddicts of lengths 1, 10, 100, and 1000, time 10000 attempts
to first look for the largest key known to be in the orddict and also
look for another key known to be larger than any key present.
store: For lists of random pairs of lengths 1, 10, 100, and 1000, create
an empty orddict and time 1000 iterations of storing all pairs into the
orddict.

These tests were performed against R16B02 on a Macbook Pro with an Intel
Core i7 processor running at 2.7GHz and 16GB of RAM running OS X 10.8.5,
and on a Dell system with a 3.4GHz Intel Core i7 and 16GB of RAM running
Ubuntu Linux 12.04.

The tables below show results for OS X and Linux respectively. Each table
lists the name of each test followed by two numbers, each a time in
microseconds of the average of 10 runs of the test. The first number is the
result for the original orddict, the second for the revised orddict.

As the numbers for both platforms show, the revised fetch, find, from_list,
and is_key functions are always faster than the original versions, in some
cases quite a bit faster. Notably, for the from_list cases of length 10,
100, and 1000, the revised version of from_list is 1, 2, and 3 orders of
magnitude faster respectively. For the store function, the revised version
is slightly faster in all cases except for the length 100 case, where the
original store implementation is slightly faster than the revised version.

Results from OS X:

fetch length 1: 10.14 9.16
fetch length 10: 12.64 10.15
fetch length 100: 39.53 11.60
fetch length 1000: 205.85 18.18
find length 1: 1.31 0.73
find length 10: 4.99 0.88
find length 100: 42.08 2.62
find length 1000: 268.10 13.28
from_list length 1: 1.23 0.31
from_list length 10 ordered: 27.38 9.48
from_list length 10 reverse ordered: 12.24 10.38
from_list length 100 ordered: 1772.07 63.45
from_list length 100 reverse ordered: 87.95 55.87
from_list length 1000 ordered: 171497.21 680.12
from_list length 1000 reverse ordered: 931.64 617.90
is_key length 1: 0.91 0.54
is_key length 10: 3.30 1.36
is_key length 100: 28.34 3.11
is_key length 1000: 331.17 14.38
store length 1: 0.70 0.69
store length 10: 15.18 14.17
store length 100: 881.28 888.42
store length 1000: 62943.43 62227.62

Results from Linux:

fetch length 1: 9.23 8.45
fetch length 10: 10.30 8.75
fetch length 100: 18.57 10.72
fetch length 1000: 72.82 15.70
find length 1: 0.40 0.32
find length 10: 1.87 0.54
find length 100: 14.61 2.16
find length 1000: 97.42 11.11
from_list length 1: 0.47 0.12
from_list length 10 ordered: 11.72 3.78
from_list length 10 reverse ordered: 4.87 4.61
from_list length 100 ordered: 718.11 27.69
from_list length 100 reverse ordered: 34.04 27.25
from_list length 1000 ordered: 72121.96 339.10
from_list length 1000 reverse ordered: 466.38 343.22
is_key length 1: 0.27 0.21
is_key length 10: 1.24 0.61
is_key length 100: 10.68 2.09
is_key length 1000: 129.34 12.54
store length 1: 0.36 0.31
store length 10: 5.78 5.43
store length 100: 334.22 346.94
store length 1000: 25687.93 25441.16

nox · 2013-09-30T07:55:12Z

lib/stdlib/src/orddict.erl

+is_key(Key, [{K,_}]) when K == Key -> true;
+is_key(_Key, [{_,_}]) -> false;
+is_key(Key, Dict) ->
+    lists:keyfind(Key, 1, Dict) /= false.


Why not lists:keymember/3?

There should be a guard is_list(Dict) in the last clause to avoid failing with badarg instead of function_clause.

vinoski · 2013-09-30T11:24:19Z

@nox thanks for the thorough and constructive review! I'll create a new commit to address the issues you raised.

OTP-Maintainer · 2013-09-30T18:42:25Z

Patch has passed first testings and has been assigned to be reviewed

richcarl · 2013-10-02T12:53:16Z

lib/stdlib/src/orddict.erl

-fetch_keys([{Key,_}|Dict]) ->
-    [Key|fetch_keys(Dict)];
-fetch_keys([]) -> [].
+fetch_keys(Dict) ->


This change seems strange. lists:unzip/1 is neither implemented as a BIF nor does it get inlined, and it builds two lists instead of one. The main difference from the original fetch_keys is that it creates the lists in reverse and reverses them at the end. I didn't see any measurements for this function in your pull request. Is it really faster? In any case, it should produce quite a lot of unnecessary garbage cons cells: once for the reverse key list, once for the unused lists of values and once again for the reversed value list.

Thanks for the feedback, @richcarl. Looks like I forgot to include the fetch_keys results -- my apologies -- but yes, it's faster. But I also had the same concerns you mentioned, so to verify I put this version in to get the kind of feedback you've provided. I'll keep working on this.

OTP-Maintainer · 2013-10-03T19:21:19Z

Patch has passed first testings and has been assigned to be reviewed

OTP-Maintainer · 2013-10-04T19:23:09Z

Patch has passed first testings and has been assigned to be reviewed

Improve the performance of orddict:from_list/1 by reimplementing it using the lists module in a way that preserves backward compatibility. The QuickCheck programs linked below were used to verify backward compatibility: * https://gist.github.com/vinoski/3bd216efa421c581174a * https://gist.github.com/vinoski/c6db70e8dc725083843d Both tests, which were run on R16B03, require the original orddict module to be renamed to olddict, and that code:unstick_mod/1 be applied to orddict in order to allow it to be replaced with the revised orddict. The first QuickCheck test first generates a list of pairs of terms, then uses the list to create both an original and revised orddict using from_list/1, then verifies that the results of the operation are the same for both instances. The second QuickCheck test is similar except that it first creates an instance of the original and revised orddicts and then folds over a randomly-generated list of orddict functions, applying each function to each orddict instance and verifying that the results match. The revised orddict:from_list/1 function was also tested to assess performance against the original orddict implementation. The test program used is available here: * https://gist.github.com/vinoski/61772a052f3501e1e128 Since an orddict instance is implemented as a list, the test program creates ordicts of length 1, 10, 100, and 1000 and uses them to assess performance at each length. Performance was measured using timer:tc/3 to time a number of iterations of various tests against the original orddict and against the revised orddict. To test from_list/1, orddicts of lengths 1, 10, 100, and 1000 are created from a list of random pairs with integer keys. For lengths greater than 1, two different tests are performed: one passing a list of pairs in sorted key order, and the other passing a list of pairs in reverse sorted key order. Since orddicts are ordered, these orderings effect worst-case and best-case behavior of the original orddict:from_list/2 implementation respectively. These tests were performed against R16B02 on a Macbook Pro with an Intel Core i7 processor running at 2.7GHz and 16GB of RAM running OS X 10.8.5, and on a Dell system with a 3.4GHz Intel Core i7 and 16GB of RAM running Ubuntu Linux 12.04. The tables below show results for OS X and Linux respectively. Each table lists the name of each test followed by two numbers, each a time in microseconds of the average of 10 runs of the test. The first number is the result for the original orddict, the second for the revised orddict. As the numbers for both platforms show, the revised from_list/1 function is always faster than the original version, in some cases quite a bit faster. Results from OS X: ------------------ from_list length 1: 1.789 0.116 from_list length 10 ordered: 10.082 3.040 from_list length 10 reverse ordered: 4.853 3.604 from_list length 100 ordered: 397.213 20.134 from_list length 100 reverse ordered: 25.473 20.745 from_list length 1000 ordered: 37490.26 251.46 from_list length 1000 reverse ordered: 307.94 215.96 Results from Linux: ------------------- from_list length 1: 0.146 0.025 from_list length 10 ordered: 4.729 0.815 from_list length 10 reverse ordered: 1.687 0.956 from_list length 100 ordered: 144.467 5.896 from_list length 100 reverse ordered: 6.694 5.816 from_list length 1000 ordered: 13755.19 79.413 from_list length 1000 reverse ordered: 91.54 64.308

vinoski · 2013-12-19T22:04:36Z

I've amended this branch per the erlang-patch email from @bjorng here: http://erlang.org/pipermail/erlang-patches/2013-December/004507.html. With this change, only the orddict:from_list/1 function is modified, as requested.

proxyles · 2014-01-23T13:27:16Z

merged

Eliminate multiple memory leaks in the loader

nox reviewed Sep 30, 2013
View reviewed changes

richcarl reviewed Oct 2, 2013
View reviewed changes

proxyles closed this Jan 23, 2014

sverker pushed a commit to sverker/otp that referenced this pull request Aug 27, 2020

Merge pull request erlang#91 from erlang/bjorn/fix-loader

a380b0b

Eliminate multiple memory leaks in the loader

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve performance for some orddict funs #91

improve performance for some orddict funs #91

vinoski commented Sep 30, 2013

nox Sep 30, 2013

nox Sep 30, 2013

vinoski commented Sep 30, 2013

OTP-Maintainer commented Sep 30, 2013

richcarl Oct 2, 2013

vinoski Oct 2, 2013

OTP-Maintainer commented Oct 3, 2013

OTP-Maintainer commented Oct 4, 2013

vinoski commented Dec 19, 2013

proxyles commented Jan 23, 2014

improve performance for some orddict funs #91

improve performance for some orddict funs #91

Conversation

vinoski commented Sep 30, 2013

Results from OS X:

Results from Linux:

nox Sep 30, 2013

Choose a reason for hiding this comment

nox Sep 30, 2013

Choose a reason for hiding this comment

vinoski commented Sep 30, 2013

OTP-Maintainer commented Sep 30, 2013

richcarl Oct 2, 2013

Choose a reason for hiding this comment

vinoski Oct 2, 2013

Choose a reason for hiding this comment

OTP-Maintainer commented Oct 3, 2013

OTP-Maintainer commented Oct 4, 2013

vinoski commented Dec 19, 2013

proxyles commented Jan 23, 2014