#9 Fix python benchmarks #12

vsbogd · 2018-05-24T12:27:59Z

Python benchmarks are implemented using Guile implementation as reference. Changes in Python bindings API are applied.

Link https://github.com/opencog/benchmark/pull/12/files?w=1 ignores spaces and makes code alignment is more relevant.

Python results:

$ ./atomspace/atomspace/atomspace_bm -A -c
OpenCog Atomspace Benchmark - Version 1.0.1

Random generator: MT19937
Random seed: 1527164403

Benchmarking Python's noop method 400000 times ..........
0.023060 seconds elapsed (17346170.39 per second)
Sum clock() time for all requests: 131 (0.000131 seconds, 3.05344e+09 requests per second)
------------------------------
Benchmarking Python's getType method 400000 times ..........
7.586225 seconds elapsed (52727.15 per second)
Sum clock() time for all requests: 7004312 (7.00431 seconds, 57107.7 requests per second)
------------------------------
Benchmarking Python's getTruthValue method 400000 times ..........
8.982693 seconds elapsed (44530.08 per second)
Sum clock() time for all requests: 8596201 (8.5962 seconds, 46532.2 requests per second)
------------------------------
Benchmarking Python's setTruthValue method 400000 times ..........
10.471440 seconds elapsed (38199.14 per second)
Sum clock() time for all requests: 9861954 (9.86195 seconds, 40559.9 requests per second)
------------------------------
Benchmarking Python's pointerCast method 400000 times ..........
0.190754 seconds elapsed (2096942.31 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------
Benchmarking Python's getIncomingSet method 400000 times ..........
8.424081 seconds elapsed (47482.92 per second)
Sum clock() time for all requests: 8034402 (8.0344 seconds, 49785.9 requests per second)
------------------------------
Benchmarking Python's getIncomingSetSize method 400000 times ..........
0.187442 seconds elapsed (2133992.72 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------
Benchmarking Python's getOutgoingSet method 400000 times ..........
7.629716 seconds elapsed (52426.59 per second)
Sum clock() time for all requests: 7233621 (7.23362 seconds, 55297.3 requests per second)
------------------------------
Benchmarking Python's addNode method 400000 times ..........
13.503772 seconds elapsed (29621.35 per second)
Sum clock() time for all requests: 13077400 (13.0774 seconds, 30587.1 requests per second)
------------------------------
Benchmarking Python's addLink method 400000 times ..........
22.010627 seconds elapsed (18173.04 per second)
Sum clock() time for all requests: 21142408 (21.1424 seconds, 18919.3 requests per second)
------------------------------
Benchmarking Python's removeAtom method 194000 times ...........
4.535461 seconds elapsed (42774.04 per second)
Sum clock() time for all requests: 3858507 (3.85851 seconds, 50278.5 requests per second)
------------------------------
Benchmarking Python's getHandlesByType method 400000 times ..........
2.239153 seconds elapsed (178638.98 per second)
Sum clock() time for all requests: 4474066000 (4474.07 seconds, 89.4041 requests per second)
------------------------------
Benchmarking Python's push_back method 400000 times ..........
0.000836 seconds elapsed (478528693.67 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------
Benchmarking Python's push_back_reserve method 400000 times ..........
0.000828 seconds elapsed (483214746.54 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------
Benchmarking Python's emplace_back method 400000 times ..........
0.000830 seconds elapsed (481965412.24 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------
Benchmarking Python's emplace_back_reserve method 400000 times ..........
0.000840 seconds elapsed (476219585.58 per  #second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------
Benchmarking Python's reserve method 400000 times ..........
0.000837 seconds elapsed (477847223.01 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------

C++ results:

$ ./atomspace/atomspace/atomspace_bm -A
OpenCog Atomspace Benchmark - Version 1.0.1

Random generator: MT19937
Random seed: 1527164611

Benchmarking AtomSpace's noop method 400000 times ..........
0.022912 seconds elapsed (17458081.17 per second)
Sum clock() time for all requests: 136 (0.000136 seconds, 2.94118e+09 requests per second)
------------------------------
Benchmarking AtomSpace's getType method 400000 times ..........
0.175667 seconds elapsed (2277034.91 per second)
Sum clock() time for all requests: 2931 (0.002931 seconds, 1.36472e+08 requests per second)
------------------------------
Benchmarking AtomSpace's getTruthValue method 400000 times ..........
0.236796 seconds elapsed (1689216.72 per second)
Sum clock() time for all requests: 59137 (0.059137 seconds, 6.76396e+06 requests per second)
------------------------------
Benchmarking AtomSpace's setTruthValue method 400000 times ..........
0.606672 seconds elapsed (659334.81 per second)
Sum clock() time for all requests: 401896 (0.401896 seconds, 995282 requests per second)
------------------------------
Benchmarking AtomSpace's pointerCast method 400000 times ..........
0.212977 seconds elapsed (1878137.67 per second)
Sum clock() time for all requests: 24430 (0.02443 seconds, 1.63733e+07 requests per second)
------------------------------
Benchmarking AtomSpace's getIncomingSet method 400000 times ..........
0.367629 seconds elapsed (1088053.29 per second)
Sum clock() time for all requests: 176175 (0.176175 seconds, 2.27047e+06 requests per second)
------------------------------
Benchmarking AtomSpace's getIncomingSetSize method 400000 times ..........
0.295474 seconds elapsed (1353756.77 per second)
Sum clock() time for all requests: 81679 (0.081679 seconds, 4.89722e+06 requests per second)
------------------------------
Benchmarking AtomSpace's getOutgoingSet method 400000 times ..........
0.184256 seconds elapsed (2170891.77 per second)
Sum clock() time for all requests: 4991 (0.004991 seconds, 8.01443e+07 requests per second)
------------------------------
Benchmarking AtomSpace's addNode method 400000 times ..........
2.152410 seconds elapsed (185838.22 per second)
Sum clock() time for all requests: 1866968 (1.86697 seconds, 214251 requests per second)
------------------------------
Benchmarking AtomSpace's addLink method 400000 times ..........
4.436075 seconds elapsed (90169.80 per second)
Sum clock() time for all requests: 3860118 (3.86012 seconds, 103624 requests per second)
------------------------------
Benchmarking AtomSpace's removeAtom method 194000 times ...........
0.729076 seconds elapsed (266090.26 per second)
Sum clock() time for all requests: 389898 (0.389898 seconds, 497566 requests per second)
------------------------------
Benchmarking AtomSpace's getHandlesByType method 400000 times ..........
1.210773 seconds elapsed (330367.46 per second)
Sum clock() time for all requests: 2257008000 (2257.01 seconds, 177.226 requests per second)
------------------------------
Benchmarking AtomSpace's push_back method 400000 times ..........
0.000837 seconds elapsed (477847223.01 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------
Benchmarking AtomSpace's push_back_reserve method 400000 times ..........
0.000843 seconds elapsed (474468778.28 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------
Benchmarking AtomSpace's emplace_back method 400000 times ..........
0.000864 seconds elapsed (462947461.37 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------
Benchmarking AtomSpace's emplace_back_reserve method 400000 times ..........
0.000894 seconds elapsed (447511763.14 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------
Benchmarking AtomSpace's reserve method 400000 times ..........
0.000886 seconds elapsed (451485898.82 per second)
Sum clock() time for all requests: 0 (0 seconds, inf requests per second)
------------------------------

linas · 2018-05-24T16:47:46Z

Looks perfect to me!

FWIW, your C++ performance figures are comparable to mine; I have a slower CPU than you but probably a much bigger cache. So, your performance results for "small things" that easily fit into cache are better than mine: getTV, pointerCast, getOutgoingSet. I get better figures when things don't fit in your cache, but do fit in mine: addNode addLink removeAtom (viz, when comparing to the numbers at the bottom of diary.txt)

(BTW, if you ever feel the urge to keep a diary, don't edit diary.txt but instead start a new one. I should probably put my name on top of the existing one, since the results are specific to cpu, version numbers, etc.)

linas · 2018-05-24T16:48:52Z

The python numbers are pretty much exactly as expected, and are presumably within a factor of 2x of what you're measuring for guile.

vsbogd · 2018-05-24T17:15:11Z

Great, thank you for analysis!

Yes I should keep my performance results to use them as baseline while changing benchmarks or code. I will add my dairy as separate file according your suggestion.

I though about dairy.txt and about a way to fill it automatically reflecting test system configuration. I think it would be great to have dairy format which is grepable by some machine id or system parameters or both and suitable for automatic performance changes analysis.

linas · 2018-05-24T17:26:59Z

The benchmark is extremely noisy -- if you run it five times in a row, you will get numbers that can differ by a lot. I do not really understand why it is so noisy. The diary.txt numbers are usually best-out-of-three or best-out-of-five.

Once upon a time I used to get better numbers if I also ran some other CPU-intensive process at the same time, rather than benchmarking on an idle system. I never fully understod that, either.

Thus, "fully automated" reporting is problematic .. first, the noise washes out a lot of small changes, and anyway - one has to understand why things changed. Certain necessary changes usually slow things down; other performance tunings usually speed things up.

I also see a HUGE variation of numbers that reflect cache-boundary, cache-contention, and cache-address-aliasing issues (and ditto for the TLB's) -- altering sizes, alignments, forcing aliasing can result in performance changes of 2x better or worse. This makes if very hard to get consistent performance measurements, and to consistently evaluate the result of code-changes. A lot of the mystery changes (numbers that change, but should not have) are almost surely due to these cache and TLB issues.

linas · 2018-05-24T17:46:14Z

Cache: for example, I'm playing with your new changes, right now. I made a "minor" change, that should affect only link insertion -- I reserved a size for std::vector because reserving works faster than just push_back.

atom remove performance should not change at all. Yet it jumped from an average of 350K (average of three runs) to 400K (average of three runs). The best of the old three runs was 351779 the worst of the new three runs is 393171 -- a huge change for something where the C++ code is unchanged. However, the code-alignment in the shared library changes -- suggesting that there is either an I-cache aliasing issue, or an I-TLB aliasing issue (viz, mine are 4-way set-associative so if the measurement loop accidentally causes 5 pages to alias to the same address, then there will always be one page eviction per loop! This kind of stuff has been driving me nuts; I see no practical way of controlling for it.)

vsbogd added 6 commits May 24, 2018 11:53

Align python benchmarks with scheme one.

395b39b

Remove unused random node generation

c14f7ca

Fix Python tests according latest opencog Pyton API

7e2aea9

Remove CYTHONX define

1e02695

Fix addLink Python benchmark

4e64744

Remove Nloops asserts where it is supported

0a0daaf

vsbogd mentioned this pull request May 24, 2018

Fix Python benchmarks #9

Open

linas merged commit 407ad2c into opencog:master May 24, 2018

vsbogd deleted the fix-python-benchmarks branch May 24, 2018 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#9 Fix python benchmarks #12

#9 Fix python benchmarks #12

vsbogd commented May 24, 2018 •

edited

Loading

linas commented May 24, 2018

linas commented May 24, 2018

vsbogd commented May 24, 2018

linas commented May 24, 2018

linas commented May 24, 2018 •

edited

Loading

#9 Fix python benchmarks #12

#9 Fix python benchmarks #12

Conversation

vsbogd commented May 24, 2018 • edited Loading

linas commented May 24, 2018

linas commented May 24, 2018

vsbogd commented May 24, 2018

linas commented May 24, 2018

linas commented May 24, 2018 • edited Loading

vsbogd commented May 24, 2018 •

edited

Loading

linas commented May 24, 2018 •

edited

Loading