-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#9 Fix python benchmarks #12
Conversation
Looks perfect to me! FWIW, your C++ performance figures are comparable to mine; I have a slower CPU than you but probably a much bigger cache. So, your performance results for "small things" that easily fit into cache are better than mine: getTV, pointerCast, getOutgoingSet. I get better figures when things don't fit in your cache, but do fit in mine: addNode addLink removeAtom (viz, when comparing to the numbers at the bottom of (BTW, if you ever feel the urge to keep a diary, don't edit |
The python numbers are pretty much exactly as expected, and are presumably within a factor of 2x of what you're measuring for guile. |
Great, thank you for analysis! Yes I should keep my performance results to use them as baseline while changing benchmarks or code. I will add my dairy as separate file according your suggestion. I though about |
The benchmark is extremely noisy -- if you run it five times in a row, you will get numbers that can differ by a lot. I do not really understand why it is so noisy. The Once upon a time I used to get better numbers if I also ran some other CPU-intensive process at the same time, rather than benchmarking on an idle system. I never fully understod that, either. Thus, "fully automated" reporting is problematic .. first, the noise washes out a lot of small changes, and anyway - one has to understand why things changed. Certain necessary changes usually slow things down; other performance tunings usually speed things up. I also see a HUGE variation of numbers that reflect cache-boundary, cache-contention, and cache-address-aliasing issues (and ditto for the TLB's) -- altering sizes, alignments, forcing aliasing can result in performance changes of 2x better or worse. This makes if very hard to get consistent performance measurements, and to consistently evaluate the result of code-changes. A lot of the mystery changes (numbers that change, but should not have) are almost surely due to these cache and TLB issues. |
Cache: for example, I'm playing with your new changes, right now. I made a "minor" change, that should affect only link insertion -- I reserved a size for std::vector because reserving works faster than just push_back. atom remove performance should not change at all. Yet it jumped from an average of 350K (average of three runs) to 400K (average of three runs). The best of the old three runs was 351779 the worst of the new three runs is 393171 -- a huge change for something where the C++ code is unchanged. However, the code-alignment in the shared library changes -- suggesting that there is either an I-cache aliasing issue, or an I-TLB aliasing issue (viz, mine are 4-way set-associative so if the measurement loop accidentally causes 5 pages to alias to the same address, then there will always be one page eviction per loop! This kind of stuff has been driving me nuts; I see no practical way of controlling for it.) |
Python benchmarks are implemented using Guile implementation as reference. Changes in Python bindings API are applied.
Link https://github.com/opencog/benchmark/pull/12/files?w=1 ignores spaces and makes code alignment is more relevant.
Python results:
C++ results: