New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cppy use update and c++11 compatibility #55
Conversation
Codecov Report
@@ Coverage Diff @@
## master #55 +/- ##
==========================================
- Coverage 97.1% 96.99% -0.11%
==========================================
Files 25 24 -1
Lines 1345 1331 -14
==========================================
- Hits 1306 1291 -15
- Misses 39 40 +1
Continue to review full report at Codecov.
|
I took the liberty of benchmarking the new
Unfortunately, it appears to be much slower than Loki's version. Further, the use of Why are we re-inventing the wheel here? Results (no optimizations): $ g++ -o bench-av benchmark.cc -Ibenchmark/include -I. ./benchmark/src/libbenchmark.a -pthread && ./bench-av
2018-11-29 15:30:59
Running ./bench-av
Run on (4 X 3096 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x4)
Load Average: 0.14, 0.11, 0.09
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
Empty_True<New_AV> 9 ns 9 ns 78118541
Empty_True<Loki_AV> 23 ns 23 ns 30255659
Empty_True<Map> 5 ns 5 ns 123856346
Empty_True<Un_Map> 8 ns 8 ns 91611431
Empty_False<New_AV> 9 ns 9 ns 81322823
Empty_False<Loki_AV> 23 ns 23 ns 29199018
Empty_False<Map> 5 ns 5 ns 148670595
Empty_False<Un_Map> 8 ns 8 ns 88430182
Construct<New_AV> 561 ns 554 ns 1251063
Construct<Loki_AV> 500 ns 502 ns 1390762
Construct<Map> 486 ns 489 ns 1324887
Construct<Un_Map> 501 ns 501 ns 1372624
Destruct<New_AV> 557 ns 571 ns 1232776
Destruct<Loki_AV> 501 ns 504 ns 1372615
Destruct<Map> 474 ns 477 ns 1441903
Destruct<Un_Map> 491 ns 493 ns 1403424
Insert_Fresh<New_AV> 9360 ns 9360 ns 76437
Insert_Fresh<Loki_AV> 3313 ns 3313 ns 210077
Insert_Fresh<Map> 2012 ns 2012 ns 346983
Insert_Fresh<Un_Map> 2754 ns 2754 ns 253397
Insert_Stale<New_AV> 4659 ns 4659 ns 152283
Insert_Stale<Loki_AV> 2136 ns 2136 ns 320373
Insert_Stale<Map> 1349 ns 1349 ns 491941
Insert_Stale<Un_Map> 1727 ns 1725 ns 393902
Read<New_AV> 4417 ns 4417 ns 155646
Read<Loki_AV> 1903 ns 1903 ns 353953
Read<Map> 1110 ns 1110 ns 622229
Read<Un_Map> 1566 ns 1566 ns 429544
Iterate<New_AV> 3875 ns 3875 ns 185505
Iterate<Loki_AV> 59 ns 59 ns 11486152
Iterate<Map> 81 ns 81 ns 8017519
Iterate<Un_Map> 119 ns 119 ns 5955354 Results with $ g++ -O3 -o bench-av benchmark.cc -Ibenchmark/include -I. ./benchmark/src/libbenchmark.a -pthread && ./bench-av
2018-11-29 15:33:50
Running ./bench-av
Run on (4 X 3096 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x4)
Load Average: 0.13, 0.14, 0.10
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
Empty_True<New_AV> 1 ns 1 ns 1000000000
Empty_True<Loki_AV> 1 ns 1 ns 1000000000
Empty_True<Map> 1 ns 1 ns 1000000000
Empty_True<Un_Map> 0 ns 0 ns 1000000000
Empty_False<New_AV> 1 ns 1 ns 1000000000
Empty_False<Loki_AV> 1 ns 1 ns 1000000000
Empty_False<Map> 1 ns 1 ns 1000000000
Empty_False<Un_Map> 0 ns 0 ns 1000000000
Construct<New_AV> 464 ns 466 ns 1523639
Construct<Loki_AV> 461 ns 464 ns 1557102
Construct<Map> 456 ns 457 ns 1541968
Construct<Un_Map> 463 ns 466 ns 1480297
Destruct<New_AV> 475 ns 481 ns 1370097
Destruct<Loki_AV> 462 ns 466 ns 1389995
Destruct<Map> 462 ns 464 ns 1502111
Destruct<Un_Map> 480 ns 481 ns 1302722
Insert_Fresh<New_AV> 4102 ns 4101 ns 171157
Insert_Fresh<Loki_AV> 3771 ns 3771 ns 184118
Insert_Fresh<Map> 2559 ns 2558 ns 269747
Insert_Fresh<Un_Map> 3324 ns 3323 ns 209317
Insert_Stale<New_AV> 3384 ns 3384 ns 204126
Insert_Stale<Loki_AV> 3277 ns 3277 ns 209633
Insert_Stale<Map> 2778 ns 2778 ns 245423
Insert_Stale<Un_Map> 3350 ns 3349 ns 204648
Read<New_AV> 3281 ns 3280 ns 216964
Read<Loki_AV> 3057 ns 3057 ns 222605
Read<Map> 2647 ns 2647 ns 266062
Read<Un_Map> 3281 ns 3281 ns 216900
Iterate<New_AV> 459 ns 459 ns 1319530
Iterate<Loki_AV> 5 ns 5 ns 154900640
Iterate<Map> 15 ns 15 ns 44048397
Iterate<Un_Map> 3 ns 3 ns 217174530 Source: #include <iostream>
#include <string>
#include <map>
#include <unordered_map>
#include "benchmark/benchmark.h"
#include "./kiwi/AssocVector.h"
#include "./kiwi/AssocVector-Loki.h"
using namespace std;
typedef AssocVector<string, string> New_AV;
typedef Loki::AssocVector<string, string> Loki_AV;
typedef map<string, string> Map;
typedef unordered_map<string, string> Un_Map;
#define BENCHMARK_KIWI(name) BENCHMARK_TEMPLATE(name, New_AV); BENCHMARK_TEMPLATE(name, Loki_AV); BENCHMARK_TEMPLATE(name, Map); BENCHMARK_TEMPLATE(name, Un_Map)
#define BENCHMARK_KIWI_PROTO(name) template <typename AV> static void name(benchmark::State &state)
#define BENCH(name) BENCHMARK_KIWI_PROTO(name); BENCHMARK_KIWI(name); BENCHMARK_KIWI_PROTO(name)
static const char *large_text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum nisl eros, fermentum at tristique a, convallis in sapien. Mauris ante augue, varius eu nisi eu, mattis cursus sapien. Donec at felis rhoncus, lacinia mi vel, sodales quam. Sed ac risus quis nisi ultricies placerat at a libero. In vel leo leo. Sed in dapibus justo. Sed libero risus, facilisis nec ante quis, euismod luctus ipsum. Etiam tincidunt leo nulla, vitae tincidunt orci tempor et. Pellentesque et diam non diam tincidunt suscipit. Aenean blandit bibendum convallis.\n\nVestibulum nulla urna, dapibus sit amet velit quis, pulvinar porta mi. Interdum et malesuada fames ac ante ipsum primis in faucibus. Proin magna justo, sagittis et facilisis quis, fringilla vitae velit. Nunc fermentum quam ligula, vel semper leo lacinia sodales. Duis quis enim at eros finibus auctor. Maecenas auctor, quam non vehicula ultricies, mauris dui scelerisque justo, nec sollicitudin magna erat sed erat. Nam dignissim nisi est, quis faucibus purus tincidunt consectetur. Nulla purus neque, semper a ligula ac, facilisis iaculis metus. Sed ut velit quis justo laoreet accumsan in et diam.\n\nCurabitur rutrum, mauris ut blandit lobortis, diam purus vehicula ex, in pharetra mi lectus interdum ipsum. Phasellus suscipit arcu in interdum venenatis. Nullam molestie augue et lorem semper euismod. Sed condimentum porta turpis, aliquam varius dolor sollicitudin sed. Cras vel dolor id nisi accumsan maximus eu vestibulum erat. Duis id luctus augue. Cras condimentum eu nibh eu placerat. Duis nec vehicula velit, nec pharetra ante. Donec ac dictum ex, nec vestibulum massa. Duis venenatis ex ut massa faucibus, vitae placerat nibh pretium. Etiam consequat placerat nunc ut imperdiet. Vestibulum vel turpis nec velit sagittis placerat a ac sapien. Nullam viverra risus et tempus viverra. Fusce ligula enim, hendrerit ut tortor vitae, tincidunt lobortis turpis. Proin in facilisis nibh.\n\nMorbi sit amet felis aliquam, vestibulum dui sed, tempor nunc. Integer vehicula odio nec velit scelerisque, et dignissim risus tincidunt. Nullam tincidunt molestie fringilla. Aliquam euismod massa vitae faucibus rhoncus. Mauris id odio pulvinar, rutrum neque sed, porttitor nisi. Maecenas convallis iaculis metus nec iaculis. Curabitur in tortor ut mauris posuere pharetra eget sit amet sapien. In leo quam, maximus eu luctus vitae, elementum et ligula.\n\nOrci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Integer vehicula malesuada eros, ac suscipit arcu tincidunt ac. Donec et augue nisi. Proin sit amet efficitur libero, quis accumsan mauris. Nam scelerisque tristique libero, vitae convallis orci. Ut nisl magna, finibus vel porttitor ut, laoreet nec tortor. Cras nibh velit, iaculis nec molestie ac, pulvinar in nisi. Maecenas luctus dignissim lectus ut sodales. Nullam id posuere tellus, non imperdiet libero. Integer tristique euismod ultrices. Mauris enim sem, varius id varius nec, consequat quis lectus. Etiam ut risus auctor, sollicitudin leo et, finibus dui. Vivamus luctus erat non erat lobortis luctus.";
BENCH(Empty_True) {
AV av;
for (auto _ : state) {
benchmark::DoNotOptimize(av.empty());
}
}
BENCH(Empty_False) {
AV av;
for (auto _ : state) {
benchmark::DoNotOptimize(av.empty());
}
}
BENCH(Construct) {
for (auto _ : state) {
AV *av = new AV();
state.PauseTiming();
delete av;
state.ResumeTiming();
}
}
BENCH(Destruct) {
for (auto _ : state) {
state.PauseTiming();
AV *av = new AV();
state.ResumeTiming();
delete av;
}
}
BENCH(Insert_Fresh) {
for (auto _ : state) {
AV av;
benchmark::DoNotOptimize(av["foo"] = "bar");
benchmark::DoNotOptimize(av["hello"] = "world");
benchmark::DoNotOptimize(av["baz"] = large_text);
benchmark::DoNotOptimize(av[large_text] = "buzz");
}
}
BENCH(Insert_Stale) {
AV av;
av["foo"] = "bar";
av["hello"] = "world";
av["baz"] = large_text;
av[large_text] = "buzz";
for (auto _ : state) {
benchmark::DoNotOptimize(av["foo"] = "bar");
benchmark::DoNotOptimize(av["hello"] = "world");
benchmark::DoNotOptimize(av["baz"] = large_text);
benchmark::DoNotOptimize(av[large_text] = "buzz");
}
}
BENCH(Read) {
AV av;
av["foo"] = "bar";
av["hello"] = "world";
av["baz"] = large_text;
av[large_text] = "buzz";
for (auto _ : state) {
benchmark::DoNotOptimize(av["foo"]);
benchmark::DoNotOptimize(av["hello"]);
benchmark::DoNotOptimize(av["baz"]);
benchmark::DoNotOptimize(av[large_text]);
}
}
BENCH(Iterate) {
AV av;
av["foo"] = "bar";
av["hello"] = "world";
av["baz"] = large_text;
av[large_text] = "buzz";
for (auto _ : state) {
for (const auto &elem : av) {
benchmark::DoNotOptimize(elem);
}
}
}
BENCHMARK_MAIN(); |
Thanks for the input @hal0. Could you run a similar benchmark with the types actually used in kiwi (cf maptype.h and solverimpl.h) ? |
I adapted the benchmarks for I can also test with some other types, too - but the results are similar to I also added With the Kiwi types, In no case is the new Results (optimized with $ g++ -O3 -o bench-av -std=c++11 benchmark.cc -Ibenchmark/include -I. ./benchmark/src/libbenchmark.a -pthread && ./bench-av
2018-11-29 16:23:39
Running ./bench-av
Run on (4 X 3096 MHz CPU s)
CPU Caches:
L1 Data 32K (x4)
L1 Instruction 32K (x4)
L2 Unified 256K (x4)
L3 Unified 8192K (x4)
Load Average: 0.12, 0.19, 0.18
-------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------
Empty_True<New_AV> 0 ns 0 ns 1000000000
Empty_True<Loki_AV> 0 ns 0 ns 1000000000
Empty_True<Map> 0 ns 0 ns 1000000000
Empty_False<New_AV> 0 ns 0 ns 1000000000
Empty_False<Loki_AV> 1 ns 1 ns 1000000000
Empty_False<Map> 0 ns 0 ns 1000000000
Construct<New_AV> 473 ns 475 ns 1483565
Construct<Loki_AV> 465 ns 469 ns 1471332
Construct<Map> 483 ns 484 ns 1443997
Destruct<New_AV> 485 ns 487 ns 1442461
Destruct<Loki_AV> 474 ns 476 ns 1457509
Destruct<Map> 473 ns 475 ns 1453059
Insert_Fresh<New_AV> 300 ns 300 ns 2378047
Insert_Fresh<Loki_AV> 81 ns 81 ns 8085511
Insert_Fresh<Map> 85 ns 85 ns 9065604
Insert_Stale<New_AV> 83 ns 83 ns 8627377
Insert_Stale<Loki_AV> 12 ns 12 ns 52659881
Insert_Stale<Map> 8 ns 8 ns 78199537
Read<New_AV> 82 ns 82 ns 8234315
Read<Loki_AV> 10 ns 10 ns 70679610
Read<Map> 9 ns 9 ns 73422088
Iterate<New_AV> 70 ns 70 ns 9867960
Iterate<Loki_AV> 2 ns 2 ns 349865190
Iterate<Map> 10 ns 10 ns 63069564
Find_Fail<New_AV> 21 ns 21 ns 35106236
Find_Fail<Loki_AV> 1 ns 1 ns 525654627
Find_Fail<Map> 1 ns 1 ns 1000000000
Find_Succeed<New_AV> 29 ns 29 ns 23843718
Find_Succeed<Loki_AV> 2 ns 2 ns 344261036
Find_Succeed<Map> 2 ns 2 ns 279747530 Source: #include <iostream>
#include <string>
#include <map>
#include <memory>
#include <utility>
#include "benchmark/benchmark.h"
#include "./kiwi/AssocVector.h"
#include "./kiwi/AssocVector-Loki.h"
#include "./kiwi/symbol.h"
#include "./kiwi/variable.h"
using namespace std;
using K = kiwi::Variable;
using V = kiwi::impl::Symbol;
typedef AssocVector<K, V> New_AV;
typedef Loki::AssocVector<K, V> Loki_AV;
typedef map<K, V> Map;
#define BENCHMARK_KIWI(name) BENCHMARK_TEMPLATE(name, New_AV); BENCHMARK_TEMPLATE(name, Loki_AV); BENCHMARK_TEMPLATE(name, Map)
#define BENCHMARK_KIWI_PROTO(name) template <typename AV> static void name(benchmark::State &state)
#define BENCH(name) BENCHMARK_KIWI_PROTO(name); BENCHMARK_KIWI(name); BENCHMARK_KIWI_PROTO(name)
BENCH(Empty_True) {
AV av;
for (auto _ : state) {
benchmark::DoNotOptimize(av.empty());
}
}
BENCH(Empty_False) {
AV av;
for (auto _ : state) {
benchmark::DoNotOptimize(av.empty());
}
}
BENCH(Construct) {
for (auto _ : state) {
AV *av = new AV();
state.PauseTiming();
delete av;
state.ResumeTiming();
}
}
BENCH(Destruct) {
for (auto _ : state) {
state.PauseTiming();
AV *av = new AV();
state.ResumeTiming();
delete av;
}
}
BENCH(Insert_Fresh) {
kiwi::Variable foo("foo");
kiwi::Variable hello("hello");
kiwi::Variable baz("baz");
kiwi::impl::Symbol symbol(kiwi::impl::Symbol::External, 0);
for (auto _ : state) {
AV av;
benchmark::DoNotOptimize(av[foo] = symbol);
benchmark::DoNotOptimize(av[hello] = symbol);
benchmark::DoNotOptimize(av[baz] = symbol);
}
}
BENCH(Insert_Stale) {
AV av;
kiwi::Variable foo("foo");
kiwi::Variable hello("hello");
kiwi::Variable baz("baz");
kiwi::impl::Symbol symbol(kiwi::impl::Symbol::External, 0);
av[foo] = symbol;
av[hello] = symbol;
av[baz] = symbol;
for (auto _ : state) {
benchmark::DoNotOptimize(av[foo] = symbol);
benchmark::DoNotOptimize(av[hello] = symbol);
benchmark::DoNotOptimize(av[baz] = symbol);
}
}
BENCH(Read) {
AV av;
kiwi::Variable foo("foo");
kiwi::Variable hello("hello");
kiwi::Variable baz("baz");
kiwi::impl::Symbol symbol(kiwi::impl::Symbol::External, 0);
av[foo] = symbol;
av[hello] = symbol;
av[baz] = symbol;
for (auto _ : state) {
benchmark::DoNotOptimize(av[foo]);
benchmark::DoNotOptimize(av[hello]);
benchmark::DoNotOptimize(av[baz]);
}
}
BENCH(Iterate) {
AV av;
kiwi::Variable foo("foo");
kiwi::Variable hello("hello");
kiwi::Variable baz("baz");
kiwi::impl::Symbol symbol(kiwi::impl::Symbol::External, 0);
av[foo] = symbol;
av[hello] = symbol;
av[baz] = symbol;
for (auto _ : state) {
for (const auto &elem : av) {
benchmark::DoNotOptimize(elem);
}
}
}
BENCH(Find_Fail) {
AV av;
kiwi::Variable foo("foo");
for (auto _ : state) {
benchmark::DoNotOptimize(av.find(foo) == av.end());
}
}
BENCH(Find_Succeed) {
AV av;
kiwi::Variable foo("foo");
kiwi::impl::Symbol symbol(kiwi::impl::Symbol::External, 0);
av[foo] = symbol;
for (auto _ : state) {
benchmark::DoNotOptimize(av.find(foo) == av.end());
}
}
BENCHMARK_MAIN(); |
Also, it should be mentioned that the new I would imagine iteration and lookup times are of much higher importance to Kiwi, thus making this improvement arguably less critical - especially since by all other metrics, the new |
@sccolbert any insight on what operations matter the most here, and your opinion would be greatly appreciated. |
Benchmarks (on a use case derived from enaml) shows that Loki::AssocVector provides a rough 2x speedup over std::map. Given the few modifications required to get Loki::AssocVector c++11 compliant conserving it is more than worth it. This is basically ready to go once @sccolbert validates the fix for the one known bug fixed in #56. Given @hal0 benchmarks I did not go through benchmarking unordered_map and the new AssocVector. |
@sccolbert this is ready to merge I believe. Do you want a look first ? |
So wait, this is still using the |
Iteration speed is the critical path. Loki makes updating the solver state about 2x faster on a typical application. You can look at the included benchmark if you want. |
Ah okay, that's good then :) |
So we can get an apk via gh-actions Notes: - this way the reviewers can see how the `on device unit test app` behaves when some test generates an image or give us an error - these changes will be reverted in the next commit - we must pin kiwisolver recipe to a commit, or the matplotlib recipe it won't pass the build stage, due to a recent changes in kiwisolver repo (nucleic/kiwi#55)
If @Qix- or @PDeveloper you can benchmark with the new AssocVector that I put in it would be great.