Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cppy use update and c++11 compatibility #55

Merged
merged 16 commits into from Dec 11, 2019
Merged

Cppy use update and c++11 compatibility #55

merged 16 commits into from Dec 11, 2019

Conversation

MatthieuDartiailh
Copy link
Member

If @Qix- or @PDeveloper you can benchmark with the new AssocVector that I put in it would be great.

@codecov-io
Copy link

codecov-io commented Oct 26, 2018

Codecov Report

Merging #55 into master will decrease coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #55      +/-   ##
==========================================
- Coverage    97.1%   96.99%   -0.11%     
==========================================
  Files          25       24       -1     
  Lines        1345     1331      -14     
==========================================
- Hits         1306     1291      -15     
- Misses         39       40       +1
Impacted Files Coverage Δ
kiwi/AssocVector.h 97.43% <ø> (-0.07%) ⬇️
kiwi/variable.h 100% <ø> (ø) ⬆️
kiwi/row.h 100% <ø> (ø) ⬆️
py/variable.cpp 100% <100%> (ø) ⬆️
kiwi/solverimpl.h 92.56% <100%> (-0.08%) ⬇️
py/symbolics.h 100% <100%> (ø) ⬆️
py/types.h 100% <100%> (ø) ⬆️
py/strength.cpp 100% <100%> (ø) ⬆️
py/kiwisolver.cpp 100% <100%> (ø) ⬆️
py/util.h 100% <100%> (ø) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4cda3c7...d475749. Read the comment docs.

@hal0
Copy link

hal0 commented Nov 29, 2018

I took the liberty of benchmarking the new AssocVector implementation a bit.

(#54) . . . which claims to be a faster implementation and is c++11 compatible.

Unfortunately, it appears to be much slower than Loki's version.

Further, the use of std::map and std::unordered_map appear to be much faster than either implementation - std::map being the clear winner here, except for the case of iteration, where Loki seems to be only ~26%-33% faster than std::map and pretty much identical to std::unordered_map when speed optimizations are turned on.

Why are we re-inventing the wheel here?


Results (no optimizations):

$ g++ -o bench-av benchmark.cc -Ibenchmark/include -I. ./benchmark/src/libbenchmark.a -pthread && ./bench-av

2018-11-29 15:30:59
Running ./bench-av
Run on (4 X 3096 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x4)
Load Average: 0.14, 0.11, 0.09
-------------------------------------------------------------
Benchmark                      Time           CPU Iterations
-------------------------------------------------------------
Empty_True<New_AV>             9 ns          9 ns   78118541
Empty_True<Loki_AV>           23 ns         23 ns   30255659
Empty_True<Map>                5 ns          5 ns  123856346
Empty_True<Un_Map>             8 ns          8 ns   91611431
Empty_False<New_AV>            9 ns          9 ns   81322823
Empty_False<Loki_AV>          23 ns         23 ns   29199018
Empty_False<Map>               5 ns          5 ns  148670595
Empty_False<Un_Map>            8 ns          8 ns   88430182
Construct<New_AV>            561 ns        554 ns    1251063
Construct<Loki_AV>           500 ns        502 ns    1390762
Construct<Map>               486 ns        489 ns    1324887
Construct<Un_Map>            501 ns        501 ns    1372624
Destruct<New_AV>             557 ns        571 ns    1232776
Destruct<Loki_AV>            501 ns        504 ns    1372615
Destruct<Map>                474 ns        477 ns    1441903
Destruct<Un_Map>             491 ns        493 ns    1403424
Insert_Fresh<New_AV>        9360 ns       9360 ns      76437
Insert_Fresh<Loki_AV>       3313 ns       3313 ns     210077
Insert_Fresh<Map>           2012 ns       2012 ns     346983
Insert_Fresh<Un_Map>        2754 ns       2754 ns     253397
Insert_Stale<New_AV>        4659 ns       4659 ns     152283
Insert_Stale<Loki_AV>       2136 ns       2136 ns     320373
Insert_Stale<Map>           1349 ns       1349 ns     491941
Insert_Stale<Un_Map>        1727 ns       1725 ns     393902
Read<New_AV>                4417 ns       4417 ns     155646
Read<Loki_AV>               1903 ns       1903 ns     353953
Read<Map>                   1110 ns       1110 ns     622229
Read<Un_Map>                1566 ns       1566 ns     429544
Iterate<New_AV>             3875 ns       3875 ns     185505
Iterate<Loki_AV>              59 ns         59 ns   11486152
Iterate<Map>                  81 ns         81 ns    8017519
Iterate<Un_Map>              119 ns        119 ns    5955354

Results with -O3 (optimized for speed):

$ g++ -O3 -o bench-av benchmark.cc -Ibenchmark/include -I. ./benchmark/src/libbenchmark.a -pthread && ./bench-av

2018-11-29 15:33:50
Running ./bench-av
Run on (4 X 3096 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x4)
Load Average: 0.13, 0.14, 0.10
-------------------------------------------------------------
Benchmark                      Time           CPU Iterations
-------------------------------------------------------------
Empty_True<New_AV>             1 ns          1 ns 1000000000
Empty_True<Loki_AV>            1 ns          1 ns 1000000000
Empty_True<Map>                1 ns          1 ns 1000000000
Empty_True<Un_Map>             0 ns          0 ns 1000000000
Empty_False<New_AV>            1 ns          1 ns 1000000000
Empty_False<Loki_AV>           1 ns          1 ns 1000000000
Empty_False<Map>               1 ns          1 ns 1000000000
Empty_False<Un_Map>            0 ns          0 ns 1000000000
Construct<New_AV>            464 ns        466 ns    1523639
Construct<Loki_AV>           461 ns        464 ns    1557102
Construct<Map>               456 ns        457 ns    1541968
Construct<Un_Map>            463 ns        466 ns    1480297
Destruct<New_AV>             475 ns        481 ns    1370097
Destruct<Loki_AV>            462 ns        466 ns    1389995
Destruct<Map>                462 ns        464 ns    1502111
Destruct<Un_Map>             480 ns        481 ns    1302722
Insert_Fresh<New_AV>        4102 ns       4101 ns     171157
Insert_Fresh<Loki_AV>       3771 ns       3771 ns     184118
Insert_Fresh<Map>           2559 ns       2558 ns     269747
Insert_Fresh<Un_Map>        3324 ns       3323 ns     209317
Insert_Stale<New_AV>        3384 ns       3384 ns     204126
Insert_Stale<Loki_AV>       3277 ns       3277 ns     209633
Insert_Stale<Map>           2778 ns       2778 ns     245423
Insert_Stale<Un_Map>        3350 ns       3349 ns     204648
Read<New_AV>                3281 ns       3280 ns     216964
Read<Loki_AV>               3057 ns       3057 ns     222605
Read<Map>                   2647 ns       2647 ns     266062
Read<Un_Map>                3281 ns       3281 ns     216900
Iterate<New_AV>              459 ns        459 ns    1319530
Iterate<Loki_AV>               5 ns          5 ns  154900640
Iterate<Map>                  15 ns         15 ns   44048397
Iterate<Un_Map>                3 ns          3 ns  217174530

Source:

#include <iostream>
#include <string>
#include <map>
#include <unordered_map>

#include "benchmark/benchmark.h"

#include "./kiwi/AssocVector.h"
#include "./kiwi/AssocVector-Loki.h"

using namespace std;

typedef AssocVector<string, string> New_AV;
typedef Loki::AssocVector<string, string> Loki_AV;
typedef map<string, string> Map;
typedef unordered_map<string, string> Un_Map;

#define BENCHMARK_KIWI(name) BENCHMARK_TEMPLATE(name, New_AV); BENCHMARK_TEMPLATE(name, Loki_AV); BENCHMARK_TEMPLATE(name, Map); BENCHMARK_TEMPLATE(name, Un_Map)
#define BENCHMARK_KIWI_PROTO(name) template <typename AV> static void name(benchmark::State &state)
#define BENCH(name) BENCHMARK_KIWI_PROTO(name); BENCHMARK_KIWI(name); BENCHMARK_KIWI_PROTO(name)

static const char *large_text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum nisl eros, fermentum at tristique a, convallis in sapien. Mauris ante augue, varius eu nisi eu, mattis cursus sapien. Donec at felis rhoncus, lacinia mi vel, sodales quam. Sed ac risus quis nisi ultricies placerat at a libero. In vel leo leo. Sed in dapibus justo. Sed libero risus, facilisis nec ante quis, euismod luctus ipsum. Etiam tincidunt leo nulla, vitae tincidunt orci tempor et. Pellentesque et diam non diam tincidunt suscipit. Aenean blandit bibendum convallis.\n\nVestibulum nulla urna, dapibus sit amet velit quis, pulvinar porta mi. Interdum et malesuada fames ac ante ipsum primis in faucibus. Proin magna justo, sagittis et facilisis quis, fringilla vitae velit. Nunc fermentum quam ligula, vel semper leo lacinia sodales. Duis quis enim at eros finibus auctor. Maecenas auctor, quam non vehicula ultricies, mauris dui scelerisque justo, nec sollicitudin magna erat sed erat. Nam dignissim nisi est, quis faucibus purus tincidunt consectetur. Nulla purus neque, semper a ligula ac, facilisis iaculis metus. Sed ut velit quis justo laoreet accumsan in et diam.\n\nCurabitur rutrum, mauris ut blandit lobortis, diam purus vehicula ex, in pharetra mi lectus interdum ipsum. Phasellus suscipit arcu in interdum venenatis. Nullam molestie augue et lorem semper euismod. Sed condimentum porta turpis, aliquam varius dolor sollicitudin sed. Cras vel dolor id nisi accumsan maximus eu vestibulum erat. Duis id luctus augue. Cras condimentum eu nibh eu placerat. Duis nec vehicula velit, nec pharetra ante. Donec ac dictum ex, nec vestibulum massa. Duis venenatis ex ut massa faucibus, vitae placerat nibh pretium. Etiam consequat placerat nunc ut imperdiet. Vestibulum vel turpis nec velit sagittis placerat a ac sapien. Nullam viverra risus et tempus viverra. Fusce ligula enim, hendrerit ut tortor vitae, tincidunt lobortis turpis. Proin in facilisis nibh.\n\nMorbi sit amet felis aliquam, vestibulum dui sed, tempor nunc. Integer vehicula odio nec velit scelerisque, et dignissim risus tincidunt. Nullam tincidunt molestie fringilla. Aliquam euismod massa vitae faucibus rhoncus. Mauris id odio pulvinar, rutrum neque sed, porttitor nisi. Maecenas convallis iaculis metus nec iaculis. Curabitur in tortor ut mauris posuere pharetra eget sit amet sapien. In leo quam, maximus eu luctus vitae, elementum et ligula.\n\nOrci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Integer vehicula malesuada eros, ac suscipit arcu tincidunt ac. Donec et augue nisi. Proin sit amet efficitur libero, quis accumsan mauris. Nam scelerisque tristique libero, vitae convallis orci. Ut nisl magna, finibus vel porttitor ut, laoreet nec tortor. Cras nibh velit, iaculis nec molestie ac, pulvinar in nisi. Maecenas luctus dignissim lectus ut sodales. Nullam id posuere tellus, non imperdiet libero. Integer tristique euismod ultrices. Mauris enim sem, varius id varius nec, consequat quis lectus. Etiam ut risus auctor, sollicitudin leo et, finibus dui. Vivamus luctus erat non erat lobortis luctus.";

BENCH(Empty_True) {
	AV av;

	for (auto _ : state) {
		benchmark::DoNotOptimize(av.empty());
	}
}

BENCH(Empty_False) {
	AV av;

	for (auto _ : state) {
		benchmark::DoNotOptimize(av.empty());
	}
}

BENCH(Construct) {
	for (auto _ : state) {
		AV *av = new AV();
		state.PauseTiming();
		delete av;
		state.ResumeTiming();
	}
}

BENCH(Destruct) {
	for (auto _ : state) {
		state.PauseTiming();
		AV *av = new AV();
		state.ResumeTiming();
		delete av;
	}
}

BENCH(Insert_Fresh) {
	for (auto _ : state) {
		AV av;
		benchmark::DoNotOptimize(av["foo"] = "bar");
		benchmark::DoNotOptimize(av["hello"] = "world");
		benchmark::DoNotOptimize(av["baz"] = large_text);
		benchmark::DoNotOptimize(av[large_text] = "buzz");
	}
}

BENCH(Insert_Stale) {
	AV av;
	av["foo"] = "bar";
	av["hello"] = "world";
	av["baz"] = large_text;
	av[large_text] = "buzz";

	for (auto _ : state) {
		benchmark::DoNotOptimize(av["foo"] = "bar");
		benchmark::DoNotOptimize(av["hello"] = "world");
		benchmark::DoNotOptimize(av["baz"] = large_text);
		benchmark::DoNotOptimize(av[large_text] = "buzz");
	}
}

BENCH(Read) {
	AV av;
	av["foo"] = "bar";
	av["hello"] = "world";
	av["baz"] = large_text;
	av[large_text] = "buzz";

	for (auto _ : state) {
		benchmark::DoNotOptimize(av["foo"]);
		benchmark::DoNotOptimize(av["hello"]);
		benchmark::DoNotOptimize(av["baz"]);
		benchmark::DoNotOptimize(av[large_text]);
	}
}

BENCH(Iterate) {
	AV av;
	av["foo"] = "bar";
	av["hello"] = "world";
	av["baz"] = large_text;
	av[large_text] = "buzz";

	for (auto _ : state) {
		for (const auto &elem : av) {
			benchmark::DoNotOptimize(elem);
		}
	}
}

BENCHMARK_MAIN();

@MatthieuDartiailh
Copy link
Member Author

Thanks for the input @hal0. Could you run a similar benchmark with the types actually used in kiwi (cf maptype.h and solverimpl.h) ?

@hal0
Copy link

hal0 commented Nov 30, 2018

I adapted the benchmarks for <Variable, Symbol>, though I had to drop the std::unordered_map tests as they require some extra work but their results weren't interesting enough to warrant it.

I can also test with some other types, too - but the results are similar to <std::string, std::string>.

I also added ::find() tests, both with the case of when the element exists and when it does not.

With the Kiwi types, std::map is still a strong contender, with the only noticeable difference being in iteration times - of which Loki is the clear winner.

In no case is the new AssocVector type faster - it is very much slower than either Loki's implementation or std::map even with Kiwi types.


Results (optimized with -O3):

$ g++ -O3 -o bench-av -std=c++11 benchmark.cc -Ibenchmark/include -I. ./benchmark/src/libbenchmark.a -pthread && ./bench-av

2018-11-29 16:23:39
Running ./bench-av
Run on (4 X 3096 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x4)
Load Average: 0.12, 0.19, 0.18
-------------------------------------------------------------
Benchmark                      Time           CPU Iterations
-------------------------------------------------------------
Empty_True<New_AV>             0 ns          0 ns 1000000000
Empty_True<Loki_AV>            0 ns          0 ns 1000000000
Empty_True<Map>                0 ns          0 ns 1000000000
Empty_False<New_AV>            0 ns          0 ns 1000000000
Empty_False<Loki_AV>           1 ns          1 ns 1000000000
Empty_False<Map>               0 ns          0 ns 1000000000
Construct<New_AV>            473 ns        475 ns    1483565
Construct<Loki_AV>           465 ns        469 ns    1471332
Construct<Map>               483 ns        484 ns    1443997
Destruct<New_AV>             485 ns        487 ns    1442461
Destruct<Loki_AV>            474 ns        476 ns    1457509
Destruct<Map>                473 ns        475 ns    1453059
Insert_Fresh<New_AV>         300 ns        300 ns    2378047
Insert_Fresh<Loki_AV>         81 ns         81 ns    8085511
Insert_Fresh<Map>             85 ns         85 ns    9065604
Insert_Stale<New_AV>          83 ns         83 ns    8627377
Insert_Stale<Loki_AV>         12 ns         12 ns   52659881
Insert_Stale<Map>              8 ns          8 ns   78199537
Read<New_AV>                  82 ns         82 ns    8234315
Read<Loki_AV>                 10 ns         10 ns   70679610
Read<Map>                      9 ns          9 ns   73422088
Iterate<New_AV>               70 ns         70 ns    9867960
Iterate<Loki_AV>               2 ns          2 ns  349865190
Iterate<Map>                  10 ns         10 ns   63069564
Find_Fail<New_AV>             21 ns         21 ns   35106236
Find_Fail<Loki_AV>             1 ns          1 ns  525654627
Find_Fail<Map>                 1 ns          1 ns 1000000000
Find_Succeed<New_AV>          29 ns         29 ns   23843718
Find_Succeed<Loki_AV>          2 ns          2 ns  344261036
Find_Succeed<Map>              2 ns          2 ns  279747530

Source:

#include <iostream>
#include <string>
#include <map>
#include <memory>
#include <utility>

#include "benchmark/benchmark.h"

#include "./kiwi/AssocVector.h"
#include "./kiwi/AssocVector-Loki.h"
#include "./kiwi/symbol.h"
#include "./kiwi/variable.h"

using namespace std;

using K = kiwi::Variable;
using V = kiwi::impl::Symbol;

typedef AssocVector<K, V> New_AV;
typedef Loki::AssocVector<K, V> Loki_AV;
typedef map<K, V> Map;

#define BENCHMARK_KIWI(name) BENCHMARK_TEMPLATE(name, New_AV); BENCHMARK_TEMPLATE(name, Loki_AV); BENCHMARK_TEMPLATE(name, Map)
#define BENCHMARK_KIWI_PROTO(name) template <typename AV> static void name(benchmark::State &state)
#define BENCH(name) BENCHMARK_KIWI_PROTO(name); BENCHMARK_KIWI(name); BENCHMARK_KIWI_PROTO(name)

BENCH(Empty_True) {
	AV av;

	for (auto _ : state) {
		benchmark::DoNotOptimize(av.empty());
	}
}

BENCH(Empty_False) {
	AV av;

	for (auto _ : state) {
		benchmark::DoNotOptimize(av.empty());
	}
}

BENCH(Construct) {
	for (auto _ : state) {
		AV *av = new AV();
		state.PauseTiming();
		delete av;
		state.ResumeTiming();
	}
}

BENCH(Destruct) {
	for (auto _ : state) {
		state.PauseTiming();
		AV *av = new AV();
		state.ResumeTiming();
		delete av;
	}
}

BENCH(Insert_Fresh) {
	kiwi::Variable foo("foo");
	kiwi::Variable hello("hello");
	kiwi::Variable baz("baz");

	kiwi::impl::Symbol symbol(kiwi::impl::Symbol::External, 0);

	for (auto _ : state) {
		AV av;
		benchmark::DoNotOptimize(av[foo] = symbol);
		benchmark::DoNotOptimize(av[hello] = symbol);
		benchmark::DoNotOptimize(av[baz] = symbol);
	}
}

BENCH(Insert_Stale) {
	AV av;

	kiwi::Variable foo("foo");
	kiwi::Variable hello("hello");
	kiwi::Variable baz("baz");

	kiwi::impl::Symbol symbol(kiwi::impl::Symbol::External, 0);

	av[foo] = symbol;
	av[hello] = symbol;
	av[baz] = symbol;

	for (auto _ : state) {
		benchmark::DoNotOptimize(av[foo] = symbol);
		benchmark::DoNotOptimize(av[hello] = symbol);
		benchmark::DoNotOptimize(av[baz] = symbol);
	}
}

BENCH(Read) {
	AV av;

	kiwi::Variable foo("foo");
	kiwi::Variable hello("hello");
	kiwi::Variable baz("baz");

	kiwi::impl::Symbol symbol(kiwi::impl::Symbol::External, 0);

	av[foo] = symbol;
	av[hello] = symbol;
	av[baz] = symbol;

	for (auto _ : state) {
		benchmark::DoNotOptimize(av[foo]);
		benchmark::DoNotOptimize(av[hello]);
		benchmark::DoNotOptimize(av[baz]);
	}
}

BENCH(Iterate) {
	AV av;

	kiwi::Variable foo("foo");
	kiwi::Variable hello("hello");
	kiwi::Variable baz("baz");

	kiwi::impl::Symbol symbol(kiwi::impl::Symbol::External, 0);

	av[foo] = symbol;
	av[hello] = symbol;
	av[baz] = symbol;

	for (auto _ : state) {
		for (const auto &elem : av) {
			benchmark::DoNotOptimize(elem);
		}
	}
}

BENCH(Find_Fail) {
	AV av;

	kiwi::Variable foo("foo");

	for (auto _ : state) {
		benchmark::DoNotOptimize(av.find(foo) == av.end());
	}
}

BENCH(Find_Succeed) {
	AV av;

	kiwi::Variable foo("foo");
	kiwi::impl::Symbol symbol(kiwi::impl::Symbol::External, 0);

	av[foo] = symbol;

	for (auto _ : state) {
		benchmark::DoNotOptimize(av.find(foo) == av.end());
	}
}

BENCHMARK_MAIN();

@hal0
Copy link

hal0 commented Nov 30, 2018

Also, it should be mentioned that the new AssocVector implementation claims to have better ::insert() and ::erase() complexity, not necessarily speed.

I would imagine iteration and lookup times are of much higher importance to Kiwi, thus making this improvement arguably less critical - especially since by all other metrics, the new AssocVector is abysmally slower.

@MatthieuDartiailh
Copy link
Member Author

@sccolbert any insight on what operations matter the most here, and your opinion would be greatly appreciated.

@MatthieuDartiailh
Copy link
Member Author

Benchmarks (on a use case derived from enaml) shows that Loki::AssocVector provides a rough 2x speedup over std::map. Given the few modifications required to get Loki::AssocVector c++11 compliant conserving it is more than worth it. This is basically ready to go once @sccolbert validates the fix for the one known bug fixed in #56. Given @hal0 benchmarks I did not go through benchmarking unordered_map and the new AssocVector.

@MatthieuDartiailh
Copy link
Member Author

@sccolbert this is ready to merge I believe. Do you want a look first ?

@Qix-
Copy link
Contributor

Qix- commented Jul 18, 2019

So wait, this is still using the Loki::AssocVector class? Didn't @hal0 show that it was slower? Or is iteration speed the critical path?

@MatthieuDartiailh
Copy link
Member Author

Iteration speed is the critical path. Loki makes updating the solver state about 2x faster on a typical application. You can look at the included benchmark if you want.

@Qix-
Copy link
Contributor

Qix- commented Jul 18, 2019

Ah okay, that's good then :)

@MatthieuDartiailh MatthieuDartiailh merged commit 4858730 into master Dec 11, 2019
opacam added a commit to opacam/python-for-android that referenced this pull request Dec 29, 2019
So we can get an apk via gh-actions

Notes:
  - this way the reviewers can see how the `on device unit test app`
    behaves when some test generates an image or give us an error
  - these changes will be reverted in the next commit
  - we must pin kiwisolver recipe to a commit, or the matplotlib recipe
    it won't pass the build stage, due to a recent changes in kiwisolver
    repo (nucleic/kiwi#55)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants