Slower than Catch in realistic test cases #14

Closed
wichtounet opened this Issue May 30, 2016 · 19 comments

Projects

None yet

2 participants

@wichtounet
wichtounet commented May 30, 2016 edited

Hi,

I did some benchmarking of doctest on my machine (Linux, gcc-4.9.3). I adapted a bit your Python script to work on Linux (with_gcc = 1 and removed the forced CMake generator).

First, I can confirm that in the simple benchmark case, it is indeed faster to compile than Catch by a large margin. However, this is not a very realistic test case.

I have modified the benchmark to generate 50 files, each with 25 test cases and each with 10 CHECK (using a variable and not two constants). Here are the results:

Catch: 1 minute 56 seconds
Doctest: 2 minutes 32 seconds

In a realistic test case, doctest is significantly slower than Catch. I can see the same thing when I replace Catch by doctest in one of my large test case of my ETL library.

Do you have any idea why this happen ?

I can only guess that the overhead comes from the templates used to extract lhs and rhs or by the macros generating the test cases.

@wichtounet wichtounet changed the title from Slower than Catch for Multiple Test-Cases with Multiple REQUIREs to Slower than Catch in realistic test cases May 30, 2016
@onqtam
Owner
onqtam commented May 30, 2016 edited

It should be because of the macros and code bloat - currently the CHECK/REQUIRE macros generate more code than they should - this is a planned optimisation - and when I fix this - it should be on-par with Catch for heavy usage as well - but making it any faster is maybe impossible.

see the comparison in code generated by the macros - doctest versus catch

many doctest internals will be reimplemented - I just rushed this release and atleast the interface should be stable.

A step further for even faster compile times would be to introduce macros like CHECK_EQUALS(x, y) where there is no expression decomposition and less templates - a blog post on this matter regarding Catch can be read here (and actually you are the author :D).

@wichtounet

It should be because of the macros and code bloat - currently the CHECK/REQUIRE macros generate more code than they should - this is a planned optimisation - and when I fix this - it should be on-par with Catch for heavy usage as well - but making it any faster is maybe impossible.

You're probably right, a bit of encapsulation would save some compilation time if this block of code is not compiled for every assertion.

A step further for even faster compile times would be to introduce macros like CHECK_EQUALS(x, y) where there is no expression decomposition and less templates - a blog post on this matter regarding Catch can be read here (and actually you are the author :D).

This is me indeed :) I was very surprised to find such a large improvement.
CHECK_EQUALS and a few variants would probably be the best solution to save the maximum compilation time. That would also mean more variants to maintain on the library side.

Thanks for the time invested so far and congratulations for the very fast include times :)

@wichtounet

If you are interested, I have been able to reduce the compilation time even further: http://baptiste-wicht.com/posts/2016/06/reduce-compilation-time-by-another-16-with-catch.html

I have reduced the REQUIRE_EQUALS macro even further for another 16% reduction in compilation, this is a total of 27% reduction from the original compilation time of 764 seconds to 554 seconds.

@onqtam
Owner
onqtam commented Jun 1, 2016

I will probably supply such macros in a separate header as an extension - but before that I will optimize the current macros

@onqtam onqtam added a commit that referenced this issue Jun 7, 2016
@onqtam reworked the expression decomposition macros (and the rest) for faste…
…r compile times and less code bloat (not benchmarked yet) - relates #14
1c0562f
@onqtam
Owner
onqtam commented Jun 7, 2016 edited

100 files
25 test cases
10 assertions in each test case

variables and assertions like this:

int a0 = 5;
int b0 = 6;
CHECK(a0 == b0);
int a1 = 5;
int b1 = 6;
CHECK(a1 == b1);

MSVC Debug - 13 sec (old was 19 sec)
MSVC Release - 21.7 sec (old was 38 sec)

MinGW-w64 GCC 5.3.1 Debug - 47 sec (old was 100 sec)
MinGW-w64 GCC 5.3.1 Release - 150 sec (old was 330 sec)

Catch:

MSVC Debug - 62 sec
MSVC Release - 123 sec

So the dev version of doctest should now be quite faster than it's previous version (by around 40% for MSVC and 53% for MinGW-w64 gcc 5.3.1) - even though Catch is slower than both the old and the new version of doctest according to my benchmarks - but I suspect they still ain't real enough.

The single header is now assembled from it's 2 parts - doctest/parts/doctest_fwd.h and doctest/parts/doctest_impl.h - to get optimal compile time performance include the fwd version everywhere and the impl version only where the test runner should be instantiated. But for these benchmarks here I used the single header and not the parts (it would have maybe saved 1-2ms per source file so not much difference there...).

I'm going to make a header - doctest/parts/doctest_fast.h with fast versions of the macros just like your fast_catch.hpp header - for equality, less than, greater than, etc.

But I'm not sure if I should circumvent the Approx class like you have done - if I don't use that class I will have to include <cmath> for abs() and etc...

Also you don't handle the case if the expression throws - so in your version of the macros you have cut away more than just the expression decomposition - I think I will save the try/catch blocks though.

All this is in the dev branch - I will make a 1.1 release soon (in the upcoming weeks) and this will go into it.

Thanks for pushing me to optimize this - it was on the roadmap but I only suspected it was slower - didn't have a clue how much.

What do you think? Test it in your codebase and report :)

@onqtam onqtam added a commit that referenced this issue Jun 8, 2016
@onqtam related to #14 - added a header with a specialized macro for equality
removed WARN assertions from the assertion count
c4ff1ca
@onqtam
Owner
onqtam commented Jun 8, 2016

Sooo I made the first set of macros in the doctest/doctest/parts/doctest_fast.h header - WARN_EQ/CHECK_EQ/REQUIRE_EQ

and here are some compile time benchmarks with MSVC 2015 in Debug (no other compilers/modes tested):

500 files
25 test cases per file
10 assertions per case

93 sec (expression decomposition) vs 35 sec (fast macros) - but the majority of the time in the slow case is spent in linking

1 file
250 test cases in that file
100 assertions per case

48 sec (expression decomposition) vs 4.6 sec (fast macros) - majority of time spent NOT in linking - but in compiling!!!

And I even retained all the functionality of the macros - they break into the debugger and the REQUIRE_EQ macro ends the test case on failure! I could have made it even more optimal but I don't want to cut away these features.

Give it a try and report what happens in the real world :)

I will make macros also for less than/greater than/etc. soon.

@onqtam onqtam added a commit that referenced this issue Jun 8, 2016
@onqtam - removed the _fast header and put its contents into the main doctest…
… header

- implemented the rest of the comparison fast macros (and also the unary ones)

fixed #14
0457fca
@onqtam
Owner
onqtam commented Jun 8, 2016

I ended up removing the doctest_fast.h header and now the fast macros are in the main doctest header - see here

@wichtounet

This sounds great! I'm in vacation now, but I'll test this on my codebase next week.

@wichtounet
wichtounet commented Jun 14, 2016 edited

I've been able to test the last version directly on my codebase. I made it so I just have to define a macro to switch between Catch and doctest.

Here are the timings:

  • Catch: ~650s
  • Catch (fast macros): ~460s
  • Doctest: ~440s
  • Doctest (stripped macros): ~410s

(The timings are not the same from my blog post because I did this experiment with less features enabled)

So, it's faster than Catch in all cases! Even in the case when using my very minimal Catch macros, it's faster and has more features (since you retained much more features than I did).

The stripped macros are stripped so that they contain only the call to fast_assert and fast_assert_unary. Not sure if it's worth doing for me and certainly not worth doing in the library itself.

I had to do one change though. In fast_assert_unary, you are taking the expression by reference and then converting bool inside. The problem with that is that the expression is odr-used and in my case this leads to undefined references at link time with fields that should not be odr-used. The same problem could occur in fast_assert, but I don't compare binary things that cannot be odr-used in my case. I simply did a new version of fast_assert with a bool value as parameter. Is it really important to do the conversion to bool inside fast_assert than simply taking a bool parameter ?

Thanks for this nice update!

@onqtam
Owner
onqtam commented Jun 14, 2016

well if fast_assert_unary takes a bool - then the try/catch block where it is evaluated will go out of the function and into the macro - leading to more code bloat.

about the odr-used stuff - can you give an example? I'm a bit brain-dead at the moment

@wichtounet
wichtounet commented Jun 14, 2016 edited

well if fast_assert_unary takes a bool - then the try/catch block where it is evaluated will go out of the function and into the macro - leading to more code bloat.

But only the conversion to bool will be evaluated at this place, not the evaluation of the expression itself, isn't it ?

If you have a function f(x) that returns a bool, the evaluation of f(x) inside REQUIRE_UNARY(f(22)) will not be evaluated inside the try/catch block, but only inside the DOCTEST_FAST_ASSERTION_UNARY. Only the bool will then be passed to fast_assert_unary and therefore the try catch should not catch anything. The only thing that would be caught by your try/catch is an exception thrown by the conversion of the passed value to bool and in most cases, this will simply be a bool. Unless I'm missing something big.

about the odr-used stuff - can you give an example? I'm a bit brain-dead at the moment

Yes. Imagine you are testing type_traits:

template<typename T>
struct type_traits {
    static constexpr const bool is_bounded = false;
};

somewhere:

    REQUIRE_UNARY(type_traits<double>::is_bounded);
    REQUIRE_UNARY(type_traits<float>::is_bounded);

In this case, there is an undefined reference because is_bounded is "used" (because of the reference to it), its value cannot be directly replaced by the compiler. And therefore a definition should be present. This is not a huge issue, since in most cases, the definition will be available. This is mostly a corner case and I can deal with it myself I guess.

@onqtam
Owner
onqtam commented Jun 14, 2016 edited

damn - I completely missed both of these things...
I'll have to think on this.

I think I'll end up making 2 sets of assertions - those who can catch exceptions - in the macro, and those that cant - the fast ones.

And maybe the user will switch between them with a define, or they will be separate macros - perhaps CHECK_EQ and CHECK_EQ_FAST...

and the fast versions of the unary macros will take just a bool and not a T& - which will avoid the reference creation - but about the _EQ/_GT and other binary macros ... I'll think about it.

Thanks again for pointing out these issues to me - and also that the original macros were so slow! I do care about this.

@wichtounet

damn - I completely missed both of these things...
I'll have to think on this.

Sorry to give you so much work ;)

As for the reference problem, it's probably secondary. The first problem is more important for people who care about exceptions.

@onqtam
Owner
onqtam commented Sep 20, 2016 edited

after 3+ months of radio silence - I've fixed all the new assertions!

checkout the benchmarks!

roughly said:

doctest 1.1 (which will be released today probably) is around 3 times faster than version 1.0 (released on 2016.05.22) when expression decomposing macros like CHECK(a==b) are used.

Additionally I've added normal binary macros that don't do any expression decomposition - CHECK_EQ(a,b) - and they are around 20% faster than CHECK(a==b). These macros capture exceptions thrown while evaluating the expression (and it's left and right side) - just like CHECK(a==b).

Additionally I've added faster normal binary macros that don't do any expression decomposition - FAST_CHECK_EQ(a,b) - and they are around 30-70% faster than CHECK_EQ(a,b)! The difference is that these macros don't have a try/catch block so no exceptions are caught.

Additionally I've added the DOCTEST_CONFIG_SUPER_FAST_ASSERTS identifier which makes the fast asserts even faster by another 35-80%! The difference is that when such super fast assertions fail - if the debugger is attached it will not break on the source line where the assertion was written but in a function inside doctest - so the user will have to go 1 level up in the callstack to see where the failing assertion actually is.

I've also added the DOCTEST_CONFIG_ASSERTION_PARAMETERS_BY_VALUE config identifier - when used all expressions are copied instead of binded to const& - so no ODR usage - like you've requested.

All macros evaluate the expressions only once. All macros stringify properly the values when the assertion fails.

There are also unary assertions - like FAST_CHECK_UNARY(expr)

I've also added benchmarks with gcc/clang under linux.

I assume working with FAST_CHECK_EQ is tedious so I will write a section in the FAQ how to have shorter user defined aliases for the macros (it will be something like the alternative macros example)

Let me know what you think! :) I think I've reached the limits of compile time for assertion macros.

@wichtounet

That sounds so awesome!

As soon as the version 1.1 is released, I'll update my code and do the measurements again and make a post on my blog. I'll keep you up-to-date with the results.

@onqtam onqtam added a commit that referenced this issue Sep 21, 2016
@onqtam reworked the expression decomposition macros (and the rest) for faste…
…r compile times and less code bloat (not benchmarked yet) - relates #14
f2ea97b
@onqtam onqtam added a commit that referenced this issue Sep 21, 2016
@onqtam related to #14 - added a header with a specialized macro for equality
removed WARN assertions from the assertion count
a6bcda2
@onqtam onqtam added a commit that closed this issue Sep 21, 2016
@onqtam - removed the _fast header and put its contents into the main doctest…
… header

- implemented the rest of the comparison fast macros (and also the unary ones)

fixed #14
8deb092
@onqtam onqtam closed this in 8deb092 Sep 21, 2016
@onqtam
Owner
onqtam commented Sep 21, 2016

A blog post would be much appreciated - any kind of publicity for this framework is much needed!

damn rebasing... I should really learn git.

@wichtounet

I made all the tests. The results are available on my blog: http://baptiste-wicht.com/posts/2016/09/blazing-fast-unit-test-compilation-with-doctest-11.html

For reference, here are the final results I got (on my expression templates framework):

Catch 724.22
Fast Catch macros 464.52
doctest 1.0 871.54
doctest 1.1 614.67
REQUIRE_EQ 493.97
FAST_REQUIRE_EQ 439.09
SUPER_FAST_ASSERTS 411.11

That's pretty good results, I think.

@onqtam
Owner
onqtam commented Sep 21, 2016

damn - nothing compared to my synthetic benchmarks :D but still - my fastest macros are just a single function call - just like the ones u've made for Catch. I should make a note in the documentation that for real use cases the results are not that spectacular :D

@wichtounet

I think it's still great. Here, we are still compiling more than 1000 test case with several thousand assertions. Moreover, this a template-heavy library so we can expect some lengthy compilation times. If we could compute the real compilation of the test cases without the test (:P), then we could compare the real difference between catch and doctest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment