Skip to content

Benchmarks

poletti-marco edited this page May 2, 2020 · 19 revisions

Methodology

All the following benchmarks have been executed on an AMD Ryzen Threadripper 1950X @3.4GHz (32 virtual cores), on a machine with 32 GB RAM. All runtime benchmarks below are single-threaded (Fruit is not multithreaded, even though it's reentrant and thread-safe), so the number of cores doesn't matter for the runtime performance benchmarks. But it does matter for the compile-time benchmarks.

The following compiler options were used: -std=c++11 -O2 -DNDEBUG. Benchmarks involving Boost.DI used -std=c++14 instead, since Boost.DI requires at least C++14.

The benchmarks were executed on Kubuntu Linux 19.10 with the following compilers:

  • Clang 10.0.0
  • GCC 9.2.1

All values use 95% confidence intervals and have been rounded to 2 significant digits. Each benchmark is repeated at least 3 times, and then again until one of the following:

  • 20 runs of the benchmark
  • 2h runtime (for this benchmark alone)
  • the bounds of the confidence interval round to the same number (with 2 significant digits' precision).

When you see e.g. 5.3 in a result it means the confidence interval [5.3, 5.3] (i.e., both bounds rounded to that number, when keeping only 2 significant digits of precision). Proper confidence intervals are written as e.g. 5.2-5.5, meaning the confidence interval [5.2, 5.5].

All benchmarks try to simulate what happens in a dummy codebase using dependency injection, to give an overall idea (as opposed to synthetic benchmarks, where a large % difference might be misleading because that part is only a small % of the total run/compile time).

The dummy codebases are defined as follows: 10% of the classes have no dependencies, and 90% have 10 dependencies each (i.e. each of them needed 10 instances of other objects from the injector to be constructed). When not otherwise specified, there are also interfaces (i.e. pure virtual base classes) for all injected classes. So e.g. for codebases with 100 classes, the dependency graph has 10 classes with no dependencies and 90 classes with 10 dependencies each, for a total of 900 edges between classes; and there are 100 interfaces with 100 interface-implementation edges in the injection graph.

In addition to Fruit, the "contestants" for these benchmarks are:

  • boost-experimental/di. This will be referred to as Boost.DI below for conciseness (and this is also how the author of that library refers to it) but note that this is not actually part of Boost, the author has pushed for its inclusion for several years but it has never been accepted into Boost. Using "stars on github" as a rough measure of popularity, Boost.DI is the 2nd most popular DI framework, behind Fruit (graph).
  • "Simple DI": a codebase with dependency injection but without using any DI framework. Unlike the others, this codebase uses all concrete classes (instead of using interfaces). All classes are allocated on the stack (instead of e.g. using new). This is included to show the pros/cons of this bare-bones approach compared to other no-DI-framework approaches and to Fruit.
  • "Simple DI w/ interfaces": similar to the previous, but using interfaces. All classes are still allocated on the stack. This is included to see the pros/cons of introducing interfaces (compare the values with the ones for "Simple DI").
  • "Simple DI w/ interfaces and new/delete": similar to the previous, but using new to allocate classes on the heap. This is included to see the pros/cons of allocating on the heap, and the pros/cons of using Fruit instead of not using a DI framework.

The 3 "Simple DI" codebases are meant as successive steps towards the Fruit model, with Fruit as the last step of the progression.

Full compile time

These benchmarks show the time to compile the codebase from scratch, using make with N+1 jobs (where N is the number of virtual cores available). Note that these benchmarks compile with optimizations (-O2), the compile time without optimization would of course be lower.

Since Fruit does most injection checks at compile-time using template metaprogramming, in some sense a part of it "runs" at compile time too. The same applies to Boost.DI.

Compile time (Clang) 100 classes 250 classes 1000 classes
Fruit 6.6-6.7 s 16 s 64 s
Boost.DI 17 s 58 s 580 s
Simple DI 0.93-0.94 s 1.9 s 8.3 s
Simple DI w/ interfaces 0.97-0.98 s 1.9 s 6.9 s
Simple DI w/ interfaces, new/delete 2.5 s 5.8 s 25 s
Compile time (GCC) 100 classes 250 classes 1000 classes
Fruit 5.1 s 12 s 48 s
Boost.DI 18 s 110 s N/A
Simple DI 0.73 s 1.5 s 6.4 s
Simple DI w/ interfaces 0.82 s 1.8 s 8.8-8.9 s
Simple DI w/ interfaces, new/delete 2.1-2.2 s 5.2-5.3 s 33 s

Key takeaways:

  • Adopting Fruit adds about 4-5s of cold compilation time in a codebase with 100 classes and about 40-60s in a codebase with 1000 classes.
  • The slowdown will be additive, so while the relative time difference in the table above is huge, that won't be the case in a real codebase that already takes a long time without Fruit. E.g. in a codebase with 250 injected classes that currently takes 5min to compile, you should expect a cold compile time slowdown on the order of 10-15s, not of 10x.
  • The compilation time with Boost.DI is much higher than Fruit (even in a small codebase) and the gap becomes even larger in larger codebases. The slowdown in medium/large codebases could be a non-trivial fraction of the compile time without DI (if not a multiple).
  • The data for the combination Boost.DI+GCC+1000 classes is not available because GCC crashes (AFAICT due to it running out of memory). As shown in the tables below, Boost.DI doesn't scale in terms of compile time memory either.

Incremental compile time

The scenario for these benchmarks is as follows: starting from an already-compiled codebase, we touch 5 random files and then re-run make. This is meant to simulate the compilation cost in an edit-rerun cycle, as part of development. Any high values here slow down engineers working on the project, much more than high cold compile times would (since incremental compilations are much more frequent than cold ones).

Incremental compile time (Clang) 100 classes 250 classes 1000 classes
Fruit 3.9 s 4 s 6.1 s
Boost.DI 16 s 57 s 570-580 s
Simple DI 0.84-0.85 s 1.8 s 8 s
Simple DI w/ interfaces 0.65-0.66 s 0.67-0.69 s 1.9 s
Simple DI w/ interfaces, new/delete 2.2 s 4.6 s 20 s
Incremental compile time (GCC) 100 classes 250 classes 1000 classes
Fruit 2.9 s 3-3.1 s 5.2 s
Boost.DI 17 s 110 s N/A
Simple DI 0.67-0.68 s 1.5 s 6.3 s
Simple DI w/ interfaces 0.58-0.6 s 0.89-0.91 s 5.2 s
Simple DI w/ interfaces, new/delete 1.9 s 4.4 s 29 s

Key takeaways:

  • Switching to Fruit doesn't cause significant increases on the incremental compile time; in fact, in larger codebases the incremental compilation time is lower with Fruit even when compared with some "no DI framework" approaches. This is because the increased modularity of the codebase allows to re-compile fewer things, and because the sizes of the various compilation units are more balanced, instead of having 1 file that is very slow to compile (main.cpp) compared to the rest.
  • As in the previous benchmarks, Boost.DI incremental compile times are huge and increase significantly with the size of the codebase. A 10min incremental compile time overhead for a codebase with 1000 classes would likely cause a significant slowdown in the development and waste many engineer/hours.

Compile memory

These benchmarks do a cold build of the codebase, but instead of measuring the compilation time they measure the maximum amount of RAM needed by the various steps of the build (including both compilation and linking). This is an important metric because it determines how much RAM the developers working on the project need to use the full compilation speed allowed by their processor, or how much they need to scale down the parallelism to make the compilation fit in the available RAM.

Compile memory (Clang) 100 classes 250 classes 1000 classes
Fruit 133 MB 133 MB 209 MB
Boost.DI 343 MB 772 MB 4196 MB
Simple DI 89 MB 94 MB 114 MB
Simple DI w/ interfaces 93 MB 104 MB 162 MB
Simple DI w/ interfaces, new/delete 181 MB 324 MB 1049 MB
Compile memory (GCC) 100 classes 250 classes 1000 classes
Fruit 152 MB 152 MB 228 MB
Boost.DI 572 MB 1430 MB 7534 MB
Simple DI 70 MB 85 MB 162 MB
Simple DI w/ interfaces 75-76 MB 104 MB 286 MB
Simple DI w/ interfaces, new/delete 305 MB 572 MB 2193 MB

Key takeaways:

  • Adding Fruit increases the RAM requirements per process by at most 95MB in the worst comparison (codebase of 1000 classes compiled with Clang, comparing Fruit vs "Simple DI"). This will likely be dwarfed by the amount of additional memory needed to compile your actual code.
  • "Simple DI w/ interfaces" is roughly on par with Fruit; Fruit takes a bit less memory when using GCC and a bit more when using Clang.
  • "Simple DI w/ interfaces, new/delete" requires a lot of RAM; this is because there's 1 compilation unit (main.cpp) that's much larger than the rest.
  • Boost.DI takes significantly more memory than Fruit (>3x even in the smallest codebase) and as in previous compile time benchmarks it doesn't scale; the gap becomes larger with the codebase size, using up several GBs in the large codebase.

Startup time

This is the first of the runtime benchmarks. The scenario is as follows: the main process of the example codebase starts up, creates an injector and injects all classes, then prints "Hello, world!" and terminates.

This is meant to show the overhead of using a DI framework compared to another or to the "Simple DI" approaches (i.e. with no DI framework).

Startup time (Clang) 100 classes 250 classes 1000 classes
Fruit 7.3 ms 6.6 ms 9.2 ms
Boost.DI 5.7-5.8 ms 6.2 ms 8.4 ms
Simple DI 5.4 ms 6.8 ms 7.1 ms
Simple DI w/ interfaces 5.4 ms 6.2 ms 6.5 ms
Simple DI w/ interfaces, new/delete 5.4 ms 7 ms 6.3 ms
Startup time (GCC) 100 classes 250 classes 1000 classes
Fruit 5.8 ms 6.5 ms 9.2 ms
Boost.DI 4.5 ms 5.3 ms N/A
Simple DI 5.3 ms 5.9 ms 5.8 ms
Simple DI w/ interfaces 5.4 ms 6.4 ms 6.9 ms
Simple DI w/ interfaces, new/delete 5.4 ms 6.1 ms 6.6 ms

Key takeaways:

  • The Fruit startup time overhead is at most 3.4ms in the least favorable comparison (GCC, 1000 classes, comparing with "Simple DI"), which should be dwarfed by the actual startup time of virtually any real-world application of that size.
  • The overheads here should be additive, not multiplicative. So if you have a server binary that doesn't currently use Fruit and takes 1s to start, adopting Fruit will add around 2-3ms to that, which would be hardly noticeable (and not e.g. 20%).
  • Unlike the compile time performance, Boost.DI is actually competitive here, about 1-2ms faster than Fruit. As mentioned in the previous section, unless your binary is extremely fast to start and startup time is critical, the extra 1-2ms should not be an issue.

Time to create additional injectors

What if you want to create many injectors during the lifetime of a process? For example, you might want to create 1 injector per request in a RPC/HTTP server, so that you can store request-specific state in the injected objects but still guaranteeing that there's no interference between requests (and without needing to guard access to this data with locks to allow threads processing different requests to access/modify it).

This is the scenario for this section: after the process has started (paying the startup costs mentioned in the previous section) then we repeatedly create an injector, inject all classes with it and destroy it (serially, there is no parallelism here).

The Fruit codebase in this case uses fruit::NormalizedComponent to pre-compute data at startup. The cost of doing this is similar than (in fact, slightly lower than) the cost for creating an injector, so you can still refer to the startup times in the previous section to have an idea of the startup time overhead of Fruit.

Boost.DI does not offer comparable functionality (at the time of writing), so the example codebase there creates injectors from scratch each time.

Per-request time (Clang) 100 classes 250 classes 1000 classes
Fruit 2.5-2.6 μs 8-8.1 μs 85 μs
Boost.DI 47-50 μs 150 μs 710-740 μs
Simple DI 0.81-0.83 μs 2.2 μs 8.2-8.3 μs
Simple DI w/ interfaces 1-1.1 μs 2.8 μs 13 μs
Simple DI w/ interfaces, new/delete 2.2 μs 5.5 μs 47 μs
Per-request time (GCC) 100 classes 250 classes 1000 classes
Fruit 2.9-3 μs 12 μs 99-100 μs
Boost.DI 53-55 μs 140-150 μs N/A
Simple DI 0.44 μs 1.2 μs 4.6-4.7 μs
Simple DI w/ interfaces 0.65-0.66 μs 1.7-1.8 μs 12 μs
Simple DI w/ interfaces, new/delete 2-2.1 μs 5-5.1 μs 43 μs

Key takeaways:

  • Creating a Fruit injector per request is quite cheap, the worst overhead here is about 0.1ms (when comparing Fruit to "Simple DI" with GCC and 1000 classes)
  • This benchmark assumes that all 1000 classes have to be injected for each request; in practice, each request will only need a subset of the classes. You can reduce the cost of using Fruit by using fruit::Provider<> to lazily inject classes. If you use this and you end up injecting only e.g. 100 classes/request on average (even if your entire codebase consists of 1000 classes) you'll probably see a slowdown comparable to the 2-3μs for the codebase with 100 classes.
  • The time increases super-linearly with the number of classes (e.g. when going from 250 classes to 1000, the number of classes to inject is 4x but the time increases by more than 4x). This is because the code for the constructors/destructors of 1000 classes no longer fits in the same level of cache as the code for 250 classes would, causing additional cache misses.
  • Boost.DI is 10-15x slower than Fruit in all cases.

Executable size

The following tables show the size of the executable generated in the various codebases.

Executable size (stripped, Clang) 100 classes 250 classes 1000 classes
Fruit 390 KB 947 KB 3710 KB
Boost.DI 576 KB 1464 KB 5664 KB
Simple DI 30 KB 54 KB 195 KB
Simple DI w/ interfaces 66 KB 156 KB 585 KB
Simple DI w/ interfaces, new/delete 70 KB 166 KB 625 KB
Executable size (stripped, GCC) 100 classes 250 classes 1000 classes
Fruit 390 KB 957 KB 3808 KB
Boost.DI 761 KB 2050 KB N/A
Simple DI 34 KB 70 KB 253 KB
Simple DI w/ interfaces 98 KB 224 KB 869 KB
Simple DI w/ interfaces, new/delete 107 KB 244 KB 927 KB
Executable size (stripped, no exceptions/RTTI, Clang) 100 classes 250 classes 1000 classes
Fruit 322 KB 791 KB 3125 KB
Boost.DI 458 KB 1074 KB 4492 KB
Simple DI 30 KB 54 KB 195 KB
Simple DI w/ interfaces 59 KB 126 KB 478 KB
Simple DI w/ interfaces, new/delete 59 KB 126 KB 488 KB
Executable size (stripped, no exceptions/RTTI, GCC) 100 classes 250 classes 1000 classes
Fruit 302 KB 732 KB 2832 KB
Boost.DI 673 KB 1855 KB N/A
Simple DI 34 KB 70 KB 253 KB
Simple DI w/ interfaces 70 KB 166 KB 654 KB
Simple DI w/ interfaces, new/delete 70 KB 166 KB 634 KB

Key takeaways:

  • The executable size overhead of using Fruit is around 3-4KB per injected class. In a real codebase, this will likely be dwarfed by the size of the actual code.
  • As expected, disabling exceptions and RTTI leads to a decrease in the executable size. If you're concerned about executable size, you should probably do this in your release build, while keeping those on in debug builds (at least RTTI, that allows Fruit to report richer error messages containing type names and function signatures).
  • The executable size overhead of Boost.DI is about 30% higher than Fruit with Clang, and about 2-3x with GCC.