Skip to content


Subversion checkout URL

You can clone with
Download ZIP
100644 242 lines (197 sloc) 10.1 KB
27494a2 @jdelong Pull from FB rev 63ce89e2f2301e6bba44a111cc7d4218022156f6
jdelong authored
1 `folly/FBvector.h`
2 ------------------
4 Simply replacing `std::vector` with `folly::fbvector` (after
5 having included the `folly/FBVector.h` header file) will
6 improve the performance of your C++ code using vectors with
7 common coding patterns. The improvements are always non-negative,
8 almost always measurable, frequently significant, sometimes
9 dramatic, and occasionally spectacular.
40a24cb Add samples to FBVector documentation
Helios Alonso Cabanillas authored
11 ### Sample
12 ***
14 folly::fbvector<int> numbers({0, 1, 2, 3});
15 numbers.reserve(10);
16 for (int i = 4; i < 10; i++) {
17 numbers.push_back(i * 2);
18 }
19 assert(numbers[6] == 12);
27494a2 @jdelong Pull from FB rev 63ce89e2f2301e6bba44a111cc7d4218022156f6
jdelong authored
21 ### Motivation
22 ***
40a24cb Add samples to FBVector documentation
Helios Alonso Cabanillas authored
24 `std::vector` is the stalwart abstraction many use for
27494a2 @jdelong Pull from FB rev 63ce89e2f2301e6bba44a111cc7d4218022156f6
jdelong authored
25 dynamically-allocated arrays in C++. It is also the best known
26 and most used of all containers. It may therefore seem a
27 surprise that `std::vector` leaves important - and sometimes
28 vital - efficiency opportunities on the table. This document
29 explains how our own drop-in abstraction `fbvector` improves key
30 performance aspects of `std::vector`. Refer to
31 folly/test/FBVectorTest.cpp for a few benchmarks.
33 ### Memory Handling
34 ***
36 It is well known that `std::vector` grows exponentially (at a
37 constant factor) in order to avoid quadratic growth performance.
38 The trick is choosing a good factor (any factor greater than 1
39 ensures O(1) amortized append complexity towards infinity). A
40 factor that's too small causes frequent vector reallocation; one
41 that's too large forces the vector to consume much more memory
42 than needed. The initial HP implementation by Stepanov used a
43 growth factor of 2, i.e. whenever you'd `push_back` into a vector
44 without there being room, it would double the current capacity.
46 With time, other compilers reduced the growth factor to 1.5, but
47 gcc has staunchly used a growth factor of 2. In fact it can be
48 mathematically proven that a growth factor of 2 is rigorously the
49 <i>worst</i> possible because it never allows the vector to reuse
50 any of its previously-allocated memory. That makes the vector cache-
51 unfriendly and memory manager unfriendly.
53 To see why that's the case, consider a large vector of capacity C
54 residing somewhere at the beginning of an initially unoccupied
55 chunk. When the request for growth comes about, the vector
56 (assuming no in-place resizing, see the appropriate section in
57 this document) will allocate a chunk next to its current chunk,
58 copy its existing data, and then deallocate the old chunk. So now
59 we have a chunk of size C followed by a chunk of size k * C.
60 Continuing this process we'll then have a chunk of size k * k * C
61 to the right and so on. That leads to a series of the form (using
62 ^^ for power):
64 C, C*k, C*k^^2, C*k^^3, ...
66 If we choose k = 2 we know that every element in the series will
67 be strictly larger than the sum of all previous ones because of
68 the remarkable equality:
70 1 + 2^^1 + 2^^2 + 2^^3... + 2^^n = 2^^(n+1) - 1
72 What that really means is that the new request for a chunk will
73 be never satisfiable by coalescing all previously-used chunks.
74 This is not quite what you'd want.
76 We would of course want the vector to not crawl forward in
77 memory, but instead to move back to its previously-allocated
78 chunks. Any number smaller than 2 guarantees that you'll be able
79 at some point to reuse the previous chunks. Going through the
80 math reveals the equation:
82 k^^n <= 1 + k + k^^2 + ... + k^^(n-2)
84 If some number n satisfies that equation, it means you can reuse
85 memory after n reallocations. The graphical solver below reveals
86 that choosing k = 1.5 (blue line) allows memory reuse after 4
87 reallocations, choosing k = 1.45 (red line) allows memory reuse
88 after 3 reallocations, and choosing k = 1.3 (black line) allows
89 reuse after only 2 reallocations.
91 ![graphical solutions](./Fbvector--graphical_solutions.png)
93 Of course, the above makes a number of simplifying assumptions
94 about how the memory allocator works, but definitely you don't
95 want to choose the theoretically absolute worst growth factor.
96 `fbvector` uses a growth factor of 1.5. That does not impede good
97 performance at small sizes because of the way `fbvector`
98 cooperates with jemalloc (below).
100 ### The jemalloc Connection
101 ***
103 Virtually all modern allocators allocate memory in fixed-size
104 quanta that are chosen to minimize management overhead while at
105 the same time offering good coverage at low slack. For example, an
106 allocator may choose blocks of doubling size (32, 64, 128,
107 <t_co>, ...) up to 4096, and then blocks of size multiples of a
108 page up until 1MB, and then 512KB increments and so on.
110 As discussed above, `std::vector` also needs to (re)allocate in
111 quanta. The next quantum is usually defined in terms of the
112 current size times the infamous growth constant. Because of this
113 setup, `std::vector` has some slack memory at the end much like
114 an allocated block has some slack memory at the end.
116 It doesn't take a rocket surgeon to figure out that an allocator-
117 aware `std::vector` would be a marriage made in heaven: the
118 vector could directly request blocks of "perfect" size from the
119 allocator so there would be virtually no slack in the allocator.
120 Also, the entire growth strategy could be adjusted to work
121 perfectly with allocator's own block growth strategy. That's
122 exactly what `fbvector` does - it automatically detects the use
123 of jemalloc and adjusts its reallocation strategy accordingly.
125 But wait, there's more. Many memory allocators do not support in-
126 place reallocation, although most of them could. This comes from
127 the now notorious design of `realloc()` to opaquely perform
128 either in-place reallocation or an allocate-memcpy-deallocate
129 cycle. Such lack of control subsequently forced all clib-based
130 allocator designs to avoid in-place reallocation, and that
e275511 @edbaunton Update
edbaunton authored
131 includes C++'s `new` and `std::allocator`. This is a major loss of
27494a2 @jdelong Pull from FB rev 63ce89e2f2301e6bba44a111cc7d4218022156f6
jdelong authored
132 efficiency because an in-place reallocation, being very cheap,
133 may mean a much less aggressive growth strategy. In turn that
134 means less slack memory and faster reallocations.
136 ### Object Relocation
137 ***
139 One particularly sensitive topic about handling C++ values is
140 that they are all conservatively considered <i>non-
141 relocatable</i>. In contrast, a relocatable value would preserve
142 its invariant even if its bits were moved arbitrarily in memory.
143 For example, an `int32` is relocatable because moving its 4 bytes
144 would preserve its actual value, so the address of that value
145 does not "matter" to its integrity.
147 C++'s assumption of non-relocatable values hurts everybody for
148 the benefit of a few questionable designs. The issue is that
149 moving a C++ object "by the book" entails (a) creating a new copy
150 from the existing value; (b) destroying the old value. This is
151 quite vexing and violates common sense; consider this
152 hypothetical conversation between Captain Picard and an
153 incredulous alien:
155 Incredulous Alien: "So, this teleporter, how does it work?"<br>
156 Picard: "It beams people and arbitrary matter from one place to
157 another."<br> Incredulous Alien: "Hmmm... is it safe?"<br>
158 Picard: "Yes, but earlier models were a hassle. They'd clone the
159 person to another location. Then the teleporting chief would have
160 to shoot the original. Ask O'Brien, he was an intern during those
161 times. A bloody mess, that's what it was."
163 Only a tiny minority of objects are genuinely non-relocatable:
165 * Objects that use internal pointers, e.g.:
167 class Ew {
168 char buffer[1024];
169 char * pointerInsideBuffer;
170 public:
171 Ew() : pointerInsideBuffer(buffer) {}
172 ...
8244bf0 Added missing curly braces
Helios Alonso Cabanillas authored
173 }
27494a2 @jdelong Pull from FB rev 63ce89e2f2301e6bba44a111cc7d4218022156f6
jdelong authored
175 * Objects that need to update "observers" that store pointers to them.
177 The first class of designs can always be redone at small or no
178 cost in efficiency. The second class of objects should not be
179 values in the first place - they should be allocated with `new`
180 and manipulated using (smart) pointers. It is highly unusual for
181 a value to have observers that alias pointers to it.
183 Relocatable objects are of high interest to `std::vector` because
184 such knowledge makes insertion into the vector and vector
185 reallocation considerably faster: instead of going to Picard's
186 copy-destroy cycle, relocatable objects can be moved around
187 simply by using `memcpy` or `memmove`. This optimization can
188 yield arbitrarily high wins in efficiency; for example, it
189 transforms `vector< vector<double> >` or `vector< hash_map<int,
190 string> >` from risky liabilities into highly workable
191 compositions.
193 In order to allow fast relocation without risk, `fbvector` uses a
194 trait `folly::IsRelocatable` defined in `"folly/Traits.h"`. By default,
195 `folly::IsRelocatable::value` conservatively yields false. If
196 you know that your type `Widget` is in fact relocatable, go right
197 after `Widget`'s definition and write this:
199 // at global namespace level
200 namespace folly {
40a24cb Add samples to FBVector documentation
Helios Alonso Cabanillas authored
201 struct IsRelocatable<Widget> : boost::true_type {};
27494a2 @jdelong Pull from FB rev 63ce89e2f2301e6bba44a111cc7d4218022156f6
jdelong authored
202 }
204 If you don't do this, `fbvector<Widget>` will fail to compile
207 #### Additional Constraints
209 Similar improvements are possible in presence of a "simple" type
210 - more specifically, one that has a trivial assignment (i.e.
211 assignment is the same as bitblitting the bits over) or a nothrow
212 default constructor. These traits are used gainfully by
213 `fbvector` in a variety of places. Fortunately, these traits are
214 already present in the C++ standard (well, currently in Boost).
215 To summarize, in order to work with `fbvector`, a type `Widget`
216 must pass:
219 IsRelocatable<Widget>::value &&
220 (boost::has_trivial_assign<T>::value || boost::has_nothrow_constructor<T>::value));
222 These traits go hand in hand; for example, it would be very
223 difficult to design a class that satisfies one branch of the
224 conjunction above but not the other. `fbvector` uses these simple
225 constraints to minimize the number of copies made on many common
226 operations such as `push_back`, `insert`, or `resize`.
228 To make it easy for you to state assumptions about a given type
229 or family of parameterized types, check Traits.h and in
230 particular handy family of macros FOLLY_ASSUME_FBVECTOR_COMPATIBLE*.
232 ### Miscellaneous
233 ***
235 `fbvector` uses a careful implementation all around to make
236 sure it doesn't lose efficiency through the cracks. Some future
237 directions may be in improving raw memory copying (`memcpy` is
238 not an intrinsic in gcc and does not work terribly well for
239 large chunks) and in furthering the collaboration with
240 jemalloc. Have fun!
40a24cb Add samples to FBVector documentation
Helios Alonso Cabanillas authored
Something went wrong with that request. Please try again.