Skip to content

Commit 555f956

Browse files
committed
8484 Implement aggregate sum and use for arc counters
In pursuit of improving performance on multi-core systems, we should implements fanned out counters and use them to improve the performance of some of the arc statistics. These stats are updated extremely frequently, and can consume a significant amount of CPU time. Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Approved by: Dan McDonald <danmcd@joyent.com> Author: Paul Dagnelie <pcd@delphix.com>
1 parent 04d53b9 commit 555f956

9 files changed

+543
-65
lines changed

uts/common/Makefile.files

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1339,12 +1339,14 @@ LUA_OBJS += \
13391339

13401340
ZFS_COMMON_OBJS += \
13411341
abd.o \
1342+
aggsum.o \
13421343
arc.o \
13431344
blkptr.o \
13441345
bplist.o \
13451346
bpobj.o \
13461347
bptree.o \
13471348
bqueue.o \
1349+
cityhash.o \
13481350
dbuf.o \
13491351
ddt.o \
13501352
ddt_zap.o \
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Copyright (c) 2011 Google, Inc.
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy
4+
of this software and associated documentation files (the "Software"), to deal
5+
in the Software without restriction, including without limitation the rights
6+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7+
copies of the Software, and to permit persons to whom the Software is
8+
furnished to do so, subject to the following conditions:
9+
10+
The above copyright notice and this permission notice shall be included in
11+
all copies or substantial portions of the Software.
12+
13+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19+
THE SOFTWARE.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
CITYHASH CHECKSUM FUNCTIONALITY IN ZFS

uts/common/fs/zfs/aggsum.c

Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
/*
2+
* CDDL HEADER START
3+
*
4+
* This file and its contents are supplied under the terms of the
5+
* Common Development and Distribution License ("CDDL"), version 1.0.
6+
* You may only use this file in accordance with the terms of version
7+
* 1.0 of the CDDL.
8+
*
9+
* A full copy of the text of the CDDL should have accompanied this
10+
* source. A copy of the CDDL is also available via the Internet at
11+
* http://www.illumos.org/license/CDDL.
12+
*
13+
* CDDL HEADER END
14+
*/
15+
/*
16+
* Copyright (c) 2017 by Delphix. All rights reserved.
17+
*/
18+
19+
#include <sys/zfs_context.h>
20+
#include <sys/aggsum.h>
21+
22+
/*
23+
* Aggregate-sum counters are a form of fanned-out counter, used when atomic
24+
* instructions on a single field cause enough CPU cache line contention to
25+
* slow system performance. Due to their increased overhead and the expense
26+
* involved with precisely reading from them, they should only be used in cases
27+
* where the write rate (increment/decrement) is much higher than the read rate
28+
* (get value).
29+
*
30+
* Aggregate sum counters are comprised of two basic parts, the core and the
31+
* buckets. The core counter contains a lock for the entire counter, as well
32+
* as the current upper and lower bounds on the value of the counter. The
33+
* aggsum_bucket structure contains a per-bucket lock to protect the contents of
34+
* the bucket, the current amount that this bucket has changed from the global
35+
* counter (called the delta), and the amount of increment and decrement we have
36+
* "borrowed" from the core counter.
37+
*
38+
* The basic operation of an aggsum is simple. Threads that wish to modify the
39+
* counter will modify one bucket's counter (determined by their current CPU, to
40+
* help minimize lock and cache contention). If the bucket already has
41+
* sufficient capacity borrowed from the core structure to handle their request,
42+
* they simply modify the delta and return. If the bucket does not, we clear
43+
* the bucket's current state (to prevent the borrowed amounts from getting too
44+
* large), and borrow more from the core counter. Borrowing is done by adding to
45+
* the upper bound (or subtracting from the lower bound) of the core counter,
46+
* and setting the borrow value for the bucket to the amount added (or
47+
* subtracted). Clearing the bucket is the opposite; we add the current delta
48+
* to both the lower and upper bounds of the core counter, subtract the borrowed
49+
* incremental from the upper bound, and add the borrowed decrement from the
50+
* lower bound. Note that only borrowing and clearing require access to the
51+
* core counter; since all other operations access CPU-local resources,
52+
* performance can be much higher than a traditional counter.
53+
*
54+
* Threads that wish to read from the counter have a slightly more challenging
55+
* task. It is fast to determine the upper and lower bounds of the aggum; this
56+
* does not require grabbing any locks. This suffices for cases where an
57+
* approximation of the aggsum's value is acceptable. However, if one needs to
58+
* know whether some specific value is above or below the current value in the
59+
* aggsum, they invoke aggsum_compare(). This function operates by repeatedly
60+
* comparing the target value to the upper and lower bounds of the aggsum, and
61+
* then clearing a bucket. This proceeds until the target is outside of the
62+
* upper and lower bounds and we return a response, or the last bucket has been
63+
* cleared and we know that the target is equal to the aggsum's value. Finally,
64+
* the most expensive operation is determining the precise value of the aggsum.
65+
* To do this, we clear every bucket and then return the upper bound (which must
66+
* be equal to the lower bound). What makes aggsum_compare() and aggsum_value()
67+
* expensive is clearing buckets. This involves grabbing the global lock
68+
* (serializing against themselves and borrow operations), grabbing a bucket's
69+
* lock (preventing threads on those CPUs from modifying their delta), and
70+
* zeroing out the borrowed value (forcing that thread to borrow on its next
71+
* request, which will also be expensive). This is what makes aggsums well
72+
* suited for write-many read-rarely operations.
73+
*/
74+
75+
/*
76+
* We will borrow aggsum_borrow_multiplier times the current request, so we will
77+
* have to get the as_lock approximately every aggsum_borrow_multiplier calls to
78+
* aggsum_delta().
79+
*/
80+
static uint_t aggsum_borrow_multiplier = 10;
81+
82+
void
83+
aggsum_init(aggsum_t *as, uint64_t value)
84+
{
85+
bzero(as, sizeof (*as));
86+
as->as_lower_bound = as->as_upper_bound = value;
87+
mutex_init(&as->as_lock, NULL, MUTEX_DEFAULT, NULL);
88+
as->as_numbuckets = boot_ncpus;
89+
as->as_buckets = kmem_zalloc(boot_ncpus * sizeof (aggsum_bucket_t),
90+
KM_SLEEP);
91+
for (int i = 0; i < as->as_numbuckets; i++) {
92+
mutex_init(&as->as_buckets[i].asc_lock,
93+
NULL, MUTEX_DEFAULT, NULL);
94+
}
95+
}
96+
97+
void
98+
aggsum_fini(aggsum_t *as)
99+
{
100+
for (int i = 0; i < as->as_numbuckets; i++)
101+
mutex_destroy(&as->as_buckets[i].asc_lock);
102+
mutex_destroy(&as->as_lock);
103+
}
104+
105+
int64_t
106+
aggsum_lower_bound(aggsum_t *as)
107+
{
108+
return (as->as_lower_bound);
109+
}
110+
111+
int64_t
112+
aggsum_upper_bound(aggsum_t *as)
113+
{
114+
return (as->as_upper_bound);
115+
}
116+
117+
static void
118+
aggsum_flush_bucket(aggsum_t *as, struct aggsum_bucket *asb)
119+
{
120+
ASSERT(MUTEX_HELD(&as->as_lock));
121+
ASSERT(MUTEX_HELD(&asb->asc_lock));
122+
123+
/*
124+
* We use atomic instructions for this because we read the upper and
125+
* lower bounds without the lock, so we need stores to be atomic.
126+
*/
127+
atomic_add_64((volatile uint64_t *)&as->as_lower_bound, asb->asc_delta);
128+
atomic_add_64((volatile uint64_t *)&as->as_upper_bound, asb->asc_delta);
129+
asb->asc_delta = 0;
130+
atomic_add_64((volatile uint64_t *)&as->as_upper_bound,
131+
-asb->asc_borrowed);
132+
atomic_add_64((volatile uint64_t *)&as->as_lower_bound,
133+
asb->asc_borrowed);
134+
asb->asc_borrowed = 0;
135+
}
136+
137+
uint64_t
138+
aggsum_value(aggsum_t *as)
139+
{
140+
int64_t rv;
141+
142+
mutex_enter(&as->as_lock);
143+
if (as->as_lower_bound == as->as_upper_bound) {
144+
rv = as->as_lower_bound;
145+
for (int i = 0; i < as->as_numbuckets; i++) {
146+
ASSERT0(as->as_buckets[i].asc_delta);
147+
ASSERT0(as->as_buckets[i].asc_borrowed);
148+
}
149+
mutex_exit(&as->as_lock);
150+
return (rv);
151+
}
152+
for (int i = 0; i < as->as_numbuckets; i++) {
153+
struct aggsum_bucket *asb = &as->as_buckets[i];
154+
mutex_enter(&asb->asc_lock);
155+
aggsum_flush_bucket(as, asb);
156+
mutex_exit(&asb->asc_lock);
157+
}
158+
VERIFY3U(as->as_lower_bound, ==, as->as_upper_bound);
159+
rv = as->as_lower_bound;
160+
mutex_exit(&as->as_lock);
161+
162+
return (rv);
163+
}
164+
165+
static void
166+
aggsum_borrow(aggsum_t *as, int64_t delta, struct aggsum_bucket *asb)
167+
{
168+
int64_t abs_delta = (delta < 0 ? -delta : delta);
169+
mutex_enter(&as->as_lock);
170+
mutex_enter(&asb->asc_lock);
171+
172+
aggsum_flush_bucket(as, asb);
173+
174+
atomic_add_64((volatile uint64_t *)&as->as_upper_bound, abs_delta);
175+
atomic_add_64((volatile uint64_t *)&as->as_lower_bound, -abs_delta);
176+
asb->asc_borrowed = abs_delta;
177+
178+
mutex_exit(&asb->asc_lock);
179+
mutex_exit(&as->as_lock);
180+
}
181+
182+
void
183+
aggsum_add(aggsum_t *as, int64_t delta)
184+
{
185+
struct aggsum_bucket *asb =
186+
&as->as_buckets[CPU_SEQID % as->as_numbuckets];
187+
188+
for (;;) {
189+
mutex_enter(&asb->asc_lock);
190+
if (asb->asc_delta + delta <= (int64_t)asb->asc_borrowed &&
191+
asb->asc_delta + delta >= -(int64_t)asb->asc_borrowed) {
192+
asb->asc_delta += delta;
193+
mutex_exit(&asb->asc_lock);
194+
return;
195+
}
196+
mutex_exit(&asb->asc_lock);
197+
aggsum_borrow(as, delta * aggsum_borrow_multiplier, asb);
198+
}
199+
}
200+
201+
/*
202+
* Compare the aggsum value to target efficiently. Returns -1 if the value
203+
* represented by the aggsum is less than target, 1 if it's greater, and 0 if
204+
* they are equal.
205+
*/
206+
int
207+
aggsum_compare(aggsum_t *as, uint64_t target)
208+
{
209+
if (as->as_upper_bound < target)
210+
return (-1);
211+
if (as->as_lower_bound > target)
212+
return (1);
213+
mutex_enter(&as->as_lock);
214+
for (int i = 0; i < as->as_numbuckets; i++) {
215+
struct aggsum_bucket *asb = &as->as_buckets[i];
216+
mutex_enter(&asb->asc_lock);
217+
aggsum_flush_bucket(as, asb);
218+
mutex_exit(&asb->asc_lock);
219+
if (as->as_upper_bound < target) {
220+
mutex_exit(&as->as_lock);
221+
return (-1);
222+
}
223+
if (as->as_lower_bound > target) {
224+
mutex_exit(&as->as_lock);
225+
return (1);
226+
}
227+
}
228+
VERIFY3U(as->as_lower_bound, ==, as->as_upper_bound);
229+
ASSERT3U(as->as_lower_bound, ==, target);
230+
mutex_exit(&as->as_lock);
231+
return (0);
232+
}

0 commit comments

Comments
 (0)