4
4
# obatcher
5
5
OCaml design framework for building batch-parallel "services". Based
6
6
on [ "Concurrent structures made
7
- easy"] ( https://www.arxiv.org/abs/2408.13779 ) study. Whilst the paper
8
- was written with a focus on batched ** data structures** , it is
9
- discovered that the mechanism can be generalized to work on any type
10
- of "service", where a "service" refers to any modular component of
11
- software that we interact with via API's.
7
+ easy"] ( https://www.conference-publishing.com/download.php?Event=OOPSLAB24MAIN&Paper=23c3e926862612a7e34e440df3bba1&Version=final )
8
+ study. Whilst the paper was written with a focus on batched ** data
9
+ structures** , it is discovered that the mechanism can be generalized
10
+ to work on any type of "service" which stand to benefit from
11
+ processing requests in batches. A "service" collectively refers to any
12
+ software component that exposes a request API interface.
12
13
13
14
### Contents
14
15
* [ Description] ( #description )
@@ -18,51 +19,66 @@ software that we interact with via API's.
18
19
* [ Results] ( #results )
19
20
20
21
# Description
21
- At it's core, ** obatcher** is primarily an approach to designing
22
- efficient concurrent services. The key observation being that
23
- _ "processing a batch of a priori known operations in parallel is
24
- easier than optimising performance for a stream of arbitrary
25
- asynchronous concurrent requests"._ However, designing such services
26
- with API's that expects users to pass in an explicit batch (e.g. array
27
- or list) of operations is unergonomic and requires systems to be
28
- designed around this pattern. ** obatcher** solves this by cleverly
29
- wrapping explicitly batched services and then use the scheduler to
30
- implicitly batch operations under the hood before passing it to the
31
- service. From the clients perspective, interfacing with the batched
32
- service looks like any other plain atomic request service.
22
+ ** obatcher** is primarily a design pattern for building efficient
23
+ concurrent services. The key observation is that _ "processing a batch
24
+ of a priori known operations in parallel is easier than optimising
25
+ performance for a stream of arbitrary asynchronous concurrent
26
+ requests"._ In other words, our approach works by __ separating__ (1)
27
+ the processing of requests from (2) their reception in a parallel
28
+ environment.
29
+
30
+ (1) Instead of having the hard task of optimizing concurrent services,
31
+ we think it's much easier to optimize services that take a batch of
32
+ operations as input. Additionally, an ** important and neccessary
33
+ invariant** is that:
34
+
35
+ > only 1 batch may run at any time.
36
+
37
+ (2) However, designing services this way is unergonomic for users. It
38
+ requires their programs to intentionally and efficiently collect
39
+ requests in batches (e.g. array or list) before handing them of to the
40
+ service. ** obatcher** solves this by cleverly leveraging the scheduler
41
+ to automate the collection of batches in a efficient and low latency
42
+ way. From a usages perspective, interfacing with a batched service
43
+ after composing over it with ** obatcher** looks like any ordinary
44
+ service that handles atomic requests.
33
45
34
46
## Benefits of batch-parallel service design
35
- * [ Picos scheduler agnostic] ( #picos-scheduler-agnostic )
36
- * [ Thread-safe] ( #thread-safe )
37
47
* [ Batch optimization & Incremental parallelism] ( #batch-optimization-&-incremental-parallelism )
48
+ * [ Picos scheduler swappable] ( #picos-scheduler-swappable )
49
+ * [ Thread-safety] ( #thread-safety )
38
50
* [ Easy to test and reason about] ( #easy-to-test-and-reason-about )
39
51
40
- ### Picos scheduler agnostic
41
- ** obatcher** depends on scheduler primatives to perform
42
- transformations on services. As a consequence, it suffers from
52
+ ### Batch optimization & Incremental parallelism
53
+ The benefit of taking a batch of operations as input, is that it
54
+ potentially enables optimizations based on the spread of
55
+ operations. Furthermore, pre-analysis of operations can better advise
56
+ the parallel strategy employed by the service. Another neat advantage
57
+ of batched requests is that parallelism can be added incrementally
58
+ across operations that are "known" to be independent rather than
59
+ having to guarantee safety across all incoming concurrent operations
60
+ to the service.
61
+
62
+ ### Picos scheduler swappable
63
+ ** obatcher** depends on scheduler primatives to enable implicit
64
+ batching transformation on services. As a consequence, it suffers from
43
65
portability issues across different schedulers. To account for this,
44
66
** obatcher** is built on top of
45
67
[ picos] ( https://www.github.com/polytipic/picos ) . Picos provides the
46
68
low-level building blocks for writing schedulers. By using the same
47
- picos primatives to implement ** obatcher** , any picos scheduler
48
- becomes compatible with ** obatcher** .
69
+ picos primatives to implement ** obatcher** , any picos scheduler is
70
+ also compatible with ** obatcher** .
49
71
50
- ### Thread-safe
72
+ ### Thread-safety
51
73
A defining invariant of batched services is that only ** a single batch
52
74
of operations runs at any time** . To guarantee this, ** obatcher** adds
53
75
an efficient lock-free queue in front of the service to collect
54
76
operations in batches before submitting it to the service. This design
55
77
takes inspiration from
56
78
[ ** Flat-combining** ] ( https://people.csail.mit.edu/shanir/publications/Flat%20Combining%20SPAA%2010.pdf ) . The
57
79
research shows that this synchronization method provides better
58
- scaling properties as compared to coarse-grained locking.
59
-
60
- ### Batch optimization & Incremental parallelism
61
- The benefit of taking a batch of operations as input is that this
62
- enables a whole slew optimizations based on the spread of
63
- operations. Furthermore, pre-analysis of operations can better advise
64
- how to add parallelism which can be evolved overtime rather than
65
- having to guarantee safety across all operations to the service.
80
+ scaling properties as compared to employing naive coarse-grained
81
+ locking.
66
82
67
83
### Easy to test and reason about
68
84
Because services only handle single batches at any time, this makes it
@@ -208,10 +224,44 @@ batch runs at a time. Synchronization of parallel operations are
208
224
performed upon entry to the BPDS. Requests that are made when the BPDS
209
225
is busy are sent to form a batch that runs next.
210
226
211
- # Results
212
- Our approach here is new with unfortunately few benchmarks. However,
213
- ** obatcher** is designed almost identically to the one described in
227
+ # Notes
228
+
229
+ ## Request Latency
230
+ A frequent question about ** obatcher** that comes up is:
231
+
232
+ > "Does obatcher wait for a certain number of operations to queue
233
+ > before handing it off to the service?"
234
+
235
+ The underlying question here is "How do batches form?". ** obatcher**
236
+ does not wait for requests to form before launching a batch. The
237
+ design is such that any incoming request will try to launch the batch
238
+ immediately. If a batch is already in-flight, then it forms the next
239
+ incoming batch because of the single batch in-flight
240
+ invariant. Therefore, the answer is that batches form only when
241
+ another batches are being processed which means that you can expect
242
+ that your request gets serviced promptly.
243
+
244
+ ## Backpressure
245
+ A convenient feature of services following the ** obatcher** design pattern is
246
+ that there is a quick way to tell how efficient the underlying batch processing
247
+ mechanism that you've implemented is against your expected rate of incoming
248
+ concurrent requests. Functionally, the ideal observation you should make with
249
+ respect to the number of request in each batch overtime is that it increases
250
+ and then plateaus at some point. This indicates that the batch processing
251
+ mechanism will eventually settle at some fixed point where it throttles at the
252
+ optimal batch size. This occurs when the throughput of request processing
253
+ matches that of the rate of incoming requests. Conceptually this works because
254
+ the more requests there are in a batch, the faster the overall throughput will
255
+ be. Therefore, the general rule of thumb is that if your batches just increase
256
+ monotonically, the batch processing implementation needs some work. If the
257
+ batches settle at some batch size, you're golden and you can work toward
258
+ bringing that number down further. Finally, if you're consistently getting a
259
+ batch size of 1, then your workload request rate might not be large enough to
260
+ have required batch processing in the first place!
261
+
262
+ ## Benchmarks Our approach here is new with unfortunately few benchmarks.
263
+ However, ** obatcher** is designed almost identically to the one described in
214
264
[ "Concurrent structures made
215
- easy"] ( https://www.arxiv.org/abs/2408.13779 ) . You can see from the
216
- results in the paper that batched structures scales much better than
217
- those where threads race for mutual exclusion.
265
+ easy"] ( https://www.conference-publishing.com/download.php?Event=OOPSLAB24MAIN&Paper=23c3e926862612a7e34e440df3bba1&Version=final ) .
266
+ You can see from the results in the paper that batched structures scales much
267
+ better than those where threads race for mutual exclusion.
0 commit comments