Skip to content

Commit 6a81aba

Browse files
author
Koonwen Lee
committed
README: touch up
1 parent fbf47cd commit 6a81aba

File tree

1 file changed

+89
-39
lines changed

1 file changed

+89
-39
lines changed

README.md

Lines changed: 89 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,12 @@
44
# obatcher
55
OCaml design framework for building batch-parallel "services". Based
66
on ["Concurrent structures made
7-
easy"](https://www.arxiv.org/abs/2408.13779) study. Whilst the paper
8-
was written with a focus on batched **data structures**, it is
9-
discovered that the mechanism can be generalized to work on any type
10-
of "service", where a "service" refers to any modular component of
11-
software that we interact with via API's.
7+
easy"](https://www.conference-publishing.com/download.php?Event=OOPSLAB24MAIN&Paper=23c3e926862612a7e34e440df3bba1&Version=final)
8+
study. Whilst the paper was written with a focus on batched **data
9+
structures**, it is discovered that the mechanism can be generalized
10+
to work on any type of "service" which stand to benefit from
11+
processing requests in batches. A "service" collectively refers to any
12+
software component that exposes a request API interface.
1213

1314
### Contents
1415
* [Description](#description)
@@ -18,51 +19,66 @@ software that we interact with via API's.
1819
* [Results](#results)
1920

2021
# Description
21-
At it's core, **obatcher** is primarily an approach to designing
22-
efficient concurrent services. The key observation being that
23-
_"processing a batch of a priori known operations in parallel is
24-
easier than optimising performance for a stream of arbitrary
25-
asynchronous concurrent requests"._ However, designing such services
26-
with API's that expects users to pass in an explicit batch (e.g. array
27-
or list) of operations is unergonomic and requires systems to be
28-
designed around this pattern. **obatcher** solves this by cleverly
29-
wrapping explicitly batched services and then use the scheduler to
30-
implicitly batch operations under the hood before passing it to the
31-
service. From the clients perspective, interfacing with the batched
32-
service looks like any other plain atomic request service.
22+
**obatcher** is primarily a design pattern for building efficient
23+
concurrent services. The key observation is that _"processing a batch
24+
of a priori known operations in parallel is easier than optimising
25+
performance for a stream of arbitrary asynchronous concurrent
26+
requests"._ In other words, our approach works by __separating__ (1)
27+
the processing of requests from (2) their reception in a parallel
28+
environment.
29+
30+
(1) Instead of having the hard task of optimizing concurrent services,
31+
we think it's much easier to optimize services that take a batch of
32+
operations as input. Additionally, an **important and neccessary
33+
invariant** is that:
34+
35+
> only 1 batch may run at any time.
36+
37+
(2) However, designing services this way is unergonomic for users. It
38+
requires their programs to intentionally and efficiently collect
39+
requests in batches (e.g. array or list) before handing them of to the
40+
service. **obatcher** solves this by cleverly leveraging the scheduler
41+
to automate the collection of batches in a efficient and low latency
42+
way. From a usages perspective, interfacing with a batched service
43+
after composing over it with **obatcher** looks like any ordinary
44+
service that handles atomic requests.
3345

3446
## Benefits of batch-parallel service design
35-
* [Picos scheduler agnostic](#picos-scheduler-agnostic)
36-
* [Thread-safe](#thread-safe)
3747
* [Batch optimization & Incremental parallelism](#batch-optimization-&-incremental-parallelism)
48+
* [Picos scheduler swappable](#picos-scheduler-swappable)
49+
* [Thread-safety](#thread-safety)
3850
* [Easy to test and reason about](#easy-to-test-and-reason-about)
3951

40-
### Picos scheduler agnostic
41-
**obatcher** depends on scheduler primatives to perform
42-
transformations on services. As a consequence, it suffers from
52+
### Batch optimization & Incremental parallelism
53+
The benefit of taking a batch of operations as input, is that it
54+
potentially enables optimizations based on the spread of
55+
operations. Furthermore, pre-analysis of operations can better advise
56+
the parallel strategy employed by the service. Another neat advantage
57+
of batched requests is that parallelism can be added incrementally
58+
across operations that are "known" to be independent rather than
59+
having to guarantee safety across all incoming concurrent operations
60+
to the service.
61+
62+
### Picos scheduler swappable
63+
**obatcher** depends on scheduler primatives to enable implicit
64+
batching transformation on services. As a consequence, it suffers from
4365
portability issues across different schedulers. To account for this,
4466
**obatcher** is built on top of
4567
[picos](https://www.github.com/polytipic/picos). Picos provides the
4668
low-level building blocks for writing schedulers. By using the same
47-
picos primatives to implement **obatcher**, any picos scheduler
48-
becomes compatible with **obatcher**.
69+
picos primatives to implement **obatcher**, any picos scheduler is
70+
also compatible with **obatcher**.
4971

50-
### Thread-safe
72+
### Thread-safety
5173
A defining invariant of batched services is that only **a single batch
5274
of operations runs at any time**. To guarantee this, **obatcher** adds
5375
an efficient lock-free queue in front of the service to collect
5476
operations in batches before submitting it to the service. This design
5577
takes inspiration from
5678
[**Flat-combining**](https://people.csail.mit.edu/shanir/publications/Flat%20Combining%20SPAA%2010.pdf). The
5779
research shows that this synchronization method provides better
58-
scaling properties as compared to coarse-grained locking.
59-
60-
### Batch optimization & Incremental parallelism
61-
The benefit of taking a batch of operations as input is that this
62-
enables a whole slew optimizations based on the spread of
63-
operations. Furthermore, pre-analysis of operations can better advise
64-
how to add parallelism which can be evolved overtime rather than
65-
having to guarantee safety across all operations to the service.
80+
scaling properties as compared to employing naive coarse-grained
81+
locking.
6682

6783
### Easy to test and reason about
6884
Because services only handle single batches at any time, this makes it
@@ -208,10 +224,44 @@ batch runs at a time. Synchronization of parallel operations are
208224
performed upon entry to the BPDS. Requests that are made when the BPDS
209225
is busy are sent to form a batch that runs next.
210226

211-
# Results
212-
Our approach here is new with unfortunately few benchmarks. However,
213-
**obatcher** is designed almost identically to the one described in
227+
# Notes
228+
229+
## Request Latency
230+
A frequent question about **obatcher** that comes up is:
231+
232+
> "Does obatcher wait for a certain number of operations to queue
233+
> before handing it off to the service?"
234+
235+
The underlying question here is "How do batches form?". **obatcher**
236+
does not wait for requests to form before launching a batch. The
237+
design is such that any incoming request will try to launch the batch
238+
immediately. If a batch is already in-flight, then it forms the next
239+
incoming batch because of the single batch in-flight
240+
invariant. Therefore, the answer is that batches form only when
241+
another batches are being processed which means that you can expect
242+
that your request gets serviced promptly.
243+
244+
## Backpressure
245+
A convenient feature of services following the **obatcher** design pattern is
246+
that there is a quick way to tell how efficient the underlying batch processing
247+
mechanism that you've implemented is against your expected rate of incoming
248+
concurrent requests. Functionally, the ideal observation you should make with
249+
respect to the number of request in each batch overtime is that it increases
250+
and then plateaus at some point. This indicates that the batch processing
251+
mechanism will eventually settle at some fixed point where it throttles at the
252+
optimal batch size. This occurs when the throughput of request processing
253+
matches that of the rate of incoming requests. Conceptually this works because
254+
the more requests there are in a batch, the faster the overall throughput will
255+
be. Therefore, the general rule of thumb is that if your batches just increase
256+
monotonically, the batch processing implementation needs some work. If the
257+
batches settle at some batch size, you're golden and you can work toward
258+
bringing that number down further. Finally, if you're consistently getting a
259+
batch size of 1, then your workload request rate might not be large enough to
260+
have required batch processing in the first place!
261+
262+
## Benchmarks Our approach here is new with unfortunately few benchmarks.
263+
However, **obatcher** is designed almost identically to the one described in
214264
["Concurrent structures made
215-
easy"](https://www.arxiv.org/abs/2408.13779). You can see from the
216-
results in the paper that batched structures scales much better than
217-
those where threads race for mutual exclusion.
265+
easy"](https://www.conference-publishing.com/download.php?Event=OOPSLAB24MAIN&Paper=23c3e926862612a7e34e440df3bba1&Version=final).
266+
You can see from the results in the paper that batched structures scales much
267+
better than those where threads race for mutual exclusion.

0 commit comments

Comments
 (0)