/
alert_stream_design.slide
160 lines (102 loc) · 5.27 KB
/
alert_stream_design.slide
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
# Rubin Alert Distribution Design
Spencer Nelson
27 Jul 2020
swnelson@uw.edu
## Short version
1. Our alert packets are really large, and contain all the data we have in each message.
2. This makes the alert stream hard to scale, which forces us to distribute it to a small audience.
3. We could separate notification from data and have a better system.
## Current design
The alert pipeline sends alert packets to a Kafka cluster.
A select group of users (~5-15) get read access to that Kafka cluster. They are
responsible for rebroadcasting the alerts for broader community access.
Alert packets have metadata, "raw" science data, and calculated values.
## Current design: alert packet size
Alerts are about 100 KB, dominated by cutout images which hold raw science data.
For example, here's a sample alert which is 127KB:
```
field | size (bytes)
------------------------+--------------
alertId | 9
diaSource | 546
diaObject | 677
18 prvDiaSources | 9,829
28 prvDiaForcedSources | 1,347
cutoutDifference | 57,604
cutoutTemplate | 57,604
```
So, roughly
- 1% metadata and calculated measurements
- 9% history
- 90% cutouts
## Current design: stream size
We publish the stream of all alerts. Each alert packet is about 100KB, and there are about 10,000 alerts per exposure.
Most alerts are published in a brief flood when processing completes, so perhaps 2,000 alerts per second over 5 seconds.
This means we need about 1.6Gbps outbound bandwidth per recipient.
## Problem with the current design
Distribution and notification are conflated.
Letting everyone know that there's something worth checking out is different
from sending everyone data.
The conflation of the two makes bandwidth the primary constraint for filter
authors, which is really expensive.
Raises time-to-notify for rapid alerts, since recipients must scan through all
science data to get to next alert.
Makes system hard to scale or optimize; bulk data delivery and rapid delivery
must be served from same hardware.
## Separating alert distribution from notification
We could send alert packets which are much smaller, but include a URL referring
to a location with much more data.
The "alert stream" gets much smaller:
```
{"image": "https://ls.st/alerts/2021/07/27/exposure_1/alert_1.fits", "h1": 90122.9, "h2": "..."}
{"image": "https://ls.st/alerts/2021/07/27/exposure_1/alert_2.fits", "h1": 71259.9, "h2": "..."}
{"image": "https://ls.st/alerts/2021/07/27/exposure_1/alert_3.fits", "h1": 12409.2, "h2": "..."}
```
Eric Bellm has concocted a plausible minimal payload. It's just 100 bytes per
alert!
## Client behavior
Client behavior is to consume from the alert stream and go fetch the full blob
of data separately _if they actually care_.
They can trivially filter out most events that they don't care about.
We can rate-limit clients' HTTP calls to fetch more data to manage cost fairly.
Community brokers (if they exist) are then groups with really high rate limits.
Augmenting the alert stream with new information becomes easier. Just add a new
header, and keep the reference back to the original alert data - no re-upload
required.
## We might be able to make the stream public
If each alert is just 200 bytes (100 bytes of data + 100 bytes of URLs), then
the peak outbound bandwidth of notifications is probably about 3Mbps per
recipient of the stream. That's small! We could handle thousands of recipients
with a single 10Gbps NIC.
We'd need to rate limit access to the other 99,800 bytes of data. This can be
tuned to our actual bandwidth requirements.
Permitting 270 requests/sec lets a user keep up with all 10k alerts/exposure
(albeit with up to ~35s latency). This is equivalent to about 220 Mbps.
We could easily support 100 recipients with a 25Gbps bandwidth limit:
- 100 * 3 Mbps = 0.3Gbps for notifications
- 100 * 220 Mbps = 22Gbps for data
## Latency impact is negligible if consumers are picky
This may require one extra round-trip to fetch data. This will add 200ms of
latency typically versus using the same stream to notify and transmit data.
End-to-end latency may be *lower* in practice because this permits very high
parallelism. Consumers can fan out handling of many messages concurrently; a
combined stream requires a bulky copy for that fanout which is harder to build
infrastructure around.
## On the other hand...
Rate limits might frustrate users.
Identifying the right balance of headers will take some effort.
We'd have to run two systems - data and notification - instead of just one
combo.
Referential integrity could be hard to maintain. Perfect synchronization of the
data and notification systems is probably impossible.
## Incremental approach
We can try these ideas out without disrupting the existing plan: provide a
"skinny form" of the alerts alongside the fat form.
This could be implemented as a consumer of the Kafka topic (or the other way
around, Kafka on top of the skinny stream).
Approximately double the hardware requirements, and extra development effort to
build and operate the service two ways.
## Conclusions
Putting both data and notifications about data on the same channel is clumsy and
difficult. We might be able to provide a more useful data stream to more people
if we separate them.