alert_stream_design.slide

# Rubin Alert Distribution Design

Spencer Nelson
27 Jul 2020
swnelson@uw.edu


## Short version

1. Our alert packets are really large, and contain all the data we have in each message.
2. This makes the alert stream hard to scale, which forces us to distribute it to a small audience.
3. We could separate notification from data and have a better system.

## Current design

The alert pipeline sends alert packets to a Kafka cluster.

A select group of users (~5-15) get read access to that Kafka cluster. They are
responsible for rebroadcasting the alerts for broader community access.

Alert packets have metadata, "raw" science data, and calculated values.

## Current design: alert packet size

Alerts are about 100 KB, dominated by cutout images which hold raw science data.

For example, here's a sample alert which is 127KB:

```
field                   | size (bytes)
------------------------+--------------
alertId                 |       9
diaSource               |     546
diaObject               |     677
18 prvDiaSources        |   9,829
28 prvDiaForcedSources  |   1,347
cutoutDifference        |  57,604
cutoutTemplate          |  57,604
```

So, roughly
 - 1% metadata and calculated measurements
 - 9% history
 - 90% cutouts

## Current design: stream size

We publish the stream of all alerts. Each alert packet is about 100KB, and there are about 10,000 alerts per exposure.

Most alerts are published in a brief flood when processing completes, so perhaps 2,000 alerts per second over 5 seconds.

This means we need about 1.6Gbps outbound bandwidth per recipient.

## Problem with the current design

Distribution and notification are conflated.

Letting everyone know that there's something worth checking out is different
from sending everyone data.

The conflation of the two makes bandwidth the primary constraint for filter
authors, which is really expensive.

Raises time-to-notify for rapid alerts, since recipients must scan through all
science data to get to next alert.

Makes system hard to scale or optimize; bulk data delivery and rapid delivery
must be served from same hardware.


## Separating alert distribution from notification

We could send alert packets which are much smaller, but include a URL referring
to a location with much more data.

The "alert stream" gets much smaller:

```
{"image": "https://ls.st/alerts/2021/07/27/exposure_1/alert_1.fits", "h1": 90122.9, "h2": "..."}
{"image": "https://ls.st/alerts/2021/07/27/exposure_1/alert_2.fits", "h1": 71259.9, "h2": "..."}
{"image": "https://ls.st/alerts/2021/07/27/exposure_1/alert_3.fits", "h1": 12409.2, "h2": "..."}
```

Eric Bellm has concocted a plausible minimal payload. It's just 100 bytes per
alert!


## Client behavior

Client behavior is to consume from the alert stream and go fetch the full blob
of data separately _if they actually care_.

They can trivially filter out most events that they don't care about.

We can rate-limit clients' HTTP calls to fetch more data to manage cost fairly.
Community brokers (if they exist) are then groups with really high rate limits.

Augmenting the alert stream with new information becomes easier. Just add a new
header, and keep the reference back to the original alert data - no re-upload
required.


## We might be able to make the stream public

If each alert is just 200 bytes (100 bytes of data + 100 bytes of URLs), then
the peak outbound bandwidth of notifications is probably about 3Mbps per
recipient of the stream. That's small! We could handle thousands of recipients
with a single 10Gbps NIC.

We'd need to rate limit access to the other 99,800 bytes of data. This can be
tuned to our actual bandwidth requirements.

Permitting 270 requests/sec lets a user keep up with all 10k alerts/exposure
(albeit with up to ~35s latency). This is equivalent to about 220 Mbps.

We could easily support 100 recipients with a 25Gbps bandwidth limit:
 - 100 * 3 Mbps = 0.3Gbps for notifications
 - 100 * 220 Mbps = 22Gbps for data


## Latency impact is negligible if consumers are picky

This may require one extra round-trip to fetch data. This will add 200ms of
latency typically versus using the same stream to notify and transmit data.

End-to-end latency may be *lower* in practice because this permits very high
parallelism. Consumers can fan out handling of many messages concurrently; a
combined stream requires a bulky copy for that fanout which is harder to build
infrastructure around.


## On the other hand...

Rate limits might frustrate users.

Identifying the right balance of headers will take some effort.

We'd have to run two systems - data and notification - instead of just one
combo.

Referential integrity could be hard to maintain. Perfect synchronization of the
data and notification systems is probably impossible.


## Incremental approach

We can try these ideas out without disrupting the existing plan: provide a
"skinny form" of the alerts alongside the fat form.

This could be implemented as a consumer of the Kafka topic (or the other way
around, Kafka on top of the skinny stream).

Approximately double the hardware requirements, and extra development effort to
build and operate the service two ways.

## Conclusions

Putting both data and notifications about data on the same channel is clumsy and
difficult. We might be able to provide a more useful data stream to more people
if we separate them.