Clone this wiki locally
Databus is a low latency change capture system which has become an integral part of LinkedIn’s data processing pipeline. Databus addresses a fundamental requirement to reliably capture, flow and processes primary data changes. Databus provides the following features :
- Isolation between sources and consumers
- Guaranteed in order and at least once delivery with high availability
- Consumption from an arbitrary time point in the change stream including full bootstrap capability of the entire data.
- Partitioned consumption
- Source consistency preservation
The main components in the above architecture are as follows:
- Read changed rows from the Databus sources in the source database and serialize them as Databus data change events in an in-memory buffer
- Listen for requests from Databus Clients (including Bootstrap Producers) and transport new Databus data change events
- Please find more documentation at Databus 2.0 Relay
- Check for new data change events on relays and execute business-logic specific callbacks.
- If they fall too far behind from the relays, run a catchup query to a bootstrap server.
- New Databus clients run a bootstrap query to a bootstrap server and then switch to a relay for recent data change events.
- Single clients can process an entire Databus stream or they be part of a cluster where each consumer processes only a portion of the stream.
- Please find more documentation at Databus 2.0 Client.
Databus Bootstrap Producers
- Just a special kind of Databus client.
- Check for new data change events on relays.
- Store those events in a MySQL database.
- The MySQL database is used for bootstrap and catchup for clients.
Databus Bootstrap Servers
- Listen for requests from Databus Clients and return long look-back data change events for bootstrapping and catchup.
More detailed documentation can be found at https://github.com/linkedin/databus/wiki/_pages