Skip to content

Overview

selfint edited this page Dec 23, 2022 · 8 revisions

What

Rillet is an idea more than a technology, on how to write fast, robust, scalable/distributable, and debug-able/testable pipelines.

The guiding principle is that, in stream processing:

correctness > reliability > performance > ... > setup duration

The results must always be correct.

Each pipelines must always be able to recover from crashes.

The pipelines need to run as fast as possible.

Each pipeline is started once and runs forever, more work up front means nothing, reducing it is the last priority (but still a priority, just the last one).

Why

Current stream processing solutions feel like they are more complicated than they need to be. Rillet is an attempt at drastically simplifying stream processing.

Problems

  1. Fault tolerance strategies are backwards.

Current fault tolerance solutions attempt to 'snapshot' running pipelines, and restore the snapshot on failure. This adds overhead to the development of stream processing libraries, and to their performance.

Instead, Rillet pipelines are stateless, so if a pipeline crashes there is nothing to "restore".

  1. Declarative libraries are not debug/test-able.

Most (if not all) stream processing libraries are declarative, meaning they describe what the pipeline does, and not how it does it. This is great for writing new pipelines, and is much faster than the imperative alternative. The problem is that when the pipeline doesn't work, debugging the code is hard, since the actual code running isn't the code that you wrote.

Instead, Rillet pipelines aren't written in a declarative API/DSL, they are just normal programs that read/write using stdio.

  1. Over engineered

"Stream processing" isn't a complicate problem (or is it?), and it's solution also shouldn't be complicated.

Rillet pipelines use only existing solutions, and add as little overhead/boilerplate as possible.

Clone this wiki locally