Skip to content
shmurthy62 edited this page Mar 5, 2015 · 8 revisions

Screenshot

Jetstream provides a seamless integration with Esper. The Jetstream Esper project provides a custom Jetstream processor implementation called Esper Processor. Jetstream is licensed under MIT and Apache dual license. However Jetstream Esper is licensed under GPLV2 license which is the same license as Esper. Esper Processor can be optionally used by Jetstream Application developers in their applications. Esper Processor can be injected in to your pipeline at run time like other Jetstream components.

Esper Processor is an off the shelf Jetstream component. There is no code to write to use Esper. All you do is wire in one or more instances of Esper Processor in to your pipeline using Spring XML wiring syntax and implement your business logic as EPL statements. A typical pipeline wiring with EsperProcessor is as shown below.

screenshot

EsperProcessor provides you the same experience as with interacting with a data base using SQL. This processor provides an interface to define the event structure, EPL(Esper's SQL like language), aggregate functions and custom EPL annotations to the Esper engine. All this are provisioned through Spring XML. EPL statements can be hot deployed into a running application without requiring application restart. The new statements are applied without any impact to live streams. The SQL that is submitted to the engine is compiled to java byte code providing very good performance during statement evaluation time. It is assumed that you are familiar with Esper's EPL before you start to use this processor.

Jetstream offers an annotation plugin framework which enables developers to extend EPLs by writing their own annotations. Several default annnotations are provided by Esper Processor. Several of these annotations make the pipeline visible to SQL enabling flow control of event streams in SQL.

The processor provides support for exception processing. You can inject the listeners to listen to events that failed processing. This can occur when the EPL is not correctly written or data is bad. Unprocessed events can be routed through an advice processor/Kafka setup for replay later.

Performance Considerations

One thing to be aware of is that Esper Processor runs in a single thread and all time window output is managed through another timer thread. Although Esper supports a multi-threaded model, we have selected a single threaded model with a queue. This can limit the processing throughput. You can deploy multiple EsperProcessor instances if you have more cores on the nodes where you deploy your applications. The event stream can be distributed among the different EsperProcessor instances by placing a LoadBalancer in front of the EsperProcessor instances as shown below.

screenshot

Esper Processor has implementations for custom aggregate functions which are extensions to that provided by esper for computing topN, distinct count and percentiles. These have been implemented using approximate algorithms. If you use these functions you are trading off a little bit of accuracy for performance. These aggregates consume very minimal memory compared to doing it the standard way. We recommend that you use these functions for use cases that require these aggregate functions.

Monitoring

Esper Processor like every other Jetstream Component exposes it's stats through a REST interface. You can watch all counters showing count of events flowing through the processor. You can also watch the avg processing time along with the queue depth. Raw events along with errors are also shown in the monitoring page.

Pulsar Real Time Analytics uses Esper Processor heavily. It is best to see how the pipeline applications are built by looking at the PulsarIO Metrics Calculator application wiring.

Contributors

  • Sharad Murthy
  • Xinglang Wang
  • Rajeshwari Muthupandian

Acknowledgements

  • Ken Wang
  • Lisa Li
  • Warren Jin
  • Dyutimoy Sarkar
  • Tim Robison