Skip to content

Kafka to Jetstream

rmuthupandian edited this page Feb 11, 2015 · 1 revision

Overview

Jetstream has implemented full capabilities of consuming Kafka data.

The InboundKafkaChannel is a specialized Jetstream Inbound Channel implementation that enables event streams to be consumed from Kafka Topics and processed in Jetstream Applications. Jetstream Framework provides an extensible serializer which could be used with this Channel. The Channel uses low level Kafka Consumer API to provide full control of kafka stream processing. The event streams from Kafka partitions can be distributed across the cluster such that one node in the cluster can process one or more partitions.

Users can implement Kafka Processors by extending the AbstractBatchEventProcessor. It's a base class which supports batch event processing in Jetstream. Please goto this page for details about Jetstream batch event processing.

Users can link the Kafka Processors to their downstream BatchEventSink or EventSinks. So that the users can leverage Jetstream built-in features to process events from Kafka.

This section first describes the capabilities of Jetstream Kafka processing. Then shows you how to provision an InboundKafkaChannel and write a Kafka processor. Before you read this section further it is recommended that you familiarize yourself with Apache Kafka

Problems of Kafka High Level API

  • Limited control of the offset. The High Level API manages the offset all internally. There is no way for the user to manage the offset for a specific partition.
  • No control of partition assignment. The High Level API dynamically assigns partitions to the nodes in the cluster. No custom ways are provided to the users.
  • Unmanageable Rebalance. The High Level API manages rebalance all internally. No notifications or callbacks are provided for the users when the rebalance happens.
  • Massive fetch threads. The High Level API create 1 fetch thread for each partition in each topic. The thread count would be massive when there are many topics and partitions.

Jetstream Kafka Capabilities

  • Pure pull model batch processing. Jetstream provides pure pull model for Kafka processing. The users have full control of the offset and partition.
  • Static and dynamic partition assignment. User can hard configure the partitions each node can consume. Or let the framework dynamically assign the partitions.
  • Optimized Rebalance. Jetstream does rebalance of the partitions when any of the Jetstream consumer nodes goes down or a new Jetstream node is added into the cluster. It guarantees no event loss or event duplication during the rebalance process. It also ensures minimal partition transfer during the rebalance process.
  • Full control of consumer threads. Jetstream uses a task queue for consuming different partitions in each topic. Users can fully control the thread pool which executes the tasks in the queue.

[Next : Outbound Kafka Channel>>>] (../wiki/OutboundKafkaChannel)

Clone this wiki locally