A Flume Channel in which every transaction that puts events waits for corresponding transactions that take the events, like SynchronousQueue. This channel can be used as a faster alternative to File Channel when the channel doesn't need to act as a reservoir. See Choosing channel implementations and Performance.
Let's say, we have the following Flume configuration: [Taildir Source] --> [File Channel] --> [Avro Sink]
.
The Taildir Source reads lines of a log file as they are written by some application. File Channel temporarily stores the events on the disk.
Avro Sink takes events from the channel and sends to a remote Flume instance. We chose File Channel instead of Memory Channel here because we don't want to lose events.
The problem here is that we are basically writing the same data twice to the disk: one in the log file the Taildir Source is watching, and another in the File Channel.
Considering the File Channel, in this case, doesn't need to act as a reservoir since the log file itself already works like one, we should be able to replace it with something new that behaves like SynchronousQueue. It should be more performant because it doesn't use disks. It doesn't lose events and provides at-least once delivery because events to the channel are synchronously taken from the channel.
-
Download a pre-built jar and put it under
$FLUME_HOME/lib
.Alternatively, you can build a jar manually:
mvn clean package cp target/flume-synchronous-channel-1.0.0-SNAPSHOT.jar $FLUME_HOME/lib
-
Add SynchronousChannel to your Flume configuration.
agent.channels = ch1 agent.channels.ch1.type = net.thisptr.flume.plugins.channel.synchronous.SynchronousChannel
When the source has put() events and called commit() of a transaction, the commit() is blocked until the sink(s) take() all the events of the transaction and calls corresponding commit(). This guarantees at-least once delivery because the source transaction succeeds only after the sink has consumed all the events of the transaction.
If you don't need at-least once delivery, just use Memory Channel. If the source itself works like a reservoir, such as in case of Taildir Source, Kinesis Source, or Pub/Sub Source, you can try Synchronous Channel. Otherwise, use File Channel.
At-least Once Delivery | Acts as a reservoir | Performance | |
---|---|---|---|
File Channel | Yes | Yes | Low (Disks are slow) |
Memory Channel | No (Loses events on crash) | Yes | High |
Synchronous Channel | Yes | No (Source is blocked if the sink is slow) | High |
YMMV, but in our benchmark, we observed 80-90% increase in throughput and 75-95% reduction in CPU usage compared to File Channel. See docs/benchmark/README.md for the details.
Apache License, Version 2.0