Kafka Sink Connect GCP Bigtable

Apache Kafka Sink only Connect can be used to stream messages from Apache Kafka to Google Cloud Platform (GCP) wide column store Bigtable.

What is Apache Kafka

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. For more details, please refer to Apache Kafka home page.

What is Google Cloud Bigtable

Bigtable is a compressed, high performance, proprietary data storage system built on Google File System, Chubby Lock Service, SSTable and a few other Google technologies. On May 6, 2015, a public version of Bigtable was made available as a service in the Google Cloud Platform. For more details, please refer to GCP Bigtable home page.

High Level Architecture

This project leverage bigtable-client-core library (NO HBase) to stream data to GCP Bigtable. bigtable-client-core internally use the gRPC framework to talk to GCP Bigtable.

Prerequisites

Apache ZooKeeper and Apache Kafka installed and running in your machine. Please refer to respective sites to download and start ZooKeeper and Kafka. You would also need Java version 8 or above.

Tested Software Versions

Software	Version	Note
Java	1.8.0_161	You may use java 8 or above. Tested using Java 8.
Kafka	>= 2.11-1.1.1	Please refer. Tested using kafka_2.11-1.1.1, may work with older versions.
Zookeeper	>= 3.4.13	Please refer. Tested using zookeeper-3.4.13.
bigtable-client-core	1.12.1	Please refer.
Kafka connect-api	2.3.0	Please refer.
grpc-netty-shaded	1.23.0	Please refer.

Configurations

bigtable-sink.properties

Property	Value	Data Type	Description
name	bigtable-sink	String	Name of the Sink Connect.
connector.class	BigtableSinkConnector	String	Simple name of the Connector Class.
tasks.max	1	Number	Numbers of tasks.
topics	demo-topic	String	Comma separated list of topics.
config.files.location	kafka_home/config	String	There should be one yml file per topic.
continue.after.write.error	false	Boolean	Continue the task after a batch resulted write error.

demo-topic.yml (one yml file per topic)

Property	Value	Data Type	Description
keyFile:	/home/keys/demo-instance-key.json	String	GCP Connect Key File. This is a topic level configuration because you could subscribe from multiple topics and messages from one topic may go to a table in instance A and messages from another topic may go to a table in instance B
project:	demo-project	String	Name of the GCP Project
instance:	demo-instance	String	Name of GCP Bigtable instance
table:	demo-table	String	Name of GCP Bigtable table
transformer:	com.sanjuthomas.gcp.transform.JsonEventTransformer	String	Transformer class to transform the message to Bigtable writable row. You may provide your own implementation.
keyQualifiers:	- exchange - symbol	Array	Bigtable row key qualifier. Configured element names would be used to construct the row keys.
keyDelimiter:	-	String	Delimiter to use if there are more than one element to construct row key.
families:	- data - metadata	Array	Column families in the Bigtable table. This configuration will be used by the transformer.
familyQualifiers:	- data: - client - exchange - symbol - price - quantity - metadata: - created_at - processed_at - topic - partition	Array	Column family to columns mapping.

Constraints

The current configuration system supports streaming messages from a given topic to a given table. You can subscribe any number of topics, but a topic can be pointed to one and only table. Say for example, if you subscribed from topic named demo-topic, you should have yml file named demo-topic.yml. That yml file contains all the configuration requires to transform and write data into Bigtable.

As of today, we have transformer support for JSON Messages. I'm planning to add the Avro Messages transformer in the next version.

How to deploy the connector

This is maven project. To create an uber jar, execute the following maven goals.

mvn clean compile package shade:shade install

Copy the artifact kafka-connect-gcp-bigtable-1.0.0.jar to kakfa_home/lib folder.

Copy the bigtable-sink.properties file into kafka_home/config folder. Update the content of the property file according to your environment.

Alternatively, you may keep the kafka-connect-gcp-bigtable-1.0.jar in another directory and export that directory into Kafka class path before starting the connector.

How to start connector in stand-alone mode

Open a shell prompt, move to kafka_home and execute the following.

bin/connect-standalone.sh config/connect-bigtable-standalone.properties config/bigtable-sink.properties

How to start connector in distribute mode

Open a shell prompt, change your working directory to kafka_home and execute the following.

bin/connect-distributed.sh config/connect-bigtable-distributed.properties config/bigtable-sink.properties

Questions

Either create an issues in this project or send it to bt@sanju.org. Thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
etc		etc
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
kafka-connect-bigtable.png		kafka-connect-bigtable.png
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka Sink Connect GCP Bigtable

What is Apache Kafka

What is Google Cloud Bigtable

High Level Architecture

Prerequisites

Tested Software Versions

Configurations

bigtable-sink.properties

demo-topic.yml (one yml file per topic)

Constraints

How to deploy the connector

How to start connector in stand-alone mode

How to start connector in distribute mode

Questions

License

About

Releases

Packages

Languages

License

nbyrnes-acv/kafka-connect-gcp-bigtable

Folders and files

Latest commit

History

Repository files navigation

Kafka Sink Connect GCP Bigtable

What is Apache Kafka

What is Google Cloud Bigtable

High Level Architecture

Prerequisites

Tested Software Versions

Configurations

bigtable-sink.properties

demo-topic.yml (one yml file per topic)

Constraints

How to deploy the connector

How to start connector in stand-alone mode

How to start connector in distribute mode

Questions

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages