Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Docker image for deploying and running Apache Kafka Connect

License

Notifications You must be signed in to change notification settings

rueedlinger/kafka-connect-image

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Kafka Connect Docker Image

This project provides a Docker image for deploying and running Apache Kafka Connect. The goal of this project is to have a base or example to build your own Apache Kafka Connect Docker image.

Note: This is my pet project and is used to get a better understanding how to customize an Apache Kafka Connect Docker image. Feel free to fork and modify the image to your needs. The goal was to rebuild a similar Docker image like the one from Confluent.

The Docker image is based on Alpine Linux and contains:

  • Apache Kafka 3.0
  • Java 11 (zulu11-jdk-headless) - A haeadless Java Development Kit (without GUI support).
  • Confluent Hub Client - A CLI which can be used to install Kafka Connect plugins from Confluent Hub.

The following Apache Kafka Connect plugins are already installed:

CI Build / Release

  • CI Build
  • Release Docker

Build and Run (docker-compose)

To build an run the Apache Kafka Connect image just run the following command.

docker-compose up --build

This will start the following Docker containers:

  • zookeeper => Apache Zookeeper (confluentinc/cp-zookeeper)
  • broker => Apache Kafka (confluentinc/cp-kafka)
  • schema-registry=> Schema Registry (confluentinc/cp-schema-registry)
  • connect=> The plain Apache Kafka Connect Docker image (see Dockerfile)
  • kafka-ui=> Kafka UI from Provectus (provectuslabs/kafka-ui)
  • connect-ui => Kafka Connect UI from Lenses.io (landoop/kafka-connect-ui)

When all containers are started you can access different services like

Releases & Docker Image

The Docker image is also published in Docker Hub.

docker pull rueedlinger/kafka-connect:4.0.0
Docker Tag Description
main This is the current release of the main branch.
<major>.<minor>.<patch> Tag for a specific release. For example 2.0.0
Release Kafka Version Java Version
1.0.x 2.6.x 11
2.0.x 2.7.x 11
3.0.x 2.8.x 11
4.0.x 3.0.x 11

Configuration

All environment variables with the prefix CONNECT_ are used to configure Apache Kafka Connect. For example CONNECT_BOOTSTRAP_SERVERS=foo is mapped to the Connect configuration bootstrap.servers=foo.

Note: The setup and configuration is inspired by the Confluent Apache Kafka Connect Docker image.

Environment Variables

The following environment variables are set.

Environment Variable Description Default
CONNECT_HOME The path to Apache Kafka Connect configuration files, plugin directory and classpath directory. /usr/local/connect
KAFKA_HOME The location of the Kafka binaries. /usr/local/kafka
CONFLUENT_HUB_HOME The location of the Confluent Hub cli /usr/local/confluent-hub
CONNECT_WORKER_CONFIG The path to Apache Kafka Connect worker configuration file. $CONNECT_HOME/etc/connect-distributed.properties
CONNECT_LOG_CONFIG The path to Apache Kafka Connect logging configuration file. $CONNECT_HOME/etc/connect-log4j.properties
KAFKA_LOG4J_OPTS Kafka logging configuration -Dlog4j.configuration=file:$CONNECT_LOG_CONFIG
PATH The default PATH variable $KAFKA_HOME/bin:$CONFLUENT_HUB_HOME/bin:$PATH
LOG_DIR LOG_DIR parameter (defines the path name of the directory to which system execution logs are to be output) /var/log

Required Configuration

The following configuration settings are required.

Configuration Description
CONNECT_BOOTSTRAP_SERVERS A host:port pair for establishing the initial connection to the Kafka cluster. Multiple bootstrap servers can be used in the form host1:port1,host2:port2,host3:port3....
CONNECT_GROUP_ID A unique string that identifies the Connect cluster group this worker belongs to.
CONNECT_CONFIG_STORAGE_TOPIC The name of the topic in which to store connector and task configuration data. This must be the same for all workers with the same group.id
CONNECT_OFFSET_STORAGE_TOPIC The name of the topic in which to store offset data for connectors. This must be the same for all workers with the same group.id
CONNECT_STATUS_STORAGE_TOPIC The name of the topic in which to store state for connectors. This must be the same for all workers with the same group.id
CONNECT_KEY_CONVERTER Converter class for keys. This controls the format of the data that will be written to Kafka for source connectors or read from Kafka for sink connectors.
CONNECT_VALUE_CONVERTER Converter class for values. This controls the format of the data that will be written to Kafka for source connectors or read from Kafka for sink connectors.
CONNECT_REST_ADVERTISED_HOST_NAME The hostname that will be given out to other workers to connect to. In a Docker environment, your clients must be able to connect to the Connect and other services. Advertised hostname is how Connect gives out a hostname that can be reached by the client.

Optional Configuration

When nothing else is set the following defaults are used.

Configuration Description Default
TZ The TZ environment variable is used to establish the local time zone. Valid values are Europe/Zurich, America/New_York, Europe/Dublin, ... UTC
LANG The LANG environment variable controls the locale of the host. C.UTF-8
CLASSPATH The Classpath which is set for Apache Kafka Connect. $CONNECT_HOME/jars/*
CONNECT_PLUGIN_PATH The plugin.path value that indicates the location from which to load Connect plugins in classloading isolation. $CONNECT_HOME/plugins,/usr/local/share/java
CONNECT_INTERNAL_KEY_CONVERTER Converter class for internal keys that implements the Converter interface. org.apache.kafka.connect.json.JsonConverter with value.converter.schemas.enable=true
CONNECT_INTERNAL_VALUE_CONVERTER Converter class for internal values that implements the Converter interface. org.apache.kafka.connect.json.JsonConverter with key.converter.schemas.enable=true
CONNECT_REST_PORT Port for the REST API to listen on. 8083
CONNECT_LOG4J_ROOT_LOGLEVEL The root log level. INFO
CONNECT_LOG4J_LOGGERS There is also an option to override other log4j properties. Valid options are org.apache.zookeeper=ERROR,org.I0Itec.zkclient=ERROR,org.reflections=ERROR -
CONNECT_LOG4J_APPENDER_STDOUT_LAYOUT_CONVERSIONPATTERN The logging format which is used. '[%d] %p %X{connector.context}%m (%c:%L)%n'

Confluent Hub Client

Confluent Hub Client is installed in the Docker image and can be used to install connectors from Confluent Hub.

Just run

confluent-hub install <connector> \
   --component-dir $CONNECT_HOME/plugins \
   --worker-configs $CONNECT_WORKER_CONFIG \
   --no-prompt

or use the convenience script confluent-hub-install which has all the required properties already set.

confluent-hub-install <connector>

How to Install Other Plugins

If you want to install other Kafka Connect plugins (Connectors, SMT, etc.) you have two options:

  1. Create a Dockerfile and install the plugin with the confluent-hub CLI.
FROM rueedlinger/kafka-connect:4.0.0

# Install connectors from Confluent Hub with convenience script.
# This will install the plugin in $CONNECT_HOME/plugins
RUN confluent-hub-install confluentinc/kafka-connect-jdbc:10.0.1

# Or directly with confluent-hub CLI
# confluent-hub install confluentinc/kafka-connect-jdbc:10.0.1 \
#   --component-dir $CONNECT_HOME/plugins \
#   --worker-configs $CONNECT_WORKER_CONFIG \
#   --no-prompt
  1. Create a Dockerfile and place the connector in one of the Kafka Connect plugin directories.
FROM rueedlinger/kafka-connect:4.0.0

# Add the connector plugin to /usr/local/share/java
ADD connector.jar /usr/local/share/java

# Or add the connector $CONNECT_HOME/plugins
# ADD connector.jar $CONNECT_HOME/plugins

Note: If you want to install Kafka Consumer or Producer interceptors, you should place them in $CONNECT_HOME/jars. Because $CONNECT_HOME/jars is added to CLASSPATH when starting Apache Kafka Connect.