## Configure HDFS 3 Sink Properties

Let us configure HDFS 3 Sink Properties so that data can be pushed to HDFS from Kafka Topic.

* We need to make sure that HDFS 3 Sink Connector plugin is already setup.

```shell
ls -ltr /opt/kafka/share/plugins/
```

* Make sure to create a working directory.

```shell
mkdir -p ~/kafka_connect/retail_logs_consume
```

* Define worker level properties as part of **retail_logs_standalone.properties**

```shell
bootstrap.servers=w01.itversity.com:9092,w02.itversity.com:9092

key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter

key.converter.schemas.enable=true
value.converter.schemas.enable=true

offset.storage.file.filename=/home/itversity/kafka_connect/retail_logs_consume/retail.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000

plugin.path=/opt/kafka/share/plugins # This should point to the base directory of the plugins
```

* Define properties for sink. In my case the file name is **retail_logs_hdfs_sink.properties**.

```shell
name=hdfs-sink
connector.class=io.confluent.connect.hdfs3.Hdfs3SinkConnector
tasks.max=3
confluent.topic.bootstrap.servers=w01.itversity.com:9092,w02.itversity.com:9092
topics=dg_retail
hdfs.url=hdfs://m01.itversity.com:9000/user/itversity/retail_consumer # Make sure to change this as per your HDFS folder.
flush.size=1000
```

In [1]:
!ls -ltr /opt/kafka/share/plugins/

total 36
-rw-r--r-- 1 kafka kafka 32329 Apr 26 12:54 apache-curator-2.12.0.pom
drwxr-xr-x 6 kafka kafka  4096 Apr 26 18:59 kafka-connect-hdfs


In [2]:
!mkdir -p ~/kafka_connect/retail_logs_consume

In [3]:
!ls -ltr ~/kafka_connect/retail_logs_consume

total 84
-rwxr-xr-x 1 itversity students   499 Apr 26 18:20 retail_logs_standalone.properties
-rw-rw-r-- 1 itversity students 74589 Apr 26 18:28 kc.log
-rwxr-xr-x 1 itversity students   309 Apr 26 18:33 retail_logs_hdfs_sink.properties


In [4]:
!cat /home/itversity/kafka_connect/retail_logs_consume/retail_logs_standalone.properties

bootstrap.servers=w01.itversity.com:9092,w02.itversity.com:9092

key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter

key.converter.schemas.enable=true
value.converter.schemas.enable=true

offset.storage.file.filename=/home/itversity/kafka_connect/retail_logs_consume/retail.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000

plugin.path=/opt/kafka/share/plugins


In [5]:
!cat /home/itversity/kafka_connect/retail_logs_consume/retail_logs_hdfs_sink.properties

name=hdfs-sink
connector.class=io.confluent.connect.hdfs3.Hdfs3SinkConnector
tasks.max=3
confluent.topic.bootstrap.servers=w01.itversity.com:9092,w02.itversity.com:9092
topics=dg_retail
hdfs.url=hdfs://m01.itversity.com:9000/user/itversity/retail_consumer
flush.size=1000
plugin.path=/opt/kafka/share/plugins
