iomete: Kafka Streaming Job

This is a collection of data movement capabilities. This streaming job copies data from Kafka to Iceberg.

Deserialization

Currently, two deserialization format supported.

JSON
AVRO

JSON

In the Spark configuration, a user-defined reference json schema can be defined, and the system processes the binary data accordingly. Otherwise, It considers the schema of the first row and assumes the rest of the rows is compatible.

Avro

Converts binary data according to the schema defined by the user or retrieves the schema from the schema registry.

Job creation

Go to Spark Jobs.
Click on Create New.

Specify the following parameters (these are examples, you can change them based on your preference):

Name: kafka-streaming-job
Docker Image: iomete/iomete_kafka_streaming_job:0.2.1
Main application file: local:///app/driver.py
Environment Variables: LOG_LEVEL: INFO or ERROR
Config file:

{
  kafka: {
      bootstrap_servers: "localhost:9092",
      topic: "usage.spark.0",
      serialization_format: json,
      group_id: group_1,
      starting_offsets: latest,
      trigger: {
        interval: 5
        unit: seconds # minutes
      },
      schema_registry_url: "http://127.0.0.1:8081"
  },
  database: {
    schema: default,
    table: spark_usage_20
  }
}

Configuration properties

Property Description

kafka

Required properties to connect and configure.

`bootstrap_servers`	Kafka broker server.
`topic`	Kafka topic name.
`serialization_format`	Value data serialization format.
`group_id`	Consumer group id.
`starting_offsets`	Specify where to start instead.
`trigger`	`interval` Processing trigger interval. `unit` Processing trigger unit: seconds, minutes

database

Destination database properties.

schema Specify the schema (database) to store into.
table Specify the table.

Create Spark Job

Create Spark Job - Instance

You can use Environment Variables to store your sensitive data like password, secrets, etc. Then you can use these variables in your config file using the ${ENV_NAME} syntax.

Create Spark Job - Application Environment

Create Spark Job - Application dependencies

Tests

Prepare the dev environment

virtualenv .env #or python3 -m venv .env
source .env/bin/activate

pip install -e ."[dev]"

Run test

pytest

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docker		docker
docs/img		docs/img
kafka_streaming_job		kafka_streaming_job
.dockerignore		.dockerignore
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
application.conf		application.conf
driver.py		driver.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iomete: Kafka Streaming Job

Table of Contents

Deserialization

JSON

Avro

Job creation

Configuration properties

Tests

About

Releases

Packages

Languages

iomete/kafka-streaming-job

Folders and files

Latest commit

History

Repository files navigation

iomete: Kafka Streaming Job

Table of Contents

Deserialization

JSON

Avro

Job creation

Configuration properties

Tests

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages