Skip to content

Data ingestion using Hudi DeltaStreamer and Kafka

Notifications You must be signed in to change notification settings

runalddsouza/hudi-kafka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hudi-kafka

This project has two components:

  • Kafka AvroProducer -> produces cryptocurrency data.
  • Hudi DeltaStreamer -> ingests the data from Kafka and writes to hudi tables.

Producer

  • Install packages: pip install -r requirements.txt
  • Start Producer: python producer/producer.py --topic <topic-name> --bootstrap-servers <broker-server> --schema-registry <schema-registry-url> --log-file <log-file-path>

Consumer

Refer Documentation for configuration.

  • Install Spark
  • Update Hudi config and kafka topic settings in kafka-source.properties
  • Download Hudi utilities bundle and set path in hudi-delta-streamer.sh
  • Start: delta-streamer/hudi-delta-streamer.sh <spark-master> <broker-server> <schema-registry-url> delta-streamer/kafka-source.properties <output-path>

Docker Setup

  • Kafka
  • Schema Registry
  • Zookeeper
  • Producer
  • Consumer (Hudi DeltaStreamer)

Steps:

  • Clone repository
  • Run: cd docker
  • Start services: docker-compose up

About

Data ingestion using Hudi DeltaStreamer and Kafka

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published