Skip to content

edwardoboh/kafka-streams-aggregator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kafka Streams Word Count Application

A Spring Boot application that demonstrates real-time stream processing using Apache Kafka Streams to count word occurrences from incoming sentences.

Overview

This application reads sentences from a Kafka topic, splits them into individual words, counts the occurrences of each word, and outputs the results to another Kafka topic.

Architecture

[sentence topic] → [Kafka Streams App] → [word topic]
     "hello world"      word counting        "hello : 1"
     "hello again"                          "world : 1" 
                                           "hello : 2"
                                           "again : 1"

Features

  • Real-time processing: Processes messages as they arrive
  • Word counting: Splits sentences and maintains running counts of each word
  • Fault tolerance: Uses Kafka Streams' built-in reliability features
  • Scalable: Can be deployed in multiple instances for horizontal scaling

Prerequisites

  • Java 17+
  • Docker and Docker Compose
  • Maven

Quick Start

1. Start Kafka Infrastructure

docker-compose up -d

2. Run the Application

mvn spring-boot:run

3. Create Topics (if not auto-created)

# Create input topic
docker-compose exec kafka kafka-topics --bootstrap-server localhost:9092 --create --topic sentence --partitions 1 --replication-factor 1

# Create output topic  
docker-compose exec kafka kafka-topics --bootstrap-server localhost:9092 --create --topic word --partitions 1 --replication-factor 1

4. Test the Application

Send sentences:

docker-compose exec kafka kafka-console-producer --bootstrap-server localhost:9092 --topic sentence

View word counts:

docker-compose exec kafka kafka-console-consumer --bootstrap-server localhost:9092 --topic word --from-beginning

Configuration

Key application properties in application.yaml:

spring:
  kafka:
    bootstrap-servers: localhost:9092
    streams:
      application-id: word-count-v1
      properties:
        allow.auto.create.topics: true
        processing.guarantee: at_least_once
        replication.factor: 1

How It Works

  1. Input: Receives sentences from the sentence topic
  2. Processing:
    • Splits sentences into individual words using whitespace
    • Groups words by their value
    • Maintains a running count of each word occurrence
  3. Output: Sends word counts as "word : count" format to the word topic

Docker Compose Services

  • Zookeeper: Kafka cluster coordination
  • Kafka: Message broker
  • Topic Initializer: Creates required topics on startup

Development Notes

  • Uses Kafka Streams DSL for stream processing
  • Implements stateful operations (grouping and counting)
  • Handles internal topic creation for state management
  • Configured for local development with single-node Kafka

Troubleshooting

Application won't start:

  • Ensure Kafka is running: docker-compose ps
  • Check topic creation: docker-compose logs init-topics

No messages in output topic:

  • Verify input messages: Check sentence topic has messages
  • Check application logs for processing errors
  • Ensure both topics exist and are accessible

Architecture Decisions

  • CommandLineRunner approach: Manual Kafka Streams configuration for learning purposes
  • String serialization: Simple output format for easy debugging
  • Single partition: Simplified setup for development

Next Steps

  • Add error handling and dead letter queues
  • Implement windowed counting for time-based analysis
  • Add metrics and monitoring
  • Configure for production deployment

About

Dynamic grouping and counting of messages with KStreams and KTable

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published