A Spring Boot application that demonstrates real-time stream processing using Apache Kafka Streams to count word occurrences from incoming sentences.
This application reads sentences from a Kafka topic, splits them into individual words, counts the occurrences of each word, and outputs the results to another Kafka topic.
[sentence topic] → [Kafka Streams App] → [word topic]
"hello world" word counting "hello : 1"
"hello again" "world : 1"
"hello : 2"
"again : 1"
- Real-time processing: Processes messages as they arrive
- Word counting: Splits sentences and maintains running counts of each word
- Fault tolerance: Uses Kafka Streams' built-in reliability features
- Scalable: Can be deployed in multiple instances for horizontal scaling
- Java 17+
- Docker and Docker Compose
- Maven
docker-compose up -d
mvn spring-boot:run
# Create input topic
docker-compose exec kafka kafka-topics --bootstrap-server localhost:9092 --create --topic sentence --partitions 1 --replication-factor 1
# Create output topic
docker-compose exec kafka kafka-topics --bootstrap-server localhost:9092 --create --topic word --partitions 1 --replication-factor 1
Send sentences:
docker-compose exec kafka kafka-console-producer --bootstrap-server localhost:9092 --topic sentence
View word counts:
docker-compose exec kafka kafka-console-consumer --bootstrap-server localhost:9092 --topic word --from-beginning
Key application properties in application.yaml
:
spring:
kafka:
bootstrap-servers: localhost:9092
streams:
application-id: word-count-v1
properties:
allow.auto.create.topics: true
processing.guarantee: at_least_once
replication.factor: 1
- Input: Receives sentences from the
sentence
topic - Processing:
- Splits sentences into individual words using whitespace
- Groups words by their value
- Maintains a running count of each word occurrence
- Output: Sends word counts as
"word : count"
format to theword
topic
- Zookeeper: Kafka cluster coordination
- Kafka: Message broker
- Topic Initializer: Creates required topics on startup
- Uses Kafka Streams DSL for stream processing
- Implements stateful operations (grouping and counting)
- Handles internal topic creation for state management
- Configured for local development with single-node Kafka
Application won't start:
- Ensure Kafka is running:
docker-compose ps
- Check topic creation:
docker-compose logs init-topics
No messages in output topic:
- Verify input messages: Check
sentence
topic has messages - Check application logs for processing errors
- Ensure both topics exist and are accessible
- CommandLineRunner approach: Manual Kafka Streams configuration for learning purposes
- String serialization: Simple output format for easy debugging
- Single partition: Simplified setup for development
- Add error handling and dead letter queues
- Implement windowed counting for time-based analysis
- Add metrics and monitoring
- Configure for production deployment