Skip to content
This repository has been archived by the owner on Apr 16, 2024. It is now read-only.

scott-mcnulty/simple-pyspark-streaming-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Simple Pyspark Streaming Example

Simple app to test out spark streaming from Kafka.

It's assumed that both docker and docker-compose are already installed on your machine to run this poc. Java, python3, Spark, and kafkacat (optional but recommended) will also be used. Anything that needs to be installed is most likely going to be easiest when using Homebrew (such as kafkacat)

Credits

Jake Mason: Creating the model code.
wurstmeister: For his Kafka Docker setup at his repo.

Table of Contents

Links

Kafka docker image
Run Kafka using docker
Kafka 0.10.0 example producer
Kafkacat git repo
Kafkacat confluence
Spark streaming + Kafka integration guide
Kafka-python

Playbook

After cloning this repo clone the repo below to get some Kafka docker-compose files:

cd simple-pyspark-streaming-example;
git clone https://github.com/wurstmeister/kafka-docker.git

Single Node Kafka Cluster

In the file kafka-docker/docker-compose-single-broker.yml change the KAFKA_ADVERTISED_HOST_NAME environment variable to use localhost.

Start a single node cluster with broker at localhost:9092.

docker-compose -f kafka-docker/docker-compose-single-broker.yml up -d

To verify the cluster was created successfully you can use a program like kafkacat to consume and produce to a topic.

In a new terminal use kafkacat to connect a consumer to the broker with topic test.

kafkacat -b localhost:9092 -C -t test

Add -d broker for debugging:

kafkacat -d broker -b localhost:9092 -C -t test

In another new terminal use kafkacat to connect a producer to the broker with topic test.

kafkacat -b localhost:9092 -P -t test

Type a message into the terminal and press enter to see the message consumed by the kafkacat consumer client.

to top

Multi Node Kafka Cluster

TODO

to top

Words Producer and Consumer

Link to readme

to top

Spark Streaming Application

Link to readme

to top

About

Simple app to test out pyspark streaming from Kafka.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages