DT8034 Project 2019

Image analysis using Apache Spark & Apache Kafka.

Dependencies can be found at the end of this README file.

SPARK VERSION: [2.3.2]

Kafka Config

Currently running on a single node.

Kafka version: [2.1.1]

IP: ~~34.90.222.198~~

PORT: 9092 (ZooKeeper port 2181)

Installation

Running locally:

To run locally set: processed.dir = output/ inside video-stream-processor/stream-processor-prop.cfg

spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.2 streamProcessor.py

Running in cloud:

Create cluster

gcloud dataproc clusters create my-cluster \
    --image-version 1.3 \
    --metadata 'MINICONDA_VARIANT=3' \
    --metadata 'MINICONDA_VERSION=latest' \
    --metadata 'CONDA_PACKAGES=opencv=3.4.2' \
    --metadata 'PIP_PACKAGES=pyspark==2.3.2 kafka-python==1.4.6 google-cloud-storage==1.15.0' \
    --initialization-actions \
    gs://dataproc-initialization-actions/conda/bootstrap-conda.sh,gs://dataproc-initialization-actions/conda/install-conda-env.sh

Upload files

The following files need to be uploaded to Google Cloud:

faceDetector.py
stream-processor-prop.cfg
haarcascade_frontalface_default.xml
streamProcessor.py

Create bucket and set processed.dir=gs://[YOUR-BUCKET-NAME]/output/ in stream-processor-prop.cfg

Run in cloud

gcloud dataproc jobs submit pyspark --py-files faceDetector.py,stream-processor-prop.cfg,haarcascade_frontalface_default.xml streamProcessor.py --cluster=my-cluster --properties spark.jars.packages=org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.2

video-stream-collector

This component will handle data collection from file or camera and push the data to our Kafka endpoint. This serves as our producer.

video-stream-processor

This component will handle the processing of the data and serves as our consumer. It will be running as a Spark application and subscribe to our kafka topic, process the data in smaller batches and output it to our bucket.

video-stream-viewer

This component will display the proccesed frames, either from spark directly or via kafka. OpenCV is used to display the frames but this should be changed to some option where the user can select which stream to watch etc.

utils

This component only contains some simple scripts to consume/produce message in our Kafka Broker.

Dependencies

On machine:
Java 8.1
Python 3.7
In Python environment:
opencv-python=4.1.0
kafka-python=1.4.6
pyspark=2.3.2
numpy=1.16.3

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
utils		utils
video-stream-collector		video-stream-collector
video-stream-processor		video-stream-processor
video-stream-viewer		video-stream-viewer
.DS_Store		.DS_Store
README.md		README.md
config.cfg		config.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utils

utils

video-stream-collector

video-stream-collector

video-stream-processor

video-stream-processor

video-stream-viewer

video-stream-viewer

.DS_Store

.DS_Store

README.md

README.md

config.cfg

config.cfg

Repository files navigation

DT8034 Project 2019

Kafka Config

Installation

Running locally:

Running in cloud:

video-stream-collector

video-stream-processor

video-stream-viewer

utils

Dependencies

About

Releases

Packages

Contributors 2

Languages

jacobharsten/DT8034_PROJECT

Folders and files

Latest commit

History

Repository files navigation

DT8034 Project 2019

Kafka Config

Installation

Running locally:

Running in cloud:

video-stream-collector

video-stream-processor

video-stream-viewer

utils

Dependencies

About

Resources

Stars

Watchers

Forks

Languages