Skip to content

Presenting 3 ways to run Spark over containers, this project is recommended to those who seek to explore Big Data out of a Hadoop Cluster.

Notifications You must be signed in to change notification settings

lmassaoy/spark-on-k8s

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-on-k8s

Screen-Shot-2020-07-27-at-05-03-34.png

In this repo, I expect to show you a couple of different ways to work with Spark out of a Hadoop Cluster. Kubernetes clusters are becoming more and more common in all sizes of companies, and use their power to process Spark is attractive. With this in mind, I'd like to invite you to join me on a journey of learning in a seek of more wide options to do Big Data.

You're about to face 3 ways of running Spark over containers:

I deeply hope you to have fun with this experience and get yourself more confident to step outside of your traditional Hadoop cluster :)

How to set everything up

Click HERE to follow the step-by-step :)

So far...

Mode Status
K8s: Spark-Submit OK
GCP/spark-on-k8s-operator OK (currently in Beta)
Docker: Jupyter PySpark OK

Architecture

Screen-Shot-2020-07-27-at-04-43-52.png

About

Presenting 3 ways to run Spark over containers, this project is recommended to those who seek to explore Big Data out of a Hadoop Cluster.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published