Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 1.5 KB

README.md

File metadata and controls

25 lines (18 loc) · 1.5 KB

Spark on Kubernetes - from Zero to Hero

With Hadoop, a lot of companies have succeeded to handle huge amount of data and create added value for their business. With Spark, data analysing becomes easier and faster. Nowadays, more and more enterprises are trying to migrate also on Cloud based services for several reasons which we will not present here.

Based on Cloud, HDFS of Hadoop plays a much less important role in the ecosystem or whole architecture, because data are stored on Cloud provided deep storage services, such as S3 on AWS, ADLS on Azure and cloud storage on GCP.

So people are trying to bring Spark out of the ecosystem of Hadoop by using other resource managers, such as Apache Mesos or Kubernetes.

In this project, we will try to make Spark work on Kubernetes cluster which is highly searched recently.

Doesn't like other blogs or projects on Github who prepare a all-in-one script, I will try to do it manually step by step in order to show you how to realize it and help you to understand how it works !

We will try to work it out by the following chapters:

For each one (in local PC), I will work on both Spark v2.4.5 and Spark v3.0.0 I suppose you understand the basics of Kubernetes, Docker and Spark