Kubeflow Spark

Orchestrate Spark Jobs using Kubeflow, a modern Machine Learning orchestration framework. Read related blog post.

Requirements

Run make all to start everything and skip to step 6 or:

./scripts/start-minikube.sh

./scripts/install-kubeflow.sh

./scripts/install-spark-operator.sh

./scripts/add-spark-rbac.sh

./scripts/add-kubeflow-ui-ingress.sh

kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8005:80

python kubeflow_pipeline.py

Navigate to the Pipelines UI and upload the newly created pipeline from file spark_job_pipeline.yaml
Trigger a pipeline run. Make sure to set spark-sa as Service Account for the execution.
Enjoy your orchestrated Spark job execution!

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
k8s-apply-component.yaml		k8s-apply-component.yaml
k8s-get-component.yaml		k8s-get-component.yaml
kubeflow_pipeline.py		kubeflow_pipeline.py
requirements.txt		requirements.txt
spark-job-python.yaml		spark-job-python.yaml
spark-job.yaml		spark-job.yaml