-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spark Job #1467
Merged
Merged
Add Spark Job #1467
Changes from 11 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
8c47015
Add a Spark operator to Kubeflow along with integration tests.
holdenk dedc3dd
re-run autoformat
holdenk 5be73cd
default doesn't seem to exists anymore
holdenk 4a0f832
Debug the env list cause it changed
holdenk a1cfbd4
re-run autoformat again
holdenk 49bff26
Specify the env since env list shows default env is the only env pres…
holdenk f8db0e8
Remove debug env list since the operator now works
holdenk 39b503c
autofrmat and indent default
holdenk 6080dd1
Address CR feedback: remove deploymentscope and just use clusterole, …
holdenk 8a688d2
Take out the spark job from ther workflows in components test we just…
holdenk 028184d
Remove namespace as a param and just use the env.
holdenk 0218bd5
Fix end of line on namespace from ; to ,
holdenk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
A very early attempt at allowing Apache Spark to be used with Kubeflow. | ||
Starts a container to run the driver program in, and the rest is up to the Spark on K8s integration. | ||
Based on https://github.com/GoogleCloudPlatform/spark-on-k8s-operator |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,346 @@ | ||
{ | ||
// Define the various prototypes you want to support. | ||
// Each prototype should be a list of different parts that together | ||
// provide a userful function such as serving a TensorFlow or PyTorch model. | ||
all(params, name, env):: [ | ||
$.parts(params, name, env).operatorServiceAccount, | ||
$.parts(params, name, env).operatorClusterRole(), | ||
$.parts(params, name, env).operatorClusterRoleBinding(), | ||
$.parts(params, name, env).deployment, | ||
], | ||
|
||
sparkJob(params, name, env):: [ | ||
$.parts(params, name, env).jobServiceAccount, | ||
$.parts(params, name, env).jobClusterRole, | ||
$.parts(params, name, env).jobClusterRoleBinding, | ||
$.parts(params, name, env).sparkJob, | ||
], | ||
|
||
// Parts should be a dictionary containing jsonnet representations of the various | ||
// K8s resources used to construct the prototypes listed above. | ||
parts(params, name, env):: { | ||
// All ksonnet environments are associated with a namespace and we | ||
// generally want to use that namespace for a component. | ||
// However, in some cases an application may use multiple namespaces in which | ||
// case the namespace for a particular component will be a parameter. | ||
local namespace = env.namespace; | ||
local mainClass = if params.mainClass == "null" then "" else params.mainClass, | ||
local jobArguments = if params.jobArguments == "null" then [] else std.split(params.jobArguments, ","), | ||
local sparkVersion = params.sparkVersion, | ||
|
||
jobServiceAccount:: { | ||
apiVersion: "v1", | ||
kind: "ServiceAccount", | ||
metadata: { | ||
name: name, | ||
namespace: namespace, | ||
}, | ||
}, | ||
|
||
jobClusterRole:: { | ||
apiVersion: "rbac.authorization.k8s.io/v1beta1", | ||
kind: "Role", | ||
metadata: { | ||
namespace: namespace, | ||
name: name, | ||
}, | ||
rules: [ | ||
{ | ||
apiGroups: [ | ||
"", | ||
], | ||
resources: [ | ||
"pods", | ||
], | ||
verbs: [ | ||
"*", | ||
], | ||
}, | ||
{ | ||
apiGroups: [ | ||
"", | ||
], | ||
resources: [ | ||
"services", | ||
], | ||
verbs: [ | ||
"*", | ||
], | ||
}, | ||
], | ||
}, | ||
jobClusterRoleBinding:: { | ||
apiVersion: "rbac.authorization.k8s.io/v1beta1", | ||
kind: "RoleBinding", | ||
metadata: { | ||
name: name, | ||
namespace: namespace, | ||
}, | ||
subjects: [ | ||
{ | ||
kind: "ServiceAccount", | ||
name: name, | ||
namespace: namespace, | ||
}, | ||
], | ||
roleRef: { | ||
kind: "Role", | ||
name: name, | ||
apiGroup: "rbac.authorization.k8s.io", | ||
}, | ||
}, | ||
operatorServiceAccount:: { | ||
apiVersion: "v1", | ||
kind: "ServiceAccount", | ||
metadata: { | ||
name: name, | ||
namespace: namespace, | ||
}, | ||
}, | ||
operatorClusterRole():: { | ||
local roleType = "ClusterRole", | ||
kind: roleType, | ||
apiVersion: "rbac.authorization.k8s.io/v1beta1", | ||
metadata: { | ||
labels: { | ||
app: "spark-operator", | ||
}, | ||
name: name, | ||
}, | ||
rules: [ | ||
{ | ||
apiGroups: [ | ||
"", | ||
], | ||
resources: [ | ||
"pods", | ||
], | ||
verbs: [ | ||
"*", | ||
], | ||
}, | ||
{ | ||
apiGroups: [ | ||
"", | ||
], | ||
resources: [ | ||
"services", | ||
"configmaps", | ||
], | ||
verbs: [ | ||
"create", | ||
"get", | ||
"delete", | ||
], | ||
}, | ||
{ | ||
apiGroups: [ | ||
"", | ||
], | ||
resources: [ | ||
"nodes", | ||
], | ||
verbs: [ | ||
"get", | ||
], | ||
}, | ||
{ | ||
apiGroups: [ | ||
"", | ||
], | ||
resources: [ | ||
"events", | ||
], | ||
verbs: [ | ||
"create", | ||
"update", | ||
"patch", | ||
], | ||
}, | ||
{ | ||
apiGroups: [ | ||
"apiextensions.k8s.io", | ||
], | ||
resources: [ | ||
"customresourcedefinitions", | ||
], | ||
verbs: [ | ||
"create", | ||
"get", | ||
"update", | ||
"delete", | ||
], | ||
}, | ||
{ | ||
apiGroups: [ | ||
"admissionregistration.k8s.io", | ||
], | ||
resources: [ | ||
"mutatingwebhookconfigurations", | ||
], | ||
verbs: [ | ||
"create", | ||
"get", | ||
"update", | ||
"delete", | ||
], | ||
}, | ||
{ | ||
apiGroups: [ | ||
"sparkoperator.k8s.io", | ||
], | ||
resources: [ | ||
"sparkapplications", | ||
"scheduledsparkapplications", | ||
], | ||
verbs: [ | ||
"*", | ||
], | ||
}, | ||
], | ||
}, | ||
operatorClusterRoleBinding():: { | ||
apiVersion: "rbac.authorization.k8s.io/v1beta1", | ||
local bindingType = "ClusterRoleBinding", | ||
local roleType = "ClusterRole", | ||
kind: bindingType, | ||
metadata: { | ||
name: name, | ||
}, | ||
subjects: [ | ||
{ | ||
kind: "ServiceAccount", | ||
name: name, | ||
namespace: namespace, | ||
}, | ||
], | ||
roleRef: { | ||
kind: roleType, | ||
name: name, | ||
apiGroup: "rbac.authorization.k8s.io", | ||
}, | ||
}, | ||
deployment:: { | ||
apiVersion: "apps/v1beta1", | ||
kind: "Deployment", | ||
metadata: { | ||
name: name, | ||
namespace: namespace, | ||
labels: { | ||
"app.kubernetes.io/name": name, | ||
"app.kubernetes.io/version": sparkVersion, | ||
}, | ||
}, | ||
spec: { | ||
replicas: 1, | ||
selector: { | ||
matchLabels: { | ||
"app.kubernetes.io/name": name, | ||
"app.kubernetes.io/version": sparkVersion, | ||
}, | ||
}, | ||
strategy: { | ||
type: "Recreate", | ||
}, | ||
template: { | ||
metadata: { | ||
annotations: { | ||
"prometheus.io/scrape": "true", | ||
"prometheus.io/port": "10254", | ||
"prometheus.io/path": "/metrics", | ||
}, | ||
labels: { | ||
"app.kubernetes.io/name": name, | ||
"app.kubernetes.io/version": sparkVersion, | ||
name: name, | ||
}, | ||
initializers: { | ||
pending: [ | ||
|
||
], | ||
}, | ||
}, | ||
spec: { | ||
serviceAccountName: name, | ||
containers: [ | ||
{ | ||
name: name, | ||
image: params.image, | ||
imagePullPolicy: "Always", | ||
command: [ | ||
"/usr/bin/spark-operator", | ||
], | ||
ports: [ | ||
{ | ||
containerPort: 10254, | ||
}, | ||
], | ||
args: [ | ||
"-logtostderr", | ||
"-enable-metrics=true", | ||
"-metrics-labels=app_type", | ||
], | ||
}, | ||
], | ||
}, | ||
}, | ||
}, | ||
}, | ||
// Job specific configuration | ||
sparkJob:: { | ||
apiVersion: "sparkoperator.k8s.io/v1alpha1", | ||
kind: "SparkApplication", | ||
metadata: { | ||
name: params.jobName, | ||
namespace: namespace, | ||
}, | ||
spec: { | ||
type: params.type, | ||
mode: "cluster", | ||
image: params.image, | ||
imagePullPolicy: "Always", | ||
mainClass: mainClass, | ||
mainApplicationFile: params.applicationResource, | ||
arguments: jobArguments, | ||
volumes: [ | ||
{ | ||
name: "test-volume", | ||
hostPath: { | ||
path: "/tmp", | ||
type: "Directory", | ||
}, | ||
}, | ||
], | ||
driver: { | ||
cores: params.driverCores, | ||
memory: params.driverMemory, | ||
labels: { | ||
version: sparkVersion, | ||
}, | ||
serviceAccount: params.name, | ||
volumeMounts: [ | ||
{ | ||
name: "test-volume", | ||
mountPath: "/tmp", | ||
}, | ||
], | ||
}, | ||
executor: { | ||
cores: params.executorCores, | ||
instances: params.numExecutors, | ||
memory: params.executorMemory, | ||
labels: { | ||
version: params.sparkVersion, | ||
}, | ||
volumeMounts: [ | ||
{ | ||
name: "test-volume", | ||
mountPath: "/tmp", | ||
}, | ||
], | ||
}, | ||
restartPolicy: "Never", | ||
}, | ||
}, | ||
}, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
{ | ||
"name": "spark", | ||
"apiVersion": "0.0.1", | ||
"kind": "ksonnet.io/parts", | ||
"description": "Holden's awesome Spark Job prototype based on https://github.com/GoogleCloudPlatform/spark-on-k8s-operator\n", | ||
"author": "kubeflow-team <kubeflow-discuss@googlegroups.com>", | ||
"contributors": [ | ||
{ | ||
"name": "Holden Karau", | ||
"email": "holden@pigscanfly.ca" | ||
} | ||
], | ||
"repository": { | ||
"type": "git", | ||
"url": "https://github.com/kubeflow/kubeflow" | ||
}, | ||
"bugs": { | ||
"url": "https://github.com/kubeflow/kubeflow/issues" | ||
}, | ||
"keywords": [ | ||
"kubernetes", | ||
"kubeflow", | ||
"machine learning", | ||
"apache spark" | ||
], | ||
"license": "Apache 2.0", | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good girl...
While
Node
s are technically read-only this is still decent practice as the object is so weird in general. Do we need specificNode
information? If so what?I am wondering if this is why we are using
ClusterRole
instead of aRole
Just a nit/question - non blocking because IDGAF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ClusterRole versus Role is now user configurable, if folks don't need to run jobs outside of the namespace where they created the operator we'll just do a Role, but if they want to have the operator and jobs sit in different namespaces we use clusterrole.