Support Spark natively in Kubernetes #34377

Closed
foxish opened this Issue Oct 8, 2016 · 41 comments

Comments

@foxish
Member

foxish commented Oct 8, 2016

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.

Concepts

  • Spark Application: A spark application is what a user submits to execute his program on the spark framework.
  • Spark Driver: Runs the “main” program associated with a Spark application.
  • Spark Context: A spark context is created with a particular configuration by a spark application, and is the entry-point to the spark execution environment.
  • Spark Executor: One or more processes which listen to the Spark Driver and perform individual Spark Tasks as instructed by the Spark driver.
  • Spark Standalone mode: Spark standalone mode is lightweight cluster manager within the main spark project itself, which lets one turn up a spark cluster, i.e. individual master and worker processes.
  • Spark Client mode: In client mode, the driver is launched on the client machine (outside the cluster).
  • Spark Cluster mode: In cluster mode, the driver is launched onto the cluster, for any supported type of cluster, which is specified at the time of submitting the spark application into the cluster.
  • Spark Dynamic Allocation: It is a configuration flag which can be turned on. Dynamic allocation scales the number of executors based on the workload presented. Dynamic allocation of executors requires an "external shuffle service" when used in cluster mode.

Problem

Kubernetes currently supports standalone mode only. There are various efforts within the project dealing with spark-standalone such as:

Limitations of standalone mode

standalone

  • Spark workers cannot be elastic depending on job demands easily, as the spark components cannot speak directly to kubernetes.
  • Since the spark-master and spark-worker aren't designed to work with a resource manager it doesn’t provide easy integration to Kubernetes features (namespace, rolling out, etc).
  • If several users share a standalone cluster, it is not strongly multi-tenant and isolation requires that each user has his/her own standalone cluster.
  • Each worker wants to use an entire node. This is not resource efficient. Sharing resources between different clusters as they scale dynamically during the execution of a multi-stage spark application requires an external manager/monitor.

Native support

native

It is possible for spark to support Kubernetes natively, which would obviate the need for standalone deployments of masters and workers.

  • Spark knows best about its resource requirements and if it directly accessed the kubernetes API to create workers and drivers, that would be more efficient.
  • We want generic resource management tools in or alongside kubernetes that apply to Spark as well as other application types. We don't want application-specific resource management tools (Standalone cluster scaler).
  • Having one resource manager (Spark Standalone) nested inside another resource manager (Kubernetes) causes many subtle issues and bugs when run in production.

Approach

  • Spark provides support for writing integrations with external cluster managers.
  • Writing a cluster manager "plugin" for kubernetes would solve all the above problems.

Proposed flow with native spark integration

  • A kubernetes cluster already exists, and the user has logged in to it. His/her .kube/config that has credentials and a current-context (namespace) selected already.
    • The namespace has sufficient resource quota to create several pods.
  • spark-submit is used to submit a new spark application to the k8s cluster. The target being a k8s cluster may be indicated by specifying the --master=k8s://default and --deploy-mode=cluster arguments to spark-submit.
  • Spark-submit creates a driver pod, based on a standard spark image which the user can override if needed.
  • The driver pod exposes Spark UI typically on port 4040 which is exposed outside the cluster or can be accessed via the apiserver proxy (user can set some option to control this).
  • This driver pod now creates creates executor pods based on options provided to it.
  • The driver pod authenticates as the namespace's default service account when talking to the api-server (or the user can select a non-default service account). The user has to have ensured that this service account has permission to create pods in this namespace.
  • The driver pod should create and post a secret that the executor pods can read, and use as a token to authenticate themselves to their driver.
  • The executor pods start up and report back to the driver at the necessary RPC endpoint to indicate that they are ready to process the submitted spark application.
  • The driver pod starts to issue tasks to the registered executor pods.
  • When processing completes, the logs of the driver pod contain the output.
  • The driver pod cleans up by killing executors and deleting the service endpoint and finally itself.

Proof of concept

  • Kubernetes is a sub-project/cluster plugin project in Apache Spark.
  • Compiling spark with kubernetes support is possible.
  • Driver and executor pods are created on a kubernetes cluster.
  • It is possible to execute sample programs (e.g. SparkPi) on the cluster by invoking spark-submit.

Fork of the Spark project: https://github.com/apache-spark-on-k8s/spark/
JIRA: https://issues.apache.org/jira/browse/SPARK-18278

Future work

  • Support for dynamic executor scaling.
  • This requires an external shuffle service and this part of the design is still under discussion.
  • Support for various commandline options.
  • Support for job-ids, with ease of submitting, killing and checking status of jobs through spark-submit.
  • Working spark-shell by attaching stdin and stdout from the driver pod.

cc @kubernetes/sig-apps

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Oct 9, 2016

Contributor

Edit: Oops, wrong Will Benton :)

Contributor

smarterclayton commented Oct 9, 2016

Edit: Oops, wrong Will Benton :)

@k82cn

This comment has been minimized.

Show comment
Hide comment
@k82cn

k82cn Oct 9, 2016

Member

+1 on "generic resource management tools", I have similar ideas on that when working on kube-mesos.

Member

k82cn commented Oct 9, 2016

+1 on "generic resource management tools", I have similar ideas on that when working on kube-mesos.

@xiang90

This comment has been minimized.

Show comment
Hide comment
Contributor

xiang90 commented Oct 9, 2016

@hongchaodeng

This comment has been minimized.

Show comment
Hide comment
@hongchaodeng

hongchaodeng Oct 10, 2016

Member

Nice work!
@foxish Are you going to implement this? If so, let me know if you need any help on coding side.

Member

hongchaodeng commented Oct 10, 2016

Nice work!
@foxish Are you going to implement this? If so, let me know if you need any help on coding side.

@foxish

This comment has been minimized.

Show comment
Hide comment
@foxish

foxish Oct 10, 2016

Member

@hongchaodeng Yes, the plan is to extend the proof of concept (linked above in the issue) to fill in the gaps and get it merged upstream into the spark project.

Member

foxish commented Oct 10, 2016

@hongchaodeng Yes, the plan is to extend the proof of concept (linked above in the issue) to fill in the gaps and get it merged upstream into the spark project.

@erictune

This comment has been minimized.

Show comment
Hide comment
Member

erictune commented Oct 10, 2016

@mattf

This comment has been minimized.

Show comment
Hide comment
Contributor

mattf commented Oct 11, 2016

@foxish

This comment has been minimized.

Show comment
Hide comment
@foxish

foxish Oct 12, 2016

Member

A first draft minimum checklist items that need to be completed in order to get it merged upstream. (please add/remove items if you don't think all the steps are necessary, or something is missing)

  • spark-submit must generate a unique spark job ID that can be used to kill/check status of a spark job. (foxish/spark#3)
  • support for spark distribution to be baked into docker image.
  • support for any remaining spark-submit commandline options.
  • support for namespaces to provide quota and isolation between users.
  • spark code style guide compliance
  • testing against a non-trivial set of spark jobs, and various commandline options.
  • spark improvement proposal (SIP) proposing the changes.

cc @tnachen

Member

foxish commented Oct 12, 2016

A first draft minimum checklist items that need to be completed in order to get it merged upstream. (please add/remove items if you don't think all the steps are necessary, or something is missing)

  • spark-submit must generate a unique spark job ID that can be used to kill/check status of a spark job. (foxish/spark#3)
  • support for spark distribution to be baked into docker image.
  • support for any remaining spark-submit commandline options.
  • support for namespaces to provide quota and isolation between users.
  • spark code style guide compliance
  • testing against a non-trivial set of spark jobs, and various commandline options.
  • spark improvement proposal (SIP) proposing the changes.

cc @tnachen

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 12, 2016

@foxish although it may not be on a critical path from upstream's POV, I think it is important to have a story around multiple applications sharing and/or contending for resources. For example, by respecting namespaces and their limits, or allowing executors to be re-assigned to new jobs to maintain fair sharing of executor resources (i.e. preemption of jobs on executors), or a combination of those two (or some hypothetical third solution).

Full understanding of this topic is TBD, but I'd propose that dynamic executors, with their requirement of shuffle services on each node, should also be part of the SIP and PR.

@foxish although it may not be on a critical path from upstream's POV, I think it is important to have a story around multiple applications sharing and/or contending for resources. For example, by respecting namespaces and their limits, or allowing executors to be re-assigned to new jobs to maintain fair sharing of executor resources (i.e. preemption of jobs on executors), or a combination of those two (or some hypothetical third solution).

Full understanding of this topic is TBD, but I'd propose that dynamic executors, with their requirement of shuffle services on each node, should also be part of the SIP and PR.

@foxish

This comment has been minimized.

Show comment
Hide comment
@foxish

foxish Oct 12, 2016

Member

@erikerlandson I've added the namespaces requirement to our checklist, so as to have a good way to manage spark jobs and isolating logs. However, I think dynamic allocation, preemption, and so on, can be taken care of in a follow-up, so as to not expand the scope of this first PR too much.

Member

foxish commented Oct 12, 2016

@erikerlandson I've added the namespaces requirement to our checklist, so as to have a good way to manage spark jobs and isolating logs. However, I think dynamic allocation, preemption, and so on, can be taken care of in a follow-up, so as to not expand the scope of this first PR too much.

@tnachen

This comment has been minimized.

Show comment
Hide comment
@tnachen

tnachen Oct 12, 2016

Contributor

Dynamic resource allocation in terms of namespaces will be nice but also a bit use case specific. Also executors in the Spark world is not sharable and launched per job, so we are really talking about machine resources that either can be preempted or moved around based on some other policy outside. We can worry about those later as I believe having a good usability story is more important as it's also largely a unsolved problem in Spark anyways

Contributor

tnachen commented Oct 12, 2016

Dynamic resource allocation in terms of namespaces will be nice but also a bit use case specific. Also executors in the Spark world is not sharable and launched per job, so we are really talking about machine resources that either can be preempted or moved around based on some other policy outside. We can worry about those later as I believe having a good usability story is more important as it's also largely a unsolved problem in Spark anyways

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 13, 2016

My POV on that is that elasticity and fair resource sharing are both part of a good usability story. However people may have legitimately differing points of view on that. IMO, it's worth investigating to see if it is feasible as part of an initial design.

My POV on that is that elasticity and fair resource sharing are both part of a good usability story. However people may have legitimately differing points of view on that. IMO, it's worth investigating to see if it is feasible as part of an initial design.

@erictune

This comment has been minimized.

Show comment
Hide comment
@erictune

erictune Oct 14, 2016

Member

@davidopp who also likes resource fairness and batch and elasticity.

Member

erictune commented Oct 14, 2016

@davidopp who also likes resource fairness and batch and elasticity.

@erictune

This comment has been minimized.

Show comment
Hide comment
@erictune

erictune Oct 15, 2016

Member

I agree with @erikerlandson that elasticity and fair resource sharing are both part of a good usability story for Spark on Kubernetes, or any type of batch computation on Kubernetes.

I think fairness is something that should happen at the Kubernetes level. So, if I start a Storm job and Tim starts a Spark job, and Jane starts a TensorFlow job, and so on, I think we each expect to get a fair share of the total Kubernetes cluster resources.

We may someday want to build more fairness support into Kubernetes. But today, it already gives us some good primitives to use to control fairness at a Kubernetes cluster level. Those primitives are: QoS, ResourceQuota, pending pods, and node autoscaling. Let's make sure Spark works with these existing Kubernetes primitives. Specific things to make sure work:

  • verify that spark jobs can use Burstable CPU
    • e.g. cluster has 100 cores total, and I start 10 pods, each with CPU request 1, limit 10. I can use close to the full 100 cores by bursting up to my limit. This should work even if my CPU quota is only 10 CPUs.
  • verify that multiple Spark jobs get fair shares of Burstable CPU
    • e.g. when you run two jobs like in the previous example, they each get a fair share (about 50 cores of usage each). This should just work because of Linux CFS. But let's test it.
  • ensure that the spark driver is aware of quota
    • e.g. when a user runs a spark driver and tells it to create more pods than the user has quota for, it should not spam the logs with errors about being unable to make pods. It should give the user a reason for not scaling to the desired executor count (no quota), while still running the Spark job to completion (albeit more slowly).
  • ensure the spark driver works with pending pods
    • e.g. when I start a spark job that doesn't fit on the current cluster, because nodes are occupied by other pods, the driver should notice that the pods it creates are pending and surface the reason to the user for not having the desired executor count, and not spam the logs with errors.
  • ensure spark driver works with cluster scaling
    • e.g. in the situation in the previous item, with pending pods, if a node is then added to the cluster, then the pending pods should start, and register with the driver, and pick up work.
Member

erictune commented Oct 15, 2016

I agree with @erikerlandson that elasticity and fair resource sharing are both part of a good usability story for Spark on Kubernetes, or any type of batch computation on Kubernetes.

I think fairness is something that should happen at the Kubernetes level. So, if I start a Storm job and Tim starts a Spark job, and Jane starts a TensorFlow job, and so on, I think we each expect to get a fair share of the total Kubernetes cluster resources.

We may someday want to build more fairness support into Kubernetes. But today, it already gives us some good primitives to use to control fairness at a Kubernetes cluster level. Those primitives are: QoS, ResourceQuota, pending pods, and node autoscaling. Let's make sure Spark works with these existing Kubernetes primitives. Specific things to make sure work:

  • verify that spark jobs can use Burstable CPU
    • e.g. cluster has 100 cores total, and I start 10 pods, each with CPU request 1, limit 10. I can use close to the full 100 cores by bursting up to my limit. This should work even if my CPU quota is only 10 CPUs.
  • verify that multiple Spark jobs get fair shares of Burstable CPU
    • e.g. when you run two jobs like in the previous example, they each get a fair share (about 50 cores of usage each). This should just work because of Linux CFS. But let's test it.
  • ensure that the spark driver is aware of quota
    • e.g. when a user runs a spark driver and tells it to create more pods than the user has quota for, it should not spam the logs with errors about being unable to make pods. It should give the user a reason for not scaling to the desired executor count (no quota), while still running the Spark job to completion (albeit more slowly).
  • ensure the spark driver works with pending pods
    • e.g. when I start a spark job that doesn't fit on the current cluster, because nodes are occupied by other pods, the driver should notice that the pods it creates are pending and surface the reason to the user for not having the desired executor count, and not spam the logs with errors.
  • ensure spark driver works with cluster scaling
    • e.g. in the situation in the previous item, with pending pods, if a node is then added to the cluster, then the pending pods should start, and register with the driver, and pick up work.
@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 15, 2016

I am doing some experiments with how this prototype interacts with namespaces, to see what kind of issues and best practices arise with using them as a resource sharing mechansim.

I generally agree that it's desirable to keep as much advanced scheduling logic out of spark itself, as possible. Spark has a concept of fair-share that can be configured, although at least in a standalone cluster it doesn't work very well due to the fact that executors are not preempted.

Executors can be preempted by killing them at the k8s cluster level. To make this work well it's desirable to have k8s be able to identify idle executors and preempt those when possible. Better yet would be an ability to tell a spark driver to do a graceful "draining" of executors and then either shutdown or re-assign.

I am doing some experiments with how this prototype interacts with namespaces, to see what kind of issues and best practices arise with using them as a resource sharing mechansim.

I generally agree that it's desirable to keep as much advanced scheduling logic out of spark itself, as possible. Spark has a concept of fair-share that can be configured, although at least in a standalone cluster it doesn't work very well due to the fact that executors are not preempted.

Executors can be preempted by killing them at the k8s cluster level. To make this work well it's desirable to have k8s be able to identify idle executors and preempt those when possible. Better yet would be an ability to tell a spark driver to do a graceful "draining" of executors and then either shutdown or re-assign.

@tnachen

This comment has been minimized.

Show comment
Hide comment
@tnachen

tnachen Oct 15, 2016

Contributor

Identifying idle executors is already done by the dynamic allocation feature by Spark itself, so generally you don't need to do anything in k8s. The native integration code from k8s will be called from Spark and it will then need to shut down those executors. Same as when the job needs more executors.

Contributor

tnachen commented Oct 15, 2016

Identifying idle executors is already done by the dynamic allocation feature by Spark itself, so generally you don't need to do anything in k8s. The native integration code from k8s will be called from Spark and it will then need to shut down those executors. Same as when the job needs more executors.

@tnachen

This comment has been minimized.

Show comment
Hide comment
@tnachen

tnachen Oct 15, 2016

Contributor

@erictune @erikerlandson definitely agree with fair sharing and elasticity being important for usability, but just want to prioritize doing the basics well first (deploy executors and client jars, also including python, R, how to get logging, error handling, , etc).

Contributor

tnachen commented Oct 15, 2016

@erictune @erikerlandson definitely agree with fair sharing and elasticity being important for usability, but just want to prioritize doing the basics well first (deploy executors and client jars, also including python, R, how to get logging, error handling, , etc).

@erictune

This comment has been minimized.

Show comment
Hide comment
@erictune

erictune Oct 17, 2016

Member

@tnachen Agree about prioritization. I think we have several avenues we can pursue for fairness/elasticity, so we don't need to block on it.

Member

erictune commented Oct 17, 2016

@tnachen Agree about prioritization. I think we have several avenues we can pursue for fairness/elasticity, so we don't need to block on it.

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 21, 2016

Modfication to use images with spark preinstalled:
foxish/spark#1

Modfication to use images with spark preinstalled:
foxish/spark#1

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 22, 2016

I am starting to think it's desirable to support a rest-based submission as another option, i.e. creating kube subclasses for RestSubmissionServer and friends. Mesos assumes this mode, and stand-alone submission defaults to it. It requires the rest server to be spun up, but also allows other rest clients to be programmed against it.

I am starting to think it's desirable to support a rest-based submission as another option, i.e. creating kube subclasses for RestSubmissionServer and friends. Mesos assumes this mode, and stand-alone submission defaults to it. It requires the rest server to be spun up, but also allows other rest clients to be programmed against it.

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 22, 2016

I believe I've been over-thinking the client jar file staging. The client submission needs it, but executors do not. This commit show how it can be done without mounting any volumes at all:
apache/spark@04d8edd

Previous volume-based idea, for posterity:

The current prototype uses a pod annotation to stage the user's jar-file on the containers. I believe it'd be more efficient to mount it on some kind of persistent volume, which can then be mounted on any pods/containers. The creation of the volume could be done at the time the initial driver pod is initialized, then used by executor pods.

Another benefit of this approach is that it may be more portable. For example, the current prototype actually comes very close to working against an openshift cluster, and the only thing that fails is that the cluster doesn't recognize the "pod.beta.kubernetes.io/init-containers" annotation, and so it doesn't ever stage the jar file. Moving to volume-based staging would solve this as well.

Using a persistent volume might require making the particular flavor of volume some kind of parameter, since there are quite a few flavors a spark user or cluster admin might prefer to use.

erikerlandson commented Oct 22, 2016

I believe I've been over-thinking the client jar file staging. The client submission needs it, but executors do not. This commit show how it can be done without mounting any volumes at all:
apache/spark@04d8edd

Previous volume-based idea, for posterity:

The current prototype uses a pod annotation to stage the user's jar-file on the containers. I believe it'd be more efficient to mount it on some kind of persistent volume, which can then be mounted on any pods/containers. The creation of the volume could be done at the time the initial driver pod is initialized, then used by executor pods.

Another benefit of this approach is that it may be more portable. For example, the current prototype actually comes very close to working against an openshift cluster, and the only thing that fails is that the cluster doesn't recognize the "pod.beta.kubernetes.io/init-containers" annotation, and so it doesn't ever stage the jar file. Moving to volume-based staging would solve this as well.

Using a persistent volume might require making the particular flavor of volume some kind of parameter, since there are quite a few flavors a spark user or cluster admin might prefer to use.

@davidopp

This comment has been minimized.

Show comment
Hide comment
@davidopp

davidopp Oct 22, 2016

Member

I'm interested in the fair sharing/elasticity bits, once the initial prototype is finished.

Member

davidopp commented Oct 22, 2016

I'm interested in the fair sharing/elasticity bits, once the initial prototype is finished.

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 24, 2016

It will be important to allow a spark-submit command to specify one or more volumes to mount on the driver-pod that contain data to be ingested by the spark app (and/or written to with output)

It will be important to allow a spark-submit command to specify one or more volumes to mount on the driver-pod that contain data to be ingested by the spark app (and/or written to with output)

@foxish

This comment has been minimized.

Show comment
Hide comment
@foxish

foxish Oct 24, 2016

Member

From the documentation.

Spark can create distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, Amazon S3, etc.

I'm guessing we would need to specify volumes only in case we want to supply input files from the file-system.
Are you suggesting that we allow mounting a single PV into the driver and every executor with mode ReadOnlyMany?

Member

foxish commented Oct 24, 2016

From the documentation.

Spark can create distributed datasets from any storage source supported by Hadoop, including your local file system, HDFS, Cassandra, HBase, Amazon S3, etc.

I'm guessing we would need to specify volumes only in case we want to supply input files from the file-system.
Are you suggesting that we allow mounting a single PV into the driver and every executor with mode ReadOnlyMany?

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Oct 24, 2016

(replaced S3 -> EBS, which is the supported storage type)
That is definitely one scenario. Other scenarios I had in mind were related to decoupling from storage endpoint creds. Taking AWS EBS as a representative example, a spark app that wants to open an RDD against EBS also has to establish secrets. But if the EBS endpoint were mounted as a kube PV, then the Spark app could be coded against a mountpoint, and be decoupled from creds. It would also allow the PV to be re-targeted to changing endpoints (and possibly even differing flavors of endpoint) without altering the application submission or code. Another useful property of leveraging PVs is that it isolates any potential issues with visibility to outside storage endpoints to the PV itself.

erikerlandson commented Oct 24, 2016

(replaced S3 -> EBS, which is the supported storage type)
That is definitely one scenario. Other scenarios I had in mind were related to decoupling from storage endpoint creds. Taking AWS EBS as a representative example, a spark app that wants to open an RDD against EBS also has to establish secrets. But if the EBS endpoint were mounted as a kube PV, then the Spark app could be coded against a mountpoint, and be decoupled from creds. It would also allow the PV to be re-targeted to changing endpoints (and possibly even differing flavors of endpoint) without altering the application submission or code. Another useful property of leveraging PVs is that it isolates any potential issues with visibility to outside storage endpoints to the PV itself.

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Oct 25, 2016

Contributor
Contributor

smarterclayton commented Oct 25, 2016

@foxish foxish referenced this issue in foxish/spark Nov 3, 2016

Merged

Use images with spark pre-installed #1

@foxish

This comment has been minimized.

Show comment
Hide comment
@foxish

foxish Nov 10, 2016

Member

Proposing the use of ThirdPartyResources to store state. foxish/spark#3

Member

foxish commented Nov 10, 2016

Proposing the use of ThirdPartyResources to store state. foxish/spark#3

@k82cn

This comment has been minimized.

Show comment
Hide comment
@k82cn

k82cn Nov 10, 2016

Member

@foxish , how about current status? I'm also investigating how to run multiple application in Kubernetes, for example, long running applications and Spark.

I'd like to see is it possible to build a resource management layer in k8s, so resource manager will help on preemption, fair sharing of resources; we proposed an solution based on Mesos (refer to KubeCon 2016: Kuberentes on EGO), but I'd like to the possibility to enable it by k8s.

Member

k82cn commented Nov 10, 2016

@foxish , how about current status? I'm also investigating how to run multiple application in Kubernetes, for example, long running applications and Spark.

I'd like to see is it possible to build a resource management layer in k8s, so resource manager will help on preemption, fair sharing of resources; we proposed an solution based on Mesos (refer to KubeCon 2016: Kuberentes on EGO), but I'd like to the possibility to enable it by k8s.

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Nov 11, 2016

@k82cn I'm also quite interested in controllers and HPA for managing the executor scaling. I think this idea for ThirdPartyResource is promising for getting useful metrics published at the kube layer:
foxish/spark#3

I'd also like to see a hook for externally signaling graceful executor shutdown, so a controller can do executor scale-down without throwing away executor work in progress.

@k82cn I'm also quite interested in controllers and HPA for managing the executor scaling. I think this idea for ThirdPartyResource is promising for getting useful metrics published at the kube layer:
foxish/spark#3

I'd also like to see a hook for externally signaling graceful executor shutdown, so a controller can do executor scale-down without throwing away executor work in progress.

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Nov 11, 2016

I did a briefing on this prototype to the OpenShift Commons recently. Describes the work so far, and also demonstrates that the kubernetes capability can be run against OpenShift out of the box
https://blog.openshift.com/openshift-commons-big-data-sig-2-running-apache-spark-natively-on-kubernetes-with-openshift/

I did a briefing on this prototype to the OpenShift Commons recently. Describes the work so far, and also demonstrates that the kubernetes capability can be run against OpenShift out of the box
https://blog.openshift.com/openshift-commons-big-data-sig-2-running-apache-spark-natively-on-kubernetes-with-openshift/

@k82cn

This comment has been minimized.

Show comment
Hide comment
@k82cn

k82cn Nov 13, 2016

Member

@erikerlandson , thanks very much :). As we sync up, I'll try your prototype firstly and propose the design on resource management part. #36716 is also filed to trace the discussion on general resource management layer.

Member

k82cn commented Nov 13, 2016

@erikerlandson , thanks very much :). As we sync up, I'll try your prototype firstly and propose the design on resource management part. #36716 is also filed to trace the discussion on general resource management layer.

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Nov 13, 2016

A PR for dynamic executor support: foxish/spark#4

A PR for dynamic executor support: foxish/spark#4

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Nov 24, 2016

PSA: I encourage anybody in the community who is interested in seeing this adopted by Apache Spark to vote and/or comment on the corresponding Spark JIRA:
https://issues.apache.org/jira/browse/SPARK-18278

PSA: I encourage anybody in the community who is interested in seeing this adopted by Apache Spark to vote and/or comment on the corresponding Spark JIRA:
https://issues.apache.org/jira/browse/SPARK-18278

@erikerlandson

This comment has been minimized.

Show comment
Hide comment
@erikerlandson

erikerlandson Nov 29, 2016

I am going to present the spark-on-kube topic at the Kubernetes SIG-Apps next Monday (Dec 5). It will be similar to the OpenShift briefing but the focus will be on kube specifically, and I'll update it to cover all the latest contributions.

I am going to present the spark-on-kube topic at the Kubernetes SIG-Apps next Monday (Dec 5). It will be similar to the OpenShift briefing but the focus will be on kube specifically, and I'll update it to cover all the latest contributions.

@ash211

This comment has been minimized.

Show comment
Hide comment
@ash211

ash211 Jan 13, 2017

For those not already aware, activity around native Spark support for Kubernetes has converged into changes being made to Apache Spark prepped in this repository: https://github.com/apache-spark-on-k8s/spark

ash211 commented Jan 13, 2017

For those not already aware, activity around native Spark support for Kubernetes has converged into changes being made to Apache Spark prepped in this repository: https://github.com/apache-spark-on-k8s/spark

@vgkowski

This comment has been minimized.

Show comment
Hide comment
@vgkowski

vgkowski Jan 14, 2017

Discovering this, it's a very interesting initiative. Did you already think about data locality in an IP per container model ? Spark use hostname to determine the best locality of a task. Is it currently part of the work on this branch ?

Discovering this, it's a very interesting initiative. Did you already think about data locality in an IP per container model ? Spark use hostname to determine the best locality of a task. Is it currently part of the work on this branch ?

@foxish

This comment has been minimized.

Show comment
Hide comment
@foxish

foxish Jan 14, 2017

Member

Locality awareness is part of the roadmap and very important. We're still in the early stages at this point, so there is lots of work ahead :) (Contributions to https://github.com/apache-spark-on-k8s/spark are welcome!)

Better HDFS support, persistent local storage, batch scheduling and resource management are being worked on actively in Kubernetes, so, we should see progress in this front in the near future.

Member

foxish commented Jan 14, 2017

Locality awareness is part of the roadmap and very important. We're still in the early stages at this point, so there is lots of work ahead :) (Contributions to https://github.com/apache-spark-on-k8s/spark are welcome!)

Better HDFS support, persistent local storage, batch scheduling and resource management are being worked on actively in Kubernetes, so, we should see progress in this front in the near future.

@davidopp

This comment has been minimized.

Show comment
Hide comment
@tangzhankun

This comment has been minimized.

Show comment
Hide comment
@tangzhankun

tangzhankun May 16, 2017

@foxish thanks for the initiative. Could you elaborate current known issues of Spark Standalone in K8s mentioned in below sentence?

Having one resource manager (Spark Standalone) nested inside another resource manager (Kubernetes) causes many subtle issues and bugs when run in production.

@foxish thanks for the initiative. Could you elaborate current known issues of Spark Standalone in K8s mentioned in below sentence?

Having one resource manager (Spark Standalone) nested inside another resource manager (Kubernetes) causes many subtle issues and bugs when run in production.

@foxish

This comment has been minimized.

Show comment
Hide comment
@foxish

foxish Dec 11, 2017

Member

This is being upstreamed currently into Apache Spark. Discussions are moved to the Spark JIRA (https://issues.apache.org/jira/browse/SPARK-18278).

Member

foxish commented Dec 11, 2017

This is being upstreamed currently into Apache Spark. Discussions are moved to the Spark JIRA (https://issues.apache.org/jira/browse/SPARK-18278).

@kow3ns kow3ns added this to Backlog in Workloads Feb 27, 2018

@foxish

This comment has been minimized.

Show comment
Hide comment
@foxish

foxish Feb 28, 2018

Member

Just released as part of Apache Spark 2.3.0 - relnotes at https://spark.apache.org/releases/spark-release-2-3-0.html#core-pyspark-and-spark-sql

/close

Member

foxish commented Feb 28, 2018

Just released as part of Apache Spark 2.3.0 - relnotes at https://spark.apache.org/releases/spark-release-2-3-0.html#core-pyspark-and-spark-sql

/close

Workloads automation moved this from Backlog to Done Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment