One can launch Spark in Kubernetes from sparklyr as:
sc <- spark_connect( master = "k8s://http://127.0.0.1:8001" config = list( spark.executor.instances = 2, spark.kubernetes.container.image = "spark-image" ) )
However, connectivity to Kubernetes would be blocked since
Here is some initial ongoing investigation to get this working, not ready for consumption...
To use kubernetes locally:
sc <- spark_connect( master = "k8s://https://192.168.99.100:8443", config = list( "sparklyr.shell.master" = "k8s://https://192.168.99.100:8443", "sparklyr.shell.deploy-mode" = "cluster", "sparklyr.gateway.remote" = TRUE, "sparklyr.shell.name" = "sparklyr", "sparklyr.shell.class" = "sparklyr.Shell", "sparklyr.shell.conf" = c( "spark.kubernetes.container.image=spark:sparklyr", "spark.kubernetes.driver.pod.name=spark-pi-driver", "spark.kubernetes.authenticate.driver.serviceAccountName=spark" ), "sparklyr.app.jar" = "local:///opt/sparklyr/sparklyr-2.3-2.11.jar" ), spark_home = spark_home_dir() )
@javierluraschi Thank you for making the effort to build kubernetes support into sparklyr.
I'm trying to understand how to implement this for myself, and am hoping you could shed a bit more light, please. I am trying to connect to a remote spark master on the same kubernetes cluster as my rstudio-server instance (with sparklyr installed), with the aim of being able to execute sparklyr commands against the spark cluster from within rstudio. I have added the sparklyr jars to the spark container, but when I try to run the spark_connect code above (substituting my local urls etc.), I get the following error
Error in shell_connection(master = master, spark_home = spark_home, app_name = app_name, : Failed to connect to Spark (SPARK_HOME is not set).
Am I misunderstanding something fundamental here? For example, does the spark master and rstudio instance have to co-exist on the same container?
Any insight you can provide would be greatly appreciated.
@kkeenan02 this work was only to enable connection to new kubernetes clusters... if you have an existing kubernetes clusters, you probably don't even need this work, let me elaborate.
One option to use Kubernetes is for someone to start a Spark cluster, usually your system admin or someone in your team, you can use Kubernetes to start a team cluster running Yarn, Cloudera or why not. Then you can connect to this cluster from
The other option which this PR enables in sparklyr 0.9, is for cases where a generic kubernetes cluster is provided, read a bunch of machines without anything installed. In these cases, you can create your own VM image for the cluster that while connecting with
@javierluraschi, I was trying the other option that you have mentioned with version of sparklyr 1.0.0 with spark (2.4.0) running on kubernetes and yet face the issue that @kkeenan02 has.
Any suggestions as to what's wrong here ?