Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparklyr cannot make connection #3262

Open
ncuriale opened this issue May 18, 2022 · 4 comments
Open

Sparklyr cannot make connection #3262

ncuriale opened this issue May 18, 2022 · 4 comments

Comments

@ncuriale
Copy link

I'm getting an issue to previous issues reported on here, but the fixes don't seem to solve this one.

port = 7078

config <- sparklyr::spark_config_kubernetes(
  master = "k8s://<ip>:443",
  account = "spark",
  driver = "spark-master-0",
  image = "docker.io/jluraschi/spark:sparklyr",
  version = "3.2.1",
  ports = port,
  timeout = 120
)
config[["spark.home"]] <- "/home/nathanc/spark/spark-3.2.1-bin-hadoop2.7"
config[["sparklyr.gateway.port"]] = port
config[["sparklyr.gateway.start.timeout"]] = 120
config[["sparklyr.connect.app.jar"]] <- system.file("java/", package = "sparklyr")

config[["spark.kubernetes.file.upload.path"]] <- "file:///path"

config[["sparklyr.log.console"]] <- TRUE
options(sparklyr.log.console = TRUE)

sparklyr::spark_connect(
  config = config,
  spark_home = config[["spark.home"]]
)

This is the resulting message - it seems to be the spark.kubernetes.file.upload.path but i have added this (not sure if this is correct but it is what i've seen in other threads)

Any suggestions around this?

Handling connection for 7078
E0518 03:25:13.753196   15491 portforward.go:400] an error occurred forwarding 7078 -> 7078: error forwarding port 7078 to pod 221e3723ee6facbe965d9b7ef0f767fb7a7a5ca1dd71206db56372f71956f8b3, uid : failed to execute portforward in network namespace "/var/run/netns/cni-77d6a8bd-543a-3031-9347-9d8033e56cfa": failed to dial 7078: dial tcp4 127.0.0.1:7078: connect: connection refused
Unable to listen on port 7078: Listeners failed to create with the following errors: [unable to create listener: Error listen tcp4 127.0.0.1:7078: bind: address already in use unable to create listener: Error listen tcp6 [::1]:7078: bind: address already in use]
error: unable to listen on any of the requested ports: [{7078 7078}]
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/nathanc/spark/spark-3.2.1-bin-hadoop2.7/jars/spark-unsafe_2.12-3.2.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/05/18 03:25:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/05/18 03:25:15 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
22/05/18 03:25:16 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
Exception in thread "main" org.apache.spark.SparkException: Please specify spark.kubernetes.file.upload.path property.
	at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:330)
	at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:276)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:275)
	at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)
	at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$4(KubernetesDriverBuilder.scala:65)
	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
	at scala.collection.immutable.List.foldLeft(List.scala:91)
	at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:63)
	at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:220)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:214)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2713)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:214)
	at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:186)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/05/18 03:25:16 INFO ShutdownHookManager: Shutdown hook called
22/05/18 03:25:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-ecf6f336-240b-4c21-845b-87f7a028421a
 Error in query_gateway_for_port(gateway, sessionId, config, isStarting) : 
  Sparklyr gateway did not respond while retrieving ports information after 10 seconds.


Try running `options(sparklyr.log.console = TRUE)` followed by `sc <- spark_connect(...)` for more debugging info. 
Handling connection for 7078
E0518 03:27:07.620018   15491 portforward.go:400] an error occurred forwarding 7078 -> 7078: error forwarding port 7078 to pod 221e3723ee6facbe965d9b7ef0f767fb7a7a5ca1dd71206db56372f71956f8b3, uid : failed to execute portforward in network namespace "/var/run/netns/cni-77d6a8bd-543a-3031-9347-9d8033e56cfa": failed to dial 7078: dial tcp4 127.0.0.1:7078: connect: connection refused
E0518 03:33:34.687093    7656 portforward.go:233] lost connection to pod
@edgararuiz
Copy link
Collaborator

Hi, maybe trying a different port? Error listen tcp6 [::1]:7078: bind: address already in use]

@ncuriale
Copy link
Author

ncuriale commented Jun 7, 2022

this issue was that the sparklyr jars were not in the image at /opt/sparklyr . once I copied these over in my Dockerfile , i was able to launch using this config

config <- sparklyr::spark_config_kubernetes(
    master = master,
    account = "spark",
    driver = name,
    image = spark_images[spark_images$version == version,]$image,
    version = version,
    timeout = 120,
    ports = port,
    executors = executors
  )
  config[["spark.home"]] <- spark_home 
  config[["sparklyr.gateway.port"]] = port

@ncuriale
Copy link
Author

ncuriale commented Jun 7, 2022

a separate issue I am getting now is that sometimes the launched spark connection just shuts down immediately after launching (see the log below)

any clues on this one? it waits for the connection to be made from client to cluster first, then once connected, it creates the backend but then shuts it down afterwards...?

++ id -u
+ myuid=185
++ id -g
+ mygid=0
+ set +e
++ getent passwd 185
+ uidentry=
+ set -e
+ '[' -z '' ']'
+ '[' -w /etc/passwd ']'
+ echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ grep SPARK_JAVA_OPT_
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=11.244.3.118 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class sparklyr.Shell local:///opt/sparklyr/sparklyr-master-2.12.jar 48625 77168 --remote
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.2.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/06/07 14:41:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/06/07 14:41:00 INFO sparklyr: Session (77168) is starting under 127.0.0.1 port 48625
22/06/07 14:41:00 INFO sparklyr: Session (77168) is configuring for remote connections
22/06/07 14:41:00 INFO sparklyr: Session (77168) found port 48625 is available
22/06/07 14:41:00 WARN sparklyr: Gateway (77168) Failed to get network interface of gateway server socketnull
22/06/07 14:41:00 INFO sparklyr: Gateway (77168) is waiting for sparklyr client to connect to port 48625
22/06/07 14:41:53 INFO sparklyr: Gateway (77168) accepted connection
22/06/07 14:41:53 INFO sparklyr: Gateway (77168) is waiting for sparklyr client to connect to port 48625
22/06/07 14:41:53 WARN sparklyr: Gateway (77168) Failed to get network interface of gateway server socketnull
22/06/07 14:41:53 INFO sparklyr: Gateway (77168) received command 0
22/06/07 14:41:53 INFO sparklyr: Gateway (77168) found requested session matches current session
22/06/07 14:41:53 INFO sparklyr: Gateway (77168) is creating backend and allocating system resources
22/06/07 14:41:53 INFO sparklyr: Gateway (77168) is using port 48626 for backend channel
log4j:WARN No appenders could be found for logger (io.netty.util.internal.logging.InternalLoggerFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
22/06/07 14:41:53 INFO sparklyr: Gateway (77168) created the backend
22/06/07 14:41:53 INFO sparklyr: Gateway (77168) is waiting for R process to end
(22/06/07 14:41:54 INFO sparklyr: Gateway (77168) is shutting down with expected SocketException,java.net.SocketException: Socket closed)

@baslat
Copy link

baslat commented Nov 27, 2022

I've had a very similar chain of issues, @ncuriale did you manage to find a fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants