You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running a standalone spark cluster using Spark 3.3.0 consisting of 1 master and 4 worker nodes. I am starting the cluster from user ubuntu on all 5 involved nodes. All nodes have access to the same storage via NFS, provided by the master.
In my environment, I have multiple users that need to connect to the cluster and submit spark applications. None of those users is the ubuntu user that starts the cluster master and workers. Here, I use the test user to log into RStudio Server and run the Spark application via sparklyr.
# Load libs
library(sparklyr)
library(tidyverse)
config<- spark_config()
config$spark.executor.memory="5G"config$spark.executor.cores="2"config$spark.shuffle.service.enabled=Tconfig$spark.dynamicAllocation.enabled=T
Sys.setenv(SPARK_HOME="/opt/spark/",
JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/")
sc<- spark_connect(master="spark://172.16.44.70:7077",
config=config,
app_name="app-test")
sdf_data<- spark_read_parquet(sc, name="sdf_data", path="/mnt/storage/example_data", memory=F)
### Everything works until here### The following fails
spark_write_csv(sdf_data %>% head(100), path="/mnt/storage/test-data/")
The resulting error is as follows.
Error: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:638)
...
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 38.0 failed 4 times, most recent failure: Lost task 0.3 in stage 38.0 (TID 2255) (172.16.44.80 executor 309): java.io.IOException: Mkdirs failed to create file:/mnt/storage/test-data/_temporary/0/_temporary/attempt_202208121311203632496408213281273_0038_m_000000_2255 (exists=false, cwd=file:/home/ubuntu/spark-3.3.0-bin-hadoop3/work/app-20220812114226-0003/309)
...
The underlying issue is, that the destination folder /mnt/storage/test-data is created by the R user instead of the cluster user. Hence, the cluster is unable to actually write the data into the folder.
test@compute1:/mnt/storage$ ls -l
total 124
drwxrwxrwx 2 ubuntu spark 73728 Aug 12 09:17 example_data
drwxr-xr-x 3 test test 4096 Aug 12 13:10 test-data
I already tried to set default ownerships and file permissions to the folder I am trying to write to using Linux ACL settings, but it seems R (or sparklyr?) does its own thing and ignores the actual permission settings.
Is there a known workaround for this issue? How can I support multiple users on top of regular file storage without having to spin up dedicated clusters for every user?
The text was updated successfully, but these errors were encountered:
Hi @s-geissler , does the data created by spark_write_csv() needs to be accessible available to everyone in the cluster? If so, can you give me more background on the workflow for this team?
Automatically closed because there has not been a response for 30 days. When you're ready to work on this further, please comment here and the issue will automatically reopen.
I am running a standalone spark cluster using Spark 3.3.0 consisting of 1 master and 4 worker nodes. I am starting the cluster from user
ubuntu
on all 5 involved nodes. All nodes have access to the same storage via NFS, provided by the master.In my environment, I have multiple users that need to connect to the cluster and submit spark applications. None of those users is the
ubuntu
user that starts the cluster master and workers. Here, I use thetest
user to log into RStudio Server and run the Spark application via sparklyr.The resulting error is as follows.
The underlying issue is, that the destination folder
/mnt/storage/test-data
is created by the R user instead of the cluster user. Hence, the cluster is unable to actually write the data into the folder.I already tried to set default ownerships and file permissions to the folder I am trying to write to using Linux ACL settings, but it seems R (or sparklyr?) does its own thing and ignores the actual permission settings.
Is there a known workaround for this issue? How can I support multiple users on top of regular file storage without having to spin up dedicated clusters for every user?
The text was updated successfully, but these errors were encountered: