New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERROR sparklyr: Gateway xxxxx failed calling take on xxx when running spark-apply #1121
Comments
@chrisvwn you are hitting this error My recommendation would be to run this locally with a subset of the data using You should see entries like:
|
Thanks @javierluraschi ! I have moved a step in the right direction. Long story short, I had left out the config for the cluster connection, so adding I am not sure about this so let me just put my steps in here with the hope I can get some explanation or at least to help another newbie out. After a while, I found my spark logs in Cloudera CDH which logs through Yarn. My cluster was showing no history of applications. On searching I found that Yarn does not log anything until you disconnect from the cluster, so I had to Still, no logs showed up and no history of any apps! Still not sure why but adding So of course I am wondering what I was seeing in the logs. I may need more reading on this since I am still fuzzy on the difference between driver and worker logs. So now the job is running successfully though I now have another error which I will open another issue for. |
Apologies for cross-posting. This is a copy of this issue on stackoverflow: https://stackoverflow.com/questions/47265209/error-sparklyr-gateway-xxxxx-failed-calling-take-on-xxx-when-running-spark-appl
I am trying to run spark_apply from the sparklyr package to perform kmeans clustering on a bunch of data hosted in hive on a spark cluster. But I am receiving a spark error that I am having difficulty understanding. The data is as follows where the features column is an aggregated vector combining the latitude and longitude columns but not used in this case.
The R code is as follows:
and called like so
likelyLocs <- spark_apply(samplog, kms)
The error I am receiving in RStudio is:
As directed in the error details, I checked the spark log and got the following.
All I can seem to fathom is that the spark job is failing somewhere in the last stage so probably merging the output from the different workers? Can anyone help find what may be the problem?
EDIT: The output from sessionInfo():
EDIT: I have also tried this on a Windows 10 machine and a local spark instance with the same results.
The text was updated successfully, but these errors were encountered: