-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diff Failure on AWS Glue #105
Comments
Minor optimization: Instead of
you could do
Because the former makes Spark compute the entire diff in order to return the count. |
It looks like your Try adding a |
Thanks, @EnricoMi ! I'll try out your suggestions |
Hello! I tried |
Can you go to the Spark UI -> SQL tab -> click on the job that fails. If you could save that HTML page or screenshot it, that would help a lot. That page looks like this: Also useful is the Executors tab and the Stages tab (click on the stage that fails). Does |
So I stumbled across something... Setting a fetchsize (eg |
So looks like this is unrelated to spark-extension and solved. Closing. |
Hi
Thanks for this awesome lib!
Hey, looking for some guidance on an issue I'm having
I'm trying to compare two dataframes for equality. It's not a requirement to know what's different just if they're different.
It works great when both dataframes are small (1m to 10m rows) rows, but fails when both frames are over 10 million. The same 59 columns exist in both frames. No crazy data types. Fairly sparse/Fair amount of NULLs
Any ideas or things I should try? Any additional details I can provide?
Additional Details
Code
Errors that I've gotten
An error occurred while calling o118.count. Job aborted due to stage failure: Task 0 in stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13.0 (TID 14) (10.226.42.94 executor 25): ExecutorLostFailure (executor 25 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
An error occurred while calling o110.count. Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (10.226.42.117 executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 652032 ms
The text was updated successfully, but these errors were encountered: