New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] show and checkpoint are very slow in Jupyter Notebooks; they are not slow outside Jupyter #13690
Comments
I can't reproduce locally with
Must be something more complex. |
local notebook works fine for me as well, looks to be just dataproc that's not working as expected. submitting that test command as a script finished in 36.2s. notebook is currently still hanging with this output (it's been 11 minutes):
|
@iris-garden can you grab that log file and upload here? it should live on the |
yep, here it is |
Nothing suspicious there. Something is going wrong in the executors. I think the only way we're gonna solve this is by running a pipeline and looking at the executor logs. I'm at a complete loss for how Jupyter could affect what happens on the executors. |
okay, so this makes no sense to me, and i don't understand gradle at all really, but i tried reproducing the issue with each recent release until i found the one where it started presenting (0.2.123), then tried it on every commit in between the previous release and that one, and found that the issue started presenting after #13551 merged. i tried reverting that commit on the current |
!!!! |
Wow, talk about a tour de force of debugging, well done!! OK, so this kinda makes sense. We are importing our own copies of the GCS libraries and renaming them all to We pin our dataproc image version to The latest available version of Dataproc's Debian images is 2.1.25-debian11 which depends on GoogleCloudDataproc hadoop connector version 2.2.15 which relies on Google Cloud Storage client library version 2.22.3. I have a PR to upgrade us to 2.27.1 because the library broke retries in versions [2.25.0, 2.27.0). AFAICT, Google's image version page only shows the most recent five. There's no way to go back further in time. Luckily, the way back machine has a March 2023 capture which includes our version. 2.1.2-debian11 used Google Cloud Dataproc hadoop connector version 2.2.9 This version of the hadoop connector was using some alpha version of a gRPC version of the cloud storage library. I'm not sure what's up with that. OK, here's my proposal: let's change that IMAGE_VERSION to the latest one and see if that fixes things. If that works, let's just merge and forget this happened. If that doesn't work, we gotta wade into the Lovecraftian horror of JARs. Most likely we're not fully relocating the dependencies pulled in by the Google Cloud Storage client libraries and they conflict with what Dataproc produces. |
closes hail-is#13690. to test that this works, i've been running these commands from the root of my clone of the hail repo: ```bash make -C hail install-editable make -C hail install-hailctl hailctl dataproc start notebook-slowdown-repro --region us-central1 hailctl dataproc connect notebook-slowdown-repro notebook ``` and then running this minimal example in the notebook: ```python import hail hail.utils.range_table(10).show() ``` and making sure it outputs a visual of the table, instead of getting stuck displaying `Stage 0:> (0+X)/Y` and not progressing.
closes hail-is#13690. to test that this works, i've been running these commands from the root of my clone of the hail repo: ```bash make -C hail install-editable make -C hail install-hailctl hailctl dataproc start notebook-slowdown-repro --region us-central1 hailctl dataproc connect notebook-slowdown-repro notebook ``` and then running this minimal example in the notebook: ```python import hail hail.utils.range_table(10).show() ``` and making sure it outputs a visual of the table, instead of getting stuck displaying `Stage 0:> (0+X)/Y` and not progressing.
closes #13690. to test that this works, i've been running these commands from the root of my clone of the hail repo: ```bash make -C hail install-editable make -C hail install-hailctl hailctl dataproc start notebook-slowdown-repro --region us-central1 hailctl dataproc connect notebook-slowdown-repro notebook ``` and then running this minimal example in the notebook: ```python import hail hail.utils.range_table(10).show() ``` and making sure it outputs a visual of the table, instead of getting stuck displaying `Stage 0:> (0+X)/Y` and not progressing.
What happened?
This started happening in 0.2.123. It does not happen in 0.2.120
Version
0.2.123
Relevant log output
No response
The text was updated successfully, but these errors were encountered: