New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[qob] permissions error is not propagated back to driver job properly #13697
Labels
Comments
The root cause is that we only catch HailException, not all exceptions. I suppose we should catch all exceptions? |
danking
added
bug
and removed
needs-triage
A brand new issue that needs triaging.
labels
Sep 22, 2023
danking
added a commit
to danking/hail
that referenced
this issue
Sep 26, 2023
CHANGELOG: Fixes hail-is#13697, a long standing issue with QoB, in which a failing partition job or driver job is not failed in the Batch UI. I am not sure why we did not do this this way in the first place.
danking
added a commit
to danking/hail
that referenced
this issue
Sep 26, 2023
CHANGELOG: Fixes hail-is#13697, a long standing issue with QoB, in which a failing partition job or driver job is not failed in the Batch UI. I am not sure why we did not do this this way in the first place.
danking
added a commit
to danking/hail
that referenced
this issue
Sep 26, 2023
CHANGELOG: Fixes hail-is#13697, a long standing issue with QoB, in which a failing partition job or driver job is not failed in the Batch UI. I am not sure why we did not do this this way in the first place.
danking
added a commit
that referenced
this issue
Oct 16, 2023
…13715) CHANGELOG: Fixes #13697, a long standing issue with QoB, in which a failing partition job or driver job is not failed in the Batch UI. I am not sure why we did not do this this way in the first place. If a JVMJob raises an exception, Batch will mark the job as failed. Ergo, we should raise an exception when a driver or a worker fails! Here's an example: I used a simple pipeline that write to a bucket to which I have read-only access. You can see an example Batch (where every partition fails): https://batch.hail.is/batches/8046901. [1] ```python3 import hail as hl hl.utils.range_table(3, n_partitions=3).write('gs://neale-bge/foo.ht') ``` NB: I removed the `log.error` in `handleForPython` because that log is never necessary. That function converts a stack of exceptions into a triplet of the short message, the full exception with stack trace, and a Hail error id (if present). That triplet is always passed along to someone else who logs the exception. (FWIW, the error id indicates a Python source location that is associated with the error. On the Python-side, we can look up that error id and provide a better stack trace.) [1] You'll notice the logs are missing. I noticed this as well, it's a new bug. I fixed it in #13729.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened?
Try writing to a bucket to which your service account has read-only access:
https://batch.hail.is/batches/8042383
The client gets an error like this:
The driver will have log output like this:
but the worker looks like this:
Version
0.2.124
Relevant log output
No response
The text was updated successfully, but these errors were encountered: