New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display BigQuery error stream when a load fails during dbt seed. #1079

Merged
merged 2 commits into from Oct 24, 2018

Conversation

Projects
None yet
3 participants
@joshtemple
Copy link
Contributor

joshtemple commented Oct 22, 2018

Creates and raises a new exception, augmenting the errors attribute of the exception with the detailed error stream from the job object. This errors attribute is unpacked downstream by the handle_error method.

I tested this out with a toy CSV file, adding a leading comma in the header row to induce a BigQuery load API error.

Before this change, the error is displayed as follows:

Database Error in seed test (data/test.csv)
  Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.

After this change, the full error details are included:

Runtime Error in seed test (data/test.csv)
  Runtime Error
    Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.
    Error while reading data, error message: CSV table references column position 2, but line starting at position:11 contains only 2 columns.

Fixes #1076

@drewbanin

This comment has been minimized.

Copy link
Contributor

drewbanin commented Oct 22, 2018

@joshtemple nice! I just tried to kick off tests for this PR, but I think GitHub is still working it's way through webhooks. This provisionally looks good to me :)

@drewbanin drewbanin requested a review from beckjake Oct 23, 2018

@beckjake
Copy link
Contributor

beckjake left a comment

I like the general idea, I do have concerns about the type(e)(...) pattern.

@@ -278,7 +278,8 @@ def poll_until_job_completes(cls, job, timeout):
raise dbt.exceptions.RuntimeException("BigQuery Timeout Exceeded")

elif job.error_result:
raise job.exception()
e = job.exception()
raise type(e)(message=e.message, errors=job.errors)

This comment has been minimized.

@beckjake

beckjake Oct 23, 2018

Contributor

I'm not sure about calling type to get a class object and just assuming that it works to call as a constructor. I mean, I know it's ok here, but job and job.exception() come from google, not us.

Is this interface (in particular, the fact that the __init__ of the exception class returned by job.exception() accepts an errors keyword argument) considered stable in any way?

I think I would prefer something like:

msg = '{}\n{}'.format(e.message, '\n'.join(str(e) for e in job.errors)).strip()
raise dbt.exceptions.RuntimeException(msg)

I haven't tested it, and I'm not 100% sure on the type of job.errors, but I assume something like that would work.

This comment has been minimized.

@joshtemple

joshtemple Oct 23, 2018

Author Contributor

Yeah, totally fair, I went back and forth on that myself. In the end I decided not to hardcode a dbt exception since I wasn't sure about the implications of that downstream for logging. If you're more comfortable with raising a RuntimeException as you outlined, I'll change it.

Google API Errors inherit from a base class (GoogleAPICallError) that accepts errors and message as keyword arguments, so it should be safe to assume we can pass those args. Alternatively, we could hardcode a generic GoogleAPICallError exception (see here) or BadRequest (which is what is actually raised in this case) which would ensure we can pass those args, rather than using type.

What do you think?

This comment has been minimized.

@beckjake

beckjake Oct 23, 2018

Contributor

Unless it has a negative downstream impact (triggering the exception handler in the wrong way, comes to mind) I would prefer to raise a dbt-native exception. At some point we'll convert it anyway for display, might as well get it done early.

@joshtemple

This comment has been minimized.

Copy link
Contributor Author

joshtemple commented Oct 23, 2018

Made the change. Only slight difference now is that the error message displays RuntimeError twice (see below) due to the way exception_handler works, but I see this happening other places in the code anyway.

Runtime Error in seed test (data/test.csv)
  Runtime Error
    Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.
    Error while reading data, error message: CSV table references column position 2, but line starting at position:11 contains only 2 columns.
@drewbanin

This comment has been minimized.

Copy link
Contributor

drewbanin commented Oct 23, 2018

woop woop! Nice work @joshtemple :) I'm going to let the tests run, and then will merge this in. This will go out in out 0.12.0 release!

@drewbanin drewbanin added this to the Guion Bluford milestone Oct 24, 2018

@drewbanin drewbanin merged commit 61af974 into fishtown-analytics:dev/guion-bluford Oct 24, 2018

1 check passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@joshtemple joshtemple deleted the joshtemple:hotfix/bq-load-errormsg branch Oct 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment