Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: Queries with multiple statements can fail silently #3304

Closed
colmsnowplow opened this issue Dec 1, 2020 · 9 comments
Closed

bigquery: Queries with multiple statements can fail silently #3304

colmsnowplow opened this issue Dec 1, 2020 · 9 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. status: investigating The issue is under investigation, which is determined to be non-trivial.

Comments

@colmsnowplow
Copy link

colmsnowplow commented Dec 1, 2020

Client

bigQuery

Environment

OSX

Go Environment

$ go version
go version go1.15.5 darwin/amd64

$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/cog/Library/Caches/go-build"
GOENV="/Users/cog/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/cog/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/cog/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/cog/GitHub/sql-runner/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/0h/f7_xqgc52lj8jvvvs3czgnhr0000gn/T/go-build497457604=/tmp/go-build -gno-record-gcc-switches -fno-common"

Code

package main

import (
	bq "cloud.google.com/go/bigquery"
  "golang.org/x/net/context"
  "google.golang.org/api/iterator"
	"fmt"
	"log"
)


func main() {
  fmt.Println("Starting")

  projectID := "my-project"
	ctx := context.Background()

	client, err := bq.NewClient(ctx, projectID)
	if err != nil {
		log.Fatalf("ERROR: Failed to create client: %v", err)
	}

  query := client.Query("SELECT BREAK; SELECT 1")

  job, err := query.Run(ctx)
		if err != nil {
			log.Fatalf("ERROR: Failed to run job: %s.", err)
		}

  it, err := job.Read(ctx)
		if err != nil {
			log.Fatalf("ERROR: Failed to read job results: %s.", err)
		}

  for {
      var values []bq.Value
      err := it.Next(&values)
      if err == iterator.Done {
          break
      }
      if err != nil {
          log.Fatalf("ERROR: Issue iterating results: %s.", err)
      }
      fmt.Println(values)
  }
}

Expected behavior

I'm not 100% sure if this is a bug or I'm just not quite understanding an instrumentation detail. When using query.Run() and job.Read(), an error will be thrown if a query fails. However if the query script contains multiple statements, the first can fail but no error is thrown.

I can manually check for an error with something like the below, but not sure that this is the intended behaviour.

  status, err := job.Status(ctx)
  if err != nil {
    log.Fatalf("ERROR: Failed to read job results: %s.", err)
  }
  if err := status.Err(); err != nil {
    log.Fatalf("ERROR: Error running job: %s.", err)
  }

Note that this leads to a different format of error - which leads me to suspect that either the API unintentionally doesn't handle errors in these cases, or I misunderstand something and should instrument the code differently.

Any guidance you may have as to which is the case would be very much appreciated. :)

Additional Detail

In case it's not immediately clear, this can be awkward to work around because some legitimate queries necessarily have multiple statements. (eg. BEGIN ... END; transactions, or any query containing a variable declaration - which may fail before the final statement).

@colmsnowplow colmsnowplow added the triage me I really want to be triaged. label Dec 1, 2020
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Dec 1, 2020
@shollyman
Copy link
Contributor

Thanks for reporting this, and thanks for providing the details to make this easy to repro.

This is nuanced, but I think it's an error in the backend response. I've filed an issue against the backend team to look at this (internally issue 174588888). The first call to the iterator Next() should respond with an error, and it is not doing so correctly in this case.

A bit more detail: the multiple statements in the input SQL means the backend treats this query as a script. This yields a parent job (the script), and it will have potentially multiple child jobs for the individual statements.

The backend documents specific behavior:

When jobs.getQueryResults is invoked on a script, it will return the query results for the last SELECT, DML, or DDL statement to execute in the script, with no query results if none of the above statements have executed. To obtain the results of all statements in the script, enumerate the child jobs and call jobs.getQueryResults on each of them.

The issue is that the call to jobs.getQueryResults (part of row iteration) returns an empty response but doesn't return the error signal, so the iterator treats this as an empty result set, rather than the proper "your SELECT has errors".

@shollyman shollyman added status: investigating The issue is under investigation, which is determined to be non-trivial. and removed triage me I really want to be triaged. labels Dec 1, 2020
@colmsnowplow
Copy link
Author

That makes sense - thanks for the quick response.

Quick follow-up question - as you describe it, it seems that the Read() method isn't expected to throw an error in this scenario, which would have been the behaviour I assumed. Rather, it sounds like the Read() method would still succeed, but the error would be encountered on iteration of results.

I ask because my program doesn't actually retrieve results - it's just a simple job runner which either returns success or failure (it mostly runs create table statements). By the sounds of it, I actually should be explicitly checking the job status for errors regardless of this behaviour?

It's mainly academic at this point, just curious about it. 🙂

@shollyman
Copy link
Contributor

shollyman commented Dec 2, 2020

The Job abstraction serves multiple purposes, not just query execution. For things like batch data ingestion, it's possible to return warnings for a job but the job itself be considered a success (ex: load a csv, but allow some amount of malformed rows to proceed). The error in the query is considered more severe, so row iteration should not proceed.

Additionally, the multi-statement execution adds more color to the scenario. If you're not interested in consuming results, I'd suggest you may not even care to bother with the Read() call and simple just wait on completion.

Verbosely, something like this:

        job, err := q.Run(ctx)
	if err != nil {
		return err // would surface API errors inserting the job into the system
	}
	status, err := job.Wait(ctx)
	if err != nil {
		return err // would surface API errors (problems polling the job)
	}
	if err := status.Err(); err != nil {
		return err // would surface errors the backend had processing the job
	}

For your error case, this would have surfaced the problem with the first SELECT, as it does attach the error to the parent script.

The next interesting bit is how deeply you want to interrogate the multi-statement execution. Scripting does support statements like BEGIN...EXCEPTION, so you can have logic/recovery in the SQL for handling problematic selects, etc.

Alternately, you can ask for a list of child jobs, and do additional checking for errors per-statement if your SQL is less about dependent statements and simply a collection of work. Job statistics will tell you if there's nonzero child jobs, and the job object does surface a convenience method for iterating through the child jobs:

        it := job.Children(ctx)
	for {
		child, err := it.Next()
		if err == iterator.Done {
			break
		}
		if err != nil {
			t.Fatal(err)
		}
		// do something with the child job metadata
	}

With that, you could more precisely identify which statement(s) had problems, which is harder to determine from just the parent metadata.

Hope this helps.

@colmsnowplow
Copy link
Author

Very helpful indeed. Many thanks!

@shollyman
Copy link
Contributor

Current status: still awaiting updates to backend response to resolve this.

@colmsnowplow
Copy link
Author

Thanks for keeping me updated @shollyman !

I've since run into another quirk of the backend (seems that way at least) which might be of interest - since it's not an OS project I don't have a good way to report it but thought it might be useful to know about.

If I have a script containing multiple DROP TABLE statements, and I don't explicitly set the region, then the script will fail with:

{Location: "query"; Message: "Not found: Dataset my-project:my-dataset was not found in location US at [3:1]"; Reason: "invalidQuery"}

It seems that auto-inference of region works just fine if it's a single DROP TABLE, but more than one and it defers back to the US.

It's not really bothersome - we just set the region explicitly to EU to avoid it. But I figure it might be one of those behaviours that one doesn't discover until it's pointed out. :)

@shollyman
Copy link
Contributor

Location auto-detection is going to depend on how accurately the location can be inferred from the SQL. Scripts are a bit more complex, so if the script starts with statements that don't implicate a location (defining temp tables, setting variables, etc), it's likely just picking the US as the fallback default.

Also, just another resource to be aware of: https://cloud.google.com/bigquery/docs/getting-support#issuetracker has links for reviewing public issues and/or filing a new one against the service itself.

@colmsnowplow
Copy link
Author

Oh Nice, didn't know I could file a report via that site, thanks!

@shollyman
Copy link
Contributor

The backend changes to resolve this are now in production. Running your repro, the error is propagating correctly and triggers failure:

2021/02/16 21:11:40 ERROR: Failed to read job results: googleapi: Error 400: Query error: Unrecognized name: BREAK at [1:8], invalidQuery.

I'm going to close this issue at this time. Thanks again for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. status: investigating The issue is under investigation, which is determined to be non-trivial.
Projects
None yet
Development

No branches or pull requests

2 participants