New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bigquery: Inserter.Put does not retry on "http2: stream closed" transient error #1793
Comments
Thanks for the report. Relocating this to the proper repository (google-cloud-go), as the one this issue was reported in (google-api-go-client) contains the underlying discovery-generated artifacts. |
Same issue here, only with "load" jobs. We do many of them (a 100) in parallel to various tables, and at least one now fails regularly. |
I'd love to get a better understanding of how we're getting in this state, as I suspect it's symptomatic of another issue. Both reports are cases where we can have potentially a large number of connections in flight. Are you using a default constructed bigquery client / transport, or is there more to how the transport layer is setup? Do you only observe this under high connection concurrency? |
Yes, we’ve only observed it with the default transport and high concurrency. It seemed to occur in “bursts”, where for several batches of concurrent requests (waiting several minutes before each batch) we’d get consistent failures. And then after waiting even longer, we didn’t experience it anymore. |
same here, we've observed this only during processing a large backlog (after an outage). perhaps the workaround is a semaphore for the number of concurrent requests. |
Thanks for confirming. I'll keep looking into this. |
I'm going to close this one out due to staleness. I've been unsuccessful reproducing this, and I've not gotten further reports. If this resurfaces, please comment with additional details (or open a new issue). |
Hi, @shollyman this is still actual. The same case as @dinvlad with big amount (up to 600) of parallel insertions in different tables (up to 100) for several minutes. App level retries are succeed btw |
@iamolegga Can you provide any additional repro steps? We are really struggling to reproduce. |
@meredithslota well, nothing special here, but we just tested with bigger parallelisation and faced with network limits, where increase of app instances that write data to bigquery gives the same velocity of handled messages, so now I'm thinking that problem could be with network limits. We've reached up to 900-1000 parallel insertions with a couple of hundreds active bigquery clients (and same amount of tables accordingly) |
I'm able to repro this occasionally now, but it's inconsistent. It looks like this with
interesting bit:
Logging seems to imply we're dealing with more than 100 streams. Is the correct thing to bound the stream concurrency on the client side to a more reasonable default? Do we do that directly, given we're silently upgrading to http2 here rather than explicitly using http2, or is it informed by other settings? |
Spent more time looking at https://github.com/golang/go/blob/master/src/net/http/h2_bundle.go. If I'm reading it and the debug logging correctly, it will already do retries in our behalf (up to 6 additional retries), which seems borne out by the logging. Our options appear to be:
Next steps: I'll try exploring this with even more verbose logging and see if I can get a better understanding of the triggering behavior (are we exceeding stream concurrency in a racy fashion, is this another backend pushback, etc). With that, we can make a decision on how to do things here. |
Managed to get a failure at debug level 2. I present, life of a stream failure:
Observations:
Reviewing the http2 client logic again, It seems my earlier comments about stream retries are not true as I was inferring the wrong part of the request flow. In this case, we received the error while awaiting flow control to start writing the DATA of the payload, which is before the retry can occur (once we have the data). |
I've gone ahead and created #3129 to add this to the BigQuery retry predicate for now. It may be possible to improve this in Go's http2 implementation, at which point we can remove the condition from the predicate. |
This adds a special case to the default retry predicate for BigQuery to work around a specific high concurrency issue. Related: #1793
Released as part of https://github.com/googleapis/google-cloud-go/releases/tag/bigquery%2Fv1.13.0 |
Thanks! |
We observe
bigquery.Inserter.Put
returning errors that end with "http2: stream closed". It appears to be a transient error, butInserter.Put
does not treat it as such:google-cloud-go/bigquery/bigquery.go
Line 146 in 588a6f7
I think it is a bug.
The text was updated successfully, but these errors were encountered: