bigquery: detecting when the streaming buffer is empty #507

mikebell-org · 2017-02-10T01:37:32Z

With BigQuery, how can I poll when the streaming buffer is empty and it's safe to copy a table without dropping down to the lower level library?
I presume it should return a Job you can Wait() on similar to copies?

jba · 2017-06-07T23:12:53Z

We blew it on this one. We didn't mean to ignore you. Do you still have this question? If so, can you explain what you mean by "the streaming buffer"? If you're calling Uploader.Put, it should be synchronous.

derekperkins · 2017-06-08T00:11:53Z

BigQuery doesn't stream directly into their long term storage, they first put it into a write optimized store and periodically flush that to the main storage. Queries are able to immediately use the streaming buffer, but since other BQ functions ignore the buffer, he's wanting to wait for the buffer to clear.

derekperkins · 2017-06-08T00:15:08Z

For anyone interested in the details, this is a great article.
https://cloud.google.com/blog/big-data/2017/06/life-of-a-bigquery-streaming-insert

jba · 2017-06-09T21:36:48Z

@mikebell-org We didn't write a poll method, but you can now find streaming buffer info in TableMetadata, so you can write your own polling. (Feel free to lift our own Retry function. Sorry it's not public.)

belablotski · 2018-03-02T00:17:10Z

Ok, thank you. Is it possible to flush it if it isn't empty? What if process needs bulk UPDATE after streaming?

jba · 2018-03-02T18:52:34Z

There's no way to flush it.

What if process needs bulk UPDATE after streaming?

Can you explain more what you mean by this?

belablotski · 2018-03-02T20:30:01Z

Hi Jonathan,
Let's say you've streamed data, then you'd like to clean data up by deleting some duplicates / updating some records - i.e. you need to execute DML query (UPDATE and/or DELETE) when streaming buffer isn't empty. In such case BigQuery will throw "Error 400: UPDATE or DELETE statements are not supported on table XYZ with streaming buffer". And you have to wait ~90 minutes to do that. So it would be nice to have API to flush streaming buffer and make table available for DELETE/UPDATE DML.

jba · 2018-03-02T22:51:05Z

OK, I see what you mean.

This is really a question for the BigQuery service itself, not the Go client. But I'll attempt an answer.

The reason there is unlikely to be a flush is that the streaming buffer is a performance optimization designed to keep queries fast, by batching changes to the underlying storage. So if there was a flush, everyone would use it, and the performance optimization would be gone.

Have you looked at using a load job as an alternative to a streaming insert?

belablotski · 2018-03-02T23:02:43Z

Thank you Jonathan. I see the reason, of course it should be expensive operation however "impossible" sounds like impossible. To be honest, my plan B is to put more functionality on the client side to minimize DML necessity.

davehowell · 2019-01-03T02:03:07Z

@jba I am doing some testing, using dataflow to stream to bigquery from pubsub, and I'd like to have a process to publish some message, test what appears in bigquery, and then reset it (i.e. delete/truncate the bigquery table).
The performance value of the streaming buffer makes total sense, and the tradeoffs are worth it, but wondering what to do in this minor use case. I've been using in-line SQL e.g. DELETE FROM {table_name} WHERE 1=1 , the fake where clause is because Bigquery requires a where clause on a delete statement. The query works if the streaming buffer is empty - 90 minutes later after I stop publishing messages to pubsub. If it's not empty, it throws the error.

Is there a way to delete records that are already materialized, ignoring the streaming buffer records? That's probably not very useful, just wondering.

Would deleting & recreating the table be an alternative? Is it going to let me drop & recreate it within that 90 minutes? I will have a try.

evmin02 · 2019-05-03T04:54:24Z

I would be also interested in an option to do iterative development on streaming inserts - i.e. being able to flush the buffer.

nbarretta mentioned this issue Feb 14, 2017

bigquery: add StreamingBuffer fields (when present) to Table.Metadata #512

Closed

s-mang changed the title ~~Detecting when the streaming buffer is empty~~ bigquery: detecting when the streaming buffer is empty Mar 1, 2017

jba added api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue. labels Mar 15, 2017

jba self-assigned this Jun 9, 2017

gopherbot closed this as completed in 69931d8 Jun 9, 2017

ulgut mentioned this issue Jun 6, 2021

bigquery: Streaming Buffer insight and/or manual flushing #4223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bigquery: detecting when the streaming buffer is empty #507

bigquery: detecting when the streaming buffer is empty #507

mikebell-org commented Feb 10, 2017 •

edited

jba commented Jun 7, 2017

derekperkins commented Jun 8, 2017

derekperkins commented Jun 8, 2017

jba commented Jun 9, 2017

belablotski commented Mar 2, 2018

jba commented Mar 2, 2018

belablotski commented Mar 2, 2018

jba commented Mar 2, 2018

belablotski commented Mar 2, 2018

davehowell commented Jan 3, 2019

evmin02 commented May 3, 2019

bigquery: detecting when the streaming buffer is empty #507

bigquery: detecting when the streaming buffer is empty #507

Comments

mikebell-org commented Feb 10, 2017 • edited

jba commented Jun 7, 2017

derekperkins commented Jun 8, 2017

derekperkins commented Jun 8, 2017

jba commented Jun 9, 2017

belablotski commented Mar 2, 2018

jba commented Mar 2, 2018

belablotski commented Mar 2, 2018

jba commented Mar 2, 2018

belablotski commented Mar 2, 2018

davehowell commented Jan 3, 2019

evmin02 commented May 3, 2019

mikebell-org commented Feb 10, 2017 •

edited