-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bigquery: detecting when the streaming buffer is empty #507
Comments
We blew it on this one. We didn't mean to ignore you. Do you still have this question? If so, can you explain what you mean by "the streaming buffer"? If you're calling |
BigQuery doesn't stream directly into their long term storage, they first put it into a write optimized store and periodically flush that to the main storage. Queries are able to immediately use the streaming buffer, but since other BQ functions ignore the buffer, he's wanting to wait for the buffer to clear. |
For anyone interested in the details, this is a great article. |
@mikebell-org We didn't write a poll method, but you can now find streaming buffer info in |
Ok, thank you. Is it possible to flush it if it isn't empty? What if process needs bulk UPDATE after streaming? |
There's no way to flush it.
Can you explain more what you mean by this? |
Hi Jonathan, |
OK, I see what you mean. This is really a question for the BigQuery service itself, not the Go client. But I'll attempt an answer. The reason there is unlikely to be a flush is that the streaming buffer is a performance optimization designed to keep queries fast, by batching changes to the underlying storage. So if there was a flush, everyone would use it, and the performance optimization would be gone. Have you looked at using a load job as an alternative to a streaming insert? |
Thank you Jonathan. I see the reason, of course it should be expensive operation however "impossible" sounds like impossible. To be honest, my plan B is to put more functionality on the client side to minimize DML necessity. |
@jba I am doing some testing, using dataflow to stream to bigquery from pubsub, and I'd like to have a process to publish some message, test what appears in bigquery, and then reset it (i.e. delete/truncate the bigquery table). Is there a way to delete records that are already materialized, ignoring the streaming buffer records? That's probably not very useful, just wondering. Would deleting & recreating the table be an alternative? Is it going to let me drop & recreate it within that 90 minutes? I will have a try. |
I would be also interested in an option to do iterative development on streaming inserts - i.e. being able to flush the buffer. |
With BigQuery, how can I poll when the streaming buffer is empty and it's safe to copy a table without dropping down to the lower level library?
I presume it should return a Job you can Wait() on similar to copies?
The text was updated successfully, but these errors were encountered: