New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Does emulator support streaming data with BigQuery Storage Write API? #29
Comments
We don't currently support the streaming api, but we have plans to do so. |
Thanks for answering Yes we cannot avoid using the streaming API so it will help a lot if the emulator will support that Thanks again |
OK, I see. Could you please provide a simple and reproducible example code here ? |
Yes Dockerfile (port is set on docker-compose):
Python code:
data.yaml:
Please write here if something else is needed. Thanks |
Thank you for presenting the example. |
I'm using the InsertAll API to stream things into BigQuery (making use of https://github.com/OTA-Insight/bqwriter), and it kind of works against the emulator... there are two things at least that does not work super well:
Really amazing work with the emulator btw! |
I've supported for the Read API for the time being. Please wait a little longer for the Write API. I'm also looking for sponsors. Please consider sponsoring me :) |
@OfirCohen29 @andreas-dentech @adamszadkowski |
Thanks a lot |
@goccy thank you very much for making this emulator and providing us with such good support :) I have been able to test streaming API with Spark integration using Java BigQuery libraries. Unfortunately I have found some other issues which I would like to share with you. I will note them here, but let me know if you would rather like them in separate issues. This time also everything which is not working is "documented" in the repository https://github.com/adamszadkowski/bigquery-emulator-issue Handling of multiple read streamsUnfortunately spark integration with BigQuery requires reading multiple streams. val rows = sparkSession.read
.format("bigquery")
.load(s"$projectId.$datasetId.$tableId")
.collectAsList() Code above doesn't work. There is a workaround, when val rows = sparkSession.read
.format("bigquery")
.option("parallelism", 1) // required by bigquery-emulator
.load(s"$projectId.$datasetId.$tableId")
.collectAsList() Even if technically this is possible to change this value - in practice it is very hard, to make that change in every Support for partitioned tablesIt looks like service.create(TableInfo.of(
TableId.of(projectId, datasetId, tableId),
StandardTableDefinition.newBuilder()
.setSchema(schema)
.setTimePartitioning(TimePartitioning.of(DAY))
.build())) Spark tries to read additional columns. It can be spotted in
In spark on the other hand there is an error passed from
Problems with streaming writeIt looks like there should be default stream for writing. Currently, error is returned:
Creating stream before write with below code doesn't work either: val createWriteStreamRequest = CreateWriteStreamRequest.newBuilder()
.setParent(TableName.of(projectId, datasetId, tableId).toString())
.setWriteStream(WriteStream.newBuilder().setType(WriteStream.Type.COMMITTED).build())
.build()
val writeStream = client.createWriteStream(createWriteStreamRequest) Execution of create stream request for the first time when
Consecutive executions causes test code to hang on timeout after which there is another error:
After that it is impossible to close gracefully
Only |
@adamszadkowski Thank you for your report. However, since this issue is already closed, please create a new issue as a new topic and paste this problem to it. |
Hey,
Thanks for creating the bigquery emulator.
Can i use the BigQuery Storage Write API?
https://cloud.google.com/python/docs/reference/bigquerystorage/latest/google.cloud.bigquery_storage_v1.services.big_query_write.BigQueryWriteClient#google_cloud_bigquery_storage_v1_services_big_query_write_BigQueryWriteClient_create_write_stream
I tried to create a write stream in Python for the emulator, and when creating it the execution get stuck.
The emulator was deployed on a docker container
Thanks
The text was updated successfully, but these errors were encountered: