From e7acdf79493bef1ef4e8d0857a9bf34bb99b4f2c Mon Sep 17 00:00:00 2001 From: "kevin.bheda" Date: Thu, 4 Aug 2022 15:39:00 +0530 Subject: [PATCH 1/2] add flow diagram for bigquery json sink --- docs/images/bigquery-json-flow-diagram.svg | 1 + docs/sinks/bigquery.md | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) create mode 100644 docs/images/bigquery-json-flow-diagram.svg diff --git a/docs/images/bigquery-json-flow-diagram.svg b/docs/images/bigquery-json-flow-diagram.svg new file mode 100644 index 00000000..02401616 --- /dev/null +++ b/docs/images/bigquery-json-flow-diagram.svg @@ -0,0 +1 @@ +BigquerySinkFactoryBigquerySinkJsonErrorHandlerGoogle Bigqueryalt[is success][no such field error]Create table with default columnsinitiate sink for json messageswrite json messagessuccesfully written records to bigquerygiven new json attributes not present in bq table schemaparse messages with no such fields errorsadd new fields in json messages to bigquery tablesuccessfully added new fields to bigquery tableretry writing json messagesBigquerySinkFactoryBigquerySinkJsonErrorHandlerGoogle Bigquery \ No newline at end of file diff --git a/docs/sinks/bigquery.md b/docs/sinks/bigquery.md index 7400a307..33d0656f 100644 --- a/docs/sinks/bigquery.md +++ b/docs/sinks/bigquery.md @@ -10,7 +10,8 @@ Bigquery utilise Bigquery [Streaming API](https://cloud.google.com/bigquery/stre Bigquery Sink has several responsibilities, first creation of bigquery table and dataset when they are not exist, Currently we support dynamic schema by inferring from incoming json data; so the bigquery schema is updated by taking a diff of fields in json data and actual table fields. Currently we only support string data type for fields, so all incoming json data values are converted to string type, Except for metadata columns and partion key. - +## Flow chart for data type json +![](../images/bigquery-json-flow-diagram.svg) ## Bigquery Table Schema Update From c3f31fd3bc6dcbb2795fe2a4e707455ab5d3b269 Mon Sep 17 00:00:00 2001 From: "kevin.bheda" Date: Thu, 4 Aug 2022 21:10:01 +0530 Subject: [PATCH 2/2] move flow chart to schema update section --- docs/sinks/bigquery.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/sinks/bigquery.md b/docs/sinks/bigquery.md index 33d0656f..63f13622 100644 --- a/docs/sinks/bigquery.md +++ b/docs/sinks/bigquery.md @@ -10,8 +10,6 @@ Bigquery utilise Bigquery [Streaming API](https://cloud.google.com/bigquery/stre Bigquery Sink has several responsibilities, first creation of bigquery table and dataset when they are not exist, Currently we support dynamic schema by inferring from incoming json data; so the bigquery schema is updated by taking a diff of fields in json data and actual table fields. Currently we only support string data type for fields, so all incoming json data values are converted to string type, Except for metadata columns and partion key. -## Flow chart for data type json -![](../images/bigquery-json-flow-diagram.svg) ## Bigquery Table Schema Update @@ -19,6 +17,11 @@ Currently we only support string data type for fields, so all incoming json data Bigquery Sink update the bigquery table schema on separate table update operation. Bigquery utilise [Stencil](https://github.com/odpf/stencil) to parse protobuf messages generate schema and update bigquery tables with the latest schema. The stencil client periodically reload the descriptor cache. Table schema update happened after the descriptor caches uploaded. +### JSON +Bigquery Sink creates the table with initial columns mentioned in the config. When new fields arrive in json data they are added to bigquery table. +### Flow chart for data type json sink and schema update +![](../images/bigquery-json-flow-diagram.svg) + ## Protobuf - Bigquery Table Type Mapping Here are type conversion between protobuf type and bigquery type :