Skip to content

Latest commit

 

History

History
20 lines (12 loc) · 1.12 KB

File metadata and controls

20 lines (12 loc) · 1.12 KB

snowplow-bigquery-consolidate-events

Snowplow models the raw events table in Bigquery as a wide table with all contexts (entities) added as array of structs columns. This is convenient except when you have several versions of a given context over time. Each version (minor as well as major) is represented as a separate column, thus requiring complex self joins to read a certain context attribute over the course of time.

Snowplow consolidated_events script offers an easy way to handle this by merging multiple versions of a context into a single array of struct column, thus preventing the need for complex self-joins.

First build the docker image by running

make build

Then create a folder called secrets, with a json file called service-account.json with the service account json from the respective google cloud project which has access to describe the snowplow events bigquery table

docker run -v secrets:/secrets snowplow-bigquery-consolidate-events \
-p project \
-d rt_pipeline_prod1 \
-t events \
-c contexts_com_organisation_context_1 contexts_com_organisation_context_2

This will print the view DDL on the console.