Skip to content

jayantak/snowplow-bigquery-consolidate-events

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

snowplow-bigquery-consolidate-events

Snowplow models the raw events table in Bigquery as a wide table with all contexts (entities) added as array of structs columns. This is convenient except when you have several versions of a given context over time. Each version (minor as well as major) is represented as a separate column, thus requiring complex self joins to read a certain context attribute over the course of time.

Snowplow consolidated_events script offers an easy way to handle this by merging multiple versions of a context into a single array of struct column, thus preventing the need for complex self-joins.

First build the docker image by running

make build

Then create a folder called secrets, with a json file called service-account.json with the service account json from the respective google cloud project which has access to describe the snowplow events bigquery table

docker run -v secrets:/secrets snowplow-bigquery-consolidate-events \
-p project \
-d rt_pipeline_prod1 \
-t events \
-c contexts_com_organisation_context_1 contexts_com_organisation_context_2

This will print the view DDL on the console.

About

Generate version-consolidated copy of Snowplow BigQuery events table

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published