-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Amazon Redshift storage
HOME > [SNOWPLOW TECHNICAL DOCUMENTATION](Snowplow technical documentation) > [Storage](storage documentation) > Amazon Redshift Storage
## 1. Current table definitionThe standard S3 / Hive implementation of Snowplow storage uses Apache Hive on EMR to store Snowplow data in a table partitioned by date. The current table definition is given in the [repo] redshift-table-def. We have pasted a copy below:
CREATE TABLE events (
-- App
app_id varchar(255) encode text255,
platform varchar(255) encode text255,
-- Date/time
collector_tstamp timestamp not null,
dvce_tstamp timestamp,
-- Event
event varchar(128) encode text255,
event_vendor varchar(128) encode text32k not null,
event_id varchar(38) not null unique,
txn_id int,
-- Versioning
v_tracker varchar(100) encode text255,
v_collector varchar(100) encode text255 not null,
v_etl varchar(100) encode text255 not null,
-- User and visit
user_id varchar(255) encode runlength,
user_ipaddress varchar(19) encode runlength,
user_fingerprint varchar(50) encode runlength,
domain_userid varchar(16),
domain_sessionidx smallint,
network_userid varchar(38),
-- Page
page_title varchar(2000),
page_referrer varchar(3000),
-- Page URL components
page_urlscheme varchar(16) encode text255,
page_urlhost varchar(255) encode text255,
page_urlport smallint,
page_urlpath varchar(1000) encode text32k,
page_urlquery varchar(3000),
page_urlfragment varchar(255),
-- Marketing
mkt_source varchar(255) encode text255,
mkt_medium varchar(255) encode text255,
mkt_term varchar(255) encode raw,
mkt_content varchar(500) encode raw,
mkt_campaign varchar(255) encode text32k,
-- Custom Event
ev_category varchar(255) encode text255,
ev_action varchar(255) encode text255,
ev_label varchar(255) encode text32k,
ev_property varchar(255) encode text32k,
ev_value float,
-- Ecommerce
tr_orderid varchar(255) encode raw,
tr_affiliation varchar(255) encode text255,
tr_total dec(18,2),
tr_tax dec(18,2),
tr_shipping dec(18,2),
tr_city varchar(255) encode text32k,
tr_state varchar(255) encode text32k,
tr_country varchar(255) encode text32k,
ti_orderid varchar(255) encode raw,
ti_sku varchar(255) encode text32k,
ti_name varchar(255) encode text32k,
ti_category varchar(255) encode text255,
ti_price dec(18,2),
ti_quantity int,
-- Page ping
pp_xoffset_min integer,
pp_xoffset_max integer,
pp_yoffset_min integer,
pp_yoffset_max integer,
-- User Agent
useragent varchar(1000) encode text32k,
-- Browser
br_name varchar(50) encode text255,
br_family varchar(50) encode text255,
br_version varchar(50) encode text255,
br_type varchar(50) encode text255,
br_renderengine varchar(50) encode text255,
br_lang varchar(255) encode text255,
br_features_pdf boolean,
br_features_flash boolean,
br_features_java boolean,
br_features_director boolean,
br_features_quicktime boolean,
br_features_realplayer boolean,
br_features_windowsmedia boolean,
br_features_gears boolean ,
br_features_silverlight boolean,
br_cookies boolean,
br_colordepth varchar(12) encode text255,
br_viewwidth integer,
br_viewheight integer,
-- Operating System
os_name varchar(50) encode text255,
os_family varchar(50) encode text255,
os_manufacturer varchar(50) encode text255,
os_timezone varchar(255) encode text255,
-- Device/Hardware
dvce_type varchar(50) encode text255,
dvce_ismobile boolean,
dvce_screenwidth integer,
dvce_screenheight integer,
-- Document
doc_charset varchar(128) encode text255,
doc_width integer,
doc_height integer,
CONSTRAINT event_id_pk PRIMARY KEY(event_id)
)
DISTSTYLE KEY
DISTKEY (domain_userid)
SORTKEY (collector_tstamp);
There are a number of changes we intend to make the table:
- Increase the number of fields, as we build out support for more events as "first class citizens". In the table definition above, a number of events are recognised as "first class citizens" with specific fields that only apply to those events: a good example is transaction events. (The accounts for all 13 fields beginning
tr_
andti_
.) For details about the events we intend to support, see the Snowplow Canonical Event Model. - We plan to move the format of data from a single events table into storing fields for particular event types in specific tables, which are joined to the primary Snowplow event table. This will be necessary to accommodate the new events we want to support as "first class citizens".
Home | About | Project | Setup Guide | Technical Docs | Copyright © 2012-2021 Snowplow Analytics Ltd. Documentation terms of use.
HOME » TECHNICAL DOCUMENTATION
1A. Trackers
Overview
ActionScript3 Tracker
Android Tracker
Arduino Tracker
CPP Tracker
Golang Tracker
Google AMP Tracker
iOS Tracker
Java Tracker
JavaScript Tracker
Lua Tracker
.NET Tracker
Node.js Tracker
PHP Tracker
Pixel Tracker
Python Tracker
Ruby Tracker
Scala Tracker
Unity Tracker
Building a Tracker
1B. Webhooks
Iglu webhook adapter
CallRail webhook adapter
MailChimp webhook adapter
Mandrill webhook adapter
PagerDuty webhook adapter
Pingdom webhook adapter
SendGrid webhook adapter
Urban Airship Connect webhook adapter
Mailgun webhook adapter
StatusGator webhook adapter
Unbounce webhook adapter
Olark webhook adapter
Marketo webhook adapter
Vero webhook adapter
2. Collectors
Overview
Cloudfront collector
Clojure collector (Elastic Beanstalk)
Scala Stream Collector
3. Enrich
Overview
EmrEtlRunner
Stream Enrich
Beam Enrich
Snowplow Event Recovery
Hadoop Event Recovery
C. Canonical Snowplow event model
4. Storage
Overview
Relational Database Shredder
Relational Database Loader
S3 Loader
Elasticsearch Loader
Storage in Redshift
Storage in PostgreSQL
Storage in Infobright (deprecated)
D. Snowplow storage formats (to write)
5. Analytics
Analytics-documentation
Event-manifest-populator
Common
Shredding
Artifact repositories