New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loader: Create the events table on startup #1034
Comments
CREATE TABLE IF NOT EXISTS snowplow.events (
-- App
app_id VARCHAR(255),
platform VARCHAR(255),
-- Date/time
etl_tstamp TIMESTAMP,
collector_tstamp TIMESTAMP NOT NULL,
dvce_created_tstamp TIMESTAMP,
-- Event
event VARCHAR(128),
event_id CHAR(36) NOT NULL,
txn_id INTEGER,
-- Namespacing and versioning
name_tracker VARCHAR(128),
v_tracker VARCHAR(100),
v_collector VARCHAR(100) NOT NULL,
v_etl VARCHAR(100) NOT NULL,
-- User and visit
user_id VARCHAR(255),
user_ipaddress VARCHAR(128),
user_fingerprint VARCHAR(128),
domain_userid VARCHAR(128),
domain_sessionidx SMALLINT,
network_userid VARCHAR(128),
-- Location
geo_country CHAR(2),
geo_region CHAR(3),
geo_city VARCHAR(75),
geo_zipcode VARCHAR(15),
geo_latitude DOUBLE,
geo_longitude DOUBLE,
geo_region_name VARCHAR(100),
-- IP lookups
ip_isp VARCHAR(100),
ip_organization VARCHAR(128),
ip_domain VARCHAR(128),
ip_netspeed VARCHAR(100),
-- Page
page_url VARCHAR(4096),
page_title VARCHAR(2000),
page_referrer VARCHAR(4096),
-- Page URL components
page_urlscheme VARCHAR(16),
page_urlhost VARCHAR(255),
page_urlport INTEGER,
page_urlpath VARCHAR(3000),
page_urlquery VARCHAR(6000),
page_urlfragment VARCHAR(3000),
-- Referrer URL components
refr_urlscheme VARCHAR(16),
refr_urlhost VARCHAR(255),
refr_urlport INTEGER,
refr_urlpath VARCHAR(6000),
refr_urlquery VARCHAR(6000),
refr_urlfragment VARCHAR(3000),
-- Referrer details
refr_medium VARCHAR(25),
refr_source VARCHAR(50),
refr_term VARCHAR(255),
-- Marketing
mkt_medium VARCHAR(255),
mkt_source VARCHAR(255),
mkt_term VARCHAR(255),
mkt_content VARCHAR(500),
mkt_campaign VARCHAR(255),
-- Custom structured event
se_category VARCHAR(1000),
se_action VARCHAR(1000),
se_label VARCHAR(4096),
se_property VARCHAR(1000),
se_value DOUBLE,
-- Ecommerce
tr_orderid VARCHAR(255),
tr_affiliation VARCHAR(255),
tr_total DECIMAL(18,2),
tr_tax DECIMAL(18,2),
tr_shipping DECIMAL(18,2),
tr_city VARCHAR(255),
tr_state VARCHAR(255),
tr_country VARCHAR(255),
ti_orderid VARCHAR(255),
ti_sku VARCHAR(255),
ti_name VARCHAR(255),
ti_category VARCHAR(255),
ti_price DECIMAL(18,2),
ti_quantity INTEGER,
-- Page ping
pp_xoffset_min INTEGER,
pp_xoffset_max INTEGER,
pp_yoffset_min INTEGER,
pp_yoffset_max INTEGER,
-- User Agent
useragent VARCHAR(1000),
-- Browser
br_name VARCHAR(50),
br_family VARCHAR(50),
br_version VARCHAR(50),
br_type VARCHAR(50),
br_renderengine VARCHAR(50),
br_lang VARCHAR(255),
br_features_pdf BOOLEAN,
br_features_flash BOOLEAN,
br_features_java BOOLEAN,
br_features_director BOOLEAN,
br_features_quicktime BOOLEAN,
br_features_realplayer BOOLEAN,
br_features_windowsmedia BOOLEAN,
br_features_gears BOOLEAN,
br_features_silverlight BOOLEAN,
br_cookies BOOLEAN,
br_colordepth VARCHAR(12),
br_viewwidth INTEGER,
br_viewheight INTEGER,
-- Operating System
os_name VARCHAR(50),
os_family VARCHAR(50),
os_manufacturer VARCHAR(50),
os_timezone VARCHAR(255),
-- Device/Hardware
dvce_type VARCHAR(50),
dvce_ismobile BOOLEAN,
dvce_screenwidth INTEGER,
dvce_screenheight INTEGER,
-- Document
doc_charset VARCHAR(128),
doc_width INTEGER,
doc_height INTEGER,
-- Currency
tr_currency CHAR(3),
tr_total_base DECIMAL(18, 2),
tr_tax_base DECIMAL(18, 2),
tr_shipping_base DECIMAL(18, 2),
ti_currency CHAR(3),
ti_price_base DECIMAL(18, 2),
base_currency CHAR(3),
-- Geolocation
geo_timezone VARCHAR(64),
-- Click ID
mkt_clickid VARCHAR(128),
mkt_network VARCHAR(64),
-- ETL tags
etl_tags VARCHAR(500),
-- Time event was sent
dvce_sent_tstamp TIMESTAMP,
-- Referer
refr_domain_userid VARCHAR(128),
refr_dvce_tstamp TIMESTAMP,
-- Session ID
domain_sessionid CHAR(128),
-- Derived timestamp
derived_tstamp TIMESTAMP,
-- Event schema
event_vendor VARCHAR(1000),
event_name VARCHAR(1000),
event_format VARCHAR(128),
event_version VARCHAR(128),
-- Event fingerprint
event_fingerprint VARCHAR(128),
-- True timestamp
true_tstamp TIMESTAMP,
-- Collector timestamp date for partitioning
collector_tstamp_date DATE GENERATED ALWAYS AS (DATE(collector_tstamp))
)
PARTITIONED BY (collector_tstamp_date, event_name); |
spenes
added a commit
that referenced
this issue
Nov 27, 2022
spenes
added a commit
that referenced
this issue
Dec 15, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, the loader assumes that the events table already exists. This means that a snowplow user getting started for the first time needs to seek out the sql needed to create the table. For Redshift it's over here but that's not obvious. For Snowflake it's over here and that's even less obvious. And for Databricks I cannot find it anywhere.
The loader already creates many other tables needed for operation, e.g. the manifest table, and for redshift the entities tables. So it is already expected that the loader runs with sufficient permissions that it could create the events table if we let it.
So.... let's just create the events table on startup. It is a fairly easy code change and yet it will have a big impact on Snowplow users getting set up with the loader for the first time.
The text was updated successfully, but these errors were encountered: