Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more events in customer journey #3

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# name or the intended use of these models
name: 'rudder_sessionization'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'default'
Expand Down
12,239 changes: 12,239 additions & 0 deletions logs/dbt.log.legacy

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions models/rudder/dbt_aliases_mapping.sql
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ Typically a user would login to an application at least once while using the app
select anonymous_id
, user_id
, timestamp as timestamp
from {{ source("<schema>","tracks") }}
from {{ source("rudder_1","tracks") }}

union distinct

select user_id
, null
, timestamp
from {{ source("<schema>","tracks") }}
from {{ source("rudder_1","tracks") }}

)

Expand Down
2 changes: 1 addition & 1 deletion models/rudder/dbt_mapped_tracks.sql
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ select *
,a2v.dbt_visitor_id
,t.timestamp
,t.event as event
from {{ source("<schema>","TRACKS") }} as t
from {{ source("rudder_1","tracks") }} as t
inner join {{ ref('dbt_aliases_mapping') }} as a2v
on a2v.alias = coalesce(t.user_id, t.anonymous_id)
)
21 changes: 21 additions & 0 deletions models/rudder/dbt_tracks_flow.sql
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,16 @@ with derived_table as (
nth_value(event,3 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as third_event,
nth_value(event,4 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as fourth_event,
nth_value(event,5 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as fifth_event,
nth_value(event,6 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as six_event,
nth_value(event,7 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as seven_event,
nth_value(event,8 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as eight_event,
nth_value(event,9 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as ninth_event,
nth_value(event,10 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as ten_event,
nth_value(event,11 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as eleven_event,
nth_value(event,12 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as twelve_event,
nth_value(event,13 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as thirteen_event,
nth_value(event,14 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as fourteen_event,

from {{ ref('dbt_track_facts') }}
)

Expand All @@ -31,4 +41,15 @@ with derived_table as (
, third_event as event_3
, fourth_event as event_4
, fifth_event as event_5
, six_event as event_6
, seven_event as event_7
, eight_event as event_8
, ninth_event as event_9
, ten_event as event_10
, eleven_event as event_11
, twelve_event as event_12
, thirteen_event as event_13
, fourteen_event as event_14


from derived_table a
4 changes: 2 additions & 2 deletions models/rudder/tracks.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
version: 2
sources:
- name: <schema>
- name: "rudder_1"
tables:
- name: TRACKS
- name: tracks
columns:
- name: anonymous_id

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/*

The first intent is to be able to link all events from the same user via some common identifier. Rudder data payload contains two fields anonymous_id and user_id. The first is device specific and generated by the Rudder SDK, the second is the user identifier that can be assigned by application code depending on a user’s registration or login.

Typically a user would login to an application at least once while using the application on a particular device. The user_id should then become available and be included in event messages. Below code creates a unique identifier dbt_visitor_id that links the user_id to the anonymous_id. The analytic function first_value has been used here

*/



with
dbt_all_mappings as (
select anonymous_id
, user_id
, timestamp as timestamp
from `rudderstack-367610`.`rudder_1`.`tracks`

union distinct

select user_id
, null
, timestamp
from `rudderstack-367610`.`rudder_1`.`tracks`

)

select
distinct anonymous_id as alias
,coalesce(first_value(user_id IGNORE NULLS)
over(
partition by anonymous_id
order by timestamp desc
rows between unbounded preceding and unbounded following), user_id, anonymous_id) as dbt_visitor_id
from dbt_all_mappings
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
/*

Use the ID generated while creating dbt_aliases_mapping to link all events for the same user on that device. Also note the idle time between events

*/



select *
,timestamp_diff(cast(timestamp as timestamp), cast(lag(timestamp) over(partition by dbt_visitor_id order by timestamp) as timestamp), minute) as idle_time_minutes
from (
select t.id as event_id
,t.anonymous_id
,a2v.dbt_visitor_id
,t.timestamp
,t.event as event
from `rudderstack-367610`.`rudder_1`.`tracks` as t
inner join `rudderstack-367610`.`dbt_jluiscases`.`dbt_aliases_mapping` as a2v
on a2v.alias = coalesce(t.user_id, t.anonymous_id)
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
/*

Table containing a useful session metric - session duration

*/



select
s1.dbt_visitor_id
, s1.session_id
, timestamp_diff(cast(s2.ended_at as timestamp), cast(s1.session_start_at as timestamp), minute) as session_duration
from
`rudderstack-367610`.`dbt_jluiscases`.`dbt_session_tracks` as s1
LEFT JOIN `rudderstack-367610`.`dbt_jluiscases`.`dbt_session_track_facts` as s2
ON s1.session_id = s2.session_id
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
/*

With the events mapped to sessions (dbt_session_tracks), one can now stipulate the session end time as the timestamp of the last event (in other words, the highest timestamp) in that session. The next table encapsulates this data. It also contains few columns that can be of aid specifically for analytics involving e-commerce sites/applications.

*/




SELECT s.session_id
, cast(MAX(map.timestamp) as timestamp) AS ended_at
, count(distinct map.event_id) AS num_pvs
, count(case when map.event = 'viewed_product' then event_id else null end) as cnt_viewed_product
, count(case when map.event = 'signup' then event_id else null end) as cnt_signup
FROM `rudderstack-367610`.`dbt_jluiscases`.`dbt_session_tracks` AS s
LEFT JOIN `rudderstack-367610`.`dbt_jluiscases`.`dbt_track_facts` as map on map.session_id = s.session_id
GROUP BY 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
/*

A decision is made to treat two events, for the same user, that are separated by 30 minutes or more - as belonging to two different user sessions. The choice of 30 minutes is arbitrary and can be modified as per requirements. Sequence number is assigned to each event within a particular session. Also, the timestamp for the first event in the session is considered as session start time. Start time of the next session is also calculated.

*/



select concat(cast(row_number() over(partition by dbt_visitor_id order by timestamp) AS string), ' - ', dbt_visitor_id) as session_id
, dbt_visitor_id
, timestamp as session_start_at
, row_number() over(partition by dbt_visitor_id order by timestamp) as session_sequence_number
, lead(timestamp) over(partition by dbt_visitor_id order by timestamp) as next_session_start_at
from `rudderstack-367610`.`dbt_jluiscases`.`dbt_mapped_tracks`
where (idle_time_minutes > 30 or idle_time_minutes is null)
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*

Below code creates a table to link the track events to the session they belong to. The session association is established via the user identifier linkage and the user session start timestamp.

So if a user U1 has session S1 with start time as T1 and session S2 with start time as T2 - then event E for user U1 would belong to session S1 if its timestamp falls between T1 and T2 or if T2 is null. The second case occurs for the last recorded session for that user.

*/




select t.anonymous_id
, t.timestamp
, t.event_id
, t.event AS event
, s.session_id
, t.dbt_visitor_id
, row_number() over(partition by s.session_id order by t.timestamp) as track_sequence_number
from `rudderstack-367610`.`dbt_jluiscases`.`dbt_mapped_tracks` as t
inner join `rudderstack-367610`.`dbt_jluiscases`.`dbt_session_tracks` as s
on t.dbt_visitor_id = s.dbt_visitor_id
and t.timestamp >= s.session_start_at
and (t.timestamp < s.next_session_start_at or s.next_session_start_at is null)
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*

We leverage analytic functions like first_value and nth_value to create 5-event sequences that capture the flow of events during a session. 5 can be increased or decreased as per requirements.

*/



with derived_table as (
select
event_id,
session_id,
track_sequence_number,
first_value(event IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as event,
dbt_visitor_id,
timestamp,
nth_value(event,2 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as second_event,
nth_value(event,3 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as third_event,
nth_value(event,4 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as fourth_event,
nth_value(event,5 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as fifth_event,
nth_value(event,6 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as six_event,
nth_value(event,7 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as seven_event,
nth_value(event,8 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as eight_event,
nth_value(event,9 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as ninth_event,
nth_value(event,10 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as ten_event,
nth_value(event,11 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as eleven_event,
nth_value(event,12 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as twelve_event,
nth_value(event,13 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as thirteen_event,
nth_value(event,14 IGNORE NULLS) over(partition by session_id order by track_sequence_number asc) as fourteen_event,

from `rudderstack-367610`.`dbt_jluiscases`.`dbt_track_facts`
)

select event_id
, session_id
, track_sequence_number
, event
, dbt_visitor_id
, cast(timestamp as timestamp) as timestamp
, second_event as event_2
, third_event as event_3
, fourth_event as event_4
, fifth_event as event_5
, six_event as event_6
, seven_event as event_7
, eight_event as event_8
, ninth_event as event_9
, ten_event as event_10
, eleven_event as event_11
, twelve_event as event_12
, thirteen_event as event_13
, fourteen_event as event_14


from derived_table a
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
/*

Table containing a useful metric about user sessions - no. of sessions for each user

*/



SELECT
dbt_visitor_id
, cast(MIN(s.session_start_at) as timestamp) as first_date
, cast(MAX(s.session_start_at) as timestamp) as last_date
, COUNT(*) as number_of_sessions
FROM `rudderstack-367610`.`dbt_jluiscases`.`dbt_session_tracks` as s
LEFT JOIN `rudderstack-367610`.`dbt_jluiscases`.`dbt_session_track_facts` as sf
ON s.session_id = sf.session_id
GROUP BY 1
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@






select session_duration
from `rudderstack-367610`.`dbt_jluiscases`.`dbt_session_duration`
where session_duration is null


Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@






select dbt_visitor_id
from `rudderstack-367610`.`dbt_jluiscases`.`dbt_session_tracks`
where dbt_visitor_id is null


Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@






select session_id
from `rudderstack-367610`.`dbt_jluiscases`.`dbt_session_tracks`
where session_id is null


Binary file added target/graph.gpickle
Binary file not shown.
1 change: 1 addition & 0 deletions target/manifest.json

Large diffs are not rendered by default.

Binary file added target/partial_parse.msgpack
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@




create or replace table `rudderstack-367610`.`dbt_jluiscases`.`dbt_aliases_mapping`


OPTIONS()
as (
/*

The first intent is to be able to link all events from the same user via some common identifier. Rudder data payload contains two fields anonymous_id and user_id. The first is device specific and generated by the Rudder SDK, the second is the user identifier that can be assigned by application code depending on a user’s registration or login.

Typically a user would login to an application at least once while using the application on a particular device. The user_id should then become available and be included in event messages. Below code creates a unique identifier dbt_visitor_id that links the user_id to the anonymous_id. The analytic function first_value has been used here

*/



with
dbt_all_mappings as (
select anonymous_id
, user_id
, timestamp as timestamp
from `rudderstack-367610`.`rudder_1`.`tracks`

union distinct

select user_id
, null
, timestamp
from `rudderstack-367610`.`rudder_1`.`tracks`

)

select
distinct anonymous_id as alias
,coalesce(first_value(user_id IGNORE NULLS)
over(
partition by anonymous_id
order by timestamp desc
rows between unbounded preceding and unbounded following), user_id, anonymous_id) as dbt_visitor_id
from dbt_all_mappings
);

Loading