Skip to content

Ruby Tracker v0.4

Ihor Tomilenko edited this page Apr 27, 2017 · 4 revisions

HOME > SNOWPLOW TECHNICAL DOCUMENTATION > Trackers > Ruby Tracker

This page refers to version 0.4.1 of the Snowplow Ruby Tracker. Documentation for other versions is available:

Version 0.2 Version 0.3 Version 0.5 Version 0.6

Please note that this version of the Ruby Tracker is dependent upon the Snowplow 0.9.14 release. You will need to be running version 0.9.14 or later of Snowplow for events sent by the tracker using POST to be successfully processed. Snowplow 0.9.14 contains updates to the Hadoop Enrich and Scala Hadoop Shred jobs to allow the newer self-describing JSON version which the Ruby Tracker sends for POSTs. For more information, please refer to tickets #1220 and #1231.

Contents

1. Overview

The Snowplow Ruby Tracker allows you to track Snowplow events in your Ruby applications and gems and Ruby on Rails web applications.

The tracker should be straightforward to use if you are comfortable with Ruby development; any prior experience with Snowplow's Python Tracker, JavaScript Tracker, Lua Tracker, Google Analytics or Mixpanel (which have similar APIs to Snowplow) is helpful but not necessary.

The Ruby Tracker and Python Tracker have very similiar functionality and APIs.

There are three main classes which the Ruby Tracker uses: subjects, emitters, and trackers.

A subject represents a single user whose events are tracked, and holds data specific to that user. If your tracker will only be tracking a single user, you don't have to create a subject - it will be done automatically.

A tracker always has one active subject at a time associated with it. It constructs events with that subject and sends them to one or more emitters, which sends them on to a Snowplow collector.

2. Initialization

Assuming you have completed the Ruby Tracker Setup for your Ruby project, you are ready to initialize the Ruby Tracker.

2.1 Requiring the module

Require the Ruby Tracker into your code like this:

require 'snowplow_tracker'

You can now initialize tracker instances.

2.2 Creating a tracker

Initialize a tracker instance like this:

emitter = SnowplowTracker::Emitter.new("d3rkrsqld9gmqf.cloudfront.net")
tracker = SnowplowTracker::Tracker.new(e)

If you wish to send events to more than one emitter, you can provide an array of emitters to the tracker constructor.

This tracker will log events to http://d3rkrsqld9gmqf.cloudfront.net/i. There are four other optional parameters:

def initialize(endpoint, subject=nil, namespace=nil, app_id=nil, encode_base64=true)

subject is a subject with which the tracker is initialized.

namespace is a name for the tracker which will be added to every event the tracker fires. This is useful if you have initialized more than one tracker. app_id is the unique ID for the Ruby application. encode_base64 determines whether JSONs in the querystring for an event will be base64-encoded.

So a more complete tracker initialization example might look like this:

initial_subject = SnowplowTracker::Subject.new
emitter = SnowplowTracker::Emitter.new("d3rkrsqld9gmqf.cloudfront.net")
tracker = SnowplowTracker::Tracker.new(emitter, initial_subject, 'cf', 'ID-ap00035', false)

2.3 Creating multiple trackers

Each tracker instance is completely sandboxed, so you can create multiple trackers as you see fit.

Here is an example of instantiating two separate trackers:

t1 = SnowplowTracker::Tracker.new(SnowplowTracker::AsyncEmitter.new("d3rkrsqld9gmqf.cloudfront.net"), nil, "t1")
t1.set_platform("cnsl")
t1.track_page_view("http://www.example.com")

t2 = SnowplowTracker::Tracker.new(SnowplowTracker::AsyncEmitter.new("my-company.c.snplow.com"), nil, "t2")
t2.set_platform("cnsl")
t2.track_screen_view("Game HUD", "23")

t1.track_screen_view("Test", "23") # Back to first tracker

Back to top

Back to top

3. Adding extra data

You can configure the a tracker instance with additional information about your application's environment or current user. This data will be attached to every event the tracker fires regarding the subject. Here are the available methods:

Function Description
set_platform Set the application platform
set_user_id Set the user ID
set_screen_resolution Set the screen resolution
set_viewport Set the viewport dimensions
set_color_depth Set the screen color depth
set_timezone Set the timezone
set_lang Set the language

There are two ways to call these methods:

  • Call them on a Subject instance. They will update the data associated with that subject and return the subject.
  • Call them on the Tracker instance. They will update the data associated with the currently active subject for that tracker and return the tracker.

For example:

s0 = SnowplowTracker::Subject.new
emitter = SnowplowTracker::Emitter.new("d3rkrsqld9gmqf.cloudfront.net")
my_tracker = SnowplowTracker::Tracker.new(emitter, s0)

# The following two lines are equivalent, except that the first returns s0 and the second returns my_tracker
s0.set_platform('mob')
my_tracker.set_platform('mob')

If you are using multiple subjects, you can use the set_subject tracker method to change which Subject instance is active:

s0 = SnowplowTracker::Subject.new
emitter = SnowplowTracker::Emitter.new("d3rkrsqld9gmqf.cloudfront.net")
my_tracker = SnowplowTracker::Tracker.new(emitter, s0)

# Set the viewport for the active subject, s0
my_tracker.set_viewport(300, 500)

# The data associated with s0 will be sent with this event
my_tracker.track_screen_view('title page')

# Create a new subject
s1 = SnowplowTracker::Subject.new

# Make s1 the active subject and set its viewport
my_tracker.set_subject(s1).set_viewport(600,1000)

# The data associated with s0 will be sent with this event
my_tracker.track_screen_view('another page')

# Change the subject back to s0 and track another event
my_tracker.set_subject(s0).track_screen_view('final page')

3.1 Set the tracker's platform with set_platform

The platform can be any one of 'pc', 'tv', 'mob', 'cnsl', or 'iot'. The default platform is 'srv'.

tracker.set_platform('mob')

3.2 Set the user ID with set_user_id

You can make the user ID a string of your choice:

tracker.set_user_id('user-000563456')

3.3 Set the screen resolution with set_screen_resolution

If your Ruby code has access to the device's screen resolution, you can pass it in to Snowplow. Both numbers should be positive integers; note the order is width followed by height. Example:

tracker.set_screen_resolution(1366, 768)

3.4 Set the viewport dimensions with set_viewport

Similarly, you can pass the viewport dimensions in to Snowplow. Again, both numbers should be positive integers and the order is width followed by height. Example:

tracker.set_viewport(300, 200)

3.5 Set the color depth with set_color_depth

If your Ruby code has access to the bit depth of the device's color palette for displaying images, you can pass it in to Snowplow. The number should be a positive integer, in bits per pixel.

tracker.set_color_depth(24)

3.6 Setting the timezone with set_timezone

If your Ruby code has access to the timezone of the device, you can pass it in to Snowplow:

tracker.set_timezone('Europe London')

3.7 Setting the language with set_lang

You can set the language field like this:

tracker.set_lang('en')

3.8 Setting the IP address with set_ip_address

If you have access to the user's IP address, you can set it like this:

tracker.set_ip_address('34.633.11.139')

3.9 Setting the useragent with set_useragent

If you have access to the user's useragent (sometimes called "browser string"), you can set it like this:

tracker.set_useragent('Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0')

3.10 Setting the domain user ID with set_domain_user_id

The domain_userid field of the Snowplow event model corresponds to the ID stored in the first party cookie set by the Snowplow JavaScript Tracker. If you want to match up server-side events with client-side events, you can set the domain user ID for server-side events like this:

tracker.set_domain_user_id('c7aadf5c60a5dff9')

You can extract the domain user ID from the Ruby on Rails cookies object like this:

def snowplow_cookie
  cookies.find { |(key, value)| key =~ /^_sp_id/ }.last
end

def domain_user_id
  if snowplow_cookie.present?
    snowplow_cookie.split('.').first
  end
end

The first argument is the cookies object (see the documentation).

If you used the "cookieName" configuration option of the Snowplow JavaScript Tracker, replace "sp" with the same string you passed as the cookieName.

3.11 Setting the network user ID with set_network_user_id

The network_user_id field of the Snowplow event model corresponds to the ID stored in the third party cookie set by the Snowplow Clojure Collector. You can set the network user ID for server-side events like this:

tracker.set_network_user_id('ecdff4d0-9175-40ac-a8bb-325c49733607')

Back to top

4. Tracking specific events

Snowplow has been built to enable you to track a wide range of events that occur when users interact with your websites and apps. We are constantly growing the range of functions available in order to capture that data more richly.

Tracking methods supported by the Ruby Tracker at a glance:

Function Description
track_page_view Track and record views of web pages.
track_ecommerce_transaction Track an ecommerce transaction
track_screen_view Track the user viewing a screen within the application
track_struct_event Track a Snowplow custom structured event
track_unstruct_event Track a Snowplow custom unstructured event

4.1 Common

All events are tracked with specific methods on the tracker instance, of the form track_XXX(), where XXX is the name of the event to track.

All tracker methods return the tracker instance, and so are chainable.

4.1.1 Argument validation

Each track_XXX method expects arguments of a certain type. The types are validated using the Ruby Contracts library. If a check fails, a runtime error is thrown. The section for each track_XXX method specifies the expected argument types for that method.

4.1.2 Optional context argument

Each track_XXX method has context as its penultimate optional parameter. This is for an optional nonempty array of self-describing custom context JSONs attached to the event. Each element of the context argument should be a hash whose keys are "schema", containing a pointer to the JSON schema against which the context will be validated, and "data", containing the context data itself. The "data" field should contain a flat hash of key-value pairs.

Important:

  • Even if only one custom context is being attached to an event, it still needs to be wrapped in an array.
  • If you do provide the argument it shouldn't be an empty array. Pass in nil instead of an empty array. Otherwise the context will fail validation.

For example, an array containing two custom contexts relating to the event of a movie poster being viewed:

# Array of contexts
[{
  # First context
  'schema' => 'iglu:com.my_company/movie_poster/jsonschema/1-0-0',
  'data' => {
    'movie_name' => 'Solaris',
    'poster_country' => 'JP',
    'poster_year$dt' => new Date(1978, 1, 1)
  }
},
{
  # Second context
  'schema' => 'iglu:com.my_company/customer/jsonschema/1-0-0',
  'data' => {
      'p_buy' => 0.23,
      'segment' => 'young adult'
  }
}]

The keys of a context hash can be either strings or Ruby symbols.

For more on how to use custom contexts, see the blog post which introduced them.

4.1.3 Optional timestamp argument

After the optional context argument, each track_XXX method supports an optional timestamp as its final argument. This allows you to manually override the timestamp attached to this event. If you do not pass this timestamp in as an argument, then the Ruby Tracker will use the current time to be the timestamp for the event. Timestamp is counted in milliseconds since the Unix epoch - the same format generated by Time.now.to_i * 1000 in Ruby.

4.1.4 Example

Here is an example of a page view event with custom context and timestamp arguments supplied:

tracker.track_page_view('http://www.film_company.com/movie_poster', nil, nil, [{
  # First context
  'schema' => 'iglu:com.my_company/movie_poster/jsonschema/1-0-0',
  'data' => {
    'movie_name' => 'Solaris',
    'poster_country' => 'JP',
    'poster_year$dt' => new Date(1978, 1, 1)
  }
},
{
  # Second context
  'schema' => 'iglu:com.my_company/customer/jsonschema/1-0-0',
  'data' => {
      'p_buy' => 0.23,
      'segment' => 'young adult'
  }
}], 1368725287000)

Track screen views with track_screen_view

Use track_screen_view() to track a user viewing a screen (or equivalent) within your app. Arguments are:

Argument Description Required? Validation
name Human-readable name for this screen Yes String
id Unique identifier for this screen No String
context Custom context No Array[Hash]
tstamp When the screen was viewed No Positive integer

Example:

tracker.track_screen_view("HUD > Save Game", "screen23")

Track page views with track_page_view

Use track_page_view() to track a user viewing a page within your app. Arguments are:

Argument Description Required? Validation
page_url The URL of the page Yes String
page_title The title of the page No String
referrer The address which linked to the page No String
context Custom context No Array[Hash]
tstamp When the pageview occurred No Positive integer

Example:

t.track_page_view("www.example.com", "example", "www.referrer.com")

4.4 Track ecommerce transactions with track-ecommerce-transaction()

Use track_ecommerce_transaction() to track an ecommerce transaction. Arguments:

Argument Description Required? Validation
transaction Data for the whole transaction Yes Hash
items Data for each item Yes Array of hashes
context Custom context No Array[Hash]
tstamp When the transaction event occurred No Positive integer

The transaction argument is a hash containing information about the transaction. Here are the fields supported in this hash:

Field Description Required? Validation
order_id ID of the eCommerce transaction Yes String
total_value Total transaction value Yes Int or Float
affiliation Transaction affiliation No String
tax_value Transaction tax value No Int or Float
shipping Delivery cost charged No Int or Float
city Delivery address city No String
state Delivery address state No String
country Delivery address country No String
currency Transaction currency No String

The transaction parameter might look like this:

{
  'order_id' => '12345'
  'total_value' => 35
  'city' => 'London'
  'country' => 'UK'
  'currency' => 'GBP'
}

The items parameter is an array of hashes. Each hash represents one item in the transaction. Here are the fields supported for each item:

Argument Description Required? Validation
sku Item SKU Yes String
price Item price Yes Int or Float
quantity Item quantity Yes Int
name Item name No String
category Item category No String
context Custom context No Array[Hash]

The items parameter might look like that:

[{
  'sku' => 'pbz0026',
  'price' => 20,
  'quantity' => 1,
  'category' => 'film'
},
{
  'sku' => 'pbz0038',
  'price' => 15,
  'quantity' => 1,
  'name' => 'red shoes'
}]

The whole method call would look like this:

tracker.track_ecommerce_transaction({
  'order_id' => '12345'
  'total_value' => 35
  'city' => 'London'
  'country' => 'UK'
  'currency' => 'GBP'
},
[{
  'sku' => 'pbz0026',
  'price' => 20,
  'quantity' => 1,
  'category' => 'film'
},
{
  'sku' => 'pbz0038',
  'price' => 15,
  'quantity' => 1,
  'name' => 'red shoes'
}])

This will fire three events: one for the transaction as a whole, which will include the fields in the transaction argument, and one for each item. The order_id and currency fields in the transaction argument will also be attached to each the items' events.

All three events will have the same timestamp and same randomly generated Snowplow transaction ID.

Note that each item in the transaction can have its own custom context.

4.5 Track structured events with track_struct_event

Use track_struct_event() to track a custom event happening in your app which fits the Google Analytics-style structure of having up to five fields (with only the first two required):

Argument Description Required? Validation
category The grouping of structured events which this action belongs to Yes String
action Defines the type of user interaction which this event involves Yes String
label A string to provide additional dimensions to the event data No String
property A string describing the object or the action performed on it No String
value A value to provide numerical data about the event No Int or Float
context Custom context No Array[Hash]
tstamp When the structured event occurred No Positive integer

Example:

tracker.track_struct_event("shop", "add-to-basket", nil, "pcs", 2)

4.6 Track unstructured events with track_unstruct_event

Use track_unstruct_event() to track a custom event which consists of a name and an unstructured set of properties. This is useful when:

  • You want to track event types which are proprietary/specific to your business (i.e. not already part of Snowplow), or
  • You want to track events which have unpredictable or frequently changing properties

The arguments are as follows:

Argument Description Required? Validation
event_json The properties of the event Yes Hash
context Custom context No Array[Hash]
tstamp When the unstructured event occurred No Positive integer

Example:

tracker.track_unstruct_event({
  "schema" => "com.example_company/save_game/jsonschema/1-0-2",
  "data" => {
    "saveId" => "4321",
    "level" => 23,
    "difficultyLevel" => "HARD",
    "dlContent" => true
  }
})

The event_json argument is self-describing JSON. It has two fields: "schema", containing a pointer to the JSON schema for the event, and "data", containing the event data itself. The data field must be flat: properties cannot be nested.

The keys of the event_json hash can be either strings or Ruby symbols.

Back to top

5. Emitters

Tracker instances must be initialized with an emitter. This section will go into more depth about the Emitter and AsyncEmitter classes.

5.1. Overview

Each tracker instance must now be initialized with an Emitter which is responsible for firing events to a Collector. An Emitter instance is initialized with two arguments: an endpoint and an optional configuration hash.

A simple example with just an endpoint:

# Create an emitter
my_emitter = SnowplowTracker::Emitter.new('d3rkrsqld9gmqf.cloudfront.net')

A complicated example using every setting:

# Create an emitter
my_emitter = SnowplowTracker::Emitter.new('d3rkrsqld9gmqf.cloudfront.net', {
  :protocol => 'https',
  :method => 'post',
  :port => 80,
  :buffer_size => 0,
  :on_success => lambda { |success_count|
    puts '#{success_count} events sent successfully'
  },
  :on_failure => lambda { |success_count, failures|
    puts '#{success_count} events sent successfully, #{failures.size} events sent unsuccessfully'
  }
})

Every setting in the configuration hash is optional. Here is what they do:

  • :protocol determines whether events will be sent using HTTP or HTTPS. It defaults to "http".
  • :method determines whether events will be sent using GET or POST. It defaults to "get".
  • :port determines the port to use. If you wish to set events over HTTPS, you should usually set it to 443.
  • :buffer_size is the number of events which will be buffered before they are all sent simultaneously. The process of sending all buffered events is called "flushing". When using GET, buffer_size defaults to 0 because each request can only contain one event. When using POST, buffer_size defaults to 10, and the buffered events are all sent together in a single request.
  • :on_success is a callback which is called every time the buffer is flushed and every event in it is sent successfully (meaning with status code 200). It should accept one argument: the number of requests sent this way.
  • on_failure is a callback which is called if the buffer is flushed but not every event is sent successfully. It should accept two arguments: the number of successfully sent events and an array containing the unsuccessful events.

5.2. The AsyncEmitter class

AsyncEmitter is a subclass of Emitter. It's API is exactly the same. It's advantage is that it always creates a new thread to flush its buffer, so requests are sent asynchronously.

A note on testing: if you test the AsyncEmitter by using a short script to send an event, you may find that the event fails to send. This is because the process exits before the flushing thread is finished. You can get round this either by adding a sleep(10) to the end of your script, or by using the synchronous flush.

5.3. Multiple emitters

It is possible to initialize a tracker with an array of emitters, in which case events will be sent to all of them:

# Create a tracker with multiple emitters
my_tracker = SnowplowTracker::Tracker.new([my_sync_emitter, my_async_emitter], 'my_tracker_name', 'my_app_id')

You can also add new emitters after creating a tracker with the add_emitter method:

# Create a tracker with multiple emitters
my_tracker.add_emitter(another_emitter)

5.4. Flushing manually

You may want to force an emitter to send all events in its buffer, even if the buffer is not full. The Tracker class has a flush method which flushes all its emitters. It accepts one argument, sync, which defaults to false. If you set sync to true, the flush will be synchronous: it will block until all flushing threads are finished.

# Asynchronous flush
my_tracker.flush

# Synchronous flush
my_tracker.flush(true)

6. Contracts

The Snowplow Ruby Tracker uses the Ruby Contracts gem for typechecking. Contracts are enabled by default but can be turned on or off:

# Turn contracts off
SnowplowTracker::disable_contracts

# Turn contracts back on
SnowplowTracker::enable_contracts

7. Logging

The emitters.rb module has Ruby logging enabled to give you information about requests being sent. The logger prints messages about what emitters are doing. By default, only messages with priority "INFO" or higher will be logged.

To change this:

require 'logger'
SnowplowTracker::LOGGER.level = Logger::DEBUG

The levels are:

Level Description
FATAL Nothing logged
WARN Notification for requests with status code not equal to 200
INFO Notification for all requests
DEBUG Contents of all requests

Back to top

8. Advanced usage

This section covers more advanced techniques with the Snowplow Ruby Tracker.

8.1. snowplow_ruby_duid

snowplow_ruby_duid is a Ruby gem that allows you to populate Snowplow's domain_userid cookie server-side from any Rack-based framework. This is useful if you want to fire an event on the user's initial request with the domain_userid already populated.

Back to top

Clone this wiki locally
You can’t perform that action at this time.