Ruby Tracker v0.5

This documentation is for an old version of this tracker!

🚧 The documentation for the latest version can be found on the Snowplow documentation site.

This page refers to version 0.5.0 of the Snowplow Ruby Tracker.

Please note that this version of the Ruby Tracker is dependent upon the Snowplow 0.9.14 release. You will need to be running version 0.9.14 or later of Snowplow for events sent by the tracker using POST to be successfully processed. Snowplow 0.9.14 contains updates to the Hadoop Enrich and Scala Hadoop Shred jobs to allow the newer self-describing JSON version which the Ruby Tracker sends for POSTs. For more information, please refer to tickets #1220 and #1231.

1. Overview
1. Initialization
- 2.1 Requiring the module
- 2.2 Creating a tracker
- 2.3 Creating multiple trackers
3 Adding extra data
- 3.1 set_platform
- 3.2 set_user_id
- 3.3 set_screen_resolution
- 3.4 set_viewport
- 3.5 set_color_depth
- 3.6 set_timezone
- 3.7 set_lang
- 3.8 set_ip_address
- 3.9 set_useragent
- 3.10 set_domain_user_id
- 3.11 set_network_user_id
- 3.12 set_fingerprint
1. Tracking specific events
- 4.1 Common
  - 4.1.1 Argument validation
  - 4.1.2 Optional context argument
  - 4.1.3 Optional timestamp argument
  - 4.1.4 Example
- 4.2 track_screen_view
- 4.3 track_page_view
- 4.4 track_ecommerce_transaction
- 4.5 track_struct_event
- 4.6 track_unstruct_event
1. Emitters
- 5.1 Overview
- 5.2 The AsyncEmitter class
- 5.3 Multiple emitters
- 5.4 Manual flushing
- 5.5 Automatically retry sending failed events
6 Contracts
7 Logging
8 Advanced usage
- 8.1 snowplow_ruby_duid

1. Overview

The Snowplow Ruby Tracker allows you to track Snowplow events in your Ruby applications and gems and Ruby on Rails web applications.

The tracker should be straightforward to use if you are comfortable with Ruby development; any prior experience with Snowplow's Python Tracker, JavaScript Tracker, Lua Tracker, Google Analytics or Mixpanel (which have similar APIs to Snowplow) is helpful but not necessary.

The Ruby Tracker and Python Tracker have very similiar functionality and APIs.

There are three main classes which the Ruby Tracker uses: subjects, emitters, and trackers.

A subject represents a single user whose events are tracked, and holds data specific to that user. If your tracker will only be tracking a single user, you don't have to create a subject - it will be done automatically.

A tracker always has one active subject at a time associated with it. It constructs events with that subject and sends them to one or more emitters, which sends them on to a Snowplow collector.

2. Initialization

Assuming you have completed the Ruby Tracker Setup for your Ruby project, you are ready to initialize the Ruby Tracker.

2.1 Requiring the module

Require the Ruby Tracker into your code like this:

require 'snowplow_tracker'

You can now initialize tracker instances.

2.2 Creating a tracker

Initialize a tracker instance like this:

emitter = SnowplowTracker::Emitter.new("my-collector.cloudfront.net")
tracker = SnowplowTracker::Tracker.new(e)

If you wish to send events to more than one emitter, you can provide an array of emitters to the tracker constructor.

This tracker will log events to http://my-collector.cloudfront.net/i. There are four other optional parameters:

def initialize(endpoint, subject=nil, namespace=nil, app_id=nil, encode_base64=true)

subject is a subject with which the tracker is initialized.

namespace is a name for the tracker which will be added to every event the tracker fires. This is useful if you have initialized more than one tracker. app_id is the unique ID for the Ruby application. encode_base64 determines whether JSONs in the querystring for an event will be base64-encoded.

So a more complete tracker initialization example might look like this:

initial_subject = SnowplowTracker::Subject.new
emitter = SnowplowTracker::Emitter.new("my-collector.cloudfront.net")
tracker = SnowplowTracker::Tracker.new(emitter, initial_subject, 'cf', 'ID-ap00035', false)

2.3 Creating multiple trackers

Each tracker instance is completely sandboxed, so you can create multiple trackers as you see fit.

Here is an example of instantiating two separate trackers:

t1 = SnowplowTracker::Tracker.new(SnowplowTracker::AsyncEmitter.new("my-collector.cloudfront.net"), nil, "t1")
t1.set_platform("cnsl")
t1.track_page_view("http://www.example.com")

t2 = SnowplowTracker::Tracker.new(SnowplowTracker::AsyncEmitter.new("my-company.c.snplow.com"), nil, "t2")
t2.set_platform("cnsl")
t2.track_screen_view("Game HUD", "23")

t1.track_screen_view("Test", "23") # Back to first tracker

Back to top

3. Adding extra data

You can configure the a tracker instance with additional information about your application's environment or current user. This data will be attached to every event the tracker fires regarding the subject. Here are the available methods:

Function	Description
`set_platform`	Set the application platform
`set_user_id`	Set the user ID
`set_screen_resolution`	Set the screen resolution
`set_viewport`	Set the viewport dimensions
`set_color_depth`	Set the screen color depth
`set_timezone`	Set the timezone
`set_lang`	Set the language

There are two ways to call these methods:

Call them on a Subject instance. They will update the data associated with that subject and return the subject.
Call them on the Tracker instance. They will update the data associated with the currently active subject for that tracker and return the tracker.

For example:

s0 = SnowplowTracker::Subject.new
emitter = SnowplowTracker::Emitter.new("my-collector.cloudfront.net")
my_tracker = SnowplowTracker::Tracker.new(emitter, s0)

# The following two lines are equivalent, except that the first returns s0 and the second returns my_tracker
s0.set_platform('mob')
my_tracker.set_platform('mob')

If you are using multiple subjects, you can use the set_subject tracker method to change which Subject instance is active:

s0 = SnowplowTracker::Subject.new
emitter = SnowplowTracker::Emitter.new("my-collector.cloudfront.net")
my_tracker = SnowplowTracker::Tracker.new(emitter, s0)

# Set the viewport for the active subject, s0
my_tracker.set_viewport(300, 500)

# The data associated with s0 will be sent with this event
my_tracker.track_screen_view('title page')

# Create a new subject
s1 = SnowplowTracker::Subject.new

# Make s1 the active subject and set its viewport
my_tracker.set_subject(s1).set_viewport(600,1000)

# The data associated with s0 will be sent with this event
my_tracker.track_screen_view('another page')

# Change the subject back to s0 and track another event
my_tracker.set_subject(s0).track_screen_view('final page')

3.1 Set the tracker's platform with `set_platform`

The platform can be any one of 'pc', 'tv', 'mob', 'cnsl', or 'iot'. The default platform is 'srv'.

tracker.set_platform('mob')

3.2 Set the user ID with `set_user_id`

You can make the user ID a string of your choice:

tracker.set_user_id('user-000563456')

3.3 Set the screen resolution with `set_screen_resolution`

If your Ruby code has access to the device's screen resolution, you can pass it in to Snowplow. Both numbers should be positive integers; note the order is width followed by height. Example:

tracker.set_screen_resolution(1366, 768)

3.4 Set the viewport dimensions with `set_viewport`

Similarly, you can pass the viewport dimensions in to Snowplow. Again, both numbers should be positive integers and the order is width followed by height. Example:

tracker.set_viewport(300, 200)

3.5 Set the color depth with `set_color_depth`

If your Ruby code has access to the bit depth of the device's color palette for displaying images, you can pass it in to Snowplow. The number should be a positive integer, in bits per pixel.

tracker.set_color_depth(24)

3.6 Setting the timezone with `set_timezone`

If your Ruby code has access to the timezone of the device, you can pass it in to Snowplow:

tracker.set_timezone('Europe London')

3.7 Setting the language with `set_lang`

You can set the language field like this:

tracker.set_lang('en')

3.8 Setting the IP address with `set_ip_address`

If you have access to the user's IP address, you can set it like this:

tracker.set_ip_address('34.633.11.139')

3.9 Setting the useragent with `set_useragent`

If you have access to the user's useragent (sometimes called "browser string"), you can set it like this:

tracker.set_useragent('Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0')

3.10 Setting the domain user ID with `set_domain_user_id`

The domain_userid field of the Snowplow event model corresponds to the ID stored in the first party cookie set by the Snowplow JavaScript Tracker. If you want to match up server-side events with client-side events, you can set the domain user ID for server-side events like this:

tracker.set_domain_user_id('c7aadf5c60a5dff9')

You can extract the domain user ID from the Ruby on Rails cookies object like this:

def snowplow_cookie
  cookies.find { |(key, value)| key =~ /^_sp_id/ }.last
end

def domain_user_id
  if snowplow_cookie.present?
    snowplow_cookie.split('.').first
  end
end

The first argument is the cookies object (see the documentation).

If you used the "cookieName" configuration option of the Snowplow JavaScript Tracker, replace "sp" with the same string you passed as the cookieName.

3.11 Setting the network user ID with `set_network_user_id`

The network_user_id field of the Snowplow event model corresponds to the ID stored in the third party cookie set by the Snowplow Clojure Collector. You can set the network user ID for server-side events like this:

tracker.set_network_user_id('ecdff4d0-9175-40ac-a8bb-325c49733607')

3.12 Setting the user fingerprint with `set_fingerprint`

The JavaScript Tracker generates a fingerprint based on browser features and attaches it to all client-side events. You can set the user fingerprint field for server-sie events like this:

tracker.set_fingerprint(164502195)

Back to top

4. Tracking specific events

Snowplow has been built to enable you to track a wide range of events that occur when users interact with your websites and apps. We are constantly growing the range of functions available in order to capture that data more richly.

Tracking methods supported by the Ruby Tracker at a glance:

Function	Description
`track_page_view`	Track and record views of web pages.
`track_ecommerce_transaction`	Track an ecommerce transaction
`track_screen_view`	Track the user viewing a screen within the application
`track_struct_event`	Track a Snowplow custom structured event
`track_unstruct_event`	Track a Snowplow custom unstructured event

4.1 Common

All events are tracked with specific methods on the tracker instance, of the form track_XXX(), where XXX is the name of the event to track.

All tracker methods return the tracker instance, and so are chainable.

4.1.1 Argument validation

Each track_XXX method expects arguments of a certain type. The types are validated using the Ruby Contracts library. If a check fails, a runtime error is thrown. The section for each track_XXX method specifies the expected argument types for that method.

4.1.2 Optional context argument

Each track_XXX method has context as its penultimate optional parameter. This is for an optional nonempty array of self-describing custom context JSONs attached to the event. Each element of the context argument should be a SelfDescribingJson with two fields: the "schema", pointing to the JSON schema against which the context will be validated, and the "data", containing the context data itself. The "data" field should contain a flat hash of key-value pairs.

Important:

Even if only one custom context is being attached to an event, it still needs to be wrapped in an array.

For example, an array containing two custom contexts relating to the event of a movie poster being viewed:

# Array of contexts
[
  # First context
  SnowplowTracker::SelfDescribingJson.new(
    'iglu:com.my_company/movie_poster/jsonschema/1-0-0',
    {
      'movie_name' => 'Solaris',
      'poster_country' => 'JP',
      'poster_year$dt' => new Date(1978, 1, 1)
    }
  ),

  # Second context
  SnowplowTracker::SelfDescribingJson.new(
    'iglu:com.my_company/customer/jsonschema/1-0-0',
    {
      'p_buy' => 0.23,
      'segment' => 'young adult'
    }
  )
]

The keys of a context hash can be either strings or Ruby symbols.

For more on how to use custom contexts, see the blog post which introduced them.

4.1.3 Optional timestamp argument

After the optional context argument, each track_XXX method supports an optional timestamp as its final argument. This allows you to manually override the timestamp attached to this event. If you do not pass this timestamp in as an argument, then the Ruby Tracker will use the current time to be the timestamp for the event. Timestamp is counted in milliseconds since the Unix epoch - the same format generated by Time.now.to_i * 1000 in Ruby.

4.1.4 Example

Here is an example of a page view event with custom context and timestamp arguments supplied:

tracker.track_page_view('http://www.film_company.com/movie_poster', nil, nil, [
  # First context
  SnowplowTracker::SelfDescribingJson.new(
    'iglu:com.my_company/movie_poster/jsonschema/1-0-0',
    {
      'movie_name' => 'Solaris',
      'poster_country' => 'JP',
      'poster_year$dt' => new Date(1978, 1, 1)
    }
  ),

  # Second context
  SnowplowTracker::SelfDescribingJson.new(
    'iglu:com.my_company/customer/jsonschema/1-0-0',
    {
      'p_buy' => 0.23,
      'segment' => 'young adult'
    }
  )
], 1368725287000)

Track screen views with `track_screen_view`

Use track_screen_view() to track a user viewing a screen (or equivalent) within your app. Arguments are:

Argument	Description	Required?	Validation
`name`	Human-readable name for this screen	Yes	String
`id`	Unique identifier for this screen	No	String
`context`	Custom context	No	Array[SelfDescribingJson]
`tstamp`	When the screen was viewed	No	Positive integer

Example:

tracker.track_screen_view("HUD > Save Game", "screen23")

Track page views with `track_page_view`

Use track_page_view() to track a user viewing a page within your app. Arguments are:

Argument	Description	Required?	Validation
`page_url`	The URL of the page	Yes	String
`page_title`	The title of the page	No	String
`referrer`	The address which linked to the page	No	String
`context`	Custom context	No	Array[SelfDescribingJson]
`tstamp`	When the pageview occurred	No	Positive integer

Example:

t.track_page_view("www.example.com", "example", "www.referrer.com")

4.4 Track ecommerce transactions with `track-ecommerce-transaction()`

Use track_ecommerce_transaction() to track an ecommerce transaction. Arguments:

Argument	Description	Required?	Validation
`transaction`	Data for the whole transaction	Yes	Hash
`items`	Data for each item	Yes	Array of hashes
`context`	Custom context	No	Array[SelfDescribingJson]
`tstamp`	When the transaction event occurred	No	Positive integer

The transaction argument is a hash containing information about the transaction. Here are the fields supported in this hash:

Field	Description	Required?	Validation
`order_id`	ID of the eCommerce transaction	Yes	String
`total_value`	Total transaction value	Yes	Int or Float
`affiliation`	Transaction affiliation	No	String
`tax_value`	Transaction tax value	No	Int or Float
`shipping`	Delivery cost charged	No	Int or Float
`city`	Delivery address city	No	String
`state`	Delivery address state	No	String
`country`	Delivery address country	No	String
`currency`	Transaction currency	No	String

The transaction parameter might look like this:

{
  'order_id' => '12345'
  'total_value' => 35
  'city' => 'London'
  'country' => 'UK'
  'currency' => 'GBP'
}

The items parameter is an array of hashes. Each hash represents one item in the transaction. Here are the fields supported for each item:

Argument	Description	Required?	Validation
`sku`	Item SKU	Yes	String
`price`	Item price	Yes	Int or Float
`quantity`	Item quantity	Yes	Int
`name`	Item name	No	String
`category`	Item category	No	String
`context`	Custom context	No	Array[SelfDescribingJson]

The items parameter might look like that:

[{
  'sku' => 'pbz0026',
  'price' => 20,
  'quantity' => 1,
  'category' => 'film'
},
{
  'sku' => 'pbz0038',
  'price' => 15,
  'quantity' => 1,
  'name' => 'red shoes'
}]

The whole method call would look like this:

tracker.track_ecommerce_transaction({
  'order_id' => '12345'
  'total_value' => 35
  'city' => 'London'
  'country' => 'UK'
  'currency' => 'GBP'
},
[{
  'sku' => 'pbz0026',
  'price' => 20,
  'quantity' => 1,
  'category' => 'film'
},
{
  'sku' => 'pbz0038',
  'price' => 15,
  'quantity' => 1,
  'name' => 'red shoes'
}])

This will fire three events: one for the transaction as a whole, which will include the fields in the transaction argument, and one for each item. The order_id and currency fields in the transaction argument will also be attached to each the items' events.

All three events will have the same timestamp and same randomly generated Snowplow transaction ID.

Note that each item in the transaction can have its own custom context.

4.5 Track structured events with `track_struct_event`

Use track_struct_event() to track a custom event happening in your app which fits the Google Analytics-style structure of having up to five fields (with only the first two required):

Argument	Description	Required?	Validation
`category`	The grouping of structured events which this `action` belongs to	Yes	String
`action`	Defines the type of user interaction which this event involves	Yes	String
`label`	A string to provide additional dimensions to the event data	No	String
`property`	A string describing the object or the action performed on it	No	String
`value`	A value to provide numerical data about the event	No	Int or Float
`context`	Custom context	No	Array[SelfDescribingJson]
`tstamp`	When the structured event occurred	No	Positive integer

Example:

tracker.track_struct_event("shop", "add-to-basket", nil, "pcs", 2)

4.6 Track unstructured events with `track_unstruct_event`

Use track_unstruct_event() to track a custom event which consists of a name and an unstructured set of properties. This is useful when:

You want to track event types which are proprietary/specific to your business (i.e. not already part of Snowplow), or
You want to track events which have unpredictable or frequently changing properties

The arguments are as follows:

Argument	Description	Required?	Validation
`event_json`	The properties of the event	Yes	SelfDescribingJson
`context`	Custom context	No	Array[SelfDescribingJson]
`tstamp`	When the unstructured event occurred	No	Positive integer

Example:

tracker.track_unstruct_event(SnowplowTracker::SelfDescribingJson.new(
  "com.example_company/save_game/jsonschema/1-0-2",
  {
    "saveId" => "4321",
    "level" => 23,
    "difficultyLevel" => "HARD",
    "dlContent" => true
  }
))

The event_json argument is SelfDescribingJson. It has two fields: "schema", containing a pointer to the JSON schema for the event, and "data", containing the event data itself. The data field must be flat: properties cannot be nested.

The keys of the event_json hash can be either strings or Ruby symbols.

Back to top

5. Emitters

Tracker instances must be initialized with an emitter. This section will go into more depth about the Emitter and AsyncEmitter classes.

5.1. Overview

Each tracker instance must now be initialized with an Emitter which is responsible for firing events to a Collector. An Emitter instance is initialized with two arguments: an endpoint and an optional configuration hash.

A simple example with just an endpoint:

# Create an emitter
my_emitter = SnowplowTracker::Emitter.new('my-collector.cloudfront.net')

A complicated example using every setting:

# Create an emitter
my_emitter = SnowplowTracker::AsyncEmitter.new('my-collector.cloudfront.net', {
  :protocol => 'https',
  :method => 'post',
  :port => 80,
  :buffer_size => 0,
  :on_success => lambda { |success_count|
    puts '#{success_count} events sent successfully'
  },
  :on_failure => lambda { |success_count, failures|
    puts '#{success_count} events sent successfully, #{failures.size} events sent unsuccessfully'
  },
  :thread_count => 10
})

Every setting in the configuration hash is optional. Here is what they do:

:protocol determines whether events will be sent using HTTP or HTTPS. It defaults to "http".
:method determines whether events will be sent using GET or POST. It defaults to "get".
:port determines the port to use. If you wish to set events over HTTPS, you should usually set it to 443.
:buffer_size is the number of events which will be buffered before they are all sent simultaneously. The process of sending all buffered events is called "flushing". When using GET, buffer_size defaults to 0 because each request can only contain one event. When using POST, buffer_size defaults to 10, and the buffered events are all sent together in a single request.
:on_success is a callback which is called every time the buffer is flushed and every event in it is sent successfully (meaning with status code 200). It should accept one argument: the number of requests sent this way.
on_failure is a callback which is called if the buffer is flushed but not every event is sent successfully. It should accept two arguments: the number of successfully sent events and an array containing the unsuccessful events.
thread_count is only used by the AsyncEmitter. It determines the number of worker threads which will be used to send events.

5.2. The AsyncEmitter class

AsyncEmitter is a subclass of Emitter. Whenever the buffer is flushed, the AsyncEmitter places the flushed events in a work queue. The AsyncEmitter asynchronously sends events in this queue using a thread pool of a fixed size. You can choose the size of this thread pool with the thread_count field:

AsyncEmitter.new(ENDPOINT, {
  thread_count: 5
})

By default this value is 1.

A note on testing: if you test the AsyncEmitter by using a short script to send an event, you may find that the event fails to send. This is because the process exits before the flushing thread is finished. You can get round this either by adding a sleep(10) to the end of your script, or by using the synchronous flush.

5.3. Multiple emitters

It is possible to initialize a tracker with an array of emitters, in which case events will be sent to all of them:

# Create a tracker with multiple emitters
my_tracker = SnowplowTracker::Tracker.new([my_sync_emitter, my_async_emitter], 'my_tracker_name', 'my_app_id')

You can also add new emitters after creating a tracker with the add_emitter method:

# Create a tracker with multiple emitters
my_tracker.add_emitter(another_emitter)

5.4. Manual flushing

You may want to force an emitter to send all events in its buffer, even if the buffer is not full. The Tracker class has a flush method which flushes all its emitters. It accepts one argument, async, which defaults to false. Unless you set async to true, the flush will be synchronous: it will block until all queued events have been sent.

# Asynchronous flush
my_tracker.flush(true)

# Synchronous flush
my_tracker.flush

5.5 Automatically retry sending failed events

You can use the following function as the on_failure callback to immediately retry failed events:

def on_failure_retry(failed_event_count, failed_events)
  # possible backoff-and-retry timeout here
  failed_events.each do |e|
    my_emitter.input(e)
  end
end

You may wish to add backoff logic to delay the resending.

6. Contracts

The Snowplow Ruby Tracker uses the Ruby Contracts gem for typechecking. Contracts are enabled by default but can be turned on or off:

# Turn contracts off
SnowplowTracker::disable_contracts

# Turn contracts back on
SnowplowTracker::enable_contracts

7. Logging

The emitters.rb module has Ruby logging enabled to give you information about requests being sent. The logger prints messages about what emitters are doing. By default, only messages with priority "INFO" or higher will be logged.

To change this:

require 'logger'
SnowplowTracker::LOGGER.level = Logger::DEBUG

The levels are:

Level	Description
`FATAL`	Nothing logged
`WARN`	Notification for requests with status code not equal to 200
`INFO`	Notification for all requests
`DEBUG`	Contents of all requests

Back to top

8. Advanced usage

This section covers more advanced techniques with the Snowplow Ruby Tracker.

8.1. snowplow_ruby_duid

snowplow_ruby_duid is a Ruby gem that allows you to populate Snowplow's domain_userid cookie server-side from any Rack-based framework. This is useful if you want to fire an event on the user's initial request with the domain_userid already populated.

Back to top