Scala Stream Collector

Ben Fradet edited this page Mar 19, 2018 · 9 revisions

HOME > SNOWPLOW TECHNICAL DOCUMENTATION > Collectors

Introduction

The Scala Stream Collector is a Snowplow event collector for Snowplow, written in Scala. The Scala Stream Collector allows near-real time processing (Enrichment, Storage, Analytics) of a Snowplow raw event stream.

The Scala Stream Collector receives raw Snowplow events over HTTP, serializes them to a Thrift record format, and then writes them to a sink. Currently supported sinks are:

  1. Amazon Kinesis
  2. Google PubSub
  3. Apache Kafka
  4. NSQ
  5. stdout for a custom stream collection process

Like the Clojure Collector, the Scala Stream Collector supports cross-domain Snowplow deployments, setting a user_id (used to identify unique visitors) server side to reliably identify the same user across domains.

How it works

User identification

The Scala Stream Collector allows the use of a third-party cookie, making user tracking across domains possible. The CloudFront Collector does not support cross domain tracking of users because user ids are set client-side, whereas the Scala Stream Collector sets them server-side.

In a nutshell: the Scala Stream Collector receives events from a tracker, sets/updates a third-party user tracking cookie, and returns the pixel to the client. The ID in this third-party user tracking cookie is stored in the network_userid field in Snowplow events.

In pseudocode terms:

if (request contains an "sp" cookie) {
    Record that cookie as the user identifier
    Set that cookie with a now+1 year cookie expiry
    Add the headers and payload to the output array
} else {
    Set the "sp" cookie with a now+1 year cookie expiry
    Add the headers and payload to the output array
}

Technical architecture

The Scala Stream Collector is built on top of akka-http.

See also

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.