A lightweight high-performance Golang Snowplow collector and enricher. Besides the language choice, Snowblower differs from the official Snowplow implementations the following ways:
- Snowblower supports SNS/SQS as the intermediate data store between stages
- Snowblower uses a JSON serialization for CollectorPayloads instead of Thrift.
It’d be rather trivial to add both a Kinesis stream as a destination for the collector as well as to support Thrift, at which point it would be a complete drop-in replacement for the Snowplow Scala Kinesis Collector. However, for our needs, SQS provides a pretty compelling solution.
In initial testing, the collector service requires between 10 and 20 times fewer front-end compute resources than the Scala-based Snowplow Kinesis collector, based on the observation that we scaled down from 24 c3.xlarge machines to 2 on our initial deployment. There are likely many reasons other than the langauge choice including:
- Snowblower only ships collected payloads that have data. It ignores the large number of empty data requests generated by Snowplow trackers.
- The Scala-based Kinesis collector is clearly marked as beta and likely not optimized.
On the other hand, the two c3.xlarge instances that replaced the Scala cluster handle a peak of over 350,000 requests per minute with an average latency at our load balancer of ~15ms and a CPU load of around 20%. We could scale back to one server, but we’ll likely experiment with smaller instances first.
One advantage to using SNS/SQS instead of Kinesis is that SQS scales transparently without explicit provisioning instruction.
Snowblower has two commands:
collectRuns the collector, sending events to SNS or SQS, acting as the second stage in a Snowplow pipeline.enrichPulls events from SQS, enriches them, and sends them into storage into Postgres or Redshift, acting as the third stage in a Snowplow pipeline.
The following environment variables configure the operation of Snowblower when running the collector:
SNS_TOPICMust contain the ARN of the SNS topic to send events to. REQUIREDPORTOptionally sets the port that the server listens to. Defaults to 8080.AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYAmazon Web Services credentials. If not set, Snowblower will attempt to use IAM Roles.COOKIE_DOMAINif not set, a domain won't be set on the session cookie