Cloud-native web, mobile and event analytics, running on AWS and on-premise with Kafka
Scala Ruby PLpgSQL Java Clojure Python Other
Switch branches/tags
stream_enrich/0.12.0 stream_enrich/0.12.0-rc3 stream_enrich/0.12.0-rc2 spark_enrich/1.11.0-rc2 spark_enrich/1.11.0-rc1 spark_enrich/1.10.0 spark_enrich/1.10.0-rc1 spark_enrich/1.9.0 spark_enrich/1.9.0-rc2 spark_enrich/1.9.0-rc1 scala_stream_collector/0.11.0 scala_stream_collector/0.11.0-rc4 scala_stream_collector/0.11.0-rc3 scala_stream_collector/0.11.0-rc2 scala_stream_collector/0.11.0-rc1 scala_common_enrich/0.28.0-M2 scala_common_enrich/0.28.0-M1 scala_common_enrich/0.27.0 scala_common_enrich/0.27.0-rc1 scala_common_enrich/0.27.0-M2 scala_common_enrich/0.27.0-M1 scala_common_enrich/0.26.0 scala_common_enrich/0.26.0-M1 scala_common_enrich/0.25.0 scala_common_enrich/0.25.0-M5 scala_common_enrich/0.25.0-M4 scala_common_enrich/0.25.0-M3 scala_common_enrich/0.25.0-M2 scala_common_enrich/0.25.0-M1 scala-common-enrich-0.16.0-M1 relational_database_shredder/0.12.0-rc4 relational_database_shredder/0.12.0-rc3 relational_database_shredder/0.12.0-rc2 relational_database_shredder/0.12.0-rc1 rdb_shredder/0.12.0 rdb_shredder/0.12.0-rc6 rdb_shredder/0.12.0-rc5 rdb_loader/0.12.0 rdb_loader/0.12.0-rc5 rdb_loader/0.12.0-rc4 rdb_loader/0.12.0-rc3 rdb_loader/0.12.0-rc2 rdb_loader/0.12.0-rc1 r96-zeugma r95-ellora r94-hill-of-tara r93-virunum r92-maiden-castle r91-stonehenge r90-lascaux r89-plain-of-jars r88-angkor-wat r87-chichen-itza r86-petra r85-metamorphosis r84-stellers-sea-eagle r83-bald-eagle r82-tawny-eagle r81-kangaroo-island-emu r80-southern-cassowary r79-black-swan r78-great-hornbill r77-great-auk r77-greak-auk r76-changeable-hawk-eagle r75-long-legged-buzzard r74-european-honey-buzzard r73-cuban-macaw r72-great-spotted-kiwi r71-stork-billed-kingfisher r70-bornean-green-magpie r69-blue-bellied-roller r68-turquoise-jay r67-bohemian-waxwing r66-oriental-skylark r65-scarlet-rosefinch r64-palila r63-red-cheeked-cordon-bleu r62-tropical-parula r61-pygmy-parrot r60-bee-hummingbird kinesis/r94-hill-of-tara kinesis/r94-hill-of-tara-rc1 kinesis/r93-virunum kinesis/r93-virunum-rc3 kinesis/r93-virunum-rc2 kinesis/r93-virunum-rc1 kinesis/r85-metamorphosis kinesis/r84-stellers-sea-eagle kinesis/r84-stellers-sea-eagle-rc2 kinesis/r84-stellers-sea-eagle-rc1 kinesis/r83-bald-eagle-rc1 kinesis/r82-tawny-eagle kinesis/r82-tawny-eagle-rc1 kinesis/r81-kangaroo-island-emu kinesis/r81-kangaroo-island-emu-rc1 kinesis/r80-southern-cassowary kinesis/r80-southern-cassowary-rc8 kinesis/r80-southern-cassowary-rc4 kinesis/r80-southern-cassowary-rc2
Nothing to show

README.md

Snowplow

Build Status Release License

Snowplow logo

Snowplow is an enterprise-strength marketing and product analytics platform. It does three things:

  1. Identifies your users, and tracks the way they engage with your website or application
  2. Stores your users' behavioural data in a scalable "event data warehouse" you control: in Amazon S3 and (optionally) Amazon Redshift or Postgres
  3. Lets you leverage the biggest range of tools to analyze that data, including big data tools (e.g. Spark) via EMR or more traditional tools e.g. Looker, Mode, Caravel, Re:dash to analyze that behavioural data

To find out more, please check out the Snowplow website and the Snowplow wiki.

Snowplow technology 101

The repository structure follows the conceptual architecture of Snowplow, which consists of six loosely-coupled sub-systems connected by five standardized data protocols/formats:

architecture

To briefly explain these six sub-systems:

  • Trackers fire Snowplow events. Currently we have 12 trackers, covering web, mobile, desktop, server and IoT
  • Collectors receive Snowplow events from trackers. Currently we have three different event collectors, sinking events either to Amazon S3, Apache Kafka or Amazon Kinesis
  • Enrich cleans up the raw Snowplow events, enriches them and puts them into storage. Currently we have a Hadoop-based enrichment process, and a Kinesis- or Kafka-based process
  • Storage is where the Snowplow events live. Currently we store the Snowplow events in a flatfile structure on S3, and in the Redshift and Postgres databases
  • Data modeling is where event-level data is joined with other data sets and aggregated into smaller data sets, and business logic is applied. This produces a clean set of tables which make it easier to perform analysis on the data. We have data models for Redshift and Looker
  • Analytics are performed on the Snowplow events or on the aggregate tables.

For more information on the current Snowplow architecture, please see the Technical architecture.

Quickstart

Assuming git, Vagrant and VirtualBox installed:

 host$ git clone https://github.com/snowplow/snowplow.git
 host$ cd snowplow
 host$ vagrant up && vagrant ssh
guest$ cd /vagrant/3-enrich/scala-common-enrich
guest$ sbt test

Find out more

Technical Docs Setup Guide Roadmap Contributing
i1 i2 i3 i4

Contributing

We're committed to a loosely-coupled architecture for Snowplow and would love to get your contributions within each of the six sub-systems.

If you would like help implementing a new tracker, adding an additional enrichment or loading Snowplow events into an alternative database, check out our Contributing page on the wiki!

Questions or need help?

Check out the Talk to us page on our wiki.

Copyright and license

Snowplow is copyright 2012-2017 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.