Cloud-native web, mobile and event analytics, running on AWS and on-premise with Kafka
Scala Ruby PLpgSQL Java Clojure Shell Other
Switch branches/tags
stream_enrich/0.16.1 stream_enrich/0.16.1-rc2 stream_enrich/0.16.1-rc1 stream_enrich/0.16.0 stream_enrich/0.16.0-rc1 stream_enrich/0.15.0 stream_enrich/0.15.0-rc3 stream_enrich/0.15.0-rc2 stream_enrich/0.15.0-rc1 stream_enrich/0.14.0 stream_enrich/0.14.0-rc6 stream_enrich/0.14.0-rc5 stream_enrich/0.14.0-rc4 stream_enrich/0.14.0-rc3 stream_enrich/0.14.0-rc2 stream_enrich/0.14.0-rc1 stream_enrich/0.13.0 stream_enrich/0.13.0-rc2 stream_enrich/0.13.0-rc1 stream_enrich/0.12.0 stream_enrich/0.12.0-rc3 stream_enrich/0.12.0-rc2 spark_enrich/1.13.0 spark_enrich/1.13.0-rc1 spark_enrich/1.12.0 spark_enrich/1.12.0-rc2 spark_enrich/1.12.0-rc1 spark_enrich/1.11.0 spark_enrich/1.11.0-rc3 spark_enrich/1.11.0-rc1 spark_enrich/1.10.0 spark_enrich/1.10.0-rc1 spark_enrich/1.9.0 spark_enrich/1.9.0-rc2 spark_enrich/1.9.0-rc1 scala_stream_collector/0.13.0 scala_stream_collector/0.13.0-rc1 scala_stream_collector/0.12.0 scala_stream_collector/0.12.0-rc1 scala_stream_collector/0.11.0 scala_stream_collector/0.11.0-rc4 scala_stream_collector/0.11.0-rc3 scala_stream_collector/0.11.0-rc2 scala_stream_collector/0.11.0-rc1 scala_common_enrich/0.32.0 scala_common_enrich/0.32.0-M1 scala_common_enrich/0.31.0 scala_common_enrich/0.31.0-M6 scala_common_enrich/0.31.0-M5 scala_common_enrich/0.31.0-M4 scala_common_enrich/0.31.0-M3 scala_common_enrich/0.31.0-M2 scala_common_enrich/0.31.0-M1 scala_common_enrich/0.30.0 scala_common_enrich/0.30.0-M2 scala_common_enrich/0.30.0-M1 scala_common_enrich/0.29.0 scala_common_enrich/0.29.0-M1 scala_common_enrich/0.28.0 scala_common_enrich/0.28.0-M3 scala_common_enrich/0.28.0-M1 scala_common_enrich/0.27.0 scala_common_enrich/0.27.0-rc1 scala_common_enrich/0.27.0-M2 scala_common_enrich/0.27.0-M1 scala_common_enrich/0.26.0 scala_common_enrich/0.26.0-M1 scala_common_enrich/0.25.0 scala_common_enrich/0.25.0-M5 scala_common_enrich/0.25.0-M4 scala_common_enrich/0.25.0-M3 scala_common_enrich/0.25.0-M2 scala_common_enrich/0.25.0-M1 scala-common-enrich-0.16.0-M1 relational_database_shredder/0.12.0-rc4 relational_database_shredder/0.12.0-rc3 relational_database_shredder/0.12.0-rc2 relational_database_shredder/0.12.0-rc1 rdb_shredder/0.12.0 rdb_shredder/0.12.0-rc6 rdb_shredder/0.12.0-rc5 rdb_loader/0.12.0 rdb_loader/0.12.0-rc5 rdb_loader/0.12.0-rc4 rdb_loader/0.12.0-rc3 rdb_loader/0.12.0-rc2 rdb_loader/0.12.0-rc1 r105-pompeii r104-stoplesteinan r103-paestum r102-afontova-gora r101-neapolis r100-epidaurus r99-carnac r98-argentomagus r97-knossos r96-zeugma r95-ellora r94-hill-of-tara r93-virunum
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
1-trackers
2-collectors
3-etl
4-storage
5-analytics
.gitignore
.gitmodules
CHANGELOG
LICENSE-2.0.txt
README.md
architecture.png
roadmap.png
setup.png
techdocs.png

README.md

SnowPlow

Introduction

SnowPlow is the world's most powerful web analytics platform. It does three things:

  • Identifies users, and tracks the way they engage with a website or web-app
  • Stores the associated data in a scalable “clickstream” data warehouse
  • Makes it possible to leverage a big data toolset (e.g. Hadoop, Pig, Hive) to analyse that data

To find out more, please check out the [SnowPlow website] website and the [SnowPlow wiki] wiki.

SnowPlow technology 101

The repository structure follows the conceptual architecture of SnowPlow, which consists of five loosely coupled stages:

![architecture] architecture-image

To briefly explain these five sub-systems:

  • Trackers fire SnowPlow events. Currently we have a JavaScript tracker; iOS and Android trackers are on the roadmap
  • Collectors receive SnowPlow events from trackers. Currently we have a CloudFront-based collector and a node.js-based collector, called SnowCannon
  • ETL (extract, transform and load) cleans up the raw SnowPlow events, enriches them and puts them into storage. Currently we have a Hive-based ETL process
  • Storage is where the SnowPlow events live. Currently we store the SnowPlow events in a Hive-format flatfile structure on S3, and in the Infobright columnar database
  • Analytics are performed on the SnowPlow events. Currently we have a set of ad hoc analyses that work with Hive and Infobright

For more information on the current SnowPlow architecture, please see the [Technical architecture] architecture-doc.

Documentation

  1. The [SnowPlow setup guide] setup details how to choose between the different available trackers, collectors, ETL modules, storage solutions etc. and hwo to set each module up.
  2. The [SnowPlow technical documentation] tech-docs provide technical details including the [SnowPlow tracker protocol] tracker-protocol, [collector log file format schemas] collector-logs and [data structure schemas] data-structure.

Contributing

We're committed to a loosely-coupled architecture for SnowPlow and would love to get your contributions within each of the five sub-systems.

If you would like help implementing a new tracker, trying a different ETL approach or loading SnowPlow events into an alternative database, [get in touch] talk-to-us!

Questions or need help?

Check out the [Talk to us] talk-to-us page on our wiki.

Copyright and license

SnowPlow is copyright 2012 SnowPlow Analytics Ltd. Significant portions of snowplow.js are copyright 2010 Anthon Pang.

Licensed under the [Apache License, Version 2.0] license (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Tracker