SnowPlow is the world's most powerful web analytics platform. It does three things:
- Identifies users, and tracks the way they engage with one or more websites
- Stores the associated data in a scalable “clickstream” data warehouse
- Makes it possible to leverage a big data toolset (e.g. Hadoop, Pig, Hive) to analyse that data
To find out more, read Keplar's blog post introducing SnowPlow. The rest of the documentation in this repository focuses on the technical aspects of SnowPlow.
Contents of this repository are as follows:
- The root folder contains this README and the Apache License, Version 2.0
docscontains the technical documentation for this project. See the next section for more details
There is a growing set of technical documentation for SnowPlow:
Additionally there is a technical README for the
Planned items on the roadmap are as follows:
snowplow.jsavailable over SSL (not currently working)
- Opensourcing the standard SerDe
- Writing and opensourcing some standard Hive 'recipes'
Original concept for SnowPlow inspired by Radek Maciaszek.
Copyright and license
SnowPlow is copyright 2012 Orderly Ltd. Significant portions of
are copyright 2010 Anthon Pang.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.