Set of Hadoop, Spark and Storm based tools for web and customer analytic
Java Python Ruby Scala Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
resource
script
spark
src/main/java/org/visitante
.gitignore
README.md
build.sbt
manifest.mf
pom.xml

README.md

Introduction

The original goal of visitante was to calculate various web analytic metric as defined by Avinash Kaushik (http://www.kaushik.net/avinash/) on the Hadoop, Spark and Storm platform. However, it has evolved into a general purpose log analytic and mining solution, beyond web server logs.

It also includes customer or marketing analytic solution. Since customer behavior data is mostly captured in logs, there is a close relationship between customer analytics and log analytics.

Philosophy

  • Simple and easy to use batch and real time web analytic
  • Highly configurable

Blogs

The following blogs of mine are good source of details of visitante

Solutions

  • Hadoop based batch analytic for

    • Num of pages visited
    • Total time spent
    • Last page visited
    • Flow status (e.g., whether checkout flow was entered, entered but not completed or completed)
    • Incident detection
    • Pattern based event detection with context
    • Customer life time value
  • Storm based real time analytic for

    • Bounce rate
    • Visit depth distribution

Build

For Hadoop 1

  • mvn clean install

For Hadoop 2 (non yarn)

  • git checkout nuovo
  • mvn clean install

For Hadoop 2 (yarn)

  • git checkout nuovo
  • mvn clean install -P yarn

For spark

  • Build chombo first in master branch with
    • mvn clean install
    • sbt publishLocal
  • Build chombo-spark in chombo/spark directory
    • sbt clean package

Need help?

Please feel free to email me at pkghosh99@gmail.com

Contribution

Contributors are welcome. Please email me at pkghosh99@gmail.com