Skip to content
This repository

Set of Hadoop based tools for web analytic

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 resource
Octocat-spinner-32 script
Octocat-spinner-32 src
Octocat-spinner-32 target
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README
Octocat-spinner-32 manifest.mf
Octocat-spinner-32 pom.xml
README
Introduction
============
The goal of visitante is to calculate web analytic metric as defined by 
Avinash Kaushik (http://www.kaushik.net/avinash/) on the Hadoop platform


Blogs
=====
The following blogs of mine are good source of details of visitante

http://pkghosh.wordpress.com/2012/06/05/big-web-analytic/
http://pkghosh.wordpress.com/2012/08/10/big-web-checkout-abandonment/


Hadoop Jobs
===========
Set of Hadoop based tools for web analytis. Currently includes the following

- SessionExtractor map reduce which processes w3C compliant web log files and outputs
	- sessionID
	- userID
	- session start time
	- page visited
	- time spent on each page visited

- SessionSummarizer map reduce which processes w3C compliant web log files and outputs
  It provides imprtanr metrics like bounce rate, page depth, abandoned checkout
	- sessionID
	- userID
	- num of pages visited
	- total time spent in session
	- last page visited in session
	- flow status (e.g., whether checkout flow was entered, entered but not completed or completed)
	
- Bayesian discriminant analysis for visitor conversion prediction

Will add the following

- Extend bayesian discriminant analysis for multiple input parameters



Something went wrong with that request. Please try again.