Skip to content

Simple app that gather stats on the first degree path to wiki philosophy.

Notifications You must be signed in to change notification settings

samyvilar/wiki_analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Wiki Stats

Simple application that tracks statistics on the articles
traversed from random wiki article to philosophy,
on exit the current stats are saved, which are reloaded 
on restart.

see: @http://en.wikipedia.org/wiki/Wikipedia:Getting_to_Philosophy

Currently:
- average number of hops to philosophy
- distribution (as percentage) of the different kind of routes.
(philosophy, cycled, dead end [no links] present, exception bad link)

NOTE: it doesn't keep track of seen routes, 
it is possible that the same sequence may be counted twice.

requires, lxml for parsing and
          bottle to serve the actual objects.

The server has a single background thread collecting stats.

usage:
    $ ./wiki_analytics.py --help # print usage
    $ ./wiki_analytics.py # <<<< starts the server and continuously gather stats 

    In another shell
    $ curl localhost:8080 # get current stats as a json object.
    or:
    $ python -c "import pprint, json, urllib2; pprint.pprint(json.load(urllib2.urlopen('http://localhost:8080')))"
    {u'count': 145,
     u'distribution': {u'cycled': 0.0,
                       u'dead_end': 0.15862068965517243,
                       u'errors': 0.0,
                       u'philosophy': 0.8413793103448276},
     u'last_minute_count': 48,
     u'philosophy': {u'average_length': 14.29387755102041}}
    $ [ctrl-c] # save stats, kill server

    Do not use kill, its unsafe, possibly raising segfault if lxml is running and would not save the current stats
    $ kill `pgrep -f wiki_analytics.py`

count: is the total number of routes seen,

distribution:
    percentage:
        cycled: routes that cycled
        dead_end: routes that lead to articles without any links.
        errors: routes that raised some kind of exception.
        philosophy: routes that terminated at philosophy.

last_minute_count: is the number of routes processed in the last minute,
(useful for determining performance, currently avg ~50 routes/min (no caching, single thread))

philosophy: current holds the average number of hops required of all philosophy terminating routes,
    (1 is the smallest route length.)


nosetests --with-doctest wiki_analytics.py # <<<< To run tests (currenly only doctests have being implemented)

About

Simple app that gather stats on the first degree path to wiki philosophy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages