example code for "Large-scale social media analysis with Hadoop" tutorial presented at ICWSM 2010
Python
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
network
wordcount
README
hstream.py

README

This repository contains example code for the tutorial I presented at
ICWSM 2010, "Large-scale social media analysis with Hadoop". More
information, including the slides, available here:

  http://jakehofman.com/icwsm2010

  wordcount/ contains the wordcount example on small input text
  network/ contains network examples on a small toygraph
  hstream.py is a simple class for implementing streaming jobs

Examples can be run locally, using the "cat data | map | sort |
reduce" analog of Hadoop streaming, or with Hadoop streaming.

To install Hadoop locally, just download and untar the source.
Quick start guides available at:

  http://hadoop.apache.org/common/docs/current/quickstart.html
  or
  http://www.ibm.com/developerworks/linux/library/l-hadoop-1/

If installing on Mac OS X, make sure to set JAVA_HOME to point to Java
1.6:

  http://blog.sethladd.com/2009/04/mac-os-x-hadoop-0191-and-java-16.html

My Hadoop bookmarks are available here:

  http://delicious.com/jhofman/hadoop
  http://delicious.com/jhofman/hadoop+tutorials

Disclaimer: these examples are written with pedagogy, not efficiency,
in mind.