Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Building a simple Node application with Pig, MongoDB, Node.js and the Enron Emails
Branch: master
Failed to load latest commit information. Final commit
avro_to_mongo.pig example added for parameter subs
email.js Working service


A Hortonworks example project for building a simple Node application with Pig, MongoDB, Node.js and the Enron Emails.

Project Setup

Get the Enron Emails

An archive of emails from the Enron investigation are available for download on S3 in Avro format

Installing Pig 0.10

You'll need the lastest version of Apache Pig, version 0.10, to use Avro. You can get it here:

Installing MongoDB

MongoDB is available for download here:

Installing the MongoDB Java Driver

The Java driver for MongoDB (which we'll need to connect to it via Pig) is available here:

Installing Node.js

Node.js is available here:

Connect Node to Mongo

npm install mongodb

Getting Started

Load the Enron Emails from Avro format into MongoDB

pig -l /tmp -x local -v -w -param avros=/me/tmp/enron -param mongourl=mongodb://localhost/enron.emails avro_to_mongo.pig

Start the Node Application

Our node application is simple: given a message ID, fetch it from MongoDB and display it as JSON. You can take it from here, or use your favorite language. node email.js

Display a Message

For instance, click on http://localhost:1337/?messageId=%3C3607504.1075843446517.JavaMail.evans@thyme%3E to see the email as JSON. Imagine your own application wrapping your mined data, fresh off Hadoop so easily!


We've used Pig to bridge the gap between Hadoop and Node.js via MongoDB. Pig is fine duct tape.

Something went wrong with that request. Please try again.