Scalable Binary Data Extraction in Hadoop
JavaScript Java Python Shell
Latest commit b38cced Jul 14, 2014 @crohling crohling Merge pull request #12 from jt6211/master
Up and running with Vagrant, Update Pig and ES, cleanup
Permalink
Failed to load latest commit information.
binarypig added a working Vagrant install with most examples working Jun 30, 2014
docs initial commit Jul 31, 2013
elasticsearch initial commit Jul 31, 2013
examples added a working Vagrant install with most examples working Jun 30, 2014
packages added a working Vagrant install with most examples working Jun 30, 2014
scripts initial commit Jul 31, 2013
webapp initial commit Jul 31, 2013
yara_rules initial commit Jul 31, 2013
.gitignore added a working Vagrant install with most examples working Jun 30, 2014
LICENSE initial commit Jul 31, 2013
README.md renamed install-ubuntu.sh to install-ubuntu-vagrant.sh Jul 14, 2014
Vagrantfile
build.sh initial commit Jul 31, 2013
install-ubuntu-vagrant.sh renamed install-ubuntu.sh to install-ubuntu-vagrant.sh Jul 14, 2014
install_wonderdog.sh initial commit Jul 31, 2013
requirements.txt initial commit Jul 31, 2013
run_examples.sh added a working Vagrant install with most examples working Jun 30, 2014
upload_to_distributed_cache.sh initial commit Jul 31, 2013

README.md

BinaryPig

Malware Processing and Analytics over Pig, Exploration through Django, Twitter Bootstrap, and Elasticsearch

Pig Libaries and Scripts

Building BinaryPig JAR

./build.sh

Installing dependencies for binarypig modules

See installation docs in "docs" directory.

Using binarypig

See pig scripts in "examples" directory.

Webapp

Installing the binarypig webapp

cd webapp

# create a local settings file
cp project/local_settings.py.example project/local_settings.py
vi project/local_settings.py

# create your mysql DB
mysql
mysql> create database binarypig;

virtualenv env
. env/bin/activate
pip install -r requirements.txt

# initialize the database
./manage.py syncdb
./manage.py migrate

Running the binarypig webapp

./manage.py runserver 0.0.0.0:8000

Issues

Some issues we encountered when running python based binarypig pig jobs

On Centos, if you have SELinux running and you disable it, you must restart arbtd or many/all of the python processes will hang and provide no output.

Reference

Getting up and Running with Vagrant

This is a mini howto on getting Binary Pig up and running on an Ubuntu 14.04 VM running over Vagrant.

Versions

  • Ubuntu 14.04
  • Hadoop 1.2.1
  • Pig 0.12.1
  • Elasticsearch 1.2.1

Steps:

# From workstation
$ git clone <repo>
$ cd binarypig
$ vagrant up
$ vagrant ssh

# now logged into VM
$ sudo su - 
$ cd /vagrant/
$ ./install-ubuntu-vagrant.sh

# press <ENTER> when asked

# agree to Oracle Java license when asked

# type "yes" when asked if you want to ssh into localhost

# enter password for root user of mysql when asked (MySQL installation)

# enter password for root user of mysql when asked (Django app install)

# walk through the django admin user creation:
#   You just installed Django's auth system, which means you don't have any superusers defined.
#   Would you like to create one now? (yes/no): yes
#   Username (leave blank to use 'root'): 
#   E-mail address: YOU@gmail.com
#   Password: 
#   Password (again): 
#   Superuser created successfully.

Now run the examples (Note: this is still a Work in Progress). This will launch a series of pig jobs that execute various BinaryPig scripts from the examples directory.

# still from VM
$ cd /vagrant/
$ ./run_examples.sh

License

Endgame, Inc.

Licensed under the Apache 2.0 license

Copyright 2013 Endgame, Inc.

Contributors