Big Data Environment Setup (vagrant-virtualbox)

This project aims for building scripts for setting up the environments of the big data analytics with the technologies including hadoop, yarn, hive, spark and so on.

Vagrant for Hadoop, Spark and Hive (V1)

=========================================

Introduction

Vagrant project to spin up a single virtual machine running:

Hadoop 2.7.2
Hive 1.2.1
Spark 2.1.0

The virtual machine will be running the following services:

HDFS NameNode + NameNode
YARN ResourceManager + JobHistoryServer + ProxyServer
Hive metastore and server2
Spark history server

Getting Started

Download and install VirtualBox
Download and install Vagrant.
Run vagrant box add centos65 https://github.com/2creatives/vagrant-centos/releases/download/v6.5.1/centos65-x86_64-20131205.box
Go to releases and download and extract the latest source of this project.
In your terminal change your directory into the project directory (i.e. cd vagrant-hadoop-spark-hive-<version>).
Run vagrant up to create the VM.
Execute vagrant ssh to login to the VM.

Web user interfaces

Here are some useful links to navigate to various UI's:

YARN resource manager: (http://10.211.55.101:8088)
Job history: (http://10.211.55.101:19888/jobhistory/)
HDFS: (http://10.211.55.101:50070/dfshealth.html)
Spark history server: (http://10.211.55.101:18080)
Spark context UI (if a Spark context is running): (http://10.211.55.101:4040)

Validating your virtual machine setup

To test out the virtual machine setup, and for examples of how to run MapReduce, Hive and Spark, head on over to VALIDATING.md.

Submit wordcount job on hadoop

To test wordcount plese run scripts/wordcount.sh

Starting services in the event of a system restart

Currently if you restart your VM then the Hadoop/Spark/Hive services won't be up (this is something I'll address soon). In the interim you can run the following commands to bring them up:

$ vagrant ssh
$ sudo -s
$ /vagrant/scripts/start-hadoop.sh
$ nohup hive --service metastore < /dev/null > /usr/local/hive/logs/hive_metastore_`date +"%Y%m%d%H%M%S"`.log 2>&1 </dev/null &
$ nohup hive --service hiveserver2 < /dev/null > /usr/local/hive/logs/hive_server2_`date +"%Y%m%d%H%M%S"`.log 2>&1 </dev/null &
$ /usr/local/spark/sbin/start-history-server.sh

More advanced setup

If you'd like to learn more about working and optimizing Vagrant then take a look at ADVANCED.md.

For developers

The file DEVELOP.md contains some tips for developers.

Credits

This project is based on the great work carried out at (https://github.com/alexholmes/vagrant-hadoop-spark-hive).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Big Data Environment Setup (vagrant-virtualbox)

Vagrant for Hadoop, Spark and Hive (V1)

Introduction

Getting Started

Web user interfaces

Validating your virtual machine setup

Submit wordcount job on hadoop

Starting services in the event of a system restart

More advanced setup

For developers

Credits

Files

README.md

Latest commit

History

README.md

File metadata and controls

Big Data Environment Setup (vagrant-virtualbox)

Vagrant for Hadoop, Spark and Hive (V1)

Introduction

Getting Started

Web user interfaces

Validating your virtual machine setup

Submit wordcount job on hadoop

Starting services in the event of a system restart

More advanced setup

For developers

Credits