This project aims for building scripts for setting up the environments of the big data analytics with the technologies including hadoop, yarn, hive, spark and so on.
=========================================
Vagrant project to spin up a single virtual machine running:
- Hadoop 2.7.2
- Hive 1.2.1
- Spark 2.1.0
The virtual machine will be running the following services:
- HDFS NameNode + NameNode
- YARN ResourceManager + JobHistoryServer + ProxyServer
- Hive metastore and server2
- Spark history server
- Download and install VirtualBox
- Download and install Vagrant.
- Run
vagrant box add centos65 https://github.com/2creatives/vagrant-centos/releases/download/v6.5.1/centos65-x86_64-20131205.box
- Go to releases and download and extract the latest source of this project.
- In your terminal change your directory into the project directory (i.e.
cd vagrant-hadoop-spark-hive-<version>
). - Run
vagrant up
to create the VM. - Execute
vagrant ssh
to login to the VM.
Here are some useful links to navigate to various UI's:
- YARN resource manager: (http://10.211.55.101:8088)
- Job history: (http://10.211.55.101:19888/jobhistory/)
- HDFS: (http://10.211.55.101:50070/dfshealth.html)
- Spark history server: (http://10.211.55.101:18080)
- Spark context UI (if a Spark context is running): (http://10.211.55.101:4040)
To test out the virtual machine setup, and for examples of how to run MapReduce, Hive and Spark, head on over to VALIDATING.md.
To test wordcount plese run scripts/wordcount.sh
Currently if you restart your VM then the Hadoop/Spark/Hive services won't be up (this is something I'll address soon). In the interim you can run the following commands to bring them up:
$ vagrant ssh
$ sudo -s
$ /vagrant/scripts/start-hadoop.sh
$ nohup hive --service metastore < /dev/null > /usr/local/hive/logs/hive_metastore_`date +"%Y%m%d%H%M%S"`.log 2>&1 </dev/null &
$ nohup hive --service hiveserver2 < /dev/null > /usr/local/hive/logs/hive_server2_`date +"%Y%m%d%H%M%S"`.log 2>&1 </dev/null &
$ /usr/local/spark/sbin/start-history-server.sh
If you'd like to learn more about working and optimizing Vagrant then take a look at ADVANCED.md.
The file DEVELOP.md contains some tips for developers.
This project is based on the great work carried out at (https://github.com/alexholmes/vagrant-hadoop-spark-hive).