Step By Step guide for Hadoop installation on Ubuntu 16.04.3 with MapReduce example using Streaming

Download Virtual Box from: https://www.virtualbox.org/wiki/Downloads
Download Ubuntu 16.04.3 (desktop version amd64) from: https://www.ubuntu.com/download/desktop OR Direct Download from: http://mirror.pnl.gov/releases/xenial/ubuntu-16.04.3-desktop-amd64.iso
create a VM with Ubuntu 16.04.3 image
After installing Ubuntu login to th VM and follow instructions given in https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html . Here I am giving step by step details for the installation steps.
First we will update the system's local repository and then install JAVA (default JDK). Run below commands on the terminal.

sudo apt-get update

sudo apt-get install default-jdk -y
Now we will install ssh and rsync packages by running following commands.

sudo apt-get install ssh -y

sudo apt-get install rsync -y
Now download Hadoop 2.7.4 from http://www.apache.org/dyn/closer.cgi/hadoop/common/
Change directory to Downloads or where ever you have downloaded the hadoop tar file. In my case it is in Downloads and all further instruction are considering that hadoop tart file is in ~/Downloads.
Change directory to extracted folder
Update JAVA_HOME variable in etc/hadoop/hadoop-env.sh file using gedit command as shown below.

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")

Now you should be able to run hadoop; check it by running below command

bin/hadoop

Now we will update some configuration files for pseudo-distributed operation. First we will edit etc/hadoop/core-site.xml file as below.

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>

Similarly, we will update etc/hadoop/hdfs-site.xml file as below.

<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>

Now we will setup passwordless ssh for Hadoop. First check if you already have passwordless ssh authentication setup; if it is new Ubuntu installation most likely it wouldn't set up. If passwordless ssh authentication is not setup, please follow next step othervise skip it.

ssh localhost

run below commands:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 0600 ~/.ssh/authorized_keys

Now we will start NameNode and DataNode but before that we will format the HDFS file system.
Now we can access Web-interface for NameNode at http://localhost:50070/
Now let's create some directories in HDFS filesystem.
Let's download one html page http://hadoop.apache.org and upload into HDFS file system.

wget http://hadoop.apache.org -O hadoop_home_page.html

Please note that HDFS file system is not same as root file system.

Grep example:

For this example we are using hadoop-mapreduce-examples-2.7.4.jar file which comes along with Hadoop. In this example we are trying to count the total number of 'https' word occurences in the given files. First we run the Hadoop job then copy the results from HDFS to the local file system. We can see that there are 2 occurences of https in the given file and same we can validate using wget command.

Wordcount example:

For wordcount example also we are using hadoop-mapreduce-examples-2.7.4.jar file. The wordcount example returns the count of each word in the given documents.

Wordcount using Hadoop streaming (python)

Here is mapper and reducer program for wordcount.
We run the program as below and the copy the result to local file system.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
10.png		10.png
11.png		11.png
12.png		12.png
13.png		13.png
14.png		14.png
15.png		15.png
16.png		16.png
18.png		18.png
18_1.png		18_1.png
19.png		19.png
20.png		20.png
21.png		21.png
22.png		22.png
23.png		23.png
24.png		24.png
25.png		25.png
26.png		26.png
27.png		27.png
28.png		28.png
30.png		30.png
33.png		33.png
34.png		34.png
35.png		35.png
36.png		36.png
37.png		37.png
38.png		38.png
39.png		39.png
7.png		7.png
8.png		8.png
9.png		9.png
A1.png		A1.png
A2.png		A2.png
A3.png		A3.png
A4.png		A4.png
README.md		README.md
TF-IDF.py		TF-IDF.py
download_hadoop.png		download_hadoop.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Step By Step guide for Hadoop installation on Ubuntu 16.04.3 with MapReduce example using Streaming

Grep example:

Wordcount example:

Wordcount using Hadoop streaming (python)

About

Releases

Packages

Languages

maniraniyal/BigData

Folders and files

Latest commit

History

Repository files navigation

Step By Step guide for Hadoop installation on Ubuntu 16.04.3 with MapReduce example using Streaming

Grep example:

Wordcount example:

Wordcount using Hadoop streaming (python)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages