This is Hadoop 2 Docker image mostly adapted from https://github.com/sequenceiq/hadoop-docker but for Ubuntu (trusty).
- Apache Hadoop 2.7.2
This step is required only for Mac OS X as docker is not natively supported in Mac OS X. To run docker on Mac OS X we need Boot2Docker.
- Install Boot2Docker from here.
- After installing, from terminal, run
boot2docker init
to initialize boot2docker. - Run
boot2docker start
to start boot2docker and exportDOCKER_HOST
andDOCKER_CERT_PATH
as shown at the end of command. - After exporting
DOCKER_HOST
andDOCKER_CERT_PATH
we can run docker commands.
NOTE: docker 1.3.0 versions require --tls to be passed to all docker command
You can either pull the image that is already pre-built from Docker hub or build the image locally (refer next section)
docker --tls pull prasanthj/docker-hadoop
If you do not want to pull the image from Docker hub, you can build it locally using the following steps
- To build the hadoop docker image locally from Dockerfile, first checkout source using
git clone https://github.com/prasanthj/docker-hadoop.git
- Change to docker-hadoop directory
cd docker-hadoop
docker --tls build -t local-hadoop-2.7.2 .
In order to use the Docker image you have just build or pulled use:
docker --tls run -i -t local-hadoop-2.7.2 /etc/bootstrap.sh -bash
You can run one of the stock examples:
# run the mapreduce
$HADOOP_PREFIX/bin/hadoop jar $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
# check the output
$HADOOP_PREFIX/bin/hdfs dfs -cat output/*
If you are running docker using Boot2Docker then do the following steps
-
Setup routing on the host machine (Mac OS X) using the following command
sudo route add -net 172.17.0.0/16 192.168.59.103
NOTE: 172.17.0.X is usually the ipaddress of docker container. 192.168.59.103 is the ipaddress exported inDOCKER_HOST
-
Get containers IP address
- To get containers IP address we need CONTAINER_ID. To get container id use the following command which should list all running containers and its ID
docker --tls ps
- Use the following command to get containers IP address (where CONTAINER_ID is the container id of local-hadoop-2.7.2 (or prasanthj/docker-hadoop if pulled from docker hub) image)
docker --tls inspect -f=“{{.NetworkSettings.IPAddress}}” CONTAINER_ID
- To get containers IP address we need CONTAINER_ID. To get container id use the following command which should list all running containers and its ID
-
Launch a web browser and type
http://<container-ip-address>:8088
to view hadoop cluster web UI.