Skip to content
Deploying a #Hadoop cluster for Test purposes using #AWS EC2, #Docker and #Cloudera Quickstart
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
LICENSE
README.md

README.md

Deploying a Hadoop cluster for Test purposes using AWS EC2, Docker and Cloudera

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

What you need to complete this installation

A. Cloud platform: 1 AWS account

B. Tools used: 1 EC2 instance on AWS (Ubuntu 18.04 LTS)

C. Containerization: 1 Docker image (Cloudera Quickstart)

D. Programming language: Bourne Shell (bash)


Some of you asked me to make a gist that helps beginners with Hadoop.

Is Hadoop going to die as many claim?

If yes, then let's run a Hadoop cluster before it's too late :)!

This gist will help you launch a Hadoop cluster easily.

We'll be using AWS as Compute and Storage platform.

We'll also use Docker in order to launch Cloudera QuickStart.

At the end of this gist, you'll have a Hadoop cluster up and running for basic purposes.

I recommend you to use a regular or enterprise version of Cloudera for dev and prod purposes.


Before you start

Create an account on AWS and log into AWS Management Console.
Check PREREQUISITES section of this gist


This section must be considered before taking the README.md section of this gist.

We'll first set up a security group, a user and assign an IAM role before proceeding to the concrete installation of Hadoop.

PREREQUISITES

Steps to be covered: 3

Setting up a Security Group

Creating a User and a Group Group

Assigning an IAM role

Go to Services > EC2, in NETWORK AND SECURITY, click on Security Groups > Create Security Group

Security group name: Hadoop

Description: Hadoop-Admins-SG

VPC: select default VPC

Security Group Rules (Inbound and Outbound): allow SSH, HTTP, HTTPS from anywhere.

Click on Create.

🔴 See configuration

isaac-arnault-AWS.png

Go to Services, in Security, Identity and Compliance section, click on IAM.

Click on Users > Add user and configure as follows:

🔴 See configuration

isaac-arnault-aws-19.png

Click on Next: Permissions > Add user to group > Create group > Group Name: hadoop_admins

Search for EC2: select AmazonEC2FullAccess, Search for IAM: select AmazonIAMFullAccess

🔴 See configuration

isaac-arnault-AWS-20.png

🔴 See configuration

isaac-arnault-aws-21.png

In IAM go to Roles > Create role > click on EC2 > Next: Permissions > select AdministratorAccess

🔴 See configuration

isaac-arnault-aws-22.png

Key: name > Value: hadoop-cluster > Next: Review > Role name: AdminAccess > Create role. By clicking on IAM, you can have a summary of the role you've created.

🔴 See configuration

isaac-arnault-AWS-23.png

At this stage you should have a user, a group and a role attached to your AWS account before proceeding to step 2.


Please note: having all check marks on IAM green is great, but it is not mandatory by AWS.

🔴 See configuration

isaac-arnault-AWS-18.png

2. INSTALLATION

Steps to be covered: 3

Setting up our EC2 instance

Pulling a Cloudera Quickstart Docker Image

Starting the services

Go to Services > EC2, click on Launch Instance.

Select Ubuntu server 18.04 LTS as AMI.

🔴 See configuration

isaac-arnault-AWS-hadoop.png

Choose a t2.xlarge instance type. Choosing a lower instance may lead to latency.

🔴 See configuration

isaac-arnault-hadoop-2.png

Click on Configure Instance Details and tune as follows:

Number of instances: 1 > IAM role: AdminAccess > Next: Add Storage, set storage size to 30 Gibibytes.

🔴 See configuration

isaac-arnault-aws-24.png

Next: Add tags > Key: name, Value: hadoop-cluster > Next: Configure Security Group > select an existing security group:

choose the one you've created with the above commands. You can also select your default security group.

Review and Launch > Launch.

You'll be prompted by AWS to create a Key Pair file, create a new key pair file and Download it.

Save it on a repository called hadoop:

mkdir hadoop

Go to Services > EC2, wait for your instance to be running and for the health checks to pass.

When your instance is running, select your instance name, and click "Connect".

Copy the link provided by the EC2 instance and use it in your Terminal:

ssh -i "MyKeyPairFile.pem" ubuntu@ec2-*-*-*-*.compute-1.amazonaws.com

Open your Terminal and go the the repository where you've stored the Key Pair file.

Perform as follows:

chmod 400 MyKeyPairFile.pem

Now execute the given ssh command by your EC2 instance:

ssh -i "MyKeyPairFile.pem" ubuntu@ec2-3-90-136-245.compute-1.amazonaws.com

You are now logged into your EC2 instance's terminal and ready to install Docker and Cloudera Quickstart.

sudo apt-get remove docker docker-engine docker.io
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
  "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable"
sudo apt-get update
apt-cache madison docker-ce
sudo apt install docker.io
sudo systemctl start docker
sudo systemctl enable docker
docker --version
sudo su
docker images
docker ps
docker pull cloudera/quickstart:latest
docker run -m 4G --memory-reservation 2G --memory-swap 8G --hostname=quickstart.cloudera --privileged=true -t -i -v $(pwd):/CDH --publish-all=true -p8888 -p8088 cloudera/quickstart /usr/bin/docker-quickstart
🔵 See output

isaac-arnault-AWS-24.png

If all services are launched on your EC2 Terminal, open your web browser and type the following :

my-EC2-instance-DNS:32768

You should land to the login form, use cloudera / cloudera as login and password.

Here you go! You can now start using Hadoop for testing purposes.

🔵 See output

isaac-arnault-AWS-cloudera.png

my-EC2-instance-DNS:32769 for cluster overview

🔵 See output

isaac-arnault-hadoop-cloudera.png

You can install other applications directly from the panel and have your cluster ready for action!


isaac-arnault-cloudera-CDH.png


Author

  • Isaac Arnault - Helping devs install Hadoop in a more effective way, cheaply, effortlessly and timelessly.
You can’t perform that action at this time.