Skip to content

Commit

Permalink
Improve caching.
Browse files Browse the repository at this point in the history
Image now has 99% efficiency. ❤
  • Loading branch information
kasipavankumar committed Aug 9, 2021
2 parents eb726b7 + 739610b commit 256cccc
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 18 deletions.
31 changes: 14 additions & 17 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,35 +1,29 @@
# Ubuntu as the base image
FROM ubuntu:20.04

# Set working directory to /home
LABEL author="D. Kasi Pavan Kumar <devdkpk@gmail.com>"
LABEL version="1.0.2"

# Set working directory to /
WORKDIR /

# Install required dependencies
RUN apt-get update && apt-get install -y \
RUN apt-get update && apt-get install --yes --no-install-recommends \
openjdk-8-jdk \
openssh-server \
openssh-client \
nano \
wget \
&& rm -rf /var/lib/apt/lists/*

# Generate SSH key pair for password less login
RUN ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa \
&& cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys \
&& chmod 0600 ~/.ssh/authorized_keys

# Download Hadoop 3.3.1
RUN wget https://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

# Unzip the .tar.gz
RUN tar xzf hadoop-3.3.1.tar.gz

# Remove the .tar.gz file
RUN rm ./hadoop-3.3.1.tar.gz

# Hadoop home
# Set HADOOP_HOME variable
ENV HADOOP_HOME=/hadoop-3.3.1

# Other Hadoop environment variables
# Other Hadoop variables
ENV HADOOP_INSTALL=${HADOOP_HOME} \
HADOOP_MAPRED_HOME=${HADOOP_HOME} \
HADOOP_COMMON_HOME=${HADOOP_HOME} \
Expand All @@ -38,11 +32,9 @@ ENV HADOOP_INSTALL=${HADOOP_HOME} \
HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native \
PATH=$PATH:${HADOOP_HOME}/sbin:${HADOOP_HOME}/bin \
HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/nativ" \

# Java home
JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ \

# For start-all.sh
# For staring Hadoop services using `start-all.sh`
HDFS_NAMENODE_USER="root" \
HDFS_DATANODE_USER="root" \
HDFS_SECONDARYNAMENODE_USER="root" \
Expand All @@ -59,4 +51,9 @@ COPY /etc/* ${HADOOP_HOME}/etc/hadoop/
# Copy bootstrap.sh
COPY ./bootstrap.sh /

# Download Hadoop 3.3.1
RUN wget -qO- https://mirrors.estointernet.in/apache/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz | tar xvz \
&& apt-get remove --yes wget \
&& apt-get autoremove --yes

CMD [ "bash", "./bootstrap.sh" ]
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A Docker image to play around with [Apache Hadoop](https://hadoop.apache.org) in [Pseudo Distributed Mode](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html) (single cluster mode).

### Below are the steps to play around with this image using [Play with Docker](https://labs.play-with-docker.com).
## Below are the steps to play around with this image using [Play with Docker](https://labs.play-with-docker.com).

1. First of all, create an account on [Docker Hub](https://hub.docker.com/signup).
2. Login to [Play with Docker](https://labs.play-with-docker.com) using the Docker Hub account you just created.
Expand All @@ -28,6 +28,14 @@ _At this stage, the image will be booting up by executing all the required steps

<hr />

## A note size of the image

The final Docker image weighs around **1.8GB**, wherein Hadoop & Java take up the majority piece. When analyzed using [Dive](https://github.com/wagoodman/dive), the efficiency came out to be around 99% (_sweet_).

![Docker image analysis](https://lh3.googleusercontent.com/keep-bbsk/AGk0z-NersED_8G-nB4mt1LH18Mqg6Q6Tb_1Wg1YcE5F6LglDrvJsYgaOpzasylVpDgLiGT9ph0GF94rgvvi5Nb0M2ZBxNYCmX31_RPXiUI=s1598)

<hr />

<div align="center">

[![Deploy Docker image](https://github.com/kasipavankumar/hadoop-docker/actions/workflows/deploy.yml/badge.svg)](https://github.com/kasipavankumar/hadoop-docker/actions/workflows/deploy.yml)
Expand Down

0 comments on commit 256cccc

Please sign in to comment.