[Update] How to Install and Set Up a 3-Node Hadoop Cluster #2514

hzoppetti · 2019-05-31T20:26:52Z

Updated guide for Hadoop 3.1.2
CT-472

sagesyr · 2019-06-26T22:13:57Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

 ## What is Hadoop?

-Hadoop is an open-source Apache project that allows creation of parallel processing applications on large data sets, distributed across networked nodes. It's composed of the **Hadoop Distributed File System (HDFS™)** that handles scalability and redundancy of data across nodes, and **Hadoop YARN**: a framework for job scheduling that executes data processing tasks on all nodes.
+Hadoop is an open-source Apache project that allows creation of parallel processing applications on large data sets, distributed across networked nodes. It is composed of the **Hadoop Distributed File System (HDFS™)** that handles scalability and redundancy of data across nodes, and **Hadoop YARN**, a framework for job scheduling that executes data processing tasks on all nodes.


The use of the ":" is strange here, and it's used more as a comma. Avoiding contractions

sagesyr · 2019-06-26T22:15:25Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

    Run the steps in this guide from the **node-master** unless otherwise specified.

-2.  Follow the [Securing Your Server](/docs/security/securing-your-server/) guide to harden the three servers. Create a normal user for the install, and a user called `hadoop` for any Hadoop daemons. Do **not** create SSH keys for `hadoop` users. SSH keys will be addressed in a later section.
+1.  [Add a Private IP Address](/docs/platform/manager/remote-access/#adding-private-ip-addresses) to each Linode so that your Cluster can communicate with an additional layer of security.


Hadoop functions fine from private IP addresses, and this will decrease a potential attack surface

sagesyr · 2019-06-26T22:20:29Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md


-4.  The steps below use example IPs for each node. Adjust each example according to your configuration:
+1. Install the JDK using the appropriate guide for your distribution, [Debian](/docs/development/java/install-java-on-debian/), [CentOS](/docs/development/java/install-java-on-centos/) or [Ubuntu](/docs/development/java/install-java-on-ubuntu-16-04/), or grab the latest JDK from Oracle.
+


Made that securing the server comes after setting up the private IP so that a user can keep this in mind when configuring their security

sagesyr · 2019-06-26T22:29:26Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

-4.  The steps below use example IPs for each node. Adjust each example according to your configuration:
+1. Install the JDK using the appropriate guide for your distribution, [Debian](/docs/development/java/install-java-on-debian/), [CentOS](/docs/development/java/install-java-on-centos/) or [Ubuntu](/docs/development/java/install-java-on-ubuntu-16-04/), or grab the latest JDK from Oracle.
+
+1.  The steps below use example IPs for each node. Adjust each example according to your configuration:


Since this is more of a disclaimer than a step you need to follow, I moved it to the bottom of the list

sagesyr · 2019-06-26T22:30:07Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

-*   The **NameNode**: manages the distributed file system and knows where stored data blocks inside the cluster are.
-*   The **ResourceManager**: manages the YARN jobs and takes care of scheduling and executing processes on worker nodes.
+*   The **NameNode** manages the distributed file system and knows where stored data blocks inside the cluster are.
+*   The **ResourceManager** manages the YARN jobs and takes care of scheduling and executing processes on worker nodes.


This is a full sentence that defines something, not a definition itself

sagesyr · 2019-06-26T22:32:05Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

 ### Distribute Authentication Key-pairs for the Hadoop User

-The master node will use an ssh-connection to connect to other nodes with key-pair authentication, to manage the cluster.
+The master node will use an ssh connection to connect to other nodes with key-pair authentication. This will allow the master node to actively manage the cluster.


run-on sentence

sagesyr · 2019-06-26T22:33:05Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

        ssh-keygen -b 4096

-1.  View the **node-master** public key so you can copy it to each of the worker nodes.
+     When generating this key, leave the password field blank so your hadoop user can communicate unprompted.


the first time I set this up I entered a password for the key pair which prevented me from following the guide further

sagesyr · 2019-06-26T22:37:30Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

-1.  View the **node-master** public key so you can copy it to each of the worker nodes.
+     When generating this key, leave the password field blank so your hadoop user can communicate unprompted.
+
+1.  View the **node-master** public key and copy it to your clipboard to use with each of your worker nodes.


felt this clarification was worth it for people who may not fully understand how less works

sagesyr · 2019-06-26T22:38:16Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

        less /home/hadoop/.ssh/id_rsa.pub

-1.  In each node, make a new file `master.pub` in `/home/hadoop/.ssh`, paste in, and save this key.
+1.  In each Linode, make a new file `master.pub` in the `/home/hadoop/.ssh` directory. Paste your public key into this file and save your changes.


Rewrote the sentence with some additional clarifications and better flow

sagesyr · 2019-06-26T22:38:53Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

        update-alternatives --display java

-    Take the value of the current link and remove the trailing `/bin/java`. For example on Debian, the link is `/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java`, so `JAVA_HOME` should be `/usr/lib/jvm/java-8-openjdk-amd64/jre`.
+    Take the value of the *current link* and remove the trailing `/bin/java`. For example on Debian, the link is `/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java`, so `JAVA_HOME` should be `/usr/lib/jvm/java-8-openjdk-amd64/jre`.


Italicized current link to differentiate it conceptually

sagesyr · 2019-06-26T22:41:38Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

        export JAVA_HOME=${JAVA_HOME}

-    with your actual java installation path. For example on a Debian with open-jdk-8:
+    with your actual java installation path. On a Debian 9 Linode with open-jdk-8 this will be as follows:


Didn't want to use "example" again

sagesyr · 2019-06-26T22:42:00Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

 ### Set NameNode Location

-On each node update `~/hadoop/etc/hadoop/core-site.xml` you want to set the NameNode location to **node-master** on port `9000`:
+Update your `~/hadoop/etc/hadoop/core-site.xml` file to set the NameNode location to **node-master** on port `9000`:


We don't need to do this on each node since it is already performed in a later step.

sagesyr · 2019-06-26T22:43:07Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md



-    The last property disables virtual-memory checking and can prevent containers from being allocated properly on JDK8.
+    The last property disables virtual-memory checking which can prevent containers from being allocated properly with JDK8 if enabled.


Felt this was worth clarifying

sagesyr · 2019-06-26T22:44:05Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

        wget -O alice.txt https://www.gutenberg.org/files/11/11-0.txt
-        wget -O holmes.txt https://www.gutenberg.org/ebooks/1661.txt.utf-8
-        wget -O frankenstein.txt https://www.gutenberg.org/ebooks/84.txt.utf-8
+        wget -O holmes.txt https://www.gutenberg.org/files/1661/1661-0.txt


utf-8 will not work. Used the links for plaintext instead

sagesyr · 2019-06-26T22:46:16Z

docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/index.md

    <property>
            <name>yarn.resourcemanager.hostname</name>
-            <value>node-master</value>
+            <value>203.0.113.0</value>


since we're using a private IP with the hostname, we need to use the public IP in order for the yarn site to resolve

andystevensname · 2019-07-12T19:33:01Z

This needs to be rebased once the quick-disclosure-note and table changes are merged into to develop.

* Updated to Hadoop 3.1.2 * tech edit * Copy Edit

hzoppetti and others added 2 commits May 31, 2019 16:25

Updated to Hadoop 3.1.2

15167ee

tech edit

7ef7b4d

sagesyr reviewed Jun 26, 2019

View reviewed changes

Copy Edit

fe2b9e5

Guaris approved these changes Jul 22, 2019

View reviewed changes

Guaris merged commit 4f21223 into linode:develop Jul 22, 2019

Guaris pushed a commit that referenced this pull request Aug 19, 2019

[Update] How to Install and Set Up a 3-Node Hadoop Cluster (#2514)

f52543e

* Updated to Hadoop 3.1.2 * tech edit * Copy Edit


		4. The steps below use example IPs for each node. Adjust each example according to your configuration:
		1. Install the JDK using the appropriate guide for your distribution, [Debian](/docs/development/java/install-java-on-debian/), [CentOS](/docs/development/java/install-java-on-centos/) or [Ubuntu](/docs/development/java/install-java-on-ubuntu-16-04/), or grab the latest JDK from Oracle.



		The last property disables virtual-memory checking and can prevent containers from being allocated properly on JDK8.
		The last property disables virtual-memory checking which can prevent containers from being allocated properly with JDK8 if enabled.

[Update] How to Install and Set Up a 3-Node Hadoop Cluster #2514

[Update] How to Install and Set Up a 3-Node Hadoop Cluster #2514

Uh oh!

Conversation

hzoppetti commented May 31, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sagesyr Jun 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andystevensname commented Jul 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sagesyr Jun 26, 2019 •

edited

Loading