Deploy Hadoop Cluster

Knowledge

HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN daemons are ResourceManager, NodeManager, and WebAppProxy. If MapReduce is to be used, then the MapReduce Job History Server will also be running. For large installations, these are generally running on separate hosts.
WebAppProxy: https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html

JAVA_HOME: /hadoop_env/jdk
HADOOP_HOME: /hadoop_env/hadoop
NAME NODE: /hadoop_env/namendoe/store
Allow DataNode file: /hadoop_env/namenode/conf/datanode-allow.list (should be ip address line by line?)
DataNode store: /hadoop_env/datanode/store
ResourceManager: /hadoop_env/resourcemanager/include_nodemanagers.list (should be ip address line by line?)
NodeManager: /hadoop_env/nodemanager/local_dir_1
NodeManager: /hadoop_env/nodemanager/log_dir_1
MR history server: /hadoop_env/mr-history/tmp
MR history server: /hadoop_env/mr-history/done

NameNode: 1
DataNode: 3
ResourceManager: 1
NodeManager?
WebAppProxy:?
MapReduceJobHistory Server ?
Typically one machine in the cluster is designated as the NameNode and another machine as the ResourceManager, exclusively. These are the masters. Other services (such as Web App Proxy Server and MapReduce Job History server) are usually run either on dedicated hardware or on shared infrastructure, depending upon the load. The rest of the machines in the cluster act as both DataNode and NodeManager. These are the workers.
1 machine for NameNode
1 machine for ResourceManager
1 machine for Web App Proxy Server
1 machine for Map Reduce Job History Server
other machines for DataNode and NodeManager. Two roles for one machine.