Skip to content

Deploy Hadoop Cluster

Jichao edited this page Jun 13, 2017 · 17 revisions

Knowledge

  • HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN daemons are ResourceManager, NodeManager, and WebAppProxy. If MapReduce is to be used, then the MapReduce Job History Server will also be running. For large installations, these are generally running on separate hosts.

Machine Template

  • JAVA_HOME: /hadoop_env/jdk
  • HADOOP_HOME: /hadoop_env/hadoop

Topology

  • NameNode: 1
  • DataNode: 3
  • ResourceManager: 1
  • NodeManager?
  • WebAppProxy:?
  • MapReduceJobHistory Server ?
  • Typically one machine in the cluster is designated as the NameNode and another machine as the ResourceManager, exclusively. These are the masters. Other services (such as Web App Proxy Server and MapReduce Job History server) are usually run either on dedicated hardware or on shared infrastructure, depending upon the load. The rest of the machines in the cluster act as both DataNode and NodeManager. These are the workers.
  • 1 machine for NameNode
  • 1 machine for ResourceManager
  • 1 machine for Web App Proxy Server
  • 1 machine for Map Reduce Job History Server
  • other machines for DataNode and NodeManager.

Reference

Clone this wiki locally