These components are installed:
- JDK 1.8
- Scala 2.12.14
- Hadoop 3.2.1
- Spark 3.1.2 (without bundled Hadoop)
- Maven
- AWS CLI (for EMR execution)
-
Example ~/.bash_aliases: export JAVA_HOME=/usr/local/cellar/openjdk@8/1.8.0+302 export HADOOP_HOME=/usr/local/cellar/hadoop/3.2.1 export SCALA_HOME=/usr/local/opt/scala@2.12/idea export SPARK_HOME=/usr/local/cellar/spark/spark-3.1.2-bin-without-hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin export SPARK_DIST_CLASSPATH=$(hadoop classpath) export YARN_HOME=$HADOOP_HOME export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"
-
Explicitly set JAVA_HOME in $HADOOP_HOME/etc/hadoop/hadoop-env.sh: export JAVA_HOME=/usr/local/cellar/openjdk@8/1.8.0+302
All of the build & execution commands are organized in the Makefile.
- Unzip project file.
- Open command prompt.
- Navigate to directory where project files unzipped.
- Edit the Makefile to customize the environment at the top. Sufficient for standalone: hadoop.root, jar.name, local.input Other defaults acceptable for running standalone.
- Standalone Hadoop: make switch-standalone -- set standalone Hadoop environment (execute once) make local
- Pseudo-Distributed Hadoop: (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation) make switch-pseudo -- set pseudo-clustered Hadoop environment (execute once) make pseudo -- first execution make pseudoq -- later executions since namenode and datanode already running
- AWS EMR Hadoop: (you must configure the emr.* config parameters at top of Makefile) make upload-input-aws -- only before first execution make aws -- check for successful execution with web interface (aws.amazon.com) make download-output-aws -- after successful execution & termination