## Configure Spark 2

Let us go ahead and configure Spark 2 on our single node Hadoop and Spark Cluster. We need to ensure that Spark can run using YARN mode.

* Update **/opt/spark2/conf/spark-env.sh** with below environment variables.

```shell
export HADOOP_HOME="/opt/hadoop"
export HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop"
export SPARK_DIST_CLASSPATH=$(hadoop --config ${HADOOP_CONF_DIR} classpath)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
```

* Update **/opt/spark2/conf/spark-defaults.conf** with below properties.

```shell
spark.driver.extraJavaOptions -Dderby.system.home=/tmp/derby/
spark.sql.repl.eagerEval.enabled   true
spark.master    yarn
spark.eventLog.enabled true
spark.eventLog.dir               hdfs:///spark2-logs
spark.history.provider            org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory     hdfs:///spark2-logs
spark.history.fs.update.interval  10s
spark.history.ui.port             18081
spark.yarn.historyServer.address localhost:18081
spark.yarn.jars hdfs:///spark2-jars/*.jar
```

* We also need to create directories for logs and jars in HDFS. Also, Spark jars should be copied to HDFS folder provided as part of **spark.yarn.jars**.

```shell
hdfs dfs -mkdir /spark2-jars
hdfs dfs -mkdir /spark2-logs

hdfs dfs -put /opt/spark2/jars/* /spark2-jars
```

* By default we will not be able to access Hive Metastore tables and databases using Spark. We need to perform below steps to integrate Spark with Hive Metastore.
  * Create soft link for **hive-site.xml** in Spark conf folder.
  * We also need to install latest **Postgres JDBC** jar in Spark jars folder.

```shell
sudo ln -s /opt/hive/conf/hive-site.xml /opt/spark2/conf/
sudo wget https://jdbc.postgresql.org/download/postgresql-42.2.19.jar \
    -O /opt/spark2/jars/postgresql-42.2.19.jar
```