Fetching contributors…
Cannot retrieve contributors at this time
99 lines (75 sloc) 2.86 KB

Spark 1.2.1 with Avro-mapred-1.7.6-hadoop2

Due to SPARK-3039 Wrong Avro Mapred Library Version (hadoop1 instead of hadoop2) we created our own version of Avro with a fix for that problem. This fix has been submitted as the Apache Spark Pull Request 4315 on GitHub and is included in the master (1.3.0-SNAPSHOT) and 1.3.0 release candidates. Hopefully, it will also be backported to 1.2.2 onwards. But here is how to build a release using the v1.2.1 release.

Obtaining Git repo for Apache Spark

Apache Spark's GitHub repo is at

git clone
cd spark
# Create branch fix from tag v1.2.1 (latest release branch)
git checkout -b fix v1.2.1

Fixing SPARK-3039

The fix consists of explicitly excluding the bad version of avro-mapred-1.7.5.jar:

In sql\hive\pom.xml:

Add explicit exclusion of bad version


We just added the exclusion for avro-mapred. commons-logging and kryo were already excluded.

Later on in pom.xml, explicit version of avro-mapred is already defined


Make Distribution with Hive, Yarn and Hadoop 2.4

See Building Spark

./ -Pyarn -Phive -Phadoop-2.4 -Phive-0.13.1
cp -R dist /usr/local/spark-1.2.1-hadoop2.4
cd /usr/local
ln -s spark-1.2.1-hadoop2.4 spark

Add Spark binaries to Path

vi ~/.bashrc
export JAVA_HOME=/usr/local/java
export SPARK_HOME=/usr/local/spark

# test
which spark-shell
> /usr/local/spark/bin/spark-shell
which spark-submit
> /usr/local/spark/bin/spark-submit

Setting Default Configurations

cd /usr/local/spark/conf

# Ensure file is executable
ls -l
> -rwxr-xr-x 1 ...


Testing the New Installation