## What's this all about?

In order to run Spark applications on your local machine, you must have **Java 8**, **Spark**, and the **PySpark** package installed.   If you are unsure whether you have any or all of these requirements, we recommend you follow the instructions in this notebook.

### Automatic Setup

Executing the following cell will install Java 8 on your machine. If you do not trust this code, please install Java 8 from its webpages.

*Note: If you have Java already installed, we recommend uninstalling it before running the following cell.*

*Linux/Mac distro note: Running this code with delete your `$HOME/java/` directory and replace it with a fresh Java 8 installation*

In [1]:
import platform as arch
from sys import platform
   
print('Beginning Java 8 (re)installation!')

# Check which OS we are running on (this should also be the code run on Jetstream)
if platform.startswith('linux'):
    print('Now installing Java on Linux...')
    if arch.architecture()[0] == '64bit':
        !wget -O ~/java.tar.gz http://javadl.oracle.com/webapps/download/AutoDL?BundleId=234464_96a7b8442fe848ef90c96a2fad6ed6d1
    elif arch.architecture()[0] == '32bit':
        !wget -O ~/java.tar.gz http://javadl.oracle.com/webapps/download/AutoDL?BundleId=234462_96a7b8442fe848ef90c96a2fad6ed6d1
    !rm -rf ~/java && mkdir ~/java && tar -xzf ~/java.tar.gz -C ~/java --strip-components=1 && rm ~/java.tar.gz

    # Add JAVA_HOME to environment variables
    %env JAVA_HOME=$HOME/java/bin

elif platform == 'darwin':
    print('Now installing Java on Mac...')
    !wget -O ~/java.dmg http://javadl.oracle.com/webapps/download/AutoDL?BundleId=234465_96a7b8442fe848ef90c96a2fad6ed6d1
    !hdiutil attach ~/java.dmg
    !sudo installer -pkg /Volumes/Java\ 8\ Update\ 181/Java\ 8\ Update\ 181.app/Contents/Resources/JavaAppletPlugin.pkg -target /
    !diskutil umount /Volumes/Java\ 8\ Update\ 181 
    !rm ~/java.dmg
    print('If there was an error, please mount java.dmg in your home directory and follow the instructions to install')

elif platform == 'win32':
    print('You are running a Windows OS.  Please download the correct version of Java from here: https://java.com/en/download/manual.jsp and install following the instructions.')

else:
    print('We had trouble determining which OS you are running.  Please ask for help.')
    

Beginning Java 8 (re)installation!
Now installing Java on Linux...
--2018-10-01 04:29:56--  http://javadl.oracle.com/webapps/download/AutoDL?BundleId=234464_96a7b8442fe848ef90c96a2fad6ed6d1
Resolving javadl.oracle.com (javadl.oracle.com)... 137.254.120.23
Connecting to javadl.oracle.com (javadl.oracle.com)|137.254.120.23|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://sdlc-esd.oracle.com/ESD6/JSCDL/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jre-8u181-linux-x64.tar.gz?GroupName=JSC&FilePath=/ESD6/JSCDL/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jre-8u181-linux-x64.tar.gz&BHost=javadl.sun.com&File=jre-8u181-linux-x64.tar.gz&AuthParam=1538383796_130aa5c33dfb84448b372484f11bb1f4&ext=.gz [following]
--2018-10-01 04:29:56--  https://sdlc-esd.oracle.com/ESD6/JSCDL/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jre-8u181-linux-x64.tar.gz?GroupName=JSC&FilePath=/ESD6/JSCDL/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jre-8u181-linux-x6

Executing the following cell will install the latest version of Apache Spark and PySpark on your machine.  It will also build matplotlib font libraries and modify your `$PATH` variable for easier use of Spark on JetStream!

*Note: If you have Spark already installed, we recommend uninstalling it before running the following cell.*

*Note: This code assumes that you have anaconda or miniconda installed on your machine.  Download and install [anaconda3](https://www.anaconda.com/download/) if you do not already have it.  If running this notebook in Jetstream via `ezj`, then Anaconda is already be installed!*

In [None]:
from shutil import which

# Check if spark is already installed
if which("pyspark"):
    print('Spark and PySpark are already installed!')
else:
    print('PySpark is not installed, attempting to install now.')
    
    # Install spark and pyspark using conda
    !conda update -n base conda --yes
    !conda install pyspark --yes
    
    # Add SPARK_HOME to environment variables
    %env SPARK_HOME=/opt/anaconda3/bin

    print('Installation complete!')
    
# Set spark master to localhost
%env SPARK_LOCAL_IP="127.0.0.1"

# Build Matplotlib font library for future use
from matplotlib import pyplot as plt
plt.plot([0],[0])
plt.show()
plt.clf()

print('All done!!')

PySpark is not installed, attempting to install now.
Solving environment: / 

Finally, check that PySpark is working correctly:

In [None]:
from pyspark import SparkContext
sc = SparkContext()


Now that we have Java and Spark installed and your environment configured, let's begin running some Spark applications!