###Setting Up Pyspark in iPython Notebook

1) To set up pyspark, first download and unzip spark 1.3.0: https://spark.apache.org/downloads.html<br>
2) If you are using Windows: don't; just get a linux VM<br>
3) If you are using a mac or linux machine, you need to build Spark. Go to the terminal and navigate to the directory where you unzipped spark.  Then build spark with this command: <br>

```
$ sbt/sbt -Pyarn -Phadoop-2.3 assembly
```

4) Update your bash file to point to spark and add spark to the python path<br>
```
$ sudo nano ~/.bashrc
```

add the following lines: 
```
export SPARK_HOME=<<location of your spark folder>> #e.g., /home/ai2/Documents/spark-1.3.0
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
```

CTRL-X to save and exit, then:
```
$ source ~/.bashrc
```

5) Navigate to your python folder and launch ipython notebook<br>
```
$ ipython notebook
```

6) Once you're in the notebook, use the setup below as an example for including spark in a script

In [1]:
from pyspark import SparkContext, SparkConf

In [2]:
# Configure Spark Settings
conf=SparkConf()
#conf.set("spark.executor.memory", "1g")
#conf.set("spark.cores.max", "2")
conf.setAppName("My App")

## Initialize SparkContext
sc = SparkContext('local[*]', conf=conf)

In [3]:
print sc

<pyspark.context.SparkContext object at 0x7ff81772a3d0>


In [4]:
readme = sc.textFile('/home/ai2/Documents/spark-1.3.0/README.md') #edit location of file for your machine

In [5]:
readme

/home/ai2/Documents/spark-1.3.0/README.md MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:-2

In [6]:
readme.cache()

/home/ai2/Documents/spark-1.3.0/README.md MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:-2

In [7]:
readme.count()

98

In [8]:
words = readme.flatMap(lambda x: x.split())
words.take(5)

[u'#', u'Apache', u'Spark', u'Spark', u'is']

In [9]:
wordcounts = words.map(lambda x: (x, 1)).reduceByKey(lambda x,y:x+y).map(lambda x:(x[1],x[0])).sortByKey(False)
wordcounts.take(10)

[(21, u'the'),
 (14, u'Spark'),
 (14, u'to'),
 (11, u'for'),
 (10, u'and'),
 (9, u'a'),
 (8, u'##'),
 (7, u'run'),
 (6, u'is'),
 (6, u'on')]