# Simple PySpark notebook
by Héctor Ramírez

<hr>

This is a simple, instructive notebook on setting up PySpark on Jupyter notebooks/lab and to run it locally.

<hr>

### First
Install pyspark and findspark:

* <code> $ pip install pyspark </code>

* <code> $ pip install findspark </code>

### Second
Spark runs on Java 8, then, to avoid issues, it's simpler to only have Java 8:
  
* Uninstall all Java versions 
* Install Java 8 from <a href="https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html">here</a>.

### Third

* Add to the <code>~/.bash_profile</code> file the following environments:


<code># PySpark</code> <br>
<code>export SPARK_HOME=/{YOUR_SPARK_DIRECTORY}/pyspark </code> <br>
<code>export PATH=$SPARK_HOME/bin:\\$PATH </code> <br>
<code># Java</code> <br>
<code>if which java > /dev/null; then export JAVA_HOME=\\$(/usr/libexec/java_home); fi </code>


Remember to replace {YOUR_SPARK_DIRECTORY} with the directory where you unpacked Spark.

<hr>

Then, we can open a notebook and create a SparkContext:

In [1]:
# To find out where pyspark is
import findspark
findspark.init()

# Creating Spark Context
from pyspark import SparkContext
sc = SparkContext("local", "first app")

# Calculating words count
text_file = sc.textFile("OneParagraph.txt") # This is a manually-created file
counts = text_file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b) \
             .map(lambda x: (x[1], x[0])) \
             .sortByKey(ascending=False)

# Printing each word with its respective count
output = counts.collect()
for (count, word) in output:
    print("%s: %i" % (word, count))
    
# Stopping Spark Context
sc.stop()

Mr.: 5
to: 4
the: 4
Trump: 2
had: 2
Russians: 2
Only: 1
one: 1
day: 1
before: 1
spoke: 1
Zelensky,: 1
Mueller: 1
testified: 1
Congress: 1
about: 1
how: 1
tried: 1
help: 1
elect: 1
by: 1
organizing: 1
theft: 1
and: 1
release: 1
of: 1
emails: 1
damaging: 1
his: 1
opponent.: 1
In: 1
that: 1
case,: 1
were: 1
pursuers: 1
who: 1
sought: 1
contacts: 1
with: 1
Trump’s: 1
campaign.: 1
