# Apache Spark with Scala

More details and magic commands here: https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb


http://apachesparkbook.blogspot.com.br


# RDD: Basics
An RDD is the basic unit of data in Spark, upon which all operations are performed. 
RDDs are intermediate results stored in Memory and are Partitioned to be operated on multiple nodes in the Cluster.

An RDD Operation can be either be actions or transformations
Action returns result to the Driver Program or write it to the Storage. An action normally starts a Computation to provide result and always return some other data type other than RDD. Transformation returns Pointer to new RDD

In [2]:
val rdd = sc.parallelize(Seq(
     |                       ("math",    55),
     |                       ("math",    56),
     |                       ("english", 57),
     |                       ("english", 58),
     |                       ("english", 12),
     |                       ("science", 59),
     |                       ("science", 54),
     |                ("science", 59),
     |                ("science", 54))
)

In [1]:
rdd.collect()

Array((math,55), (math,56), (english,57), (english,58), (science,59), (science,54))

In [3]:
rdd.count()

6

In [7]:
rdd.first()

(math,55)

In [6]:
rdd.take(2)

Array((math,55), (math,56))

In [11]:
//ordered by key
rdd.takeOrdered(3) 

Array((science,54), (science,59), (math,55))

In [1]:
rdd.countByKey()

Map(english -> 2, science -> 2, math -> 2)

## RDD Basics: Actions
https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions

## sortByKey - Ascending order

In [5]:
val sorted1 = rdd.sortByKey()
sorted1.collect()

Array((english,57), (english,58), (math,55), (math,56), (science,59), (science,54))

## sortByKey - Descending order

In [7]:
val sorted2 = rdd.sortByKey(false)
sorted2.collect()

Array((science,59), (science,54), (math,55), (math,56), (english,57), (english,58))

## sortByKey - Custom order

Defines an implicit sorting for the method sortByKey().

We use '{' here to limit the scope of the implicit ordering

In [12]:
{
implicit val sortIntegersByString = new Ordering[String] {
     override def compare(a: String, b: String) = {val result = a.compare(b) 
                                                   result
                                                  }
     }

val sorted3 = rdd.sortByKey()
sorted3.collect()
}



Array((english,57), (english,58), (math,55), (math,56), (science,59), (science,54))

## Now let's run the object ...

In [5]:
val files = Array("./resources/data/input1.txt", "./resources/data/input2.txt")

val myAnaliser = new Analiser

myAnaliser.main(files)

TEXT1
said=456
alice=377
that=234
with=172
very=139
TEXT2
vibrating=1
young=10
stumbled=8
intimately=1
someone=1
COMMON
little
said
that
they
this
with
Time elapsed: 8 seconds


## Desired Output:  
TEXT1  
said=456  
alice=377  
that=234  
with=172  
very=139  
TEXT2  
that=759  
with=448  
were=365  
from=326  
they=302  
COMMON  
little  
said  
that  
they  
this  
with  
