<img src="../img/scala-logo.png" width="200">

In [None]:
What is
Expressive
- First-class functions
- Closures
Concise
- Type inference
- Literal syntax for function creation
Java interoperability
- Can reuse java libraries
- Can reuse java tools
- No performance penalty

- [First steps to Scala](http://www.artima.com/scalazine/articles/steps.html)
- [Brief intro](https://twitter.github.io/scala_school/basics.html#class)
- [ML in Scala](https://www.mapr.com/blog/apache-spark-machine-learning-tutorial)
- [OpenNLP](http://www.scalanlp.org/)
- [KeystoneML](http://keystone-ml.org/)
- [Scala overview](http://docs.scala-lang.org/overviews/parallel-collections/overview.html)

#### Table of contents
- [Orientation](#orientation)
    - Functions
    - Methods
    - Variables
    - ...
- [Exploration](#exploration)
- [Testing](#testing)
- [MLlib](#mllib)
- [Q & A](#qna)

# Orientation

In [None]:
//get help (magic must be in first line)
%%help

#### File path

In [None]:
// list content under directory
import  org.apache.hadoop.fs.{FileSystem,Path}

FileSystem.get( sc.hadoopConfiguration ).listStatus( new Path("hdfs:///")).foreach( x => println(x.getPath ))

#### Expressions

In [None]:
1 + 1

#### Variables

In [None]:
var name = "steve"

#### Methods

In [None]:
def add(x:Int, y:Int):Int = {
    return x + y // return is inferred for non recursive methods and methods that don't have an explicit return value
}
println(add(42,13))

#### Functions

In [None]:
def addOne(m: Int): Int = m + 1

In [None]:
// whitespace in function
val add2 = adder(2, _:Int)

#### Class

In [None]:
//A simple class that does nothing:
class Person(fname:String, lname:String)
// --> no result on the next statement:
val p1 = new Person("Alice","In Chains")

//A class with a method
class Person2(fname:String, lname:String){
    def greet = s"Hi $fname $lname"
}
val p2 = new Person2("Max","Kohl")
println(p2.greet)

//Create getter and setter:
class Person4(val fname:String, var lname:String)
val p4 = new Person4("Mox","Power"){
    //override the default string repr.
    override def toString = s"$fname $lname"
}
println(p4.fname)
p4.lname="Grohl"
println(p4)

class Calculator {
       val brand: String = "HP" //field
       def add(m: Int, n: Int): Int = m + n //method
     }
val calc = new Calculator
calc.add(3,4)
calc.brand

Constructors can give you the possibility to create instances in your method definition:

In [None]:
class Calculator(brand: String) {
  /**
   * A constructor.
   */
  val color: String = if (brand == "TI") {
    "blue"
  } else if (brand == "HP") {
    "black"
  } else {
    "white"
  }

  // An instance method.
  def add(m: Int, n: Int): Int = m + n
}

Idiomatic Scala:

In [None]:
import beans._
class SPerson(@BeanProperty var name:String)
val sp = new SPerson("Scala Style")
println(sp.name)
sp.name += " rocks!"
println(sp.getName)

#### Array

In [None]:
val array1 = Array(1,2,3)
val array2 = Array("a",2,true)
val array3 = Array("a","b","c")
val itemAtIndex0 = array3(0)
array3(0)

In [None]:
Append:
val concatenated = "prepend" +: (array1 ++ array2) :+ "append"
Difference:
val diffArray = Array(1,2,3,4).diff(Array(2,3)) // >> Array(1,4)
Find:
val personArray = Array(("Alice",1), ("Bob",2), ("Carol",3))
def findByName(name:String) = personArray.find(_._1 == name).getOrElse(("David",4))
val findBob = findByName("Bob") // >> findBob = (Bob,2)
Retrieve element:
val bobFound = findBob._2 // >> bobFound = 2

#### List

In [None]:
// Lists are immutable. Any "changes" create a new list, the original is untouched.

val list1 = List("a",2,true)
Access:
val firstItem = list1(0) // >> firstItem = "a"
Create "mutable version" of list:
import  collection.mutable
val mlist = mutable.ArrayBuffer("a","b","c") // >> mlist = ArrayBuffer(d,b,e,f,g)
Modify:
mlist(0) = "k" // >> ArrayBuffer(k,b,e,f,g)
Concatenate:
list1 ++ list2
Prepend:
0 :: list1 // >> List(0,"a",2,true)
Append:
list1 :+ 4 // >> List("a",2,true,4)

val concatenated = 1 :: list1 ++ list2 ++ mlist :+ 'd'

Remove:
mlist - "c" // creates new Array with "c" removed
mlist -- List("e", "f") // ""
mlist // stayed intact
mlist -= "c" // c removed
mlist // c removed
mlist --= List("e","f") // List("e","f") removed
mlist += "e"
mlist ++= List("f","g")

Diff:
val diffList = List(1,2,3,4) diff List(2,3)

// Find statement is same as for Arrays

#### Set

In [None]:
// Sets remove duplicates
val set1 = Set(1,2,3,3)
println(set1) // Set(1,2,3)
// Find if value is in Set
val fourExists = set1(4) // println(fourExists) >> false
// Modify Set
import collection.mutable
val mset = mutable.HashSet("a","b","c")
mset("a") = false
println(mset) // >> Set(c,b)

// Concatenate and find statements same as for List

#### Map

In [None]:
val map1 = Map("one" -> 1, "two" -> 2, "three" -> 3)
import collection.mutable
val mmap = mutable.HashMap("a" -> 1, "b" -> 2, "c" -> 3)
// Remove duplicate keys
println(Map("a" -> 1, "a" -> 2))

val fourExsitsOption = map1.get("four")
println(fourExistsOption.isDefined) // false

// Concatenate and find statements same as for List

#### Mutable collections

In [None]:
import scala.collection.mutable

val arrayBuffer = mutable.ArrayBuffer(1,2,3)
val listBuffer = mutable.ListBuffer("a","b","c")
val hashSet = mutable.Set(0.1,0.2,0.3)
val hashMap = mutable.Map("one" -> 1, "two" -> 2)

[ref](http://scalatutorials.com/tour/interactive_tour_of_scala_mutable_collections_operations.html)

#### Immutable collections with var

[ref](http://scalatutorials.com/tour/interactive_tour_of_scala_immutable_collections_with_var.html)

#### Print

In [None]:
val (a,b) = swap("hello","world")
println(a,b)

#### Assign variables

In [None]:
var (x, y, z) = (1, 2, true)

#### Loops

In [None]:
// while
while ( i <10){
  sum += i
  i+=1
}
// for
for ( i <- 0 until 10) {  
  sum += i  
}  
// without loop
(0 until 10).sum

#### if

In [None]:
if (true)
    println("no braces")
if (1+1=2){
    println("multiple")
    println("statements")
}

val breakfast =
    if (likeEggs) "scrambled eggs"
    else "Apple"

#### match

In [None]:
// use match as a switch
val selection = "One"
selection match {
    case "One" => println("You selected option one!")
    case "Two" => println("You selected option two!")
}

# Object

### object -> method -> argument

In [2]:
object HelloWorld {
    def main(args: Array[String]): Unit = {
        println("Hello, world!")
    }
}

defined [32mobject [36mHelloWorld[0m

In [3]:
HelloWorld.main(null)

Hello, world!




In [None]:
//Type of object

### main method (runs first) or App (all methods run)

In [4]:
object HelloWorldApp extends App {
    println("Hi Max!")
}

defined [32mobject [36mHelloWorldApp[0m

In [5]:
HelloWorldApp.main(null)

Hi Max!




# Class

In [None]:
class Hello(message: String) {
    println(message) // this is the primary constructor
}
new Hello("Be curious!") //creates instance of the class
res0.toString //creates string representation of instance

In [None]:
// you can pass parameters to class, that can be used in instance
// types must be specified

In [None]:
// cannot be accessed from outside class instance, unless calles val:
class Hello(val message: String)
val hello = new Hello("Badabing")
hello.message

#### Mutable and immutable fields (variables)

In [None]:
// immutable = Variable can't change/mutate -> *immutable*
val NameOfPope : String = "Luigi"
// mutable
var NameOfDuke : String = "Parcival" // Variable can change -> *mutable*

In [None]:
// Fields are accessible to outsiders of the class
class Hello {
    val message: String = "Hello" // immutable field
}
(new Hello).message

class Hello {
    var message: String = "Hello"
}
val hello = new Hello
hello.message = "Primabello!" // can change mutable field

<a id='exploration'></a>
[back to top](#top)

# Exploration

In [None]:
// summary
val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)
summary.mean
summary.variance
summary.numNonzeros
summary.normL1
summary.normL2

In [None]:
// correlation


#### Create random data

In [None]:
import org.apache.spark.mllib.random.randomRDDs._
val data = normalVectorRDD(sc, numRows=10000L, numCols=3, numPartitions=10)
val stats: MultivariateStatisticalSummary = Statistics.colStats(data)
stats.mean
stats.variance


#### Print variable

In [None]:
val Tau = 2*3
println(s"Happy $Tau Day")

#### Useful operations

In [None]:
//Number operations   
//Ranges   
//creates a range between 1 to 10 inclusive  
val range = 1 to 10   
//creates a range between 1 to 10 exclusive   
val range2 = 1 until 10   
//from 2 to 10 with jumps of 3  
val range3 = 2 until 10 by 3   

println(range3.toList) //List(2, 5, 8)  

//Number convinience methods   
val num = -5  
val numAbs = num.abs //absolute value  
val max5or7 = numAbs.max(7)  
val min5or7 = numAbs.min(7)  
println(numAbs) //5  
println(max5or7) //7   
println(min5or7) //5  

//String operations   

val reverse = "Scala".reverse //reverse a string   
println(reverse) //alacS  

val cap = "scala".capitalize //make first char caps  
println(cap) //Scala  

val multi = "Scala!" * 7 //repeat n times   
println(multi) //Scala!Scala!Scala!Scala!Scala!Scala!Scala!  

val int = "123".toInt //parse as Int  
println(int)  

//Useful methods on collections   

//filter - keep only items larger than 4   
val moreThan4 = range.filter(_ > 4)  
println(moreThan4) //Vector(5, 6, 7, 8, 9, 10)  

//map - transform each item in the collection   
val doubleIt = range2.map(_ * 2)  
println(doubleIt) //Vector(2, 4, 6, 8, 10, 12, 14, 16, 18)  


<a id='testing'></a>
[back to top](#top)

# Testing

### Scalatest

In [None]:
//String test

class StackSpec extends FlatSpec {
  "A text" should "be the same text" in {
    val msg: String = "Muchas problemas"
    assert(msg === "Muchas problemas")
  }
}

<a id='mllib'></a>
[back to top](#top)

# MLlib

In [None]:
// Vectors:
import org.apache.spark.mllib.linalg.{Vector, Vectors} // scala has a vector itself
Vectors.dense(33.0, 0.0, 55.0) // dense vector
Vectors.sparse(3, Array(0, 3), Array(44.0, 55.0)) // sparce vector. first array is index
Vectors.sparse(3, Seq((0, 44.0), (2, 55.0)))

// Labeled Points:
// in classification must be 0,1. In multiclass must be 0,1,2,...
import org.apache.spark.mllib.regression.LabeledPoint
LabeledPoint(1.0, Vectors.dense(44.0, 0.0, 55.0))
LabeledPoint(0.0, Vectors.sparse(3, Array(0,2), Array(44.0, 55.0)))

// Matrices
// Dense Matrix
import org.apache.spark.mllib.linalg.{Matrix, Matrices}
Matrices.dense(3,2, Array(1,3,5,2,4,6)) 
// Sparse Matrix
val m = Matrices.sparse(5,4, Array(0,0,1,2,2), Array(1,3), Array(34,55)) // how does this work?
// Distributed Matrix
// - stored in RDDs
// - three types: RowMatrix, IndexedRowMatrix, CoordinateMatrix
// RowMatrix
RDD[Vector]

# NLP

In [None]:
https://www.npmjs.com/package/opennlp

# Examples

### Map keys to words

In [None]:
val lines = sc.textFile("")
// missing

### key-value pairs

In [None]:
val pair = ('a','b')
pair._1 //will return 'a'
pair._2 //will return 'b'

### Count words

In [None]:
val wordCounts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey((a,b) => a + b)

In [None]:
wordCounts.collect()

<a id='qna'></a>
[back to top](#top)

# Q & A

Are Scala Arrays mutable?
- Yes, the elements can be modified