<img src="../img/scala-logo.png" width="200">

In [None]:
What is
Expressive
- First-class functions
- Closures
Concise
- Type inference
- Literal syntax for function creation
Java interoperability
- Can reuse java libraries
- Can reuse java tools
- No performance penalty

- [First steps to Scala](http://www.artima.com/scalazine/articles/steps.html)
- [Brief intro](https://twitter.github.io/scala_school/basics.html#class)
- [ML in Scala](https://www.mapr.com/blog/apache-spark-machine-learning-tutorial)
- [OpenNLP](http://www.scalanlp.org/)
- [KeystoneML](http://keystone-ml.org/)
- [Scala overview](http://docs.scala-lang.org/overviews/parallel-collections/overview.html)

#### Table of contents
- [Orientation](#orientation)
- [Exploration](#exploration)
- [Testing](#testing)
- [MLlib](#mllib)

# Orientation

In [None]:
//get help (magic must be in first line)
%%help

In [None]:
// list content under directory
import  org.apache.hadoop.fs.{FileSystem,Path}

FileSystem.get( sc.hadoopConfiguration ).listStatus( new Path("hdfs:///")).foreach( x => println(x.getPath ))

#### Expressions

In [None]:
1 + 1

#### Variables

In [None]:
var name = "steve"

#### Functions

In [None]:
def addOne(m: Int): Int = m + 1

In [None]:
// whitespace in function
val add2 = adder(2, _:Int)

#### Class

In [None]:
class Calculator {
       val brand: String = "HP" //field
       def add(m: Int, n: Int): Int = m + n //method
     }
val calc = new Calculator
calc.add(3,4)
calc.brand

Constructors can give you the possibility to create instances in your method definition:

In [None]:
class Calculator(brand: String) {
  /**
   * A constructor.
   */
  val color: String = if (brand == "TI") {
    "blue"
  } else if (brand == "HP") {
    "black"
  } else {
    "white"
  }

  // An instance method.
  def add(m: Int, n: Int): Int = m + n
}

In [None]:
val calc = new Calculator("HP")
calc.color

# Object

### object -> method -> argument

In [2]:
object HelloWorld {
    def main(args: Array[String]): Unit = {
        println("Hello, world!")
    }
}

defined [32mobject [36mHelloWorld[0m

In [3]:
HelloWorld.main(null)

Hello, world!




In [None]:
//Type of object

### main method (runs first) or App (all methods run)

In [4]:
object HelloWorldApp extends App {
    println("Hi Max!")
}

defined [32mobject [36mHelloWorldApp[0m

In [5]:
HelloWorldApp.main(null)

Hi Max!




# Class

In [None]:
class Hello(message: String) {
    println(message) // this is the primary constructor
}
new Hello("Be curious!") //creates instance of the class
res0.toString //creates string representation of instance

In [None]:
// you can pass parameters to class, that can be used in instance
// types must be specified

In [None]:
// cannot be accessed from outside class instance, unless calles val:
class Hello(val message: String)
val hello = new Hello("Badabing")
hello.message

# Mutable and immutable fields (variables)

In [None]:
// Fields are accessible to outsiders of the class

In [None]:
class Hello {
    val message: String = "Hello" // immutable field
}
(new Hello).message

class Hello {
    var message: String = "Hello"
}
val hello = new Hello
hello.message = "Primabello!" // can change mutable field

### immutable

In [None]:
val NameOfPope : String = "Luigi" // Variable can't change/mutate -> *immutable*

- thread safe
- use as default

### mutable

In [None]:
var NameOfDuke : String = "Parcival" // Variable can change -> *mutable*

- useful
- require due diligance

<a id='exploration'></a>
[back to top](#top)

# Exploration

In [None]:
// summary
val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)
summary.mean
summary.variance
summary.numNonzeros
summary.normL1
summary.normL2

In [None]:
// correlation


#### Create random data

In [None]:
import org.apache.spark.mllib.random.randomRDDs._
val data = normalVectorRDD(sc, numRows=10000L, numCols=3, numPartitions=10)
val stats: MultivariateStatisticalSummary = Statistics.colStats(data)
stats.mean
stats.variance


<a id='testing'></a>
[back to top](#top)

# Testing

### Scalatest

In [None]:
//String test

class StackSpec extends FlatSpec {
  "A text" should "be the same text" in {
    val msg: String = "Muchas problemas"
    assert(msg === "Muchas problemas")
  }
}

<a id='mllib'></a>
[back to top](#top)

# MLlib

In [None]:
// Vectors:
import org.apache.spark.mllib.linalg.{Vector, Vectors} // scala has a vector itself
Vectors.dense(33.0, 0.0, 55.0) // dense vector
Vectors.sparse(3, Array(0, 3), Array(44.0, 55.0)) // sparce vector. first array is index
Vectors.sparse(3, Seq((0, 44.0), (2, 55.0)))

// Labeled Points:
// in classification must be 0,1. In multiclass must be 0,1,2,...
import org.apache.spark.mllib.regression.LabeledPoint
LabeledPoint(1.0, Vectors.dense(44.0, 0.0, 55.0))
LabeledPoint(0.0, Vectors.sparse(3, Array(0,2), Array(44.0, 55.0)))

// Matrices
// Dense Matrix
import org.apache.spark.mllib.linalg.{Matrix, Matrices}
Matrices.dense(3,2, Array(1,3,5,2,4,6)) 
// Sparse Matrix
val m = Matrices.sparse(5,4, Array(0,0,1,2,2), Array(1,3), Array(34,55)) // how does this work?
// Distributed Matrix
// - stored in RDDs
// - three types: RowMatrix, IndexedRowMatrix, CoordinateMatrix
// RowMatrix
RDD[Vector]

# NLP

In [None]:
https://www.npmjs.com/package/opennlp

# Examples

### Map keys to words

In [None]:
val lines = sc.textFile("")
// missing

### key-value pairs

In [None]:
val pair = ('a','b')
pair._1 //will return 'a'
pair._2 //will return 'b'

### Count words

In [None]:
val wordCounts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey((a,b) => a + b)

In [None]:
wordCounts.collect()