Big Data Machine Learning, Distributed Machine Learning, Parallel Machine Learning with C++, CUDA, Scala, Spark
=============

Ron Wu
-------------

10/28/16

Reference: free courses from the creators of
<br>NVIDIA https://developer.nvidia.com/udacity-cs344-intro-parallel-programming<br>
Scala http://www.scala-lang.org/blog/2016/05/23/scala-moocs-specialization-launched.html<br>
Spark, Databricks https://databricks.com/blog/2016/06/01/databricks-to-launch-first-of-five-free-big-data-courses-on-apache-spark.html <br> 
### Contents


1. <a href =#scala>Functional and Object-Oriented Scala</a>
     - <a href=#scalab>Scala Foundation</a>
     
         - <a href =#fun>First Class Objects</a>
         - <a href=#cl>Classes and the Three Pillars</a>  
         - <a href=#obj>Objects Everywhere</a> 
         - <a href =#collect>Collections</a> 
     - <a href = #funDe>Functional Design in Scala</a> 
     
         - <a href=#for>For Expression</a>
         - <a href=#stream>Stream & Lazy Evaluation</a>
         - <a href=#state>Mutable State</a>
         - <a href=#event>Event Handling</a>
         
     - <a href = #pppro>Parallel Programming in Scala</a> 
     - <a href = #scalaspark>Scala with Spark</a>  
<br><br>
2. <a href=#Spark>Big Data, Distributed Analysis with Spark</a>
     - <a href =#entropy>Spark with Time Series: spark-ts</a> 
     - <a href =#entropy>Spark with TensorFlow: TensorFrames</a> 
     - <a href =#entropy>Other Spark Ecosystem</a> 
<br><br>
3. <a href =#cuda>Parallel with CUDA, C++</a>
     - <a href=#worked>Parallel Computing</a> 
     - <a href=#worked>GPU Programming</a> 
<br><br>
 

<a name = 'scala'></a>
# Functional and Object-Oriented Scala

![](https://www.scientiamobile.com/images/icons/scala.gif)


<a name = 'scalab'></a>
## Scala Foundation


The codes in this section are mostly copied from Martin Odersky course on coursera https://www.coursera.org/learn/progfun1/

For easier transition to parallel programming, throughout this notebook I will try to use functional style over imperative style whenever possible. That is because in parallel programming variables are immutable. Once it is set, it cannot be changed, which avoids a lot deadlocks. 

If a block of code is written in loop fashion, the compiler will turn it into functions, the iteration and any change of variables inside the loop are passing through the function calls. Thus whether you write it in serial or recursion, it will end up in recursive call anyway.  

<a name='fun'></a>
### First Class Functions
 

In [1]:
//it is okay to do this, because def is call-by-name, only definition
def loop: Int = loop 

//Do not run, infinite loop, because val is call-by-value
//val l = loop
    
def square(x: Int) = x*x

def squareFirstEle(x: Int, y: => Int) = square(x)
// => makes y call-by-name, so passing in loop is okay
squareFirstEle(2,loop)

4

In [2]:
// Newton Method (steep descent or fix point) find square root

def sqrt(x: Double) = {
        
    def sqrtIter(guess: Double): Double =
        if (math.abs(guess*guess-x) / x < 0.01) guess
        else sqrtIter((guess + x / guess) / 2) 

    sqrtIter(1.0)
}

sqrt(4.5)

2.1224976448422046

In [3]:
//  High order function  

def sum_v1(f: Int => Int, a: Int, b: Int): Int = {
    if (a > b) 0
    else sum_v1(f, a+1, b) + f(a)
}


// find the sum of the squares from 1 to 5 
sum_v1(square, 1, 5)

55

In [4]:
// rectify above to tail-recursion, so will not build up the stack
    
def sum_v2(f: Int => Int, a: Int, b: Int): Int = {
    def loop(a: Int, acc: Int): Int = {
        if (a > b) acc
        else loop(a+1, acc + f(a))
    }
    loop(a, 0)
}

sum_v2(square, 1, 5)

55

In [5]:
// sum_v3 is a pure functional
    
def sum_v3(f: Int => Int): (Int, Int) => Int =  {
        def sumF(a: Int, b: Int): Int = { 
            if (a>b) 0
            else sumF(a+1,b) + f(a)
        }
        sumF
}

// passing in anonymous
sum_v3(x => x*x)(1, 5) 

55

<a name='cl'></a>
### Classes and the Three Pillars 

In [6]:
//
// Encapsulation
//

class Rational(x: Int, y: Int)  {
    require( y != 0 , "denominator not to be zero")

    //second constructor
    def this(x: Int) = this(x, 1)

    private def gcd(a:Int, b:Int) : Int =
    //Euler method
        if (b==0) a
        else gcd(b, a%b)
    private val g = math.abs(gcd(x, y))

    val numer = x / g
    val denom = y / g


    override def toString =
        if (y!=1) this.numer + "/" + this.denom
        else (this.numer).toString

    //normally + cannot be identifier, but in Scala is legal
    def + (that: Rational) =
        new Rational(this.numer*that.denom+this.denom*that.numer, this .denom*that.denom)

    def unary_- = new Rational(-numer, denom)

}

val r = new Rational(1,2)
println("r rational: " + r.toString)


val s = new Rational(3,4)

// this - is special it right next to the object, so use unary_-
println("negative of s rational: "+(-r).toString)

//this is same as (r.+(s)).toString
println("sum of rationals r & s: " + (r + s).toString)

val t = new Rational(2)
println("integer t: "+t.toString)

r rational: 1/2
negative of s rational: -1/2
sum of rationals r & s: 5/4
integer t: 2


In [7]:
//
//  inheritance
//

abstract class Node{
    def insert(x: Int) : Node
    def search(x: Int) : Boolean
}

//Java supports only single inheritance. For multiple, use with traits

//so the default constructor takes 3 arguments
class NonEmpty(elem: Int, left: Node, right: Node)  extends Node{


    def insert(x: Int) : Node = {
        if ( x < elem ) new NonEmpty(elem, left insert x , right)
        else if (x > elem) new NonEmpty(elem, left, right insert x)
        else this
    }

    def search(x: Int) : Boolean = {
        if ( x < elem ) left search x
        else if ( x > elem ) right search x
        else true
    }
    
    // because of override in Empty, override left toString will be called
    override def toString = "{" + left + elem + right + "}"
}    

class Empty extends Node{

    def insert(x: Int) : Node = new NonEmpty(x, new Empty, new Empty)
    def search(x: Int) : Boolean = false
    override def toString = "-"
}

val tree1 = new NonEmpty(3, new Empty, new  Empty)

// a copy of tree1 with 4 inserted
val tree2 = tree1 insert 4  

println(tree1.toString)
println(tree2.toString)

{-3-}
{-3{-4-}}


In [4]:
//
//  Polymorphism
//

import java.util.NoSuchElementException 

//type parameter
trait Node[T]{
    def isEmpty: Boolean
    def data: T
    def pt: Node[T]
}


//the default constructor takes 2 arguments
//this also means that 2 corresponding fields of the class are
//defined through pass in variables
class Cons[T](val data: T, val pt: Node[T]) extends Node[T]{
    def isEmpty = false
}

class Nil[T] extends Node[T]{
    def isEmpty = true
    def data : Nothing = throw new NoSuchElementException("Nil.data")
    def pt: Nothing = throw new NoSuchElementException("Nil.pt")
}

def singleton[T](elem:T) = new Cons[T](elem, new Nil[T])

println(singleton[Double](1.1).data)


// this will threw exception
// println(singleton[Double](1.1).pt.data)


//the type parameter is redundant 
val ll = new Cons(3, new Cons(2, new Cons(1, new Nil)))

println(ll.data)
println(ll.pt.data)
println(ll.pt.pt.data) 

1.1
3
2
1


<a name ='ooscala'></a> 
## Object-Oriented Aspect of Scala


<a name ='obj'></a>
### Objects Everywhere

In [1]:
/*
    this cell can only run once, because it has static object
*/

// This shows unsigned Int, a primitive type, can be represented as class object
// This is the same idea in Number theory that one can construct every natural number
// starting from the empty set

abstract class Nat {
    def isZero: java.lang.Boolean
    def predecessor: Nat
    def successor: Nat
    def + (that: Nat): Nat
    def - (that: Nat): Nat 
    override def toString: String
}


class Succ(n: Nat) extends Nat{
    def isZero = false
    def predecessor = n
    def successor = new Succ(this)
    def +(that: Nat) = new Succ(n + that)
    def -(that: Nat) = if (that.isZero) this else n - that.predecessor

    override def toString: String = "I" + n
}

//hence zero is unique
object Zero extends Nat{
    def isZero = true
    def predecessor = throw new Error("0 has no pred")
    def successor = new Succ(this)
    def +(that: Nat) = that
    def -(that: Nat) = if (that.isZero) this else throw new Error("negative number")

    override def toString: String = ""
}


def NextNat = new Succ(Zero)
val one = NextNat
val two = one.successor
val three = two.successor
println(three)
println(three-one)

III
II


In [5]:
//
// Functions as objects too
//

// function take 1 argument, with type parameters A, B 
trait Function1[A,B]{
    def apply(x : A) : B
}


def f =  {
    // anonymous function 
    // (x: Int) => x*x
    // the name AnonFun only exists inside of the block
    class AnonFun extends Function1[Int, Int] {
        def apply(x: Int) = x*x
    }
    new AnonFun
}

f(2)

4

#### Covariance, Invariance and Contravariance


In [1]:

/*

Let us denote type A is derived from type B as

    A < B 
    
If f is a type transformation, e.g. A -> Array[A], or work with generic A -> List<A>, 

Convariance 

    A < B => f(A) < f(B)

controvariance

    A < B => f(A) > f(B)
    
invariance

    Neither above.
    
In Java, Array[] is convariance, and generic is invariance, 

but in scala, both are not covariant.

but just like in Java, one can turn them into co/controvariace by using the wildcards


That is
 
    B[] array_b = new A[1]; 

is allowed and ArrayList<B> list_b = new ArrayList<A>(); is not allowed.

and elements of array_b are actually upcasted type A object. This could cause runtime errors.


That is

        
    B b = new A();  // both are fine
    
    b = new B();
        
    

But if you put them in array

    B[] array_b = new A[1];  
    
    array_b[0] = new B();     // this passes the compiler but at runtime throw ArrayStoreException
    
    //what one should do is
    
    array_b[0] = new A(); 
    
    //or
    
    array_b[0] = (B)new A(); 


The wildcards make compiler aware the problems
    
    ArrayList<? extends B> list_b = new ArrayList<A>();
    
    ArrayList<? super A> list_a = new ArrayList<B>();
    

In this way, the compiler can check the type and will not pass the check. Because

now the compiler knows 
    

    elements of list_b are sub class of B, hence any method of any sub class of B
    
including B itself can be safely invoked on elements of list_b


    elements of list_a are super class of A, hence any super class of A including A itself 
    
can be safely casted into list_a. 


so unlike before

    list_b.add(new B()); // will not pass the compiler. In fact the compiler won't allow to add anything
    
    // the legitimate way is to copy
    
    // first create something
    ArrayList<A> someList = new ArrayList<A>();
    someList.add(new A());
        
    // then
    ArrayList<? extends B> list_b = new ArrayList<A>(someList);
    list_b.get(0) //this is actually object A, not the upcasted A

    //For the super list
    ArrayList<? super A> list_a = new ArrayList<B>();
    list_a.add(new A());
    
    // or
    
    
*/

// In Scala  putting + / - , otherwise it is invariant
    

class Covariance_List[+T]{}
class Controvariace_List[-T]{} 

class B {}
class A extends B {} 

object main extends App{
    val list_B : Covariance_List[B] = new Covariance_List[A] 
    val list_A : Controvariace_List[A] = new Controvariace_List[B]  
}


<a name ='collect'></a>
## Collections

Lists

In [27]:
//
// List, Immutable
//

val list1 = List(1,2,3)
val list2 = List(4,5,6) 
println(list1.length)
println(list1.take(2))
println(list1(0))

//same as list1 ++ list2, 
println(list1 ::: list2)

//:: is the append method in List1, with three ::: another : reverse the operands
println(list1.::( list2) )

3
List(1, 2)
1
List(1, 2, 3, 4, 5, 6)
List(List(4, 5, 6), 1, 2, 3)


In [28]:
//
// impletment concatenation

def concat[T](xs: List[T], ys: List[T]): List[T] = xs match {
    case List() => ys
    case z :: zs => z :: concat(zs, ys)    
}

concat(list1, list2)

List(1, 2, 3, 4, 5, 6)

In [29]:
// implement removeAt

def removeAt[T](n: Int, x: List[T]) = {
    (x take n) ::: (x drop n+1)
}

removeAt(1, list1)

List(1, 3)

In [40]:
// insertion sort


def isort(xs: List[Int]): List[Int] = xs match {
    case List() => List()
    case y :: ys => {
        def insert(x1: Int, xs1: List[Int]): List[Int] = xs1 match {
            case List() => List(x1)
            case y1 :: ys1 => if (x1 < y1) x1 :: xs1 else y1 :: insert(x1, ys1)
        }
        insert(y, isort(ys))
    }
}

isort(List(5, 30, 10, 1, 15))

List(1, 5, 10, 15, 30)

In [46]:
//
// merge sort
//
 
def msort(xs: List[Int]) : List[Int] = {

    val n = xs.length / 2
    if (n==0) xs
    else {
        def merge(xs: List[Int], ys: List[Int]): List[Int] = (xs, ys) match {
            case (Nil, ys) => ys
            case (xs, Nil) => xs
            case (x :: xs1, y:: ys1) => { 
                
                if (x < y) x :: merge(xs1, ys)
                else y :: merge(xs, ys1)
            }
             
        } 
        val (fst , snd ) = xs splitAt n
        merge(msort(fst), msort(snd))
    }
}

msort(List(5, 30, 10, 1, 15))



List(1, 5, 10, 15, 30)

In [49]:
// merge sort with type parameter, user-defined comparison


def msort[T](xs: List[T])(lt: (T,T) => Boolean) : List[T] = {

        val n = xs.length / 2
        if (n==0) xs
        else {
            def merge(xs: List[T], ys: List[T]): List[T] = (xs, ys) match {
                case (Nil, y :: ys1) => ys
                case (x :: xs1, Nil) => xs
                case (x :: xs1, y:: ys1) => {

                    if (lt(x,y)) x :: merge(xs1, ys)
                    else y :: merge(xs, ys1)
                }

            }

            val (fst , snd ) = xs splitAt n
            merge(msort(fst)(lt), msort(snd)(lt))
        }
    }


msort(List(5, 30, 10, 1, 15))((x, y) => x < y)

List(1, 5, 10, 15, 30)

In [9]:
// merge sort with type parameter, default ordering comparison
// adding implicit and ask compiler to fill in the missing part

import math.Ordering 

def msort[T](xs: List[T])(implicit ord: Ordering[T]) : List[T] = {

    val n = xs.length / 2
    if (n==0) xs
    else {
        def merge(xs: List[T], ys: List[T]) : List[T] = (xs, ys) match {
            case (Nil, y :: ys1) => ys
            case (x :: xs1, Nil) => xs
            case (x :: xs1, y:: ys1) => {

                if (ord.lt(x,y)) x :: merge(xs1, ys)
                else y :: merge(xs, ys1)
            }

        }

        val (fst , snd ) = xs splitAt n
        merge(msort(fst), msort(snd)) 
        //merge(msort(fst)(ord), msort(snd)(ord)) 
    }
}


//without implicit, we need to
//msort(List(5, 30, 10, 1, 15))(Ordering.Int) 

msort(List(5, 30, 10, 1, 15))

List(1, 5, 10, 15, 30)

In [20]:
//  functions on list, using map, filter ...
// using them wisely is important, because we don't want to write any loops.


def rescale(xs: List[Int], factor :Int) = {
    xs map (x => x * factor)   //Scala map is tail-recursive
}

val l = List(1, 3, 5, 2, 4)
println(rescale(l, 5))

// or

println(l map (x => x * 3)) 

println(l filter (x => x < 3))
println(l filterNot (x => x < 3))
println(l partition (x => x <3))  //partition around filter and fitlerNot
  

println(l takeWhile (x => x< 3))  // take the prefix of the list up the criteria 
println(l dropWhile (x => x< 3)) 
println(l span (x => x< 3))       // combine takeWhile and dropWhile

List(5, 15, 25, 10, 20)
List(3, 9, 15, 6, 12)
List(1, 2)
List(3, 5, 4)
(List(1, 2),List(3, 5, 4))
List(1)
List(3, 5, 2, 4)
(List(1),List(3, 5, 2, 4))


In [24]:
// implement encode. No loops

def encode [T](xs: List[T]) = {

        def pack(ls: List[T]) : List[List[T]] =  ls match{

            case List() => List()
            case x :: xs1 => {
                val (fst, scn) = ls span (y => y == x)
                fst :: pack(scn)
            }
        }
        pack(xs) map ( x => (x.head, x.length))
    }

 

val data = List("a", "a", "a", "b", "b", "c", "a")

encode(data)

List((a,3), (b,2), (c,1), (a,1))

In [31]:
// reduceLeft applies operation to the left of the element 
// so below is the same as, but it will be tail-recursive, using foldLeft
//      
//     (( 1 + 2 ) + 3 ) +4
//
List(1,2,3,4).reduceLeft( _ + _ ) //  _ + _  is the same as (x, y) => x + y 

10

In [32]:
// reduceRight applies operation to the right of the element 
// so below is the same as
//      
//    1 + ( 2 + ( 3 + 4 ) )
//
List(1,2,3,4).reduceRight( _ + _ )

10

In [84]:
// foldLeft

val z = 0

println((List(1,2,3,4) foldLeft z )( _ + _ )) 

//tail-recursive, using z as initial accumulator, but the value of z doesn't change

println(z)

10
0


In [65]:

// List has sorted 

val fruit = List("apple","pineapple", "pear", "banana")

println(fruit sortWith(_.length < _.length))
println(fruit sorted)

List(pear, apple, banana, pineapple)
List(apple, banana, pear, pineapple)


In [None]:
// iterable
//   sequence
//     List -> linked list
//     Vectors -> very shallow tree, node has 32 children 
//     array
//     string
//     range
//   set 
//   map

In [26]:
// example dot product

val v1 = Vector(1,2,3,4,5)
val v2 = Vector(1,2,3,4,5)

def dotProd(v1: Vector[Int], v2: Vector[Int]) = {
    (v1 zip v2).map(xy => xy._1 * xy._2).sum 
}

dotProd(v1, v2)

55

In [29]:
// range is seq

def isPrime(n: Int)  = (2 until n) forall (d => n%d != 0)

isPrime(19)

true

In [33]:

// double loops in scala

((1 until 4) map (i => (1 until 4 ) map (j => (i,j)))).flatten

Vector((1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3))

In [34]:
// or 

(1 until 4) flatMap (i => (1 until 4 ) map (j => (i,j)))

Vector((1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3))

In [35]:
// simplier way is to use For

// For

// find pairs summing to a prime
 
for {                 //use curly bracket save commas
    i <- 1 until 4    //generator
    j <- 1 until i    
    if isPrime(i+j)   //filter i, j 
} yield (i,j)

Vector((2,1), (3,2))

In [42]:
// same as above, but much convoluted

                                        //  the case is pattern matching, 
                                        //   the full expression is  filter (e=>e match { case ...
(1 until 4) flatMap (i => (1 until i ) map (j => (i,j))) filter ( {case (x,y)=> isPrime(x+y)} )


Vector((2,1), (3,2))

In [43]:
// rewrit dot product using For

val v1 = Vector(1,2,3,4,5)
val v2 = Vector(1,2,3,4,5)

def dotProd(v1: Vector[Int], v2: Vector[Int]) = {
    (for ( (i,j) <- (v1 zip v2)) yield i*j).sum
}

dotProd(v1, v2)

55

In [54]:
 
//
// Set not seq, unorder, no duplicate
//

// 8-queens

// the following codes are copied from Martin Odersky course
// Compare it to my code http://nbviewer.jupyter.org/github/ronnnwu/codetheInterviewExercises/blob/master/Ch8.ipynb
// problem 8.12 

// Scala immutable List makes recursion very natural and the high-order list function, 
// for iterator really helped too

// In python I had to make copies and write loops over loops. Scala is faster too.

def queens(n: Int) : Set[List[Int]] ={ 
    
    //place queen at row k
    def placeQueens(k: Int) : Set[List[Int]] = {
        if (k==0) Set(List())
        else {
            for {
                queens <- placeQueens(k-1)
                col <- 0 until n
                if isSafe(col, queens)

            } yield col :: queens  
            //output e.g. List(1,5,3) means queens at row 0 col 3, row 1 col 5, row 2 col 1 
            //hence as k increases it appends queens to the front
        }
    }
    
    def isSafe(col: Int, queens: List[Int]): Boolean = {
        val row  = queens.length
        // decode queens row and col and rows are in reverse order
        val queensWithRow = (row - 1 to 0 by -1) zip queens
        queensWithRow forall {
            case (r,c) => col != c && math.abs(col-c) != row - r // check no col used, no diagonal match
        }
    }
    
    placeQueens(n)
}

 
println(queens(8).size) 

def prettyShow(queens: List[Int]) = {
    val lines = {
        for (col <- queens.reverse)
            yield Vector.fill(queens.length)("* ").updated(col, "X ").mkString

    }
    print("\n\n" + (lines mkString "\n"))
}

queens(8) take 3 map prettyShow

92


* * * * * X * * 
* * * X * * * * 
* X * * * * * * 
* * * * * * * X 
* * * * X * * * 
* * * * * * X * 
X * * * * * * * 
* * X * * * * * 

* * * * X * * * 
* * * * * * X * 
* X * * * * * * 
* * * X * * * * 
* * * * * * * X 
X * * * * * * * 
* * X * * * * * 
* * * * * X * * 

* * * * * X * * 
* * X * * * * * 
* * * * * * X * 
* * * X * * * * 
X * * * * * * * 
* * * * * * * X 
* X * * * * * * 
* * * * X * * * 

Set(())

In [74]:
//
// Map
//

val Capital = Map("US"->"DC", "France"->"Paris")

println(Capital get "US")

println(Capital + ("US"->"NYC")) 

// + map will replace value which is same as

println(Capital ++ Map("US"->"NYC")) 

Some(DC)
Map(US -> NYC, France -> Paris)
Map(US -> NYC, France -> Paris)


In [58]:

// List has groupBy
// returns a Map

val fruit = List("apple","pineapple", "orange", "banana")

fruit groupBy(_.length)
 

Map(5 -> List(apple), 9 -> List(pineapple), 6 -> List(orange, banana))

In [85]:
// example polynomial class

                              // deg, coeff
class Poly (val terms0: Map[Int, Double])  {
    
    val terms = terms0 withDefaultValue 0.0  //this allows terms(deg) in addTerm to return 0
    
    def + (other: Poly) = new Poly((other.terms foldLeft terms)(addTerm))
    
    def addTerm(terms: Map[Int,Double], term: (Int, Double)) = {
        val (deg, coeff) = term
        terms + (deg -> (coeff + terms(deg))) //use terms as initial accumulator and update its value
    }
    override def toString =
        (for ( (deg, ceof)<- terms.toList.sorted.reverse )
            yield ceof + " * x ^ " + deg) mkString(" + ")
}

val p1 = new Poly(Map(1->1.1, 2->2.2))
val p2 = new Poly(Map(1->1.1, 3->3.3))

p1 + p2

3.3 * x ^ 3 + 2.2 * x ^ 2 + 2.2 * x ^ 1

In [90]:
// an improved version

class Poly (val terms0: Map[Int, Double])  {
    
    // another constructor will take a squence of (Int,Double) unspecified size
    // and pass to the default constructor
    def this(bindings: (Int, Double)*) = this(bindings.toMap)
    
    val terms = terms0 withDefaultValue 0.0

    def + (other: Poly) = new Poly((other.terms foldLeft terms)(addTerm))
    def addTerm(terms: Map[Int,Double], term: (Int, Double)) = {
        val (deg, coeff) = term
        terms + (deg -> (coeff + terms(deg)))
    }
    override def toString =
        (for ( (deg, ceof)<- terms.toList.sorted.reverse )
            yield ceof + "*x^" + deg) mkString("+")
}

val p1 = new Poly(1->1.1, 2->2.2)
val p2 = new Poly(1->1.1, 3->3.3)

p1 + p2
 

3.3*x^3+2.2*x^2+2.2*x^1

In [130]:
//example of translating a sequence of number to a sentence

import scala.io.Source

val in = Source.fromURL("http://lamp.epfl.ch/files/content/sites/lamp/files/teaching/progfun/linuxwords.txt")

//get common English words, drop hyphenated words, symbols etc
val words = in.getLines.toList filter (word => word forall (chr => chr.isLetter)) 
println(words.length)   //  45382 words

// telephone keypad
val mnem = Map('2' -> "ABC", '3' -> "DEF", '4' -> "GHI",'5' -> "JKL",
    '6' -> "MNO", '7' -> "PQRS", '8' -> "TUV", '9' -> "WXYZ")

// reverse Map of mnem
val charCode: Map[Char, Char] =  for ((digit, str) <- mnem; ltr <- str) yield (ltr -> digit)

// the groupBy will return a map from numbers to all 45382 words
val wordsForNum: Map[String, List[String]] = {
    
    def wordCode(word: String) =  word.toUpperCase map charCode
    words groupBy wordCode withDefaultValue List()  //not found default is an empty List()
}

def encode(number: String): Set[List[String]] = {
    if (number.isEmpty) Set(List())
    else {
        for {
            split <- 1 to number.length
            word <- wordsForNum(number take split)  
            // if wordsForNum return an empty List, the iterator is Nil 
            // so that that iteration is skipped
            
            rest <- encode(number drop split) //the iterator is an element of the set hence a List
        } yield word :: rest // a string of a word concat to a list
    }.toSet
}


def translate(number: String): Set[String] =
    encode(number) map ( _ mkString " " )

translate("7225247386")

45382


Set(sack air fun, pack ah re to, pack bird to, Scala ire to, Scala is fun, rack ah re to, pack air fun, sack bird to, rack bird to, sack ah re to, rack air fun)

<a name = 'funDe'></a>
## Functional Design in Scala

Codes in this section are mostly copied from Martin Odersky course on coursera https://www.coursera.org/learn/progfun2/


<a name='for'></a>
### For Expression

In [1]:
// Scala with JSON

/*
    run only once because it contains object
*/

abstract class JSON
case class JSeq(elem: List[JSON]) extends JSON
case class JObj(bingdings: Map[String, JSON]) extends JSON
case class JNum(num: Int) extends JSON
case class JStr(str: String) extends JSON
case class JBool(b: Boolean) extends JSON
case object JNull extends JSON

val data = JObj(Map(
    "firstName" -> JStr("John"),
    "lastName"->JStr("Smith"),
    "address"->JObj(Map(
        "streeAddress"->JStr("21 2nd Street"),
        "state"->JStr("NY"),
        "postslCode"->JNum(10021)
    )),
    "phoneNumer"->JSeq(List(
        JObj(Map(
            "type"->JStr("home"),
            "number"->JStr("212 344 1345")
        )),
        JObj(Map(
            "type"->JStr("cell"),
            "number"->JStr("212 542 1345")
        ))
    ))
))

def prettyJSON(json: JSON) : String = json match {
    case JSeq(elem) =>
        "[" + (elem map prettyJSON mkString ", ") + "]"
    case JObj(bindings) =>
        val assoc = bindings map {
            case (key, value) => "\"" + key + "\": " + prettyJSON(value)
        }
        "{" + (assoc mkString ", ") + "}"
    case JNum(num) => num.toString
    case JStr(str) => "\"" + str + "\""
    case JBool(b) => b.toString
    case JNull => "Null"
}

prettyJSON(data)


{"firstName": "John", "lastName": "Smith", "address": {"streeAddress": "21 2nd Street", "state": "NY", "postslCode": 10021}, "phoneNumer": [{"type": "home", "number": "212 344 1345"}, {"type": "cell", "number": "212 542 1345"}]}

In [9]:
// use For as select for JSON
// same idea as Linq in C#

val data_collect = List(data) // suppose we have more than one piece of data

    for {
        JObj(bindings) <- data_collect
        JSeq(phones) = bindings("phoneNumer")
        JObj(phone) <- phones
        JStr(digits) = phone("number")
        if digits contains "344"
    } yield (bindings("firstName"), bindings("lastName"), digits)


List((JStr(John),JStr(Smith),212 344 1345))

In [1]:
// For for NoSql

case class Book(title: String, author: String)

val books: Set[Book] =  Set(  

    //using set will avoid not only duplicated items, but also duplicated selections
    
    Book( "Harry Potter and the Deathly Hallows", "J. K. Rowling"),
    Book( "Harry Potter and the Philosopher's Stone", "J. K. Rowling"), 
    Book( "Harry Potter And The Cursed Child", "J. K. Rowling"),  
    Book( "Angels & Demons", "Dan Brown"),
    Book( "The Audacity of Hope", "Barack Obama")
)


for {
    b1 <- books
    b2 <- books
    if b1.title < b2.title
    if b1.author == b2.author  //author has more than 1 book
} yield  b1.author


Set(J. K. Rowling)

In [5]:
// For as random generator

trait Generator[+T] {
    self =>
    def generate: T

    //handmade map to trick For-expression 
    def map[S](f: T => S): Generator[S] = new Generator[S] {
        def generate = f(self.generate)
    }

    def flatMap[S](f: T => Generator[S]): Generator[S] = new Generator[S] {
        def generate = f(self.generate).generate
    }
}

val integers = new Generator[Int] {
    val ran = new java.util.Random
    def generate = ran.nextInt()
}
println(integers.generate)

//for other types, .nextBoolean(), 
//val booleans = new Generator[ Boolean] {
//    def generate = integers.generate > 0
//}

// or
val booleans = for( x <- integers) yield (x > 0)

//that is because
//
// for ( x <- e1 ) yield e2
//
// is actually
//
//     e1.map(x => e2)
//
// which translates to
//
//     val booleans = integers.map(_>0)
//
println(booleans.generate)


def pair[U,V](u: Generator[U], v: Generator[V]) = for {
    x <- u
    y <- v
} yield (x,y)

//that is because
//
// for {
//      x <- e1
//      y <- e2
// } yield e3
//
// is actually
//
//     e1.flatMap(x => ( e2.map( y => e3)) )
//
// which translates to
//
//     def pair[U,V](u: Generator[U], v: Generator[V]) = for {
//             u flatMap (x => v map ( y => (x,y) )
//
println(pair(booleans, booleans).generate)

-899983625
false
(false,false)


In [26]:
// get random number from 1 upto 9

def choose(lo: Int, hi: Int) : Generator[Int] =
        for (x <- integers) yield lo + (math.abs(x)%(hi - lo))

choose(1, 10).generate

5

In [38]:
// return a random element

def oneOf[T](xs: T*): Generator[T] ={
        for (idx <- choose (0, xs.length)) yield xs(idx)
    }

oneOf("a","b","c","d","e","f","g").generate

c

In [105]:
// random list of arbitrary length

def lists: Generator[List[Int]] = {
    
    def single[T] (x: T)  = new Generator[T] {
        def generate = x
    }

    def nonEmptyList = for {
        head <- choose(1, 11)
        tail <- lists
    } yield head :: tail
    
    for {
        isEmpty <- choose(1, 11)
        list <- if (isEmpty >= 10) single(Nil) else nonEmptyList
        //need to put Nil in wrap because the generate call will be called on it
    } yield list

}

lists.generate

List(8, 3, 4, 8, 2, 3, 3, 10, 4, 5, 3)

<a name='stream'></a>
### Stream & Lazy Evaluation

In [108]:
//
// Stream same as List, but only create them fully when someone evaluates it

(1 to 100) 

Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)

In [109]:
(1 to 100).toStream 

Stream(1, ?)

In [115]:
// lazy means store results from last evaluation.
// let us compare call-by-name(def), call-by-value(val), and lazy

def expr = {
    def x = { print("x"); 1 }  // this is evaluated when it is called
    
    lazy val y = { print("y"); 1 } //this will not be evaluted until it is called, 
                                   //After it is called, it will not be called again, the value is stored

    val z = { print("z") ; 1 }  // this is evaluated when the program goes through this line 
                                // and the value is stored

    x + x + y + y + z + z   //this is evaluated from left to right
}

expr

zxxy

6

In [119]:
//with Stream one can play with an infinite list

//this creates an infinite Stream
def from(n: Int) : Stream[Int] = n #:: from(n+1)

// #:: is for stream concat
def sieve(s: Stream[Int]): Stream[Int] =
    s.head #:: sieve(s.tail filter ( _ % s.head != 0)) //filter out any multiple of primes

val primes = sieve(from(2))

//this returns the first 100 primes
primes.take(100).toList  //Stream take method will only creates list up to the first 100th primes


List(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541)

In [129]:
// for convergence problem, we no longer have to worry if the sequence is infinite or not

def sqrtStream(x : Double) : Stream[Double] = {
    def improve (guess: Double) = (guess + x / guess) / 2
    lazy val guesses: Stream[Double] = 1 #:: (guesses map improve) 
    guesses
}
def isGoodEnough(x: Double, n: Double)={
    math.abs(x*x-n) < 0.0001
}
sqrtStream(4).filter(x=>isGoodEnough(x,4)).take(1).toList

List(2.0000000929222947)

In [9]:
//
// N Water jugs problems 
//
//
class Pouring(capacity: Vector[Int]){  //capacity is the max amount of water each glass can take 
    
    val glasses = 0 until capacity.length //label glasses 0 to n-1
 
    type State = Vector[Int] 
    val initialState = capacity map ( x => 0)  //create a new Vector with all zeros
   
    //total possbile moves
    val moves =
        (for (g <- glasses) yield Empty(g)) ++
        (for (g <- glasses) yield Fill(g)) ++
        (for (from <- glasses; to <- glasses if from != to) yield Pour(from, to))

 
    trait Move {
        def change(state: State) : State //state: amount of water in each glass
    }
    case class Empty(glass: Int) extends Move{
        def change(state: State)  =  state updated (glass, 0)
    }
    case class Fill(glass: Int) extends Move{
        def change(state: State)  =  state updated (glass, capacity(glass))
    }
    case class Pour(from: Int, to : Int) extends Move{
        def change(state: State)  = {
            val amount = state(from) min (capacity(to) - state(to))
            state updated(from, state(from) - amount) updated(to, state(to) + amount)
        }
    }

      
    class Path(history: List[Move], val endState: State){
        //def endState: State = trackState(history)
        //private def trackState(xs: List[Move]) : State = xs match {
        //    case Nil => initialState
        //    case move :: xs1 => move change trackState(xs1)
        //}

        //accomplished the same as above
        //def endState: State = (history foldRight initialState) ( _ change _ )

        //or avoid recursive compute endState, update the endState when new moves are added
        //new moves are added in front
        def extend(move: Move) = new Path(move::history, move change endState)
        
        override def toString = (history.reverse mkString " " ) + " --> " + endState
    }

    val initialPath = new Path(Nil, initialState)

    def from(paths: Set[Path], explored: Set[State]): Stream[Set[Path]] =
        if (paths.isEmpty) Stream.empty
        else {
            val more = for {
                path <- paths
                next <- moves map path.extend  //this will add the every possible moves to all pathes
                if !(explored contains next.endState) //if the final state has been reached, it will skip
            } yield next
            paths #:: from(more, explored ++ (more map (_.endState))) //generating pathes, an infinite list of set
        }                                                             

    val pathSets = from(Set(initialPath), Set(initialState)) 

    def solution (target: Int) : Stream[Path] =
        for {
            pathSet <- pathSets   //and thanks Stream in from method, it will only create path when it needs to
            path <- pathSet
            if path.endState contains target
        } yield path
}


// any number of glasses.
// specify the desired amount to be in any one of the glass
val problem = new Pouring(Vector(4, 5, 9))
problem.solution(7).take(1).toList

List(Fill(0) Pour(0,1) Fill(0) Pour(0,1) Pour(0,2) Pour(1,0) Pour(0,2) --> Vector(0, 1, 7))

<a name='state'></a>
### Mutable State

<a name='event'></a>
### Event Handling

<a name = 'pppro'></a>
## Parallel Programming in Scala

Codes in this section are mostly copied from Viktor Kuncak, Aleksandar Prokopec course on coursera https://www.coursera.org/learn/parprog1/

<a name='Spark'></a>
# Big Data, Distributed Analysis with Spark

![](https://databricks.com/wp-content/uploads/2016/06/spark-logo-trademark.png)

<a name ='cuda'></a>
# Parallel with CUDA, C++

![](http://images.anandtech.com/doci/6839/nvidia-cuda2.png)

https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb