Big Data Machine Learning, Distributed Machine Learning, Parallel Machine Learning with C++, CUDA, Scala, Spark
=============

Ron Wu
-------------

10/26/16

Reference: free courses from the creators of
<br>NVIDIA https://developer.nvidia.com/udacity-cs344-intro-parallel-programming<br>
Scala http://www.scala-lang.org/blog/2016/05/23/scala-moocs-specialization-launched.html<br>
Spark, Databricks https://databricks.com/blog/2016/06/01/databricks-to-launch-first-of-five-free-big-data-courses-on-apache-spark.html <br> 
### Contents


1. <a href =#scala>Functional and Object-Oriented Scala</a>
     - <a href=#scalab>Functional Aspect of Scala</a>
     
         - <a href =#fun>First Class Objects</a>
         - <a href=#cl>Classes and the Three Pillars </a> 
         <br><br>
     - <a href = #ooscala>Object-Oriented Aspect of Scala: Data-Parallelism</a>  
     
         - <a href=#obj>Objects Everywhere</a> 
         - <a href =#collect>Collections</a>
         
     - <a href = #relu>Scala with Spark</a> 
<br><br>
2. <a href=#Spark>Big Data, Distributed Analysis with Spark</a>
     - <a href =#entropy>Spark with Time Series: spark-ts</a> 
     - <a href =#entropy>Spark with TensorFlow: TensorFrames</a> 
     - <a href =#entropy>Other Spark Ecosystem</a> 
<br><br>
3. <a href =#cuda>Parallel with CUDA, C++</a>
     - <a href=#worked>Parallel Computing</a> 
     - <a href=#worked>GPU Programming</a> 
<br><br>
 

<a name = 'scala'></a>
# Functional and Object-Oriented Scala

![](https://www.scientiamobile.com/images/icons/scala.gif)


<a name = 'scalab'></a>
## Functional Aspect of Scala

The codes in this section are mostly copied from Martin Odersky course on coursera https://www.coursera.org/learn/progfun1/

For easier transition to parallel programming, throughout this notebook I will try to use functional style over imperative style whenever possible. That is because in parallel programming variables are immutable. Once it is set, it cannot be changed, which avoids a lot deadlocks. 

If a block of code is written in loop fashion, the compiler will turn it into functions, the iteration and any change of variables inside the loop are passing through the function calls. Thus whether you write it in serial or recursion, it will end up in recursive call anyway.  

<a name='fun'></a>
### First Class Functions
 

In [1]:
//it is okay to do this, because def is call-by-name, only definition
def loop: Int = loop 

//Do not run, infinite loop, because val is call-by-value
//val l = loop
    
def square(x: Int) = x*x

def squareFirstEle(x: Int, y: => Int) = square(x)
// => makes y call-by-name, so passing in loop is okay
squareFirstEle(2,loop)

4

In [2]:
// Newton Method (steep descent or fix point) find square root

def sqrt(x: Double) = {
        
    def sqrtIter(guess: Double): Double =
        if (math.abs(guess*guess-x) / x < 0.01) guess
        else sqrtIter((guess + x / guess) / 2) 

    sqrtIter(1.0)
}

sqrt(4.5)

2.1224976448422046

In [3]:
//  High order function  

def sum_v1(f: Int => Int, a: Int, b: Int): Int = {
    if (a > b) 0
    else sum_v1(f, a+1, b) + f(a)
}


// find the sum of the squares from 1 to 5 
sum_v1(square, 1, 5)

55

In [4]:
// rectify above to tail-recursion, so will not build up the stack
    
def sum_v2(f: Int => Int, a: Int, b: Int): Int = {
    def loop(a: Int, acc: Int): Int = {
        if (a > b) acc
        else loop(a+1, acc + f(a))
    }
    loop(a, 0)
}

sum_v2(square, 1, 5)

55

In [5]:
// sum_v3 is a pure functional
    
def sum_v3(f: Int => Int): (Int, Int) => Int =  {
        def sumF(a: Int, b: Int): Int = { 
            if (a>b) 0
            else sumF(a+1,b) + f(a)
        }
        sumF
}

// passing in anonymous
sum_v3(x => x*x)(1, 5) 

55

<a name='cl'></a>
### Classes and the Three Pillars 

In [6]:
//
// Encapsulation
//

class Rational(x: Int, y: Int)  {
    require( y != 0 , "denominator not to be zero")

    //second constructor
    def this(x: Int) = this(x, 1)

    private def gcd(a:Int, b:Int) : Int =
    //Euler method
        if (b==0) a
        else gcd(b, a%b)
    private val g = math.abs(gcd(x, y))

    val numer = x / g
    val denom = y / g


    override def toString =
        if (y!=1) this.numer + "/" + this.denom
        else (this.numer).toString

    //normally + cannot be identifier, but in Scala is legal
    def + (that: Rational) =
        new Rational(this.numer*that.denom+this.denom*that.numer, this .denom*that.denom)

    def unary_- = new Rational(-numer, denom)

}

val r = new Rational(1,2)
println("r rational: " + r.toString)


val s = new Rational(3,4)

// this - is special it right next to the object, so use unary_-
println("negative of s rational: "+(-r).toString)

//this is same as (r.+(s)).toString
println("sum of rationals r & s: " + (r + s).toString)

val t = new Rational(2)
println("integer t: "+t.toString)

r rational: 1/2
negative of s rational: -1/2
sum of rationals r & s: 5/4
integer t: 2


In [7]:
//
//  inheritance
//

abstract class Node{
    def insert(x: Int) : Node
    def search(x: Int) : Boolean
}

//Java supports only single inheritance. For multiple, use with traits

//so the default constructor takes 3 arguments
class NonEmpty(elem: Int, left: Node, right: Node)  extends Node{


    def insert(x: Int) : Node = {
        if ( x < elem ) new NonEmpty(elem, left insert x , right)
        else if (x > elem) new NonEmpty(elem, left, right insert x)
        else this
    }

    def search(x: Int) : Boolean = {
        if ( x < elem ) left search x
        else if ( x > elem ) right search x
        else true
    }
    
    // because of override in Empty, override left toString will be called
    override def toString = "{" + left + elem + right + "}"
}    

class Empty extends Node{

    def insert(x: Int) : Node = new NonEmpty(x, new Empty, new Empty)
    def search(x: Int) : Boolean = false
    override def toString = "-"
}

val tree1 = new NonEmpty(3, new Empty, new  Empty)

// a copy of tree1 with 4 inserted
val tree2 = tree1 insert 4  

println(tree1.toString)
println(tree2.toString)

{-3-}
{-3{-4-}}


In [4]:
//
//  Polymorphism
//

import java.util.NoSuchElementException 

//type parameter
trait Node[T]{
    def isEmpty: Boolean
    def data: T
    def pt: Node[T]
}


//the default constructor takes 2 arguments
//this also means that 2 corresponding fields of the class are
//defined through pass in variables
class Cons[T](val data: T, val pt: Node[T]) extends Node[T]{
    def isEmpty = false
}

class Nil[T] extends Node[T]{
    def isEmpty = true
    def data : Nothing = throw new NoSuchElementException("Nil.data")
    def pt: Nothing = throw new NoSuchElementException("Nil.pt")
}

def singleton[T](elem:T) = new Cons[T](elem, new Nil[T])

println(singleton[Double](1.1).data)


// this will threw exception
// println(singleton[Double](1.1).pt.data)


//the type parameter is redundant 
val ll = new Cons(3, new Cons(2, new Cons(1, new Nil)))

println(ll.data)
println(ll.pt.data)
println(ll.pt.pt.data) 

1.1
3
2
1


<a name ='ooscala'></a> 
## Object-Oriented Aspect of Scala: Data-Parallelism


<a name ='obj'></a>
### Objects Everywhere

In [1]:
/*
    this cell can only run once, because it has static object
*/

// This shows unsigned Int, a primitive type, can be represented as class object
// This is the same idea in Number theory that one can construct every natural number
// starting from the empty set

abstract class Nat {
    def isZero: java.lang.Boolean
    def predecessor: Nat
    def successor: Nat
    def + (that: Nat): Nat
    def - (that: Nat): Nat 
    override def toString: String
}


class Succ(n: Nat) extends Nat{
    def isZero = false
    def predecessor = n
    def successor = new Succ(this)
    def +(that: Nat) = new Succ(n + that)
    def -(that: Nat) = if (that.isZero) this else n - that.predecessor

    override def toString: String = "I" + n
}

//hence zero is unique
object Zero extends Nat{
    def isZero = true
    def predecessor = throw new Error("0 has no pred")
    def successor = new Succ(this)
    def +(that: Nat) = that
    def -(that: Nat) = if (that.isZero) this else throw new Error("negative number")

    override def toString: String = ""
}


def NextNat = new Succ(Zero)
val one = NextNat
val two = one.successor
val three = two.successor
println(three)
println(three-one)

III
II


In [5]:
//
// Functions as objects too
//

// function take 1 argument, with type parameters A, B 
trait Function1[A,B]{
    def apply(x : A) : B
}


def f =  {
    // anonymous function 
    // (x: Int) => x*x
    // the name AnonFun only exists inside of the block
    class AnonFun extends Function1[Int, Int] {
        def apply(x: Int) = x*x
    }
    new AnonFun
}

f(2)

4

### Covariance, Invariance and Contravariance


In [5]:

/*

Let us denote type A is derived from type B as

    A < B 
    
If f is a type transformation, e.g. A -> Array[A], or work with generic A -> List<A>, 

Convariance 

    A < B => f(A) < f(B)

controvariance

    A < B => f(A) > f(B)
    
invariance

    Neither above.
    
In Java, Array[] is convariance, and generic is invariance, 

but one can turn it into co/controvariace by using the wildcards

That is
 
    B[] array_b = new A[3]; (in comparison ArrayList<B> list_b = new ArrayList<A>(); isnot allowed)

is allowed and elements of array_b are actually type A object. This could cause runtime errors

so in scala, array is not covariant.


The wildcards 
    
    ArrayList<? super A> list_b = new ArrayList<A>();
    ArrayList<? extends B> list_a = new ArrayList<B>();

In this way, the compile can check the type and will not cause the runtime error. E.g. 

Now the compile knows 
    
    elements of list_b are super class of A, hence any subclass of A including A itself 
can be safely casted into list_b. (write safe)

    elements of list_a are sub class of B, hence any method of any super class of B
including B itself can be safely invoked on elements of list_a (read safe)
    
*/

// The syntex in scala is as follows, by putting + / - otherwise everything is invariant
    


class Covariance_List[+T]{}
class Controvariace_List[-T]{} 

class B {}
class A extends B {} 

object main extends App{
    val list_B : Covariance_List[B] = new Covariance_List[A] 
    val list_A : Controvariace_List[A] = new Controvariace_List[B] 

}


<a name ='collect'></a>
## Collections

<a name='Spark'></a>
# Big Data, Distributed Analysis with Spark

![](https://databricks.com/wp-content/uploads/2016/06/spark-logo-trademark.png)

<a name ='cuda'></a>
# Parallel with CUDA, C++

![](http://images.anandtech.com/doci/6839/nvidia-cuda2.png)

https://github.com/apache/incubator-toree/blob/master/etc/examples/notebooks/magic-tutorial.ipynb