Big Data Machine Learning, Distributed Machine Learning, Parallel Machine Learning with C++, CUDA, Scala, Spark
=============

Ron Wu
-------------

10/26/16

Reference: free courses from the creators of
<br>NVIDIA https://developer.nvidia.com/udacity-cs344-intro-parallel-programming<br>
Scala http://www.scala-lang.org/blog/2016/05/23/scala-moocs-specialization-launched.html<br>
Spark, Databricks https://databricks.com/blog/2016/06/01/databricks-to-launch-first-of-five-free-big-data-courses-on-apache-spark.html <br> 
### Contents


1. <a href =#scala>Functional and Object-Oriented Scala</a>
     - <a href=#scalab>Functional Aspect of Scala</a>
         - <a href =#fun>First Class Function</a>
         - <a href=#cl>Classes</a> 
     - <a href = #relu>Object-Oriented Aspect of Scala: Data-Parallelism</a> 
     - <a href = #relu>Scala with Spark</a> 
<br><br>
2. <a href=#Spark>Big Data, Distributed Analysis with Spark</a>
     - <a href =#entropy>Spark with Time Series: spark-ts</a> 
     - <a href =#entropy>Spark with TensorFlow: TensorFrames</a> 
     - <a href =#entropy>Other Spark Ecosystem</a> 
<br><br>
3. <a href =#convnet>Parallel with CUDA, C++</a>
     - <a href=#worked>Parallel Computing</a> 
     - <a href=#worked>GPU Programming</a> 
<br><br>
 

<a name = 'scala'></a>
# Functional and Object-Oriented Scala

reference:
Martin Odersky, https://www.coursera.org/learn/progfun1/
http://docs.scala-lang.org/overviews/

<a name = 'scalab'></a>
## Functional Aspect of Scala

For easier transition to parallel programming, throughout this notebook I will try to use functional style over imperative style whenever possible. That is because in parallel programming variables are immutable. Once it is set, it cannot be changed, which avoids a lot deadlocks. 

If a block of code is written in loop fashion, the compiler will turn it into functions, the iteration and any change of variables inside the loop are passing through the function calls. Thus whether you write it in serial or recursion, it will end up in recursive call anyway.  

<a name='fun'></a>
### First Class Functions
 

In [1]:
%%file main.scala

object main extends App {
     
    //def loop: Int = loop 
    
    //Do not run, infinite loop
    //val x = loop
    
    def square(x: Int) = x*x
    

    def squareFirstEle(x: Int, y: => Int) = square(x)

    println("ignore the second argument: "+squareFirstEle(2,4))
    
    // because y argument => is call-by-name, make it a inline function, 
    // hence will not evaluate, so no infinite loops
    // without => it will evaluate y, which becomes an infinite loop
    
    //println(squareFirstEle(2, loop)) 
    
    // Newton Method (steep descent) find square root 

    def sqrt(x: Double) = {
        
        def sqrtIter(guess: Double): Double =
            if (math.abs(guess*guess-x) / x <0.01) guess
            else sqrtIter((guess + x / guess)  / 2) 
        
        sqrtIter(1.0)
    }

    println("square root: "+sqrt(4.5)) 
    
    
    //
    //
    //      High order function 
    //
    //
    
    def sum_v1(f: Int => Int, a: Int, b: Int): Int = 
        if (a > b) 0
        else sum_v1(f, a+1, b) + f(a)
    
    println("sum of squares from 1 to 5: " +sum_v1(square, 1, 5))
 
    // rectify it to tail-recursion, 
    // write ordinary function in functional form
    
    def sum_v2(f: Int => Int)(a: Int, b: Int): Int = {
        def loop(a: Int, acc: Int): Int = {
            if (a>b) acc
            else loop(a+1, acc + f(a))
        }
        loop(a, 0)
    }
    
    println("the 2nd way, sum of squares from 1 to 5: " +sum_v2(square)(1, 5))  

    // sum_v3 is a truly functional
    
    def sum_v3(f: Int => Int): (Int, Int) => Int =  {
        def sumF(a: Int, b: Int): Int = { 
            if (a>b) 0
            else sumF(a+1,b) + f(a)
        }
        sumF
    }
    
    println("the 3rd way, sum of squares from 1 to 5: " + sum_v3(square)(1, 5))  

    // passin anonymous  
    
    println("the 4th way, sum of squares from 1 to 5: " + sum_v3(x=>x*x)(1, 5))  
 
}

Overwriting main.scala


In [2]:
! scalac main.scala
! scala main

ignore the second argument: 4
square root: 2.1224976448422046
sum of squares from 1 to 5: 55
the 2nd way, sum of squares from 1 to 5: 55
the 3rd way, sum of squares from 1 to 5: 55
the 4th way, sum of squares from 1 to 5: 55


<a name='cl'></a>
### Classes

In [3]:
%%file main.scala
 

object main extends App {

    val r = new Rational(1,2)
    println("r rational: "+r.toString)


    val s = new Rational(3,4)
    
    // this - is special it right next to the object
    println("negative of s rational: "+(-r).toString)
    
    //this is same as (r.+(s)).toString
    println("sum of rationals r & s: "+(r + s).toString)

    val t = new Rational(2)
    println("integer t: "+t.toString)
    
}

class Rational(x: Int, y: Int)  {
    require( y!=0 , "denominator not to be zero")

    //second constructor
    def this(x: Int) = this(x, 1)

    private def gcd(a:Int, b:Int) : Int =
    //Euler method
        if (b==0) a
        else gcd(b, a%b)
    private val g = math.abs(gcd(x, y))

    val numer = x / g
    val denom = y / g


    override def toString =
        if (y!=1) this.numer + "/" + this.denom
        else (this.numer).toString

    //normally + cannot be identifier, but in Scala is legal
    def + (that: Rational) =
        new Rational(this.numer*that.denom+this.denom*that.numer, this .denom*that.denom)

    def unary_- = new Rational(-numer, denom)

}

Overwriting main.scala


In [4]:
! scalac main.scala
! scala main

r rational: 1/2
negative of s rational: -1/2
sum of rationals r & s: 5/4
integer t: 2


In [5]:
%%file main.scala
 

object main extends App {

    val tree1 = new NonEmpty(3, new Empty, new  Empty)
    val tree2 = tree1 insert 4  // a copy of tree1 with 4 inserted
    println(tree1.toString)
    println(tree2.toString)
}

abstract class Node{
    def insert(x: Int) : Node
    def search(x: Int) : Boolean
}

//single inheritance for multiple use with traits
class Empty extends Node{

    def insert(x: Int) : Node = new NonEmpty(x, new Empty, new Empty)
    def search(x: Int) : Boolean = false
    override def toString = "-"
}

//so the default constructor takes 3 arguments
class NonEmpty(elem: Int, left: Node, right: Node)  extends Node{


    def insert(x: Int) : Node =
        if ( x<elem ) new NonEmpty(elem, left insert x , right)
        else if (x>elem) new NonEmpty(elem, left, right insert x)
        else this

    def search(x: Int) : Boolean =
        if ( x<elem ) left search x
        else if ( x>elem ) right search x
        else true
    
    // because of override in Empty, override left toString will be called
    override def toString = "{" + left + elem + right + "}"
}    

Overwriting main.scala


In [6]:
! scalac main.scala
! scala main

{-3-}
{-3{-4-}}


In [7]:
%%file main.scala
 
//generic type

import java.util.NoSuchElementException

object main extends App {

    def singleton[T](elem:T) = new Cons[T](elem, new Nil[T])

    println(singleton[Double](1.1).data)
    
    // this will threw exception
    // println(singleton[Double](1.1).pt.data)
    
    
    //the type parameter is redundant 
    val list = new Cons(3, new Cons(2, new Cons(1, new Nil)))
    println(list.data)
    println(list.pt.data)
    println(list.pt.pt.data)

}

//type parameter
trait Node[T]{
    def isEmpty: Boolean
    def data: T
    def pt: Node[T]
}


//like before, the default constructor takes 2 arguments
//this also means that 2 corresponding fields of the class are
//defined through pass in variables
class Cons[T](val data: T, val pt: Node[T]) extends Node[T]{
    def isEmpty = false
}

class Nil[T] extends Node[T]{
    def isEmpty = true
    def data : Nothing = throw new NoSuchElementException("Nil.data")
    def pt: Nothing = throw new NoSuchElementException("Nil.pt")
}

Overwriting main.scala


In [8]:
! scalac main.scala
! scala main

1.1
3
2
1


<a name='Spark'></a>
# Big Data with Spark

- Apache Spark Components 
    - Spark SQL
    - Spark Streaming
    - MLlib
    - GraphX