# Functional Programming for Data Analysis

### Jim Pivarski

Fourth notebook: Scala

Scala is a functional programming language, like Haskell but not as strict, and it runs on the Java Virtual Machine (JVM).

This last point makes it harder to integrate into physics applications, but easier to integrate into business analytics, since most computing infrastructure in industry is based on Java, rather than C++.

Scala is also Spark's native tongue: Spark was written in Scala and provides Java, Python, and R interfaces as a convenience.

Python programming in Spark is not efficient. If you're going to be doing any architectural work in Spark, you should use Scala.

Scala also provides an example of type-safe functional programming, which provides better errors and safety when building large applications.

It also has pattern-matching, which my "functional playground" in Python lacks.

<img src="http://2.bp.blogspot.com/_r-NJO1NMiu4/TRA69XdCU8I/AAAAAAAAAnM/Re0VElAeLc4/s1600/ds_2_new.gif" style="margin-left: auto; margin-right: auto;">

In [6]:
var identifier = 'A'

def message(id: Char) =
    if (id.toByte > 'G'.toByte)
        "    <-- new node"
    else
        ""

object TreeList {
    def apply[T](values: T*): TreeList[T] = {
        val (value, children) = values.toList match {
            case Nil => throw new Exception("cannot be empty")
            case one :: Nil => (one, List())
            case first :: rest =>
                val (left, right) = rest.splitAt(rest.size / 2)
                (first, List(left, right).flatMap({
                    case Nil => List()
                    case x => List(TreeList(x: _*))
                }))
        }

        new TreeList(value, children)
    }
}
class TreeList[T](val value: T, val children: List[TreeList[T]]) {
    val id = identifier
    identifier = (identifier.toByte + 1).toChar

    def toString(indent: String): String = {
        val prefix = "\n%s%s: value %s%s".format(indent, id, value, message(id))
        val subtrees = children.map(_.toString(indent + "    "))
        (prefix :: subtrees).mkString
    }
    override def toString() = toString("")

    def size: Int = 1 + children.map(_.size).sum
    
    def toList: List[T] = value +: children.flatMap(_.toList)
    
    def get(index: Int): T = index match {
        case 0 => value
        case i if i - 1 < children.head.size => children.head.get(i - 1)
        case i => children.last.get(i - 1 - children.head.size)
    }
    
    def inserted(index: Int, newval: T): TreeList[T] = index match {
        case 0 =>
            new TreeList(value, List(new TreeList(newval, children)))
        case i if i - 1 < children.head.size =>
            new TreeList(value, children.head.inserted(i - 1, newval) :: children.tail)
        case i =>
            new TreeList(value, List(children.head, children.last.inserted(i - 1 - children.head.size, newval)))
    }
}

[36midentifier[39m: [32mChar[39m = [32m'A'[39m
defined [32mfunction[39m [36mmessage[39m
defined [32mobject[39m [36mTreeList[39m
defined [32mclass[39m [36mTreeList[39m

In [7]:
identifier = 'A'
val xs = TreeList(1, 2, 3, 4, 5, 6, 7)
xs.toList

[36mxs[39m: [32mTreeList[39m[[32mInt[39m] = 
G: value 1
    C: value 2
        A: value 3
        B: value 4
    F: value 5
        D: value 6
        E: value 7
[36mres6_2[39m: [32mList[39m[[32mInt[39m] = [33mList[39m([32m1[39m, [32m2[39m, [32m3[39m, [32m4[39m, [32m5[39m, [32m6[39m, [32m7[39m)

In [8]:
val ys = xs.inserted(5, 999)
ys.toList

[36mys[39m: [32mTreeList[39m[[32mInt[39m] = 
K: value 1    <-- new node
    C: value 2
        A: value 3
        B: value 4
    J: value 5    <-- new node
        I: value 6    <-- new node
            H: value 999    <-- new node
        E: value 7
[36mres7_1[39m: [32mList[39m[[32mInt[39m] = [33mList[39m([32m1[39m, [32m2[39m, [32m3[39m, [32m4[39m, [32m5[39m, [32m6[39m, [32m999[39m, [32m7[39m)