# Scala: First Dive

Spark applications are typically written in Scala (for performance) or Python (for easier integration with Python's rich ecosystem of data science tools).  
We now take a quick look at Scala in order to get a feel for the language before we start writing Spark applications in it.

### Data Modeling Primitives: The Sum Type

In [ ]:
class Department
case class Engineering() extends Department
case class Accounting()  extends Department
case class Marketing()   extends Department
case class PointOfSale() extends Department

### Data Modeling Primitives: The Product Type

In [ ]:
case class TentativeEmployee(firstName: String,
                             lastName:  String,
                             email:     String,
                             phoneExt:  Option[Int],
                             dept:      Department,
                             seniority: String)

In [ ]:
val brian = TentativeEmployee("Brian", "Jones", "bjones@xyz.com", Some(118), Accounting() , "midlevel")
val craig = TentativeEmployee("Craig", "Smith", "csmith@xyz.com", Some(237), Engineering(), "senior")
val john  = TentativeEmployee("John" , "Clark", "jclark@xyz.com", None     , PointOfSale(), "trainee")

Q: Something is wrong with this picture (from a data modeling perspective). **What's wrong?**

A: Conceptually, seniority is an enumeration (it has a predetermined set of possible values), therefore it should be also modeled as a sum type, just like the department. Let's fix that.

In [ ]:
class Seniority
case class Trainee()          extends Seniority
case class Midlevel()         extends Seniority
case class Senior()           extends Seniority
case class MiddleManagement() extends Seniority
case class Executive()        extends Seniority

In [ ]:
case class Employee (firstName: String,
                     lastName:  String,
                     email:     String,
                     phoneExt:  Option[Int],
                     dept:      Department,
                     seniority: Seniority)

In [ ]:
val brian = Employee( "Brian", "Jones", "bjones@xyz.com", Some(118), Accounting() , Midlevel() )
val craig = Employee("Craig", "Smith", "csmith@xyz.com" , Some(237), Engineering(), Senior()   )
val john  = Employee( "John" , "Clark", "jclark@xyz.com", None     , PointOfSale(), Trainee()  )

val allEmployees = List(brian, craig, john)

### Pattern Matching

We now answer the question which employees should be on call during regular working hours.

In [ ]:
def isOnCall(e: Employee): Boolean = e match {
  case Employee( _, _, _, Some(_), Engineering(), Midlevel() ) => true
  case Employee( _, _, _, Some(_), Engineering(), Senior()   ) => true
  case _                                                       => false
}

In [ ]:
isOnCall(brian)

In [ ]:
isOnCall(craig)

In [ ]:
isOnCall(john)

In [ ]:
val whoIsOnCall = allEmployees.filter( isOnCall(_) )

In [ ]:
whoIsOnCall.foreach( e => println(s"${e.firstName} ${e.lastName}") )

### Options

The **`Option`** type is a container that is either empty ( **`None`** ) or full ( **`Some(value)`** ).  
It provides a safe alternative to the use of null, and should be used instead of null whenever possible.  
Options are collections (of at most one item) and they are embellished with collection operations.  
Here's the idiomatic way to handle nulls in Scala.

In [ ]:
val a: String = "55"
val oa: Option[String] = Option(a)

In [ ]:
val b = null
val ob: Option[String] = Option(b)

In our application we want to convert the string value **`"55"`** to the integer value **`55`**.  
As well, we want to be safe in case our input value is null.  
By wrapping the input value in **`Option`** and using **`map`** we can achieve this uniformly without encountering **`NullPointerException`**.

In [ ]:
oa.map( _.toInt )

In [ ]:
ob.map( _.toInt )

### Exception Handling, The FP Way

In our application we need a function to compute the character count of some environment variable.  
Let's first create a naive function **`envVarCharCount`** to do that and run with it until we get ourselves into trouble.

In [ ]:
sys.env("PWD")

In [ ]:
sys.env("PWD").size

In [ ]:
def envVarCharCount(envVarName: String): Int = { sys.env(envVarName).size }

In [ ]:
envVarCharCount("PWD")

Trouble:

In [ ]:
envVarCharCount("ABC")

We can treat the exception above ( **`java.util.NoSuchElementException: key not found: ABC`** ) in an FP way like this:

In [ ]:
import scala.util.{Try, Success, Failure}

In [ ]:
Try( sys.env("PWD") )

In [ ]:
Try( sys.env("PWD") ).map( _.size )

In [ ]:
Try( sys.env("PWD") ).map( _.size ).toOption

In [ ]:
Try( sys.env("ABC") )

In [ ]:
Try( sys.env("ABC") ).map( _.size )

In [ ]:
Try( sys.env("ABC") ).map( _.size ).toOption

### Destructuring bindings

Destructuring value bindings are related to pattern matching; they use the same mechanism but are applicable when there is exactly one option.  
Destructuring bindings are particularly useful for tuples and case classes.

In [ ]:
def fn(i: Int): (Int, Int) = { (5, i) }

In [ ]:
fn(11)

In [ ]:
val (a, b) = fn(11)