ScaDaMaLe Course
[site](https://lamastex.github.io/scalable-data-science/sds/3/x/) and
[book](https://lamastex.github.io/ScaDaMaLe/index.html)

Introduction to Scala through Scala Notebook
--------------------------------------------

For convenience we use databricks Scala notebooks like this one here.

You can learn Scala locally on your own computer using Scala REPL (and
Spark using Spark-Shell).

Scala in your own computer
--------------------------

The most easy way to get Scala locally is through sbt, the Scala Build
Tool. You can also use an IDE that integrates sbt.

See: <https://docs.scala-lang.org/getting-started/index.html> to set up
Scala in your own computer.

**Software Engineering NOTE:** If you completed TASK 2 for **Cloud-free
Computing Environment** in the notebook prefixed `002_00` using
dockerCompose (optional exercise) then you will have Scala 2.11 with sbt
and Spark 2.4 inside the docker services you can start and stop locally.
Using docker volume binds you can also connect the docker container and
its services (including local zeppelin or jupyter notebook servers as
well as hadoop file system) to IDEs on your machine, etc.

------------------------------------------------------------------------

### Run a **Scala Cell**

-   Run the following scala cell.
-   Note: There is no need for any special indicator (such as `%md`)
    necessary to create a Scala cell in a Scala notebook.
-   You know it is a scala notebook because of the `(Scala)` appended to
    the name of this notebook.
-   Make sure the cell contents updates before moving on.
-   Press **Shift+Enter** when in the cell to run it and proceed to the
    next cell.
    -   The cells contents should update.
    -   Alternately, press **Ctrl+Enter** when in a cell to **run** it,
        but not proceed to the next cell.
-   characters following `//` are comments in scala. \*\*\*

In [None]:
1+1

In [None]:
println(System.currentTimeMillis) // press Ctrl+Enter to evaluate println that prints its argument as a line

  

Scala Resources
---------------

You will not be learning scala systematically and thoroughly in this
course. You will learn *to use* Scala by doing various Spark jobs.

If you are interested in learning scala properly, then there are various
resources, including:

-   [scala-lang.org](http://www.scala-lang.org/) is the **core Scala
    resource**. Bookmark the following three links:
    -   [tour-of-scala](https://docs.scala-lang.org/tour/tour-of-scala.html) -
        Bite-sized introductions to core language features.
        -   we will go through the tour in a hurry now as some Scala
            familiarity is needed immediately.
    -   [scala-book](https://docs.scala-lang.org/overviews/scala-book/introduction.html) -
        An online book introducing the main language features
        -   you are expected to use this resource to figure out Scala as
            needed.
    -   [scala-cheatsheet](https://docs.scala-lang.org/cheatsheets/index.html) -
        A handy cheatsheet covering the basics of Scala syntax.
    -   [visual-scala-reference](https://superruzafa.github.io/visual-scala-reference/) -
        This guide collects some of the most common functions of the
        Scala Programming Language and explain them conceptual and
        graphically in a simple way.
-   [Online Resources](https://docs.scala-lang.org/learn.html),
    including:
    -   [courseera: Functional Programming Principles in
        Scala](https://www.coursera.org/course/progfun)
-   [Books](http://www.scala-lang.org/documentation/books.html)
    -   [Programming in Scala, 1st Edition, Free Online
        Reading](http://www.artima.com/pins1ed/)

The main sources for the following content are (you are encouraged to
read them for more background):

-   [Martin Oderski's Scala by
    example](https://www.scala-lang.org/old/sites/default/files/linuxsoft_archives/docu/files/ScalaByExample.pdf)
-   [Scala crash course by Holden
    Karau](http://lintool.github.io/SparkTutorial/slides/day1_Scala_crash_course.pdf)
-   [Darren's brief introduction to scala and breeze for statistical
    computing](https://darrenjw.wordpress.com/2013/12/30/brief-introduction-to-scala-and-breeze-for-statistical-computing/)

Introduction to Scala
=====================

What is Scala?
--------------

"Scala smoothly integrates object-oriented and functional programming.
It is designed to express common programming patterns in a concise,
elegant, and type-safe way." by Matrin Odersky.

-   High-level language for the Java Virtual Machine (JVM)
-   Object oriented + functional programming
-   Statically typed
-   Comparable in speed to Java
-   Type inference saves us from having to write explicit types most of
    the time Interoperates with Java
-   Can use any Java class (inherit from, etc.)
-   Can be called from Java code

See a quick tour here:

-   <https://docs.scala-lang.org/tour/tour-of-scala.html>

Why Scala?
----------

-   Spark was originally written in Scala, which allows concise function
    syntax and interactive use
-   Spark APIs for other languages include:
    -   Java API for standalone use
    -   Python API added to reach a wider user community of programmes
    -   R API added more recently to reach a wider community of data
        analyststs
    -   Unfortunately, Python and R APIs are generally behind Spark's
        native Scala (for eg. GraphX is only available in Scala
        currently and datasets are only available in Scala as of
        20200918).
-   See Darren Wilkinson's 11 reasons for [scala as a platform for
    statistical computing and data
    science](https://darrenjw.wordpress.com/2013/12/23/scala-as-a-platform-for-statistical-computing-and-data-science/).
    It is embedded in-place below for your convenience.

In [None]:
//%run "/scalable-data-science/xtraResources/support/sdsFunctions"
//This allows easy embedding of publicly available information into any other notebook
//when viewing in git-book just ignore this block - you may have to manually chase the URL in frameIt("URL").
//Example usage:
// displayHTML(frameIt("https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation#Topics_in_LDA",250))
def frameIt( u:String, h:Int ) : String = {
      """<iframe 
 src=""""+ u+""""
 width="95%" height="""" + h + """"
 sandbox>
  <p>
    <a href="http://spark.apache.org/docs/latest/index.html">
      Fallback link for browsers that, unlikely, don't support frames
    </a>
  </p>
</iframe>"""
   }

In [None]:
displayHTML(frameIt("https://darrenjw.wordpress.com/2013/12/23/scala-as-a-platform-for-statistical-computing-and-data-science/",500))

  

Let's get our hands dirty in Scala
==================================

We will go through the **following** programming concepts and tasks by
building on <https://docs.scala-lang.org/tour/basics.html>.

-   **Scala Types**
-   **Expressions and Printing**
-   **Naming and Assignments**
-   **Functions and Methods in Scala**
-   **Classes and Case Classes**
-   **Methods and Tab-completion**
-   **Objects and Traits**
-   Collections in Scala and Type Hierarchy
-   Functional Programming and MapReduce
-   Lazy Evaluations and Recursions

**Remark**: You need to take a computer science course (from CourseEra,
for example) to properly learn Scala. Here, we will learn to use Scala
by example to accomplish our data science tasks at hand. You can learn
more Scala as needed from various sources pointed out above in **Scala
Resources**.

Scala Types
===========

In Scala, all values have a type, including numerical values and
functions. The diagram below illustrates a subset of the type hierarchy.

![](https://docs.scala-lang.org/resources/images/tour/unified-types-diagram.svg)

For now, notice some common types we will be usinf including `Int`,
`String`, `Double`, `Unit`, `Boolean`, `List`, etc. For more details see
<https://docs.scala-lang.org/tour/unified-types.html>. We will return to
this at the end of the notebook after seeing a brief tour of Scala now.

Expressions
===========

Expressions are computable statements such as the `1+1` we have seen
before.

In [None]:
1+1

  

We can print the output of a computed or evaluated expressions as a line
using `println`:

In [None]:
println(1+1) // printing 2

In [None]:
println("hej hej!") // printing a string

  

Naming and Assignments
----------------------

### value and variable as `val` and `var`

You can name the results of expressions using keywords `val` and `var`.

Let us assign the integer value `5` to `x` as follows:

In [None]:
val x : Int = 5 // <Ctrl+Enter> to declare a value x to be integer 5. 

  

`x` is a named result and it is a value since we used the keyword `val`
when naming it.

Scala is statically typed, but it uses built-in type inference machinery
to automatically figure out that `x` is an integer or `Int` type as
follows. Let's declare a value `x` to be `Int` 5 next without explictly
using `Int`.

In [None]:
val x = 5    // <Ctrl+Enter> to declare a value x as Int 5 (type automatically inferred)

  

Let's declare `x` as a `Double` or double-precision floating-point type
using decimal such as `5.0` (a digit has to follow the decimal point!)

In [None]:
val x = 5.0   // <Ctrl+Enter> to declare a value x as Double 5

  

Alternatively, we can assign `x` as a `Double` explicitly. Note that the
decimal point is not needed in this case due to explicit typing as
`Double`.

In [None]:
val x :  Double = 5    // <Ctrl+Enter> to declare a value x as Double 5 (type automatically inferred)

  

Next note that labels need to be declared on first use. We have declared
`x` to be a `val` which is short for *value*. This makes `x` immutable
(cannot be changed).

Thus, `x` cannot be just re-assigned, as the following code illustrates
in the resulting error: `... error: reassignment to val`.

In [None]:
//x = 10    //  uncomment and <Ctrl+Enter> to try to reassign val x to 10

  

Scala allows declaration of mutable variables as well using `var`, as
follows:

In [None]:
var y = 2    // <Shift+Enter> to declare a variable y to be integer 2 and go to next cell

In [None]:
y = 3    // <Shift+Enter> to change the value of y to 3

In [None]:
y = y+1 // adds 1 to y

In [None]:
y += 2 // adds 2 to y

In [None]:
println(y) // the var y is 6 now

  

Blocks
======

Just combine expressions by surrounding them with `{` and `}` called a
block.

In [None]:
println({
  val x = 1+1
  x+2 // expression in last line is returned for the block
})// prints 4

In [None]:
println({ val x=22; x+2})

  

Functions
=========

Functions are expressions that have parameters. A function takes
arguments as input and returns expressions as output.

A function can be nameless or *anonymous* and simply return an output
from a given input. For example, the following annymous function returns
the square of the input integer.

In [None]:
(x: Int) => x*x

  

On the left of `=>` is a list of parameters with name and type. On the
right is an expression involving the parameters.

You can also name functions:

In [None]:
val multiplyByItself = (x: Int) => x*x

In [None]:
println(multiplyByItself(10))

  

A function can have no parameters:

In [None]:
val howManyAmI = () => 1

In [None]:
println(howManyAmI()) // 1

  

A function can have more than one parameter:

In [None]:
val multiplyTheseTwoIntegers = (a: Int, b: Int) => a*b

In [None]:
println(multiplyTheseTwoIntegers(2,4)) // 8

  

Methods
=======

Methods are very similar to functions, but a few key differences exist.

Methods use the `def` keyword followed by a name, parameter list(s), a
return type, and a body.

In [None]:
def square(x: Int): Int = x*x    // <Shitf+Enter> to define a function named square

  

Note that the return type `Int` is specified after the parameter list
and a `:`.

In [None]:
square(5)    // <Shitf+Enter> to call this function on argument 5

In [None]:
val y = 3    // <Shitf+Enter> make val y as Int 3

In [None]:
square(y) // <Shitf+Enter> to call the function on val y of the right argument type Int

In [None]:
val x = 5.0     // let x be Double 5.0

In [None]:
//square(x) // <Shift+Enter> to call the function on val x of type Double will give type mismatch error

In [None]:
def square(x: Int): Int = { // <Shitf+Enter> to declare function in a block
  val answer = x*x
  answer // the last line of the function block is returned
}

In [None]:
square(5000)    // <Shift+Enter> to call the function

In [None]:
// <Shift+Enter> to define function with input and output type as String
def announceAndEmit(text: String): String = 
{
  println(text)
  text // the last line of the function block is returned
}

  

Scala has a `return` keyword but it is rarely used as the expression in
the last line of the multi-line block is the method's return value.

In [None]:
// <Ctrl+Enter> to call function which prints as line and returns as String
announceAndEmit("roger  roger")

  

A method can have output expressions involving multiple parameter lists:

In [None]:
def multiplyAndTranslate(x: Int, y: Int)(translateBy: Int): Int = (x * y) + translateBy

In [None]:
println(multiplyAndTranslate(2, 3)(4))  // (2*3)+4 = 10

  

A method can have no parameter lists at all:

In [None]:
def time: Long = System.currentTimeMillis

In [None]:
println("Current time in milliseconds is " + time)

In [None]:
println("Current time in milliseconds is " + time)

  

Classes
=======

The `class` keyword followed by the name and constructor parameters is
used to define a class.

In [None]:
class Box(h: Int, w: Int, d: Int) {
  def printVolume(): Unit = println(h*w*d)
}

  

-   The return type of the method `printVolume` is `Unit`.
-   When the return type is `Unit` it indicates that there is nothing
    meaningful to return, similar to `void` in Java and C, but with a
    difference.
-   Because every Scala expression must have some value, there is
    actually a singleton value of type `Unit`, written `()` and carrying
    no information.

We can make an instance of the class with the `new` keyword.

In [None]:
val my1Cube = new Box(1,1,1)

  

And call the method on the instance.

In [None]:
my1Cube.printVolume() // 1

  

Our named instance `my1Cube` of the `Box` class is immutable due to
`val`.

You can have mutable instances of the class using `var`.

In [None]:
var myVaryingCuboid = new Box(1,3,2)

In [None]:
myVaryingCuboid.printVolume()

In [None]:
myVaryingCuboid = new Box(1,1,1)

In [None]:
myVaryingCuboid.printVolume()

  

See <https://docs.scala-lang.org/tour/classes.html> for more details as
needed.

Case Classes
============

Scala has a special type of class called a *case class* that can be
defined with the `case class` keyword.

Unlike classes, whose instances are compared by reference, instances of
case classes are immutable by default and compared by value. This makes
them useful for defining rows of typed values in Spark.

In [None]:
case class Point(x: Int, y: Int, z: Int)

  

Case classes can be instantiated without the `new` keyword.

In [None]:
val point = Point(1, 2, 3)
val anotherPoint = Point(1, 2, 3)
val yetAnotherPoint = Point(2, 2, 2)

  

Instances of case classes are compared by value and not by reference.

In [None]:
if (point == anotherPoint) {
  println(point + " and " + anotherPoint + " are the same.")
} else {
  println(point + " and " + anotherPoint + " are different.")
} // Point(1,2,3) and Point(1,2,3) are the same.

if (point == yetAnotherPoint) {
  println(point + " and " + yetAnotherPoint + " are the same.")
} else {
  println(point + " and " + yetAnotherPoint + " are different.")
} // Point(1,2,3) and Point(2,2,2) are different.


  

By contrast, instances of classes are compared by reference.

In [None]:
myVaryingCuboid.printVolume() // should be 1 x 1 x 1

In [None]:
my1Cube.printVolume()  // should be 1 x 1 x 1

In [None]:
if (myVaryingCuboid == my1Cube) {
  println("myVaryingCuboid and my1Cube are the same.")
} else {
  println("myVaryingCuboid and my1Cube are different.")
} // they are compared by reference and are not the same.

  

More about case classes here:
<https://docs.scala-lang.org/tour/case-classes.html>.

Methods and Tab-completion
--------------------------

Many methods of a class can be accessed by `.`.

In [None]:
val s  = "hi"    // <Ctrl+Enter> to declare val s to String "hi"

  

You can place the cursor after `.` following a declared object and find
out the methods available for it as shown in the image below.

![tabCompletionAfterSDot PNG
image](https://github.com/raazesh-sainudiin/scalable-data-science/raw/master/images/week1/tabCompletionAfterSDot.png)

**You Try** doing this next.

In [None]:
//s.  // place cursor after the '.' and press Tab to see all available methods for s 

  

For example,

-   scroll down to `contains` and double-click on it.  
-   This should lead to `s.contains` in your cell.
-   Now add an argument String to see if `s` contains the argument, for
    example, try:
    -   `s.contains("f")`
    -   `s.contains("")` and
    -   `s.contains("i")`

In [None]:
//s    // <Shift-Enter> recall the value of String s

In [None]:
s.contains("f")     // <Shift-Enter> returns Boolean false since s does not contain the string "f"

In [None]:
s.contains("")    // <Shift-Enter> returns Boolean true since s contains the empty string ""

In [None]:
s.contains("i")    // <Ctrl+Enter> returns Boolean true since s contains the string "i"

  

Objects
=======

Objects are single instances of their own definitions using the `object`
keyword. You can think of them as singletons of their own classes.

In [None]:
object IdGenerator {
  private var currentId = 0
  def make(): Int = {
    currentId += 1
    currentId
  }
}

  

You can access an object through its name:

In [None]:
val newId: Int = IdGenerator.make()
val newerId: Int = IdGenerator.make()

In [None]:
println(newId) // 1
println(newerId) // 2

  

For details see
<https://docs.scala-lang.org/tour/singleton-objects.html>

Traits
======

Traits are abstract data types containing certain fields and methods.
They can be defined using the `trait` keyword.

In Scala inheritance, a class can only extend one other class, but it
can extend multiple traits.

In [None]:
trait Greeter {
  def greet(name: String): Unit
}

  

Traits can have default implementations also.

In [None]:
trait Greeter {
  def greet(name: String): Unit =
    println("Hello, " + name + "!")
}


  

You can extend traits with the `extends` keyword and override an
implementation with the `override` keyword:

In [None]:
class DefaultGreeter extends Greeter

class SwedishGreeter extends Greeter {
  override def greet(name: String): Unit = {
    println("Hej hej, " + name + "!")
  }
}

class CustomizableGreeter(prefix: String, postfix: String) extends Greeter {
  override def greet(name: String): Unit = {
    println(prefix + name + postfix)
  }
}

  

Instantiate the classes.

In [None]:
val greeter = new DefaultGreeter()
val swedishGreeter = new SwedishGreeter()
val customGreeter = new CustomizableGreeter("How are you, ", "?")

  

Call the `greet` method in each case.

In [None]:
greeter.greet("Scala developer") // Hello, Scala developer!
swedishGreeter.greet("Scala developer") // Hej hej, Scala developer!
customGreeter.greet("Scala developer") // How are you, Scala developer?

  

A class can also be made to extend multiple traits.

For more details see: <https://docs.scala-lang.org/tour/traits.html>.

Main Method
===========

The main method is the entry point of a Scala program.

The Java Virtual Machine requires a main method, named `main`, that
takes an array of strings as its only argument.

Using an object, you can define the main method as follows:

In [None]:
object Main {
  def main(args: Array[String]): Unit =
    println("Hello, Scala developer!")
}

  

What I try not do while learning a new language?
================================================

1.  I don't immediately try to ask questions like:

-   "how can I do this particular variation of some small thing I just
    learned so I can use patterns I am used to from another language I
    am hooked-on right now?" immediately
-   first go through the detailed Scala Tour on your own and then
    through the 50 odd lessons in the Scala Book
-   then return to 1. and ask detailed cross-language comparison
    questions by diving deep as needed with the source and scala docs as
    needed (google search!).