# Scala for data engineers

Welcome to scala for data engineers!

This course aims to be an introductions for data engineers or software engineers to the Scala programming language. Having some basic programming skills is required. Having Java experience makes things a lot easier because Scala compiles for the Java Virtual Machine aka JVM. Scala compiles into javascript and LLVM (like C) but its outside of the scope of this workshop.
The main goal of this course is to set the foundations to use scala in big data projects like Apache Spark and Akka Streams using a "professional" software engineer practices and cloud services. Think it like "how the big companies solve big data problems".

# Finding the right tools for the job

Scala is the work of XXX XXXX computer scientist that works in the research of languages that embraces multiples paradims into and uniforma experience.
As many other programmers, i came to scala seeking for a better tool to make UDF in Apache Spark (thank to the weird implementation of scala 2.0 in java). After a couples of minutes playing in the spark-shell i make it one of my dearest frinds.
The big charm of scala (imho) are 2 things: the java ecosystem (library for everything!) and the functional language capabilities.
Working in a functional way make a lot easier to think about distributed systems and realiability. The natural separation of the IO operations from computing, make things alot easier to test and debug. Only remember 1 time that i need the debugger in 6 years using scala (and was for oop code).


## Learning functional programming pays the bills!

Offen java software rewritted in scala shrinks being between 5 to 7 times

# Most basic paradigm: Turing machine and assemble language.

![turing machine](turing-machine.jpg)

[Mike Davey turing machine](https://spectrum.ieee.org/032610-diy-turing-machine)

# Higher level programming and procedural paradigm.
- variables and constants
- control structs
- loops
- arrays
- procedure (functions)

In [1]:
// Variables and constants

val age = 10 // this is a constant Int

var name = "Onka" // this is a variable String
name += " perrito" // we mutate things!

val nameLength = name.length // everything is and object!

val tailSize: Double = 0.3 // we can specify the types

In [2]:
// Control struct

// c-like if
var tail2: String = null
if(tailSize >= 0.5) {
    tail2 = "short"
} else {
    tail2 = "long"
}
// if is an expression, not and statement.
// Scala doesnt care about identation. {} and () are optional (almost) everywhere
val tail3 = if(tailSize >= 0.5) "short"
    else if(tailSize >= 0.8)
        "regular" else "long"

In [3]:
// loops

var i = 1
while(i < 3){
    println(s"while: $i")
    i += 1
}

i = 1
do {
    println(s"do while: $i")
    i += 1
} while(i < 1)

for (i <- (0 until 3)){
    println(s"for-until: $i")
}

for (i <- (1 to 3)){
    println(s"for-to: $i")
}


while: 1
while: 2
do while: 1
for-until: 0
for-until: 1
for-until: 2
for-to: 1
for-to: 2
for-to: 3


In [4]:
// Array

val dogs: Array[String] = Array("onka", "panda", "quimera")

for(dog <- dogs){
    println(dog)
}

onka
panda
quimera


[36mdogs[39m: [32mArray[39m[[32mString[39m] = [33mArray[39m([32m"onka"[39m, [32m"panda"[39m, [32m"quimera"[39m)

In [5]:
dogs.map(_ + " perrito").foreach(println)

onka perrito
panda perrito
quimera perrito


In [6]:
println(dogs.head)
println(dogs(1))
println(dogs(2))

onka
panda
quimera


In [7]:
//procedure

def testFn(v: String): Int = v.length
testFn("sweet")

defined [32mfunction[39m [36mtestFn[39m
[36mres6_1[39m: [32mInt[39m = [32m5[39m

# Object oriented paradigm.
- classes and objects.
- polymorfism
- Generic types

In [8]:
//classes and objects
class Dog(name: String, age: Int){
    def bark() = {
        println(s"Guau! my name is $name")
    }
}

val onkita = new Dog("onka", 10) // custom object
onkita.bark

Guau! my name is onka


defined [32mclass[39m [36mDog[39m
[36monkita[39m: [32mDog[39m = ammonite.$sess.cmd7$Helper$Dog@ffb1f40

In [9]:
trait Jump {
    def jump(v: Double): Unit
}

class GrassHooper extends Jump {
    def jump(v: Double) = {
        println(s"jumping $v")
    }
}

class Cat extends Jump {
    def jump(v: Double) = v match {
        case 0 => println("ok")
        case _ => println("cats cant jump")
    }
}

val jumpers = List(
    new Cat,
    new GrassHooper
)

List(0,1).foreach { v =>
    jumpers.foreach { j =>
        print(j.getClass.getSimpleName + ": ")
        j.jump(v)
    }
}

Cat: ok
GrassHooper: jumping 0.0
Cat: cats cant jump
GrassHooper: jumping 1.0


defined [32mtrait[39m [36mJump[39m
defined [32mclass[39m [36mGrassHooper[39m
defined [32mclass[39m [36mCat[39m
[36mjumpers[39m: [32mList[39m[[32mJump[39m] = [33mList[39m(
  ammonite.$sess.cmd8$Helper$Cat@50eeedfc,
  ammonite.$sess.cmd8$Helper$GrassHooper@401f501b
)

# Moore law, frecuency and multi-core processor units.
- Mutables states and concurrent programming.
- There's a better tool?
![FP vs OOP](fp-vs-oop.jpg)

# Functional programming
- Lambda calculus in the "alonso church machine".
- Limitations: only constants and pure functions.