# Fundamental concepts of programming for humanists (Scala)

## Program flow
Programming is the act of giving a series of instructions to the computer. Upon running the program, the computer then follows these instructions in sequence. Typically (but not always), each line in a program is a single instruction. 

The box below contains two instructions. Run them by selecting the box (by clicking on it) and pressing ctrl-enter (or selecting the play button from the menu above). 

In [6]:
println("Hello")
println("programming")

Hello
programming


Notice how two lines are printed. You can freely change the text inside the quotation marks to change what is printed. You can even add new print commands on additional lines. Try it!

Naturally, a complete program will often be a lot more complex than this, but can often still be thought of as a sequence. For example, a program could: 
 1. read in the complete texts of an author
 1. create a sheet counting how many times each distinct word appears in the texts (e.g. the: 50, world: 10, is: 30, suffering: 5)
 1. read in the complete texts of another author
 1. make a similar sheet of the words appearing therein
 1. compare the counts between the two authors to produce two tables of:
    1. the words appearing with most similar frequencies in both author's works
    1. the words that appear with most dissimilar frequencies in the author's works

## Variables

As said, often in programs, you are doing something based on some data, and not just single items either, but e.g. the complete texts of a thousand books. And, you're going to do the same thing to multiple texts. And, the process is going to have multiple steps, with different representations for the data. Thus, you can't just copy and paste the text of those books inside print statements.

Instead, you need a way for the program to refer to that data in a symbolic manner. For this, named variables can be defined. In Scala, the first time a variable appears, it needs to be defined using `var`. Otherwise, you put stuff in variables by `=`, and retrieve it just by giving the variable name.

In [1]:
var name = "Eetu"
name = "Bruce Wayne"
println("Hello " + name + ".")
println("Welcome to programming "+name+".")

Hello Bruce Wayne.
Welcome to programming Bruce Wayne.


[36mname[39m: [32mString[39m = [32m"Bruce Wayne"[39m

Here, we're storing in the variable just a simple piece of text. But in practice, variables can store much more interesting things, too (such as the complete works by an author as read from a text file, or the words counts derived thereof).

In the above, also note how whitespace works. Inside program code itself, it doesn't matter if you have `a+b` or `a + b` or even `a +b`. On the other hand, inside the `"` marks whitespace is important, because there it is data, not code. See below:

In [2]:
println("Hello     "+name+"     .")
println("Hello"+     name+     ".")
println("Hello"   +  name   +  ".")

Hello     Bruce Wayne     .
HelloBruce Wayne.
HelloBruce Wayne.


## Operators
One way to act with values is by joining them with operators. The `+` in the above is a concatenation operator, joining together multiple strings (or the contents of string variables). A lot of the basic operators come from arithmetic and are mainly defined for numeric values, e.g. `+, -, /, *`. These also follow the precendence rules from basic math. Try them:

In [14]:
println( 1+5 )
println( 1+5/2 )
println( (1+5)/2 )

6
3
3


A second common class of operators are comparison operators, e.g.: `==, !=, >, <, >=, <=`. 

In [15]:
println("Is 1<5? "+( 1<5))
println("Is 1>5? "+( 1>5 ))
println("Is 1==5? "+( 1==5 ))
println("Is 1!=5? "+( 1!=5 ))
println("Is a<b? "+( "a"<"b" ))
println("Is a==a? "+( "a"=="a" ))
println("Is a!=a? "+( "a"!="a" ))


Is 1<5? true
Is 1>5? false
Is 1==5? false
Is 1!=5? true
Is a<b? true
Is a==a? true
Is a!=a? false


Notice how the above comparisons yielded truth values? Those are useful for:

## Control flow

A computer program isn't really just a sequence of commands. It can also contain control flow statements that affect how the computer proceeds through the program. These are where the above mentioned comparison operators most often are used. Try changing the name variable by changing the assignment in the cell concerning variables above (and executing that cell), and see what happens when you after that execute the cell below.

In [3]:
if (name=="John")
    println("Hello Johnny")
else if (name=="Bruce Wayne")
    println("Hello Batman")
else
    println("Hello "+name)

Hello Batman


That control flow construct was the `if` construct. Some languages, such as Scala (and also R, actually), also have special syntax for if statements with very many options:

In [4]:
name match {
    case "John" => println("Hello Johnny")
    case "Bruce Wayne" => println("Hello Batman")
    case anyname => println("Hello "+anyname)
}

Hello Batman


Other important control flow constructs are `while` and its specialization `for`. They're used for doing stuff repeatedly (for example, to do something to all words in a sentence, etc.)

In [5]:
var count = 1
println("Starting. Count: " + count)
while (count <= 3) {
  println("In while, because count ("+count+") just tested was below or equal to 3.")
  count=count+1
  println("In while, about to test with new count: "+count)
}
println("Done. Count ("+count+") was not be below or equal to 3.")

Starting. Count: 1
In while, because count (1) just tested was below or equal to 3.
In while, about to test with new count: 2
In while, because count (2) just tested was below or equal to 3.
In while, about to test with new count: 3
In while, because count (3) just tested was below or equal to 3.
In while, about to test with new count: 4
Done. Count (4) was not be below or equal to 3.


[36mcount[39m: [32mInt[39m = [32m4[39m

What the above does is: set the variable `count` to 1. Then, as long as `count` remains under 4, set `count` to `count + 1` and print information on it. 

Note that both the adding of 1 to `count` and the prints are included inside the while. That's because of the curly braces (`{}`). Try moving the last print statement current in the braces outside of it and re-running the cell. How did the output change? (Note that if you move `count=count+1`, the while will run forever)

Some languages, such as Scala again, also have a `do while` construct, which does the thing at least once. (Compare by starting with `var count = 4` above)

In [6]:
var count = 1
println("Starting. Count: " + count)
do {
  println("In do while, count: "+count)
  count=count+1
  println("In do while, about to test with new count: "+count)
} while (count <= 3)
println("Done. Count ("+count+") was not be below or equal to 3.")

Starting. Count: 1
In do while, count: 1
In do while, about to test with new count: 2
In do while, count: 2
In do while, about to test with new count: 3
In do while, count: 3
In do while, about to test with new count: 4
Done. Count (4) was not be below or equal to 3.


[36mcount[39m: [32mInt[39m = [32m4[39m

Which you use will depend on your particular need. 

Below, the same repetition is done with the specialized for construct (repeating something for all values in a collection is such a frequent operation, it makes sense that most languages have a specialized construct for it)


In [28]:
println("Starting.")
for (item <- Seq(1,2,3))
  println("For: "+item)
println("Done.")

Starting.
For: 1
For: 2
For: 3
Done.


## Variable types

Notice, that in the above, Scala automatically converted the number `1` to the string `"1"`. However, this conversion only works one way, from numbers to strings. But! While in R this results in an error (because R doesn't support `+` for strings), Scala just happily concatenates the two `"10"` strings yielding `"1010"`.

In [30]:
println(10+10)
println("10"+10)

20
1010


Fortunately, values of different types can often be converted to each other, as in the following:

In [57]:
var i = 1
println("Type of i: "+i.getClass)
var j = "1"
println("Type of j: "+j.getClass)
println("Is i equal to j?: "+(i==j))
println("Is i.toString equal to j?: "+(i.toString==j))
println("Is i equal to j.toInt?: "+(i==j.toInt))

Type of i: int
Type of j: class java.lang.String
Is i equal to j?: false
Is i.toString equal to j?: true
Is i equal to j.toInt?: true


[36mi[39m: [32mInt[39m = [32m1[39m
[36mj[39m: [32mString[39m = [32m"1"[39m

Just don't try converting anything that's not a number to one:

In [35]:
// The computer understands how to convert the string 11 into a number
"11".toInt
// Yet the string eleven isn't a number to the computer, even though it is to us.
"eleven".toInt
// These lines by the way are comments. They're ways of documenting what a program does to humans reading it. They're not processed by the computer

: 

An important thing about Scala here is that it is actually a strongly typed language. What that means is that the language interpreter makes sure a single variable can only contain values of a single type (this is useful for detecting programming errors early)

Above, the Scala interpreter identified the variable types based on the values they were set to. You can also set these explicitly:

In [58]:
var i: String = "1"
var j: Int = 1

[36mi[39m: [32mString[39m = [32m"1"[39m
[36mj[39m: [32mInt[39m = [32m1[39m

In a strongly typed language, the interpreter will enforce these types, guarding against type bugs already at code writing time:

In [58]:
var i = "1"
i = 1


cmd57.sc:2: type mismatch;
 found   : Int(1)
 required: String
val res57_1 = i = 1
                  ^

: 

If you want a variable that can hold multiple types of values, you need to specify so explicitly:

In [37]:
var i: Any = "1"
i = 1

[36mi[39m: [32mAny[39m = 1

Besides write-time error detection, strong variable typing also enables lots of nice support in programming environments, such as autocompleting methods that are applicable to the variable type.

Again, the types of your variables in practice will often be much more complex than just strings or numbers (such as different particular ways of storing the texts of 1000 books). Still, converting data from one structure to another, as dictated by the needs of processing, is a big part of programming.

## Functions/methods

Often, one wants to also run a piece of code multiple times, from multiple parts of the program. For this, one *defines* functions, which work for code similarly to how variables work for storing and recalling values (except where data is often read in from outside the code, the logic for a function is specified inside its definition). Functions take in zero or more parameters (given in parentheses), and can optionally return back a single value.

(In this way, they're also very much like mathematical functions, where e.g. `f(x)=x+2` is a function definition, with `x` as input and the body of the function defining the returned output as `x+2`.)


In [41]:
// This is a function definition that takes in a single variable named text. It returns that text after some processing.
def standardize(text: String): String = {
    var modifiableText = text
    modifiableText = modifiableText.replace("."," ")
    modifiableText = modifiableText.replace(","," ")
    modifiableText = modifiableText.replace("?"," ")
    modifiableText = modifiableText.replace("!"," ")
    modifiableText = modifiableText.replace("'","")
    modifiableText = modifiableText.toLowerCase()
    return modifiableText.replaceAll("\\s+"," ")
}

println(standardize("Where are we? I don't know!"))
println(standardize("This, programming... is... terrifying!"))

where are we i dont know 
this programming is terrifying 


defined [32mfunction[39m [36mstandardize[39m

Again, because Scala is a strongly typed language, you need to define types for both the parameters as well as what is returned (here both `String`). In addition, Scala has a notion of read only variables, and method parameters are such. This requires us to copy the unmodifiable `text` variable to the `modifiableText` variable for processing.

(you can define these read only variables yourself using `val variable = something`)

Very often, functions are packaged inside libraries, which you have to import in order to use. In Scala, [`replace`](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#replace-java.lang.CharSequence-java.lang.CharSequence-), [`replaceAll`](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#replaceAll-java.lang.String-java.lang.String-) and [`toLowerCase`](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#toLowerCase--) are all methods of `String`-type objects, and `String` is a datatype part of the [Scala standard library](http://www.scala-lang.org/api/2.12.0/index.html) (actually they're a part of the [Java standard library](http://docs.oracle.com/javase/8/docs/api/index.html) on which Scala builds upon), so nothing has to be imported here. 

However, libraries are still a big thing. In Scala, libraries are usually not imported as a whole. Instead, you import individual pieces as needed. The syntax is `import library.package.classname` (e.g. `import import scala.io.Source`). However, you *can* also import full libraries using the syntax `import library.package._` (e.g. `import scala.collection.JavaConverters._`). Often, libraries contain both new functions as well as new data types. To use the library, one has to read up on both of them.

Scala is very object oriented, and thus most functions are defined as methods of data types. Of these, also already seen `toString, toInt` and `getClass`. The only function we've seen is `println`! You can define stuff as functions still if you want:

In [43]:
var text = "Hmm..."
println("method of String object: "+text+"->"+text.replace(".","!"))

// Don't worry about the content of this function, just note its parameters (and the fact that again what is contained in the function is defined by indentation)
def replace(string: String,replaceThis: Char,withThis: Char): String = {
    var modifiableString = new StringBuilder(string)
    for (i <- 0 until modifiableString.length)
        if (modifiableString(i)==replaceThis)
            modifiableString(i)=withThis
    return modifiableString.toString
}

println("separately defined replace-function: "+text+"->"+replace(text,'.','!'))

/* 
  See how they're the same thing, but written differently:
  replace(text,'.','!') vs
  text.replace('.','!')
 */

method of String object: Hmm...->Hmm!!!
separately defined replace-function: Hmm...->Hmm!!!


[36mtext[39m: [32mString[39m = [32m"Hmm..."[39m
defined [32mfunction[39m [36mreplace[39m

Having functions implicitly tied to particular types doesn't give us any more functionality, but sometimes makes our code look nicer:

In [48]:
// In the original standardize function, we assigned the replaced text back to the original parameter variable each row:
def standardize(text: String): String = {
    var modifiableText = text
    modifiableText = modifiableText.replace("."," ")
    modifiableText = modifiableText.replace(","," ")
    modifiableText = modifiableText.replace("?"," ")
    modifiableText = modifiableText.replace("!"," ")
    modifiableText = modifiableText.replace("'","")
    modifiableText = modifiableText.toLowerCase()
    return modifiableText.replaceAll("\\s+"," ")
}

// Often, we want to write this more shortly. Here, we can use make use of the fact that replace returns a string, and /chain/ a call to the replace (or lower) method of that returned string
def standardize2(text: String) =
  text.replace("."," ").replace(","," ").replace("?"," ").replace("!"," ").replace("'","").toLowerCase().replaceAll("\\s+"," ")

// Now, we can do this kind of chaining with functions too, but because of where we need to put the string parameter, the result is not nearly as clear to undestand.
def standardize3(text: String) =
  replace(replace(replace(replace(text,'.',' '),',',' '),'?',' '),'!',' ').replace("'","").toLowerCase().replaceAll("\\s+"," ")

println(standardize2("Where are we? I don't know!"))
println(standardize3("This, programming... is... terrifying!"))

where are we i dont know 
this programming is terrifying 


defined [32mfunction[39m [36mstandardize[39m
defined [32mfunction[39m [36mstandardize2[39m
defined [32mfunction[39m [36mstandardize3[39m

By the way, Scala is also very *functional* in the mathematical sense, where functions are chained to produce a desired output. To this end, Scala includes lots of support for doing stuff in this functional manner.

Thus, an idiomatic Scala way for defining the above replace function would be:

In [49]:
def replace(string: String, replaceThis: Char, withThis: Char) =
  string.map(char => if (char == replaceThis) withThis else char)
  
println("separately defined replace-function: "+text+"->"+replace(text,'.','!'))

separately defined replace-function: Hmm...->Hmm!!!


defined [32mfunction[39m [36mreplace[39m

Here, map is a functional method associated with all sequences (strings are sequences of characters) that maps each entry in the sequence to something else. Inside the call to map, `char => if (char == replaceThis) withThis else char` is actually a *lambda function*, a special type of unnamed function. See:

In [52]:
def select(replaceThis: Char, withThis: Char, charToConsider: Char) =
  if (charToConsider == replaceThis) withThis else charToConsider
  
def replace(string: String, replaceThis: Char, withThis: Char) =
  string.map(select(_,replaceThis,withThis))
  
println("separately defined replace-function: "+text+"->"+replace(text,'.','!'))

// How this works is that the map method of string actually takes in a function as a parameter. We can define replace that way too:

def replace2(string: String, select: Char => Char) =
  string.map(select)
  
println("replace-function taking in a select function: "+text+"->"+replace2(text,select('.','!',_)))

separately defined replace-function: Hmm...->!!!!!!
replace-function taking in a select function: Hmm...->Hmm!!!


defined [32mfunction[39m [36mselect[39m
defined [32mfunction[39m [36mreplace[39m
defined [32mfunction[39m [36mreplace2[39m

(Above, `select('.','!',_)` takes the `select` function, which is a three parameter function, and binds two of those parameters to create a one parameter function)

Often, libraries contain both new functions as well as new data types (classes) with attendant methods. To use the library, one has to read up on all of them.

## Data structures

As said earlier, variables can hold much more complex data than just simple strings or numbers. They can in fact hold any data structure defined by the programming language or a libary. 

Most programming languages have two very useful core data types you should know. These are lists (or sequences or arrays) for holding multiple items, and [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) (or hashes or maps) for creating associations between items. 

In [54]:
// Here, we're defining a replacement dictionary. The magic syntax uses {} and -> 
var replacements = Map(
    "." -> " ",
    "," -> " ",
    "!" -> " ",
    "?" -> " ",
    "'" -> "",
    "&" -> "and" 
)

def standardize4(text: String): String = {
    var modifiableText = text
    // Here we're going over all the keys in the replacement dictionary and acting on them
    for ((key,replacement) <- replacements) {
      modifiableText = modifiableText.replace(key, replacement)
    }
    return modifiableText.toLowerCase.replaceAll("\\s+", " ")
}    

// This is a sequence. 
val sentences = Seq("Where are we? I don't know!", "This, programming... is... terrifying!")

// Here we're calling the function once for each string in the sentences list
for (sentence <- sentences)
    println(standardize4(sentence))
    
// You can also explicitly refer to a particular slot in a list or a key in a dictionary using square brackets:
println(replacements("&"))
println(sentences(0))
// In the above, note that the first entry in the list is at index 0, not 1. That's a conventional relic that permeates most programming languages, and comes originally from the way computers handle memory.

where are we i dont know 
this programming is terrifying 
and
Where are we? I don't know!


[36mreplacements[39m: [32mMap[39m[[32mString[39m, [32mString[39m] = [33mMap[39m([32m"."[39m -> [32m" "[39m, [32m"&"[39m -> [32m"and"[39m, [32m"!"[39m -> [32m" "[39m, [32m","[39m -> [32m" "[39m, [32m"'"[39m -> [32m""[39m, [32m"?"[39m -> [32m" "[39m)
defined [32mfunction[39m [36mstandardize4[39m
[36msentences[39m: [32mSeq[39m[[32mString[39m] = [33mList[39m([32m"Where are we? I don't know!"[39m, [32m"This, programming... is... terrifying!"[39m)

In [56]:
// Note that a dictionary can only contain one value for each key
val replacements2 = Map(
    "." -> "?",
    "." -> "!"
)
println(replacements2("."))

// Therefore, if you need multiple values, you have to combine dictionaries with lists:
val replacements3 = Map(
    "." -> Seq("?","!")
)

println(replacements3("."))

!
List(?, !)


[36mreplacements2[39m: [32mMap[39m[[32mString[39m, [32mString[39m] = [33mMap[39m([32m"."[39m -> [32m"!"[39m)
[36mreplacements3[39m: [32mMap[39m[[32mString[39m, [32mSeq[39m[[32mString[39m]] = [33mMap[39m([32m"."[39m -> [33mList[39m([32m"?"[39m, [32m"!"[39m))

## Conclusion

That's all I think you absolutely *need* to know in order to start reading and learning from examples. 