# Fundamental concepts of programming for humanists

## Program flow
Programming is the act of giving a series of instructions to the computer. Upon running the program, the computer then follows these instructions in sequence. Typically (but not always), each line in a program is a single instruction. 

The box below contains two instructions. Run them by selecting the box (by clicking on it) and pressing ctrl-enter (or selecting the play button from the menu above). 

In [None]:
print("Hello")
print("programming")

Notice how two lines are printed. You can freely change the text inside the quotation marks to change what is printed. You can even add new print commands on additional lines.

## Variables

Often in programs, you need a particular value in multiple places. For this, named variables can be defined that act as stores for those values. In R, you put stuff in variables by `<-`, and retrieve it just by giving the variable name.


In [None]:
name <- "Eetu"
# You can use =, but <- is more idiomatic R
print(paste("Hello ", name, ".",sep=""))
print(paste("Welcome to programming ",name,".",sep=""))


## Operators
One way to act with values is by joining them with operators. The `+` in the above is a concatenation operator, joining together multiple strings (or the contents of string variables). A lot of the basic operators come from arithmetic and are mainly defined for numeric values, e.g. `+, -, /, *`. These also follow the precendence rules from basic math. Try them:

In [None]:
print( 1+5 )
print( 1+5/2 )
print( (1+5)/2 )

A second common class of operators are comparison operators, e.g.: `==, !=, >, <, >=, <=`. 

In [None]:
print(1<5)
print(1>5)
print("a"<"b")
print("ab"<"aa")

## Control flow

A computer program isn't really just a sequence of commands. It can also contain control flow statements that affect how the computer proceeds through the program. These are where the above mentioned comparison operators most often are used. Try changing the name variable by changing the assignment in the cell above (and executing that cell), and see what happens when you after that execute the cell below.

In [None]:
if (name=="John") {
    print("Hello Johnny")
} else if (name=="Bruce Wayne") {
    print("Hello Batman")
} else
    print(paste("Hello ",name,sep=""))

That control flow construct was the `if` construct. Other important control flow constructs are `while` and its specialization `for`. They're used for doing stuff repeatedly (for example, to do something to all words in a sentence, etc.)

In [None]:
i <- 1
while (i < 4) {
    print(paste("while: ",i,sep=""))
    i <- i+1
}
# note that in the above, both the print and the adding of 1 to i are included inside the while. That's because of the {} forming a block (indentation is also often used to make this clear). If the i=i+1 wasn't in the block, it would actually result in an endless loop, as i would always be 1 and thus less than 4. You're welcome to try it, but be warned that it will probably jam your browser with output. (this by the way is a comment, it is not processed by the computer)


for (i in 1:3) 
    print(paste("for: ",i,sep=""))

## Variable types

In the above, we also coincidentally introduced the fact that for computers, `"1"` and `1` are two completely separate things. One is a string, while the other is a number, and the two don't mix. For example, the second statement in the following code just doesn't work. Try it.

In [None]:
print(10+10)
print("10"+10)

In [None]:
i <- 1
print(paste("Type of i: ",typeof(i),sep=""))
j <- '1'
print(paste("Type of j: ",typeof(j),sep=""))
print(paste("Is i equal to j?: ",i==j,sep=""))
print(paste("Is as.character(i) equal to j?: ",as.character(i)==j,sep=""))
print(paste("Is i equal to as.numeric(j)?: ",i==as.numeric(j),sep=""))

Here one has to note an important difference between R and python. In general, R does more automatic conversions between formats than Python, so in the above, the number 1 *is* actually equal to the string "1" even without explicit conversion!

In [None]:
# The computer understands how to convert the string 11 into a number 
as.numeric("11")
# Yet the string eleven isn't a number to the computer, even though it is to us.
as.numeric("eleven")

Here's also an important difference. Where Python raised an error, R just issues a warning and returns NA (a special value denoting Not Available).

## Functions/methods

Often, one wants to also run a piece of code from multiple parts of the program. For this, one defines functions, which work for code similarly to how variables work for storing and recalling values. Functions take in zero or more parameters (given in parentheses), and can optionally return back a single value.

In [None]:
# This is a function definition that takes in a single variable named string. It returns that string after some processing.
modernize <- function(string) {
    return(gsub("ätä\\b","ää",gsub("ata\\b","aa",gsub("g","k",gsub("d","t",gsub("b","p",gsub("w","v",string)))))))
}

print(modernize("waltawat määrät omenata laivattiin Helsingiin"))
print(modernize("waiwaisten hambaat ovat usein huonot"))

Very often, functions are packaged inside libraries, which you have to import in order to use. In R, the `gsub` function is in the core library, so nothing has to be imported here. However, libraries are still a big thing, and the syntax for importing them is `library(libraryname)`.

Other functions we've also already seen are `as.character, as.numeric, typeof` and also `print`!

R doesn't really believe in object methods, so doesn't have any equivalent to the replace method of Python string object. Instead, everything is just pure functions, and you need to pass the values as parameters.

## Data structures

Most programming languages have two very useful core data types you should know. These are lists (or sequences or arrays) for holding multiple items, and dictionaries (or hashes or maps) for creating associations between items.

In [None]:
# R doesn't have hash objects directly (but there is a hash package). However, pure R also does have vectors that may have named indices, so let's use that:
replacements <- c(w = 'v',b ='p', d = 't', g = 'k', 'ätä\\b' = 'ää', 'ata\\b' = "aa")

modernize2 <- function(string) {
    # Here we're going over all the keys in the replacement dictionary and acting on them
    for (key in names(replacements)) {
        string <- gsub(key,replacements[key],string)
    }
    return(string)
}

# This is a list.
sentences = c("waltawat määrät omenata laivattiin Helsingiin", "waiwaisten hambaat ovat usein huonot")

# Here we're calling the function once for each string in the sentences list
for (sentence in sentences)
    print(modernize2(sentence))
    
# You can also explicitly refer to a particular slot in a list or a key in a dictionary using square brackets:
print(replacements["w"])
print(sentences[1])
# In R, indices start at 1!

In [None]:
# Note that a dictionary can only contain one value for each key
replacements2 <- c(w = "v", w = "y")
print(replacements2["w"])

# Therefore, if you need multiple values, you have to combine dictionaries with lists:
replacements2 <- list(w = c("v","y"))
print(replacements2["w"])

As an interesting albeit mostly useless note, see how in the above R returns "v" in the first instance, while Python returned "y". 

More usefully, note that in the above we couldn't use just simple vectors (created by `c()`, but instead had to use actual lists created by `list()`. The technical explanation here is that simple vectors are more efficient to the computer to process, but have the limitation of having to be flat, while actual lists can be more complex but are also therefore more expensive for the computer to operate upon. 

## Conclusion

That's all I think you absolutely *need* to know in order to start reading and learning from examples. 