# Welcome to your first R notebook!

This will be your first R notebook. Enjoy! 

This notebook will contain examples of what you can do in R and what you can do with Jupyter notebooks.

# What is a (Jupyter) notebook?

A notebook is a document that consists of cells. Each cell can either contain text (in a Markdown format) or code. Each cell can be "runned" or "executed" individually.

For code cells the output of the code will be shown once the cell has been runned. In each code cell you can use whatever have been generated by code cells that have previously been run. However, it is a good idea to have code cell only depend on cells above it, so when you clear and run all cells you will not run into problems.

Jyputer is just a special kind of notebooks. Jyputer notebooks supports many different programming languages such as Julia, Python and R. We will only use it with R. (The documentation for Jupyter can be found here: https://jupyter-notebook.readthedocs.io/en/stable/index.html)

### What is Azure notebooks then?

A Jupyter notebook need to be executed somewhere - the code need to run somewhere. You can install Jupyter locally on your machine and execute Jupyter notebooks. Alternatively you can use Microsofts Azure Notebooks to run you Jupyter notebooks in the cloud. This is the recommended option for this course as you don't have to install an software and be concerned with what version of R you are using.

# Short introduction to Jupyter notebooks

To get started with Azure notebooks and Jupyter, see this sample notebook: https://notebooks.azure.com/Microsoft/libraries/samples/html/Azure%20Notebooks%20-%20Welcome.ipynb

To create a new Jupyter notebook, you can go to the "File" menu and "New Notebook" and then choose the programming language you want to use. You click the name of the notebook in the top to change the name of it. If you save it, it will automatically be saved in your current library in Azure Notebooks.

Clicking on any cell with take you to it. The color bar to the left will tell you which cell you are at. A green bar indicate a code cell, a blue bar a text/Markdown cell. You can change the type the cell in the drop-down menu above that also shows you which to of cell the current cell is.

To edit a markdown cell double click on it. To edit a code cell just click once on the code.

To execute/run a cell press the run button or press control+enter.

To execute a cell and automatically add an new cell below, press alt+enter. The plus sign button will also create a new cell below the current cell.

In the "Kernel" menu you can restart the notebook completely and clear the output of the code cells.

# Introduction to R

Now, let us get started with R. Please execute the code cells one at a time as you read through this notebook. 

Here is a first example of some R code:

In [None]:
3+4

Assigning a value to a variable is done in R using either "=" or "<-"

In [None]:
x = 5
y <- 3

As you can see, assignments do not produce any output, so you cannot see what x and y are after executing the above cell. However, you can just type "x":

In [None]:
x
y

### Vectors in R

Vectors are a very common data structure in R. you create a vector in R the following way:

In [None]:
c(4,3,5, 17)
c("this is", "yet", "another vector")

Note that numbers are actually also vectors in R (of length 1):

In [None]:
4
c(4)
identical(4, c(4))

Here are some examples of how you can "subset" a vector, that is, getting elements out of the vector:

In [None]:
x <- c(2,5,321, 42, 17)
x
x[4]
x[length(x)]
x[c(1,3,4)]
x[1:3]
x[-2]

Note "x[4]" will give the 4th element of the vector, while "length(x)" gives you the length of the vector (and "x[length(x)]" thereby gives you the last element of the vector x). "x[c(1,3,4)]" gives you the 1st, 3rd, and 4th element of the vector x, while "x[1:3]" gives you the elements from 1 through to 3. Finally, "x[-2]" gives you everything except the second element.

Vectors can also have named elements and you can assign a vector new names. Note that "NA" is a special symbol in R for missing data. Try ?NA 

In [None]:
names(x)
names(x) <- c("the first", "second", "third", "a number")
names(x)

Note that, once you have named a vector it will print out differently. This a special feature of Jupyter notebooks.

In [None]:
x

### Build-in functions in R

R has a lot of build-in function you can use for all sort of things. Here are a few examples:

In [None]:
sqrt(4)
abs(-4)
rnorm(10)

"sqrt" is the square root function, "abs" takes the absolute value, but what do "rnorm" do? To get help on functions in R, you can type a question mark followed by the name of the function as in "?rnorm":

In [None]:
?rnorm

As you probably noticed when executing the above cell, it opens a new window below with the documentation for that function. This documentation might be a bit hard to read in the beginning, but you will get used to it. Also note the examples in the buttom which are often very useful. 

So the "rnorm" function generates random numbers from a normal distribution (we will talk more about the normal distribution later in the course).

A function in R can take multiple arguments. For instance, the "rnorm" function can take an argument called "n", which is the number of random numbers you want to generate (above we generated 10 random numbers). It also have another argument called "mean". You can always see the most important arguments a function takes on its help page. Have a look at the help page for "rnorm" again and find the decription of the arguments "n" and "mean. As you can see from the cell below "rnorm(n = 10, mean = 1)" generates 10 random numbers with mean 1, while "rnorm(n = 3, mean = 10)" generates 3 random numbers with mean 10:

In [None]:
rnorm(n = 10, mean = 1)
rnorm(n = 3, mean = 10)

Here we explicitly named the arguments "n" and "mean", but a function always has an order to its argument so "rnorm(10, 1)" is the same as "rnorm(n = 10, mean = 1)" since this is the order of the arguments. However, order matters if you do not name them. If you name them, the order does not matter. Here is an example:

In [None]:
rnorm(10, 1)
rnorm(1, 10)
rnorm(n = 10, mean = 1)
rnorm(mean = 1, n = 10)

Wait! The last two output are not the same even though the arguments was named... The reason is that "rnorm" is a function that generates random numbers, so every time you call it, it will generate new random numbers. If you want to be sure to get the same numbers again you can set the seed before calling it. (A seed is a number used to generate random numbers). Try this:

In [None]:
set.seed(17)
rnorm(n = 10, mean = 1)
set.seed(17)
rnorm(mean = 1, n = 10)

Try executing the above code cell again and see that you get the same numbers. Now try to insert a new code cell below with just "rnorm(n = 10, mean 1)" and run multiple times and watch how the number keep changing. 

### Lists and data frames in R

List is another common data type in R. Lists are like vectors, exept they are more general as each element can of different types and be complex object themselves.

In [None]:
alist <- list(me = 5, you = 12, something = "else")
alist

blist <- list(hej = c(2, 4, 54), someNames = c("Kaj", "Neel"))
blist

Here some examples of how to create a list and subset it:

In [None]:
# you can use th "$" to get an element of a list
alist$me
# or you can use "[[ ]]" to subset a list by its number or by its name
alist[[1]]
alist[["me"]]
# note that "[ ]" will give you a new list with only the selected elements.
alist[1]

Data frames are a special kind of lists. Think of them as a table of a spread sheet (each element ment of the list is a column and all the columns have the same length). Here some examples:

In [None]:
data.frame(hej = c(2, 4), someNames = c("Kaj", "Neel"))

Data frames is the central data types we will be using again and again! Usually, however, we will not "create" them ourselves, but get them using packages for getting data into R. Here is another example of a data frame and how to subset it:

In [None]:
a <- data.frame(date = as.Date(c("2016-09-15", "2016-09-16","2016-09-17", "2016-09-18", "2016-09-19")),
                Sales = c(10, 14, 12, 9, 17),
                revenue = c(3, 6, 5, 3, 8))
a
a[2, 3]
a[2, ]
a[ , 3, drop = FALSE]
a$Sales

Here are some examples of usefull operations on data frames:

In [None]:
summary(a)
head(a, n = 2)
tail(a, n = 1)
str(a)

### Logical expressions in R

There are 3 logical truth values in R:

In [None]:
TRUE
FALSE
NA
class(c(TRUE, FALSE, NA))

Here are some examples of how these are generated and by various expressions:

In [None]:
3 > 4
3 <= 3

In [None]:
x <- c(2,4,5)
x

In [None]:
x > 3
x[x > 3]
x[x < 5 & x > 2]
x[x >= 5 | x <= 2]
x[x == 4]

### Creating functions in R

You can also easily create your own functions in R:

In [None]:
aFuntion <- function(x, y) {
  z <- x * y
  z * 2
}

In [None]:
aFuntion(4,5)

### If-statements and for-loops in R

You can of course also do if-statements and for-loops in R. Here are some examples:

In [None]:
x <- 5

In [None]:
if (x > 4) y <- 1

y

In [None]:
if (x > 4) {
  y <- 1
}

y

In [None]:
if (x < 4) {
  y <- 1
} else {
  y <- 8
}

y

In [None]:
x <- c()

x

for (num in 1:10) {

    x <- c(x, num)
}

x

# More on R

If you are having trouble with R (or are fairly new to programming) or want to learn more R on your own, you can try out DataCamp's free R course here:https://www.datacamp.com/courses/free-introduction-to-r