# The basic building blocks
In this notebook we will take a closer look at elements used in computation in most machine learning systems, the **tensor**.

A tensor is a generalization of vectors and matrices to an arbitrary number of **axes**. What does that mean? A scalar, or a single value, is a 0D tensor. A vector is a 1D tensor. A matrix is a 2D tensor and so on. When a new axis is added we increase the dimension of the tensor. The usage of the word **dimension** can be confusing as it is used to refer to the number of axes as well as the number of elements along an axis. We can eliminate this ambiguity by talking about the **rank** of a tensor as the number of axes it has. Similarly we also refer to the **shape** of the tensor as vector which describes the number of dimensions along each axis of a tensor. The picture below will show you how to think of tensors and their rank visually:

![Tensors](images/tensor.jpeg)

R uses the internal data type `vector` to manipulate 0D and 1D tensors and the [`matrix`](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/matrix) data type to manipulate 2D tensors. For higher dimensional data the [`array`](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/array) data type is used. To better understand these concepts and how they relate to R, you will make the following excercise:

## Excercise 1
In the following cell complete the following tasks:
1. Using R's [`c`](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/c) function, define a 0D tensor called 'x' with the value 4:
$$
x =
\left(
\begin{array}{c}
4 \\
\end{array}
\right)
$$
2. Again using [`c`](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/c), define a 1D tensor called 'y':
$$
y =
\left(
\begin{array}{c}
1\\
2\\
3\\
4\\
\end{array}
\right)
$$
3. With the [`matrix`](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/matrix) function, define a 2D tensor called 'Z':
$$
Z =
\left(
\begin{array}{c}
1 & 2 & 3 & 4\\
5 & 6 & 7 & 8\\
\end{array}
\right)
$$
4. Print the dimensions of each tensor using [`dim`](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/dim) (use the `as.array()` function to get meaningful results for 1. and 2.)

In [None]:
<FILL IN>

In deep learning you will encounter data in various tensor dimensions
- Vector data, 2D (samples, features)
- Time-series, 3D (samples, timesteps, features)
- Images, 4D (samples, height, width, channel) or (samples, channel, height, width)

## Manipulating tensors
Deep learning involves usually involves a lot of preprocessing of data to get it into the right shape for a model to train on. Often, you will need to manipulate a tensor's dimensions and axes, and it is crucial to understand how this works in R.

During the course you will have plenty of opportunity to explore the different techniques with which R provides us. 

### Slicing
The first technique we will discuss is **slicing**. With the bracket operators `[` and `]` we can slice data from a tensor by specifying the index or indices for axes we want to select:

In [None]:
z <- matrix(c(1, 2, 3, 4, 5, 6),  ncol=2)
z # This will show z

slice <- z[1,] # This will slice the first row
slice # This will show the slice

In [None]:
z[,1] # This will slice the first column

In [None]:
z[1:2, 1] # This will extract the first two elements of the first column

## Exercise 2
In the next cell we have defined a two-dimensional tensor called `Z`. It is a matrix of size 4 by 4. Extract the 'bottom right' section of the matrix, with size 2 by 2, pictured as the numbers in red in the following diagram:

$$
Z =
\left(
\begin{array}{cccc}
1 & 2 & 3 & 4\\
5 & 6 & 7 & 8\\
9 & 10 & \color{red}{11} & \color{red}{12} \\
13 & 14 & \color{red}{15} & \color{red}{16} \\
\end{array}
\right)
$$

In [None]:
Z <- matrix(c(1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15, 4, 8, 12, 16), ncol = 4)
<FILL IN>

## Exercise 3
In the next cell we have defined a three-dimensional tensor, also called `Z`. It is of size 4 by 4 by 4, and you can think of it as a cube. Extract the middle of the cube, with dimensions 2 by 2 by 2. The result (in R) should be:

```
22 23 26 27 38 39 42 43 
```

Please note that because we cannot easily visualise a three-dimensional tensor, R will give a list of numbers as its output.

In [None]:
z <- array(1:64, dim = c(4, 4, 4))  # 1:64 will generate a vector with elements 1, 2 ... 64
<FILL IN>

## Tensor arithmetic

There are a number of operations which can be performed on tensors, the first one is an **element-wise** operation, which applies a function to each element of the tensor independently. By default `+` and `*` in R are element-wise operations.

In [None]:
z <- matrix(c(1, 2, 3, 4), nrow=2, ncol=2) 
z
a <- z * z
a

In [None]:
z + z

### Multiplication
The **multiplication** of two tensors returns a new tensor and is denoted by `%*%` in R. The multiplication will calculate the product for each row and column in the two tensors as follows:

![Dot product](./images/Matrix_multiplication_diagram_2.svg)
[[source]](https://en.wikipedia.org/wiki/Matrix_multiplication)

In the following example the output tensor is of the same dimensions as the two input tensors.

In [None]:
A <- matrix(c(1, 2, 3, 4), ncol = 2)
A
A %*% A

In [None]:
A <- matrix(1:4, ncol = 2)
A
B <- matrix(5:8, ncol = 2)
B

In [None]:
A %*% B

## Exercise 4
Calculate the following matrix multiplication:

$$
\left(
\begin{array}{cc}
1 & 2 \\
3 & 4 \\
\end{array}
\right)
\left(
\begin{array}{cc}
5 & 6 \\
7 & 8 \\
\end{array}
\right)
=
\left(
\begin{array}{cc}
19 & 22 \\
43 & 50 \\
\end{array}
\right)
$$

In [None]:
<FILL IN>

## Exercise 5
Calculate the following multiplication of a matrix with a vector:

$$
\left(
\begin{array}{cc}
1 & 2 \\
3 & 4 \\
\end{array}
\right)
\left(
\begin{array}{c}
1\\
2\\
\end{array}
\right)
=
\left(
\begin{array}{c}
5\\
11\\
\end{array}
\right)
$$

In [None]:
<FILL IN>

### Transposition
Transposition will flip a matrix over its main diagonal, like so:

$$
\left(
\begin{array}{cc}
1 & 2\\
3 & 4\\
\end{array}
\right)^T
=
\left(
\begin{array}{cc}
1 & 3\\
2 & 4\\
\end{array}
\right)
$$

R has a [`t`](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/t) function to do so:

In [None]:
a <- matrix(c(1, 3, 2, 4), ncol = 2)
a
t(a)

## Exercise 6
Create a matrix with 1 column and 1 row and transpose it. Inspect the dimensions with the [`dim`](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/dim) function. Try the same with a vector. What happens?

In [None]:
<FILL IN>

You will see that any vector that is transposed is automatically converted into a matrix. R has no concept of a row vector or column vector. Instead, it uses a matrix with one dimension set to 1, as you saw above.

# Building a single dense layer
In the remainder of this notebook you will solve the problem introduced in the slides. Let's load the data first:

In [None]:
source("01-helpers.R")  # this will load helper functions for this session

data <- dataset_linear()
X <- data$X
y <- data$y

We inspect the input variables `X` and the labels `y` using the [`head`](https://www.rdocumentation.org/packages/utils/versions/3.5.1/topics/head) function.

In [None]:
head(X)

In [None]:
head(y)

## Exercise 7
Inspect the dimensions of `X` and `y`. How many instances do we have? How many input variables?

In [None]:
<FILL IN>

Let's plot the data set using one of the helper functions you just loaded:

In [None]:
plot_dataset(X, y)

# Solving the problem - building a neuron
During the remainder of this notebook you will build a single neuron that will perfectly classify all instances.

Let's look at a diagram of this neuron you are going to build:

![The neuron we are going to build](images/neuron.png)

You will build this neuron step by step, going from left to right through the diagram. Let's begin by defining the neuron's weights:

## Exercise 8
The neuron will have two weights, one for each input, $w_1$ and $w_2$. Create a weight vector `w` with the values `0.5` and `0`.

In [None]:
<FILL IN>

## Exercise 9
Create the bias variable `b`. Set it to `0` for now. This means that we will ignore its effect in subsequent calculations. You will modify it later in order to solve the problem.

In [None]:
<FILL IN>

## Exercise 10
Create the sigmoid activation function `sigmoid`, which takes a vector or number `x` and returns as output the following:
$$
\sigma(x) = sigmoid(x) = \frac{1}{1 + e^{-x}}
$$

We have provided you with a skeleton below:

**Hint**: you will need R's [`exp`](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/log) function.

In [None]:
sigmoid <- function(x) {
    <FILL IN>
}

## Exercise 11
Perform the complete calculation of the neuron in R. You can either use the diagram as a reference above, or use the following equation:

$$
\hat{y} = \sigma(\mathbf{w} \mathbf{X}^\top + b)
$$

The result should be a matrix with **1 row and 50 columns**.

In [None]:
<FILL IN>

## Exercise 12
Put the calculation above in an R function called `neuron` that takes as its input an input matrix `X` and returns the neuron's output. We will use this function for easily recalculating the neuron's output when modifying the weights and bias of the neuron.

We have provided you with a skeleton to fill in below:

In [None]:
neuron <- function(X) {
    <FILL IN>
}

## Exercise 13
Apply the function you defined above to the input matrix `X`.

The results will be the probability of the instance belonging to the positive class (the blue one). For classification problems we usually threshold the this probability, with a value larger than $0.5$ meaning the instance is classified as positive (the 'blue' class in our case). Given the probabilities and the labels in `y`, how do you think your classifier is performing?

In [None]:
<FILL IN>

Let's plot the results of your prediction with a helper function that we have created for you, called `plot_predictions`:

In [None]:
plot_predictions(X, y,neuron)

As you can see, our accuracy is 50% simply because all instances are positive (blue). We need to tweak the decision boundary in order to get 100% accuracy.

## Exercise 14
Modify the weights `w` and the bias variables `b` such that you classify the instances with 100% accuracy. Plot the probabilities and decision boundary based on your new weights and bias.

How do the bias and weights affect the decision boundary?

In [None]:
<FILL IN>