# 392028 Introduction to Neural Networks
Copyright 2020 - Riza Velioglu, Bielefeld University

# I. Introduction to Google Colab

Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser. See the [video](https://www.youtube.com/watch?v=inN8seMm7UI) to get an overview of the key features of Colaboratory.
The document you are reading is a  [Jupyter notebook](https://jupyter.org/), hosted in Colaboratory. It is not a static page, but an interactive environment that lets you write and execute code in Python and other languages.

For example, here is a **code cell** with a short Python script that computes a value, stores it in a variable, and prints the result:

In [0]:
seconds_in_a_day = 24 * 60 * 60
seconds_in_a_day

86400

To execute the code in the above cell, select it with a click and then either press the play button to the left of the code, or use the keyboard shortcut "Command/Ctrl+Enter".

All cells modify the same global state, so variables that you define by executing a cell can be used in other cells:

In [0]:
seconds_in_a_week = 7 * seconds_in_a_day
seconds_in_a_week

604800

What does Colab offer?
- Zero configuration required
- Easy sharing
- Clone GitHub repositories
- Import/Share notebooks from Drive
- Import external datasets, e.g.from Kaggle
- Integrate PyTorch, TensorFlow, Keras, OpenCV
- Free access to GPUs!!

To get more about Colaboratory check the following tutorials:
- [Google Colaboratory](https://colab.research.google.com/notebooks/basic_features_overview.ipynb)
- [Tutorialspoint](https://www.tutorialspoint.com/google_colab/index.htm)

> If you would like install Keras and its dependencies on your local computer, please follow the step-by-step guide here: https://livebook.manning.com/book/deep-learning-with-python/appendix-a/

---
## A little more on Colab

- Installing packages
- Cloning GitHub repositories
- Mounting Google Drive
- Changing the Hardware Accelerator
- Importing Datasets (Kaggle, Drive, GitHub, etc.)

In [0]:
!pip install Keras

In [0]:
!git clone https://github.com/keras-team/keras

Cloning into 'keras'...
remote: Enumerating objects: 32987, done.[K
remote: Total 32987 (delta 0), reused 0 (delta 0), pack-reused 32987[K
Receiving objects: 100% (32987/32987), 13.11 MiB | 23.88 MiB/s, done.
Resolving deltas: 100% (24100/24100), done.


In [0]:
!ls

keras  sample_data


In [0]:
import os
os.chdir("keras")
!pwd

/content/keras


### Specifying the TensorFlow version

Running `import tensorflow` will import the default version (currently 2.x). You can use 1.x by running a cell with the `tensorflow_version` magic **before** you run `import tensorflow`.

In [2]:
%tensorflow_version 1.x
import tensorflow
print(tensorflow.__version__)

UsageError: Line magic function `%tensorflow_version` not found.


Further reading: [Introduction to Google Colab](https://colab.research.google.com/notebooks/welcome.ipynb)

# II. Data representations for neural networks

In general, all current machine-learning systems use tensors as their basic data structure. Tensors are fundamental to the field—so fundamental that Google’s TensorFlow was named after them. So what’s a tensor?

At its core, a tensor is a container for data—almost always numerical data. So, it’s a container for numbers. You may be already familiar with matrices, which are 2D tensors: tensors are a generalization of matrices to an arbitrary number of dimensions (note that in the context of tensors, a dimension is often called an axis).

## **Scalars (0D Tensors)**

A tensor that contains only one number is called a *scalar* (or scalar tensor, or 0-dimensional tensor, or 0D tensor). In Numpy, a `float32` or `float64` number is a scalar tensor (or scalar array). You can display the number of axes of a Numpy tensor via the `ndim` attribute; a scalar tensor has 0 axes (`ndim == 0`). The number of axes of a tensor is also called its `rank`. Here’s a Numpy scalar:

In [0]:
import numpy as np
x = np.array(12)
x

array(12)

In [0]:
x.ndim

0

## **Vectors (1D Tensors)**

An array of numbers is called a *vector*, or 1D tensor. A 1D tensor is said to have exactly one axis. Following is a Numpy vector:

In [0]:
x = np.array([12, 3, 6, 14])
x

array([12,  3,  6, 14])

In [0]:
x.ndim

1

This vector has five entries and so is called a *5-dimensional vector*. Don’t confuse a 5D vector with a 5D tensor! A 5D vector has only one axis and has five dimensions along its axis, whereas a 5D tensor has five axes (and may have any number of dimensions along each axis). *Dimensionality* can denote either the number of entries along a specific axis (as in the case of our 5D vector) or the number of axes in a tensor (such as a 5D tensor), which can be confusing at times. In the latter case, it’s technically more correct to talk about *a tensor of rank 5* (the rank of a tensor being the number of axes), but the ambiguous notation *5D tensor* is common regardless.

## **Matrices (2D tensors)**

An array of vectors is a *matrix*, or 2D tensor. A matrix has two axes (often referred to *rows* and *columns*). You can visually interpret a matrix as a rectangular grid of numbers. This is a Numpy matrix:

In [0]:
x = np.array([[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]])

x.ndim

2

The entries from the first axis are called the *rows*, and the entries from the second axis are called the *columns*. In the previous example, $[5, 78, 2, 34, 0]$ is the first row of $x$, and $[5, 6, 7]$ is the first column.

## **3D tensors and higher-dimensional tensors**

If you pack such matrices in a new array, you obtain a 3D tensor, which you can visually interpret as a cube of numbers. Following is a Numpy 3D tensor:

In [0]:
x = np.array([[[5, 78, 2, 34, 0],
               [6, 79, 3, 35, 1],
               [7, 80, 4, 36, 2]],
              [[5, 78, 2, 34, 0],
               [6, 79, 3, 35, 1],
               [7, 80, 4, 36, 2]],
              [[5, 78, 2, 34, 0],
               [6, 79, 3, 35, 1],
               [7, 80, 4, 36, 2]]])

x.ndim

3

By packing 3D tensors in an array, you can create a 4D tensor, and so on. In deep learning, you’ll generally manipulate tensors that are 0D to 4D, although you may go up to 5D if you process video data. 



> Here's a figure visualizing tensors:
![tensors](https://res.cloudinary.com/practicaldev/image/fetch/s--VaxrSdrA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://thepracticaldev.s3.amazonaws.com/i/bp6ux6ppf5t5amwkxklq.jpg)



## **Key attributes**

A tensor is defined by three key attributes:

- *Number of axes (rank)*—For instance, a 3D tensor has three axes, and a matrix has two axes. This is also called the tensor’s `ndim` in Python libraries such as Numpy.

- *Shape*—This is a tuple of integers that describes how many dimensions the tensor has along each axis. For instance, the previous matrix example has shape $(3, 5)$, and the 3D tensor example has shape $(3, 3, 5)$. A vector has a shape with a single element, such as $(5,)$, whereas a scalar has an empty shape, $()$.

-  *Data type* (usually called `dtype` in Python libraries)—This is the type of the data contained in the tensor; for instance, a tensor’s type could be `float32`, `uint8`, `float64`, and so on. On rare occasions, you may see a `char` tensor. Note that string tensors don’t exist in Numpy (or in most other libraries), because tensors live in preallocated, contiguous memory segments: and strings, being variable length, would preclude the use of this implementation.


> To make this more concrete, let’s look back at the data we processed in the MNIST example. First, we load the MNIST dataset:

In [0]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


In [0]:
# Next, we display the number of axes of the tensor train_images, the ndim attribute:
print(train_images.ndim)

3


In [0]:
# Here's its shape
print(train_images.shape)

(60000, 28, 28)


In [0]:
# And this is its data type, the dtype attribute:
print(train_images.dtype)

uint8


So what we have here is a 3D tensor of 8-bit integers. More precisely, it’s an array of 60,000 matrices of 28 × 28 integers. Each such matrix is a grayscale image, with coefficients between 0 and 255. 

Let’s display some digits in this 3D tensor, using the library *Matplotlib* (part of the standard scientific Python suite) and let's also use *ipywidgets* to get an interactive output! Feel free to play with the slider to get different samples from `train_images`

In [0]:
from ipywidgets import interact
import matplotlib.pyplot as plt


def plot_digit(n):
    plt.imshow(train_images[n], cmap=plt.cm.binary)
    plt.show()
  
# Since we have 60k samples, define the slider size accordingly: (min, max, step)
interact(plot_digit, n=(0, 60000, 1))

interactive(children=(IntSlider(value=30000, description='n', max=60000), Output()), _dom_classes=('widget-int…

<function __main__.plot_digit>

---

***Source Alert***

- Please open [this notebook](https://colab.research.google.com/github/ageron/handson-ml2/blob/master/tools_matplotlib.ipynb#scrollTo=fFxS6UQsuAAc) for a detailed introduction to [`matplotlib`](https://matplotlib.org), _the_ plotting library for Python, by [Aurélien Geron](https://twitter.com/aureliengeron?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor)

- Open this notebook [this notebook](https://colab.research.google.com/notebooks/widgets.ipynb#scrollTo=P6xc9QVFSlrw) for Colab `widgets` and [this notebook](https://colab.research.google.com/notebooks/forms.ipynb#scrollTo=eFN7-fUKs-Bu) to learn more about how Jupyter Widgets can be used in Colab.

---


## **Manipulating tensors in Numpy**

In the previous example, we selected a specific digit alongside the first axis using the syntax `train_images[i]`. Selecting specific elements in a tensor is called ***tensor slicing***. Let’s look at the tensor-slicing operations you can do on Numpy arrays.

The following example selects digits #10 to #100 (#100 isn’t included) and puts them in an array of shape $(90, 28, 28)$:

In [0]:
my_slice = train_images[10:100]
print(my_slice.shape)

(90, 28, 28)


It’s equivalent to this more detailed notation, which specifies a *start index* and *stop index* for the slice along each tensor axis. Note that $:$ is equivalent to selecting the entire axis:

In [0]:
# Equivalent to the previous example
my_slice = train_images[10:100, :, :]
my_slice.shape

(90, 28, 28)

In [0]:
# Also equivalent to the previous example
my_slice = train_images[10:100, 0:28, 0:28]
my_slice.shape

(90, 28, 28)

 ## **The notion of data batches**

In general, the first axis (axis 0, because indexing starts at 0) in all data tensors you’ll come across in deep learning will be the samples axis (sometimes called the samples dimension). In the MNIST example, samples are images of digits.

In addition, deep-learning models don’t process an entire dataset at once; rather, they break the data into small ***batches***. Concretely, here’s one batch of our MNIST digits, with batch size of 128:

In [0]:
batch = train_images[:128]

# And here's the next batch
batch = train_images[128:256]

# And the nth batch
n = 10
batch = train_images[128 * n:128 * (n + 1)]

Datasets splitted into individual batches are stored in a so called ***batch tensor***, whose first axis (axis 0) is called the ***batch axis*** or ***batch dimension***. This is a term you’ll frequently encounter when using Keras and other deep-learning libraries.

## **Real-world examples of data tensors**

Let’s make data tensors more concrete with a few examples similar to what you’ll encounter later. The data you’ll manipulate will almost always fall into one of the following categories:

- *Vector data*—**2D** tensors of shape $(samples, features)$

- *Timeseries data* or *sequence data*—**3D** tensors of shape $(samples, timesteps, features)$

- *Images*—**4D** tensors of shape $(samples, height, width, channels)$ or $(samples, channels, height, width)$

- *Video*—**5D** tensors of shape $(samples, frames, height, width, channels)$ or $(samples, frames, channels, height, width)$

### Vector data

This is the most common case. In such a dataset, each single data point can be encoded as a vector, and thus a batch of data will be encoded as a 2D tensor (that is, an array of vectors), where the first axis is the *samples* axis and the second axis is the *features* axis. 

Let’s take a look at two examples:

- A dataset of text documents, where we represent each document by the counts of how many times each word appears in it (out of a dictionary of 20,000 common words). Each document can be encoded as a vector of 20,000 values (one count per word in the dictionary), and thus an entire dataset of 500 documents can be stored in a tensor of shape $(500, 20000)$.

### Timeseries data or sequence data

Whenever time matters in your data (or the notion of sequence order), it makes sense to store it in a 3D tensor with an explicit time axis. Each sample can be encoded as a sequence of vectors (a 2D tensor), and thus a batch of data will be encoded as a 3D tensor.

- A dataset of tweets, where we encode each tweet as a sequence of $280$ characters out of an alphabet of $128$ unique characters. In this setting, each character can be encoded as a binary vector of size $128$ (an all-zeros vector except for a 1 entry at the index corresponding to the character). Then each tweet can be encoded as a 2D tensor of shape $(280, 128)$, and a dataset of 1 million tweets can be stored in a tensor of shape $(1000000, 280, 128)$.

### Image data

Images typically have three dimensions: height, width, and color depth. Although grayscale images (like our MNIST digits) have only a single color channel and could thus be stored in 2D tensors, by convention image tensors are always 3D, with a onedimensional color channel for grayscale images. A batch of $128$ grayscale images of size $256 × 256$ could thus be stored in a tensor of shape $(128, 256, 256, 1)$, and a batch of $128$ color images could be stored in a tensor of shape $(128, 256, 256, 3)$

> See the figure: 

![figure](https://miro.medium.com/max/1276/1*WArDf9h6Dtbo-4H5P4lguQ.png)






---
# III. Tensor Operations

Much as any computer program can be ultimately reduced to a small set of binary operations on binary inputs (AND, OR, NOR, and so on), all transformations learned by deep neural networks can be reduced to a handful of tensor operations applied to tensors of numeric data. For instance, it’s possible to add tensors, multiply tensors, and so on.

In our initial example, we were building our network by stacking $Dense$ layers on top of each other. A Keras layer instance looks like this: 

> `keras.layers.Dense(512, activation='relu')`

This layer can be interpreted as a function, which takes as input a 2D tensor and returns another 2D tensor—a new representation for the input tensor. Specifically, the function is as follows (where W is a 2D tensor and b is a vector, both attributes of the
layer):

> `output = relu(dot(W, input) + b)`

Let’s unpack this. We have three tensor operations here: a dot product ($dot$) between the input tensor and a tensor named $W$; an addition ($+$) between the resulting 2D tensor and a vector $b$; and, finally, a $relu$ operation. $relu(x)$ is $max(x, 0)$. Here's the graph of $relu$:

![relu](https://qph.fs.quoracdn.net/main-qimg-d23ac99265ab19599e71c9d1a3cb089a)

Here are some other ***activation functions***:
![act.fns](https://cdn-images-1.medium.com/max/1000/1*4ZEDRpFuCIpUjNgjDdT2Lg.png)

## Element-wise operations

The $relu$ operation and addition are element-wise operations: operations that are applied independently to each entry in the tensors being considered. If you want to write a naive Python implementation of an element-wise operation, you use a for loop, as in this naive implementation of an element-wise relu operation:

In [0]:
def naive_relu(x):
  assert len(x.shape) == 2       # x & y are 2D numpy tensors
  x = x.copy()                   # Avoid overwriting the input tensor
  
  for i in range(x.shape[0]):
    for j in range(x.shape[1]):
      x[i, j] = max(x[i, j], 0)
  return x

You do the same for addition:

In [0]:
def naive_add(x, y):
  assert len(x.shape) == 2      # x & y are 2D numpy tensors
  assert x.shape == y.shape
  x = x.copy()                  # Avoid overwriting the input tensor
  
  for i in range(x.shape[0]):
    for j in range(x.shape[1]):
      x[i, j] += y[i, j]
  return x

On the same principle, you can do element-wise multiplication, subtraction, and so on.

In practice, when dealing with Numpy arrays, these operations are available as welloptimized built-in Numpy functions. These are low-level, highly parallel, efficient tensor-manipulation
routines that are typically implemented in Fortran or C. 

So, in Numpy, you can do the following element-wise operation, and it will be blazing
fast:
```python
import numpy as np
z = x + y                   # --> Element-wise addition
z = np.maximum(z, 0.)       # --> Element-wise relu
```

## Tensor dot

The dot operation, also called a tensor product (not to be confused with an elementwise product) is the most common, most useful tensor operation. Contrary to element-wise operations, it combines entries in the input tensors. An element-wise product is done with the $*$ operator in Numpy, Keras, Theano, and TensorFlow. $dot$ uses a different syntax in TensorFlow, but in both Numpy and Keras it’s done using the standard $dot$ operator:

```
import numpy as np
z = np.dot(x, y)
```

We know that the dot product between two vectors is a scalar and that only vectors with the same number of elements are compatible for a dot product.


In [0]:
import numpy as np
x = np.array([1,2,3])   # has shape (3,)
y = np.array([2,1])     # has shape (2,)
np.dot(x,y)             # Will fail due to mismatching shapes

ValueError: ignored

You can also take the dot product between a matrix $X$ and a vector $y$, which returns a vector where the coefficients are the dot products between $y$ and the rows of $X$.

In [0]:
X = np.array([[1,2], 
              [3,4]])    # has shape (2,2)

y = np.array([0, 4])     # has shape (2,)
np.dot(X,y)

array([ 8, 16])

Note that as soon as one of the two tensors has an `ndim` greater than 1, $dot$ is no longer symmetric, which is to say that $dot(x, y)$ isn’t the same as $dot(y, x)$:

In [0]:
print(f"Dot product of X and y: {np.dot(X, y)}",
      f"\nDot product of y and X: {np.dot(y, X)} equals dot product of X transposed and y: {np.dot(X.T, y)}")

Dot product of X and y: [ 8 16] 
Dot product of y and X: [12 16] equals dot product of X transposed and y: [12 16]


Of course, a dot product generalizes to tensors with an arbitrary number of axes. The most common applications may be the dot product between two matrices. You can take the dot product of two matrices $X$ and $Y$ $(dot(X, Y))$ if and only if `X.shape[1] == Y.shape[0]`. The result is a matrix with shape $(X.shape[0],
Y.shape[1])$ , where the coefficients are the vector products between the rows of $X$ and the columns of $y$.

To understand dot-product shape compatibility, it helps to visualize the input and output tensors by aligning them as shown in the following figure: 

![tensor-dot](https://4.bp.blogspot.com/-Gt2dGWco0as/XCboOkqfkVI/AAAAAAAAb1w/jh5PcvX-AFY3Zrk1lB7u307t52m7QyaAwCLcBGAs/s1600/%25E3%2582%25B9%25E3%2582%25AF%25E3%2583%25AA%25E3%2583%25BC%25E3%2583%25B3%25E3%2582%25B7%25E3%2583%25A7%25E3%2583%2583%25E3%2583%2588%2B2018-12-29%2B12.21.10.png)


More generally, you can take the dot product between higher-dimensional tensors, following the same rules for shape compatibility as outlined earlier for the 2D case:

> $(a, b, c, d) . (d,)$ -> $(a, b, c)$

> $(a, b, c, d) . (d, e)$ -> $(a, b, c, e)$

And so on.

## Tensor reshaping

A third type of tensor operation that’s essential to understand is ***tensor reshaping***. Although it wasn’t used in the Dense layers in our first neural network example, we use it when we preprocess the image data before feeding it into our network:

> `train_images = train_images.reshape((60000, 28 * 28))`

Reshaping a tensor means rearranging its rows and columns to match a target shape. Naturally, the reshaped tensor has the same total number of coefficients as the initial tensor. Reshaping is best understood via simple examples:

In [0]:
x = np.array([[0., 1.],
              [2., 3.],
              [4., 5.]])

x.shape

(3, 2)

In [0]:
x = x.reshape((6, 1))
x

array([[0.],
       [1.],
       [2.],
       [3.],
       [4.],
       [5.]])

In [0]:
x = x.reshape((2, 3))
x

array([[0., 1., 2.],
       [3., 4., 5.]])

A special case of reshaping that’s commonly encountered is $transposition$. Transposing a matrix means exchanging its rows and its columns, so that `X[i, :]` becomes `X[:, i]`:

In [0]:
x = np.zeros((300, 20))
x.shape

The transposed of this matrix is easily expressed via `x.T`:

In [0]:
x.T.shape