# Neural Computation (Autumn 2019)
# Lab 3: Advanced Numpy, Tensors & Tensor Operations
 In this tutorial, we cover:
- Advanced Numpy like broadcasting and sorting
- Tensors (a fundamental data structure used in Neural Computation)
- Tensor Operations.

# Section 1: Advanced Numpy

As always, we need to import the numpy library via  `import` command:

In [0]:
import numpy as np

## Broadcasting


Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:

In [0]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

print(y)

This works; however when the matrix `x` is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix `x` is equivalent to forming a matrix `vv` by stacking multiple copies of `v` vertically, then performing elementwise summation of `x` and `vv`. We could implement this approach like this:

In [0]:
vv = np.tile(v, (4, 1))  # Stack 4 copies of v on top of each other
print(vv)                 # Prints "[[1 0 1]
                         #          [1 0 1]
                         #          [1 0 1]
                         #          [1 0 1]]"

In [0]:
y = x + vv  # Add x and vv elementwise
print(y)

Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

In [0]:
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)

The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting; this line works as if v actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.

Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension

If this explanation does not make sense, try reading the explanation from the [documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) or this [explanation](http://wiki.scipy.org/EricsBroadcastingDoc).

Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the [documentation](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs).

Here are some applications of broadcasting:

In [0]:
# Compute outer product of vectors
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:

print(np.reshape(v, (3, 1)) * w)

In [0]:
# Add a vector to each row of a matrix
x = np.array([[1,2,3], [4,5,6]])
# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:

print(x + v)

In [0]:
# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:

print((x.T + w).T)

In [0]:
# Another solution is to reshape w to be a row vector of shape (2, 1);
# we can then broadcast it directly against x to produce the same
# output.
print(x + np.reshape(w, (2, 1)))

In [0]:
# Multiply a matrix by a constant:
# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3), producing the
# following array:
print(x * 2)

Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.

This brief overview has touched on many of the important things that you need to know about numpy, but is far from complete. Check out the [numpy reference](http://docs.scipy.org/doc/numpy/reference/) to find out much more about numpy.

## Data Types for ndarrays

The *data type* or `dtype` is a special object containing the information (or *metadata*, data about data) the ndarray needs to interpret a chunk of memory as a particular type of data:

In [0]:
arr1 = np.array([1, 2, 3], dtype = np.float64)

print(arr1)
print(arr1.dtype)

In [0]:
arr2 = np.array([1, 2, 3], dtype = np.int32)

print(arr2)
print(arr2.dtype)

Note the difference between the types of the values in each two arrays, `arr1` and `arr2`. 

The numerical dtypes are named the same way: a type name, like `float` or `int`, followed by a number indicating the number of bits per element. A standard double-precision floating-point value (what’s used under the hood in Python’s `float` object) takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as `float64`. Some examples of Numpy data types as follows:

    int8, uint8 : signed and unsigned 8-bit (1 byte) integer types
    int16, uint16 :  signed and unsigned 16-bit integer types
    int32, uint32 : signed and unsigned 32-bit integer types
    int64, uint64 : signed and unsiged 64-bit integer types

    float16 : Half-precision floating point
    float32 : Standard single-precision floating point
    float64 : Standard double-precision floating point

## Setting Array Values by Broadcasting
The same broadcasting rule governing arithmetic operations also applies to setting values via array indexing. In a simple case, we can do things like:



In [0]:
arr = np.zeros((4,3))
print(arr)

In [0]:
arr[:] = 5
print(arr)

However, if we had a one-dimensional array of values we wanted to set into the columns of the array, we can do that as long as the shape is compatible:

In [0]:
arr[:2] = [[-1.37], [0.509]]
arr

## Sorting
Like Python’s built-in list, the ndarray `sort` instance method is an in-place sort, meaning that the array contents are rearranged without producing a new array:

In [0]:
arr = np.random.randn(6)
print(arr)

In [0]:
arr.sort()  # in-place sorting in ascending order
print(arr)

When sorting arrays in-place, remember that if the array is a view on a different ndarray, the original array will be modified:

In [0]:
arr = np.random.randn(3, 5)
print('Before sorting: \n', arr)

arr[:, 0].sort()  # sort first column values in-place
print('After sorting: \n', arr)

On the other hand, `numpy.sort` creates a new, sorted copy of an array. Otherwise, it accepts the same arguments (such as kind ) as `ndarray.sort` :

In [0]:
arr = np.random.randn(5)
arr

In [0]:
np.sort(arr)  # create a new array

In [0]:
arr  # the original array does not change

All of these sort methods take an axis argument for sorting the sections of data along the passed axis independently:

In [0]:
arr = np.random.randn(3, 5)
print('Before sorting \n', arr)

arr.sort(axis=0)  # sort columns in ascending order
print('After sorting \n', arr)

In [0]:
arr = np.random.randn(3, 5)
print('Before sorting \n', arr)

arr.sort(axis=1)  # sort rows in ascending order
print('After sorting \n', arr)

You may notice that none of the sort methods have an option to sort in descending order. This is a problem in practice because array slicing produces views, thus not producing a copy or requiring any computational work. The following exercise will ask you to propose a solution to this problem.

### Exercise: 

Given a 2-dimensional array as follows:
  
    [[5, 4, 9],
     [8, 1, 5],
     [3, 0, 3]]
  
Could you sort the array to obtain the following array?

    [[9, 5, 4], 
     [8, 5, 1], 
     [3, 3, 0]]

In [0]:
# write your solution here


Similarly, write codes to produce the following array:

    [[8, 4, 9],
     [5, 1, 5],
     [3, 0, 3]]

In [0]:
# write your solution here


# Section 2: Data Representations for Neural Networks
In the previous section, we started from data stored in multidimentional Numpy array, also called **tensors**. In general, all current machine-learning system use tensors as their basic data structure. Tensors are fundamental to the field -- so fundamental that Google's TensorFlow was named after them. So what's a tensor?

At its core, a tensor is a container for data -- almost always numerical data. So, it's a container for numbers. You may be already familiar with matrices, which are 2D tensors: tensors are a geeralisation of matrices to an arbitrary number of dimensions (note that in the context of tensors, a dimension is called an *axis*).

## Scalar (0D tensors)

A tensor that contains only one number is called a scalar (or scalar tensor, or 0-dimensional tensor, or 0D tensor). In Numpy, a `float32` or `float64` number is a scalar tensor. You may display the number of axes of a Numpy tensor via the `ndim` attribute; a scalar tensor has 0 axes (`ndim==0`). The number of axes of a tensor is also called its *ranks*. Here's a Numpy scalar:

In [0]:
x = np.array(12)  
x

In [0]:
x.ndim  # A scalar has 0 axes

## Vectors (1D tensors)
An array of number is called a vector, or 1D tensor. A 1D tensor is said to have exactly one axis. Following is a Numpy vector:

In [0]:
x = np.array([12, 3, 6, 14])
x

In [0]:
x.ndim  # a vector has exactly one axis

This vector has five entries and so is called a 5-dimensional vector. Don’t confuse a 5D vector with a 5D tensor! A 5D vector has only one axis and has five dimensions along its axis, whereas a 5D tensor has five axes (and may have any number of dimensions along each axis). Dimensionality can denote either the number of entries along a specific axis (as in the case of our 5D vector) or the number of axes in a tensor (such as a 5D tensor), which can be confusing at times. In the latter case, it’s technically more correct to talk about a tensor of rank 5 (the rank of a tensor being the number of axes), but the ambiguous notation 5D tensor is common regardless.

## Matrices (2D tensors)

An array of vectors is a matrix, or 2D tensor. A matrix has two axes (often referred to rows and columns). You may visually interpret a matrix as a rectangular grid of numbers. This is Numpy matrix:

In [0]:
x = np.array([[5, 78, 2, 34, 0],
             [6, 79, 3, 35, 1],
             [7, 80, 4, 36, 2]])
x

In [0]:
x.ndim  # a matrix has two axes.

The entries from the first axis are called the rows, and the entries from the second axis are called the columns. In the previous example, `[5, 78, 2, 34, 0]` is the first row of `x`, and `[5, 6, 7]` is the first column.

## 3D tensors and higher-dimensional tensors

If you pack such matrices in a new array, you obtain a 3D tensor, which you can visually interpret as a cube of numbers. Following is a Numpy 3D tensor:

In [0]:
x = np.array([
             [[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]],
             [[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]],
             [[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]]
              ])
x

In [0]:
x.ndim  # 3D tensor has three axes 

By packing 3D tensors in an array, you can create a 4D tensor, and so on. In deep learning, you’ll generally manipulate tensors that are 0D to 4D, although you may go up to 5D if you process video data.

## Key attributes

A tensor is defined by three key attributes:

- **Number of axes (rank)**: For instance, a 3D tensor has three axes, and a matrix has two axes. This is also called the tensor's `ndim` in Python libraries such as Numpy.

- **Shape**: This is a tuple of integers that describes how many dimensions the tensor has along each axis. For example, the previous matrix has shape of `(3,5)`, and the 3D tensor example has shape `(3,3,5)`. A vector has a shape with a single element, such as `(5,)`, whereas a scalar has an empty shape, `()`.

- **Datatype** (usually called `dtype` in Python libraries): This is the type of the data contained in the tensor; for instance, a tensor's type could be `float32`, `uint8`, `float64`, and so on. Note that string tensors do not exist in Numpy (or in most other libraries).


## Real-world examples of data tensors

Let’s make data tensors more concrete with a few examples similar to what you’ll
encounter later. The data you’ll manipulate will almost always fall into one of the following categories (see more real-world examples below):

- **Vector data** -- 2D tensor of shape `(samples, features)`, where `samples` denotes the number of samples.

- **Timeseries data or sequence data** -- 3D tensors of shape `(samples, timesteps, features)`.

- **Images** -- 4D tensor of shape `(samples, channels, height, width)` (in Facebook's Pytorch) or `(samples, height, width, channels)` (in Google's TensorFlow).

- **Video** -- 5D tensors of shape `(samples, frames, channels, height, width)`.

## Vector data
This is the most common case. In such a dataset, each single data point can be encoded as a vector, and thus a batch of data will be encoded as a 2D tensor (that is, an array of vectors), where the first axis is the samples axis and the second axis is the features axis.

Let’s take a look at two examples:

- An actuarial dataset of people, where we consider each person’s age, ZIP code,
and income. Each person can be characterized as a vector of 3 values, and thus
an entire dataset of 100,000 people can be stored in a 2D tensor of shape
`(100000, 3)`. More specifically, we have in this case `samples` is 100,000 and `features` is a vector of 3 values (age, ZIP and income).

- A dataset of text documents, where we represent each document by the counts
of how many times each word appears in it (out of a dictionary of 20,000 com-
mon words). Each document can be encoded as a vector of 20,000 values (one
count per word in the dictionary), and thus an entire dataset of 500 documents
can be stored in a tensor of shape `(500, 20000)`.


## Timeseries data or sequence data

Whenever time matters in your data (or the notion of sequence order), it makes sense to store it in a 3D tensor with an explicit time axis. Each sample can be encoded as a sequence of vectors (a 2D tensor), and thus a batch of data will be encoded as a 3D tensor.

The time axis is always the second axis (axis of index 1), by convention. Let’s look at an example:
 
 - A dataset of stock prices. Every minute, we store the current price of the stock, the highest price in the past minute, and the lowest price in the past minute. Thus every minute is encoded as a 3D vector, an entire day of trading is
encoded as a 2D tensor of shape `(390, 3)` (there are 390 minutes in a trading
day), and 250 days’ worth of data can be stored in a 3D tensor of shape `(250,
390, 3)`. Here, each sample would be one day's worth of data. Here, we have, in this case, `samples` is 250, `timesteps` is 390 and `features` is 3.

## Image data

Images typically have three dimensions: height, width, and color depth. Although
grayscale images have only a single color channel and could thus be stored in 2D tensors, by convention image tensors are always 3D , with a one-dimensional color channel for grayscale images. 

For example, a batch of 128 grayscale images (has only one channel) of
size 256 × 256 could thus be stored in a tensor of shape `(128, 1, 256, 256)` (in the form of `(samples, channels, height, width)`), and a batch of 128 color images (has three channels for red, green, blue) could be stored in a tensor of shape `(128, 3, 256, 256)`.


# Section 3: Tensor Operations

Same as matrices, we can perform element-wise arithmetic between tensors. There are four basic airthmetic operations. We will go through them in the following cells. Usually, we use a captial letter to represent a tensor.

First, let's create some tensors as follows:


In [0]:
## from matrix
T1 = np.array([[1,2,3], [4,5,6],[7,8,9]])
T2 = np.array([[11,12,13], [14,15,16],[17,18,19]])
T3 = np.array([[21,22,23], [24,25,26],[27,28,29]])

T_m = np.array([T1,T2,T3])

print(T_m.shape)
print(T_m.ndim)

In [0]:
T_d = np.array([
  [[1,2,3],    [4,5,6],    [7,8,9]],
  [[11,12,13], [14,15,16], [17,18,19]],
  [[21,22,23], [24,25,26], [27,28,29]],
  ])

print(T_d.shape)
print(T_d.ndim)

In [0]:
## check
print(T_m-T_d) 

## Tensor addition
Given wo tensors with the same dimensions, we can create a new tensor with the same dimensions where each scalar value is the element-wise addition of the scalars in the parent tensors.


In [0]:
# define tensor addition
def tensor_add(T1, T2):
    return T1 + T2

In [0]:
# example of tensor addition
print(T1)
print(T2)
print(tensor_add(T1,T2))

In [0]:
T1 + T2  # element-wise addition

## Tensor substraction
Given wo tensors with the same dimensions, we can create a new tensor with the same dimensions where each scalar value is the element-wise substraction of the scalars in the parent tensors.

In [0]:
def tensor_sub(T1, T2):
    return T1-T2

In [0]:
tensor_sub(T1, T2)

In [0]:
T1 - T2  # element-wise subtraction

## Tensor division
Given wo tensors with the same dimensions, we can create a new tensor with the same dimensions where each scalar value is the element-wise division of the scalars in the parent tensors. 

In [0]:
#Please note than every element in T2 should not be zero.
def tensor_div(T1, T2):
    return T1/T2

In [0]:
tensor_div(T1, T2)

In [0]:
T1 / T2  # element-wise division

## Tensor multiplication
There are two types of tensor multiplication, one is called hadamard product and the other one is tensor product.

## Tensor hadamard product (element-wise multiplication)
Given wo tensors with the same dimensions, we can create a new tensor with the same dimensions where each scalar value is the element-wise multiplication of the scalars in the parent tensors.

In [0]:
def tensor_hprod(T1,T2):
    return T1*T2

In [0]:
tensor_hprod(T1, T2)

In [0]:
T1 * T2  # element-wise multiplication

## Tensor product

We will denote tensor product here as “(x)”. The tensor product is not limited to tensors, but can also be performed on matrices and vectors.


### Tensor product for vectors

The rules for tensor product for vectors are:

$$
a =\begin{pmatrix}
a1 & a2
\end{pmatrix} 
$$


$$
b =\begin{pmatrix}
b1 & b2
\end{pmatrix} 
$$

Assume c = a(x)b, then

$$
c =\begin{pmatrix}
a1*b & a2*b
\end{pmatrix} 
$$

### Tensor product for matrices

The rules for tensor product for matrices are:

$$
a =\begin{pmatrix}
a11 & a12\\
a21 & a22
\end{pmatrix} 
$$


$$
b =\begin{pmatrix}
b11 & b12\\
b21 & b22
\end{pmatrix} 
$$

Assume c = a(x)b, then

$$
c =\begin{pmatrix}
a11*b & a12*b\\
a21*b & a22*b
\end{pmatrix} 
$$

In python we can use the tensordot() function to do tensor product: Given two tensors (arrays of dimension greater than or equal to one), a and b, and an array_like object containing two array_like objects, (a_axes, b_axes), sum the products of a's and b's elements (components) over the axes specified by a_axes and b_axes. The third argument can be a single non-negative integer_like scalar, N; if it is such, then the last N dimensions of a and the first N dimensions of b are summed over. To calculate the tensor product, also called the tensor dot product in NumPy, the axis must be set to 0.

In [0]:
# tensor product for vectors
A = np.array([[1,2],[1,2]])
B = np.array([[3,4],[3,4]])

C = np.tensordot(A, B, axes=0)

print(C.shape)
print(C)

## Tensordot
To better understand the usage of tensordot in Python, we provide some exercises. The basic [usage](https://docs.scipy.org/doc/numpy/reference/generated/numpy.tensordot.html) is: 

c=numpy.tensordot(a, b, axes=2),

where a, b, are tensors, axes should be int or array_like. If axes is int M, this function will sum over the last M axes of a the first M axes of b in order. If axes is a list of axes to be summed over, then first sequence will apply to a and second to b. Note, the first sequence and second sequence must be of the same length.

In [0]:
## axes is int
a = np.random.randint(0,10,(2,2))
b = np.random.randint(0,10,(2,2))

print(a.shape)
print(b.shape)

In [0]:
c = np.tensordot(a,b, axes=0) ### tensor product(the definition)
print(c)
print(c.shape)

In [0]:
c = np.tensordot(a,b, axes=1) ### tensor dot product a.b
print(c)
print(c.shape)

In [0]:
c = np.tensordot(a,b, axes=2)  ### tensor double contraction a:b
print(c)
print(c.shape)

In [0]:
## axes is array_like
a = np.random.randint(0,10,(3,4,5)) ## creat a tensor with shape [3,4,5]
b = np.random.randint(0,10,(4,5,2)) ## creat a tensor with shape [4,5,2]

c = np.tensordot(a,b,[(1,2),(0,1)])

### the shape of a is [3,4,5], using the first and the second axis we can create a new matrix with shape [4,5].
### For b we use the zero axis and the first axis to create a matrix with shape [4,5]. 
### Then for these two matrix we do tensor hadamard product and sum all the elements of the result matrix. 

print(np.sum(a[0]*b[:,:,0]))

print(c)
print(c.shape)


## Other operation

If we already have a tensor with shape (M,N,R), however, we need a new shape (M,W) where W=R* N, or (R,N,M) for the next step, we can use reshape function or transpose to achieve this.

### [Reshape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html)


In [0]:
a = np.random.randint(0,10,(3,4,5))
b = a.reshape((12,5))
print(a.shape)
print(b.shape)

### [Transpose](https://docs.scipy.org/doc/numpy/reference/generated/numpy.transpose.html)

In [0]:
a = np.random.randint(0,10,(3,4,5)) ## the shape of a is [3,4,5] the index of each axis is [0,1,2]. 
                                    ## We transpose the tensor according to this index array.
b = a.transpose(2,1,0)
print(a.shape)
print(b.shape)

# Exercises

## Exercise 1: tensordot

Given two tensors as follows:

    a = np.random.randint(0,10,(4,4,5))
    b = np.random.randint(0,10,(3,4,4))

Calculate the tensordot of the two tensors. How can we obtain a result tensor of shape `(5, 3)`?

In [0]:
# write your answer here


## Exercise 2: transpose

Given a tensor 

    a = np.random.randint(0, 10, (3,4,5,6))

How can we get a tensor b with shape `(4,5,6,3)` from tensor a?

In [0]:
# write your answer here


## Exercise 3: random seeds

Given a tensor 

    a = np.random.randint(0, 10, (3,4,5,6))

How can we make sure the entries in the tensor `a` same each time when we run the block ? 

In [0]:
# write your answer here


## Exercise 4: transpose and reshape

Given a tensor 

    a = np.random.randint(0, 10, (3,4,5,6))
    b = b.tanspose(3,2,0,1)
    b = b.reshape((30,12))

How can we get back to tensor `a` from tensor `b`?

In [0]:
# write your answer here
