# Introduction to TensorFlow (2.2-2.3, 3.5)

In [None]:
import tensorflow as tf
import numpy as np

## TensorFlow Dojo!

---

### First: NumPy recap

First things first, let's do a superfast NumPy recap.

What is the problem? We need to manipulate a lot of numbers easily and efficiently.

Note: in this course, as well as in the book, we will often switch between NumPy and TensorFlow, which have very similar, but not identical, APIs.

Consider the raw Python example:

In [None]:
arr = [0,1,2,3,4]
print(arr[1])   # select second element
print(arr[-2])  # works from the end as well, -1 is the last element
print(arr[1:3]) # 1 inclusive, 3 exclusive
print(arr[:])   # everything

Things get more complex with matrices:

In [None]:
matrix = [
            [1,2,3],
            [4,5,6],
            [7,8,9]
         ]

print(matrix[0])               # selecting a row is easy
print(matrix[0][0])            # selecting 'deeper' requires more brackets
print([r[0] for r in matrix])  # but a column? You need a loop
print()
print("matrix * 2:")           # and a double loop to multiply all the numbers by a scalar
print([[el * 2 for el in row] for row in matrix]) # Python list comprehension syntax

For dot products or matrix mutliplications, it's even more complicated!

NumPy allows you to do that very elegantly.

In [None]:
matrix = np.arange(1, 10).reshape((3,3)) # you could also convert the Python list: np.array(matrix)
print(matrix)
print(matrix[0])
print(matrix[:, 0]) # slicing
print(matrix[1, 2:])

Math.

In [None]:
print(matrix)  
print(matrix * 2) # all those are broadcast!
print(matrix + 3)
print(matrix ** 2)

Transposition.

In [None]:
print(matrix)
print(matrix.T) # also: matrix.transpose()

Adding an axis with `None` or `np.newaxis`.

In [None]:
array = np.array([1,2,3])
print(array)
print(array[None, ...])       # adding a new axis (np.newaxis works the same as None)
print(array[..., np.newaxis]) # the ... mean: "fill in the rest automatically"

Broadcasting! (More of that later)

In [None]:
print(matrix)
print(matrix + array)             # [1,2,3] added horizontally to each row
print(matrix + array[..., None])  # [1,2,3] added vertically to each column

More functions:

In [None]:
print(matrix)
print(matrix.sum()) # overall sum
print(matrix.sum(axis=0)) # sum of each column
print(matrix.sum(axis=1)) # sum of each row
# same logic with .min() .max() .mean() .std()

Linear algebra operations, for example: dot product.

In [None]:
arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])
print(arr1.dot(arr2)) # dot product

In [None]:
second_matrix = np.random.randint(0,10, size=(3,3))
print(second_matrix)
print(matrix @ second_matrix) # same as np.matmul(matrix, second_matrix)

Huge amounts of functions available...

Many  tutorials and refreshers available from the [NumPy docs](https://numpy.org/doc/stable/user/index.html):

  - [NumPy quickstart](https://numpy.org/doc/stable/user/quickstart.html)
  - [NumPy absolute basics for beginners](https://numpy.org/doc/stable/user/absolute_beginners.html)
  - [Numpy broadcasting tutorial](https://numpy.org/doc/stable/user/basics.broadcasting.html)  

*(For futher reference: [Numpy fundamentals](https://numpy.org/doc/stable/user/basics.html).)*

---

## Now to TensorFlow: plan

- Constants/Variables, shape, type
- Reshaping
- Slicing
- Tensor creation (zeros, ones, random)
- Broadcasting
- Maths & other ops

---

### 3.5.1 The basics of TensorFlow objects

`Tensors` are the generalisation of matrices (and vectors) in higher dimensions!

In [None]:
x = tf.constant([1]) # creating a simple array with one element
                     # (this is actually not a constant, by the way, but it's
                     # not *assignable*, see below)
                     # Explanation here for the curious:
                     # https://www.tensorflow.org/api_docs/python/tf/constant

In [None]:
x.shape 
# x.shape.as_list() # will give it to you as a list

In [None]:
x.dtype

In [None]:
x.numpy()

In [None]:
x.numpy().item() # only works for scalars

In [None]:
x = tf.range(12) # same as tf.constant(range(12)) or tf.constant([0,1,2,3,4,5,6,7,8,9,10,11])
print(x)

In [None]:
x.shape

The tuple notation convention: (n_elements,) for 1D tensors.

In [None]:
print("Note the shape of a 1D array:", np.array([1,2,3]).shape)
print()
print("The `(n_elements,)` tuple notation is used when defining tensor!")
print(tf.reshape(tf.range(3), (3,)))  # ← (3,), row vector
print(tf.reshape(tf.range(3), 3))     # the same
print(tf.reshape(tf.range(3), (3,1))) # not the same: a column vector

---

### Creating tensors: all-ones, all-zeros, random tensors

In [None]:
x = tf.ones(shape=(2, 1))
x

In [None]:
y = tf.zeros(shape=(2, 1))
y

In [None]:
z = tf.eye(2) 
# z = tf.eye(2, num_columns=3) # you can specify num_columns if you want another shape
z

In [None]:
x = tf.random.normal(shape=(3, 1), mean=0., stddev=1.)
x

In [None]:
x = tf.random.uniform(shape=(3, 1), minval=0., maxval=1.)
x

For random ints, just use `tf.random.uniform` and specify the dtype.

In [None]:
x = tf.random.uniform(shape=(3, 1), minval=1, maxval=15, dtype=tf.int32)
x

---

### Reshaping

In [None]:
x = tf.range(12) # of course the shapes specified must divide the total without remainder!
print(tf.reshape(x, (4,3)))
print()
print(tf.reshape(x, (3, 4)))
print()
print(tf.reshape(x, (2, 2, 3)))
print()
print(tf.reshape(x, (2, 3, 2)))

<!-- ![tf shapes 1](images/tf/tf-shape-1.png)
![tf shapes 2](images/tf/tf-shape-2.png)
![tf shapes 3](images/tf/tf-shape-3.png) -->
![tf shapes 1](https://drive.google.com/uc?id=1bbxNZK7vWj2LhEojUp-AaPdE5UdI3QvX)
![tf shapes 2](https://drive.google.com/uc?id=1RqkEJGwSNONsuo6q0X3Sex-kgvqFw3Ny)
![tf shapes 3](https://drive.google.com/uc?id=1iGRHY2cMsjkXP8VN4ThclOqglIdaqRhh)

<small>Source: [TensorFlow Introduction to Tensors](https://www.tensorflow.org/guide/tensor)</small>

---

### Slicing

In [None]:
x = tf.reshape(tf.range(2*3*4), shape=(2,3,4))
x

In [None]:
print(x)
x[1]     # first dim, second element
         # x[outer dimension, ..., inner dimension]

In [None]:
print(x)
x[1, 0] # first dim, second element
        # second dim, first element

In [None]:
print(x)
x[1, 0, 2] # first dim, second element
           # second dim, first element
           # third dim, third element

In [None]:
print(x)
x[1, 0, 1:3] # first dim, second element
             # second dim, first element
             # third dim, from second (incl) till fourth (excl) element

In [None]:
print(x)
x[1, 0, :] # first dim, second element
           # second dim, first element
           # third dim, all elements

In [None]:
selection = [0,0,0] # use an array!
x[selection]

More in this [Tensorflow tutorial](https://www.tensorflow.org/guide/tensor_slicing).

---

**Broadcasting**

Broadcasting is the automatic updating of shapes to make operations between tensors possible (and streamlined/faster). 

Knowing this is important and will **help you a lot**.

This goes back to Numpy, [here is the tutorial](https://numpy.org/doc/stable/user/basics.broadcasting.html).

From this, we get the following diagrams:

<!-- ![Numpy broadcasting 1](images/tf/np_broadcasting_1.png) -->
![Numpy broadcasting 1](https://drive.google.com/uc?id=1gPX4REfzmjisvEoCb-XxCEXqj4nPiMLo)

Source: [NumPy Broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)

<!-- ![Numpy broadcasting 2](images/tf/np_broadcasting_2.png) -->
![Numpy broadcasting 2](https://drive.google.com/uc?id=1yKYlEqTW2_4EfAwHQNviBDZ5onJpTZes)

Source: [NumPy Broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)

<!-- ![Numpy broadcasting 3](images/tf/np_broadcasting_3.png) -->
![Numpy broadcasting 3](https://drive.google.com/uc?id=1oawFy6B5Pe8Yrg1DQnXXFPwQPtN8yZoq)

Source: [NumPy Broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)

<!-- ![Numpy broadcasting 4](images/tf/np_broadcasting_4.png) -->
![Numpy broadcasting 4](https://drive.google.com/uc?id=17epzuPOcCvaJ_lqq2jjKpFu5yBUzgpxf)

Source: [NumPy Broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html)

<!-- ![Sasha Rush broadcasting rules](images/tf/srush-broadcasting.png) -->
![Sasha Rush broadcasting rules](https://drive.google.com/uc?id=1_9kY-rpKyTu-mHDQm4LgNRTVZLYKRp4x)

Source: [Sasha Rush, Twitter](https://twitter.com/srush_nlp/status/1516781757596680194?t=RwVp5kUWPvHG-e42wo0ryw&s=19)


In [None]:
x = tf.ones(shape=(2, 1)) # a column vector
x

In [None]:
y = x + tf.constant([2.]) # broadcasting [2.] to [[2.],
                          #                       [2.]]
y

In [None]:
x + 2 # works as well, just saying

In [None]:
z = tf.ones(shape=(2,2)) * 2
print(z)

In [None]:
print(y)
print(z)
z + y # broadcasting y = [[3.],
      #                   [3.]] 
      #              to   [[3., 3.],
      #                   [3., 3.]]

In [None]:
x + tf.transpose(y) # double broadcasting:
                    #
                    # x = [[1.]       →  [[1., 1.],
                    #      [1.]]          [1., 1.]]
                    #
                    # y = [[3., 3.]]  →  [[3., 3.],
                    #                     [3., 3.]]

---

### Reminder, NumPy arrays are assignable (like lists in Python)

In [None]:
x = np.ones(shape=(2, 2))
print(x)
x[0, 0] = 0.
print(x)

### But TensorFlow variables resist usual assignment!

In [None]:
v = tf.Variable(initial_value=tf.zeros(shape=(2,2)))
print(v)
try:
    v[0,0] = 1
except Exception as e:
    print("-"*40)
    print(e)
    print("tensors are not assignable the regular way...")

### Assigning a value to a TensorFlow variable

In [None]:
print(v)
v.assign(tf.ones((2,2)))
print()
print(v)

### Assigning a value to a subset of a TensorFlow variable

In [None]:
print(v)
v[0, 0].assign(3.)
print()
print(v)

### Using `assign_add`

In [None]:
print(v)
v.assign_add(tf.ones((2,2)))
print()
print(v)

In [None]:
print(v)
v.assign_sub(tf.ones(v.shape))
print()
print(v)

---

### 3.5.2. Tensor operations: Doing math in TensorFlow

### Important!

**Element-wise**: apply the operation to each element! 

If you apply an element-wise op to two tensors of different dimensions, tf will try to **broadcast** the values.

What is **not** element-wise? The dot product / matrix multiplication / tensor multiplication.

In [None]:
a = tf.ones((2, 2))
print(a)
a = a * 2
print()
print(a) # these are broadcast then performered element-wise!

In [None]:
b = tf.square(a) # same as a ** 2, element-wise
b

In [None]:
c = tf.sqrt(a) # same as a ** (1/2), element-wise
c

In [None]:
d = b + c # also element-wise
d

TensorFlow is more restrictive than NumPy: only `@` and `tf.matmul`, rather than `.dot()`.

In [None]:
e = a @ b # same as tf.matmul(a, b)
e         

In [None]:
e *= d # and, lastly, element-wise
e

---

### Finally, `reduce` operations

In [None]:
x = tf.random.uniform(shape=(3, 2), minval=1, maxval=16, dtype=tf.int32)
print(x)
print()
print(tf.reduce_min(x))
print()
print(tf.reduce_min(x, axis=0))
print()
print(tf.reduce_min(x, axis=1))

In [None]:
x = tf.random.uniform(shape=(3, 2), minval=1, maxval=16, dtype=tf.int32)
print(x)
print()
print(tf.reduce_max(x))
print()
print(tf.reduce_max(x, axis=0))
print()
print(tf.reduce_max(x, axis=1))

In [None]:
x = tf.random.uniform(shape=(3, 2), minval=1, maxval=6, dtype=tf.int32)
print(x)
print()
print(tf.reduce_mean(x))
print()
print(tf.reduce_mean(x, axis=0))
print()
print(tf.reduce_mean(x, axis=1))

In [None]:
x = tf.random.uniform(shape=(3, 2), minval=1, maxval=6, dtype=tf.int32)
print(x)
print()
print(tf.reduce_sum(x))
print()
print(tf.reduce_sum(x, axis=0))
print()
print(tf.reduce_sum(x, axis=1))

More in the [documentation](https://www.tensorflow.org/api_docs/python/tf/math/).

## Recap

- Constants/Variables, shape, type
- Reshaping
- Slicing
- Tensor creation (zeros, ones, random)
- Broadcasting
- Maths & other ops


---

## More practice!

- [TensorFlow Introduction to Tensors](https://www.tensorflow.org/guide/tensor)  
- [TensorFlow Introduction to Variables](https://www.tensorflow.org/guide/variable)  
- [TensorFlow Introduction to tensor slicing](https://www.tensorflow.org/guide/tensor_slicing)

And for something which is actually *less* basic: [TensorFlow Basics](https://www.tensorflow.org/guide/basics).  

---

## Note: Matmul with more dimensions

In [None]:
x = tf.random.uniform(shape=(5,2,4), minval=1, maxval=5, dtype=tf.int32)
# would also work with shape=(1,2,4) thanks to broadcasting
print(x)
print()
y = tf.random.uniform(shape=(5,4,2), minval=1, maxval=5, dtype=tf.int32)
print(y)
print()

In [None]:
x @ y

The last two dimensions must match, the rest must either match or one of them must be one (→ broadcast).

In [None]:
x[0] @ y[0] # the same result as the first matrix in the tensor above

---

### A note on `np.dot` in higher dimensions

TL:DR: **don't use `.dot` in higher dimensions, just switch to matmul, as this is what you'll be using.**

In [None]:
a = np.ones((4,4))
a

In [None]:
b = np.arange(3*4*2).reshape(3,4,2)
b

In [None]:
print(a.shape)
print(b.shape)

The `dot` in higher dimension will do the following operation:

> For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b. ([NumPy documentation](https://numpy.org/doc/stable/reference/generated/numpy.dot.html).

So, in our case:

```
a.shape: (4,  4)   ← the last and
b.shape:   (3,4,2) ← next-to-last dims are multiplied & summed
          ↓ ↓   ↓
result:  (4,3,  2)
          ↑ ↑      ← the leading dims of each array are just stacked in order
```
Multiply/Sum `4` with `4`.

In [None]:
adotb = a.dot(b)
print(adotb)
print()
print(f"shape: {adotb.shape}")

It's possible to think about this as "take slices of `a`", dot them with `b`.

In [None]:
a[0].dot(b) # we end up with such matrices

The `matmul` operation (aka `@`, it's the same), will **broadcast** the shapes so that:

```
a.shape: (4,4) → (1,4,4) → (3,4,4)
b.shape:                   (3,  4,2)
                              ↓   ↓  # the matmul thing
result:                    (3,4,  2)
```
    
Then it treats the last dimensions as matrices (stacks of `(4,2)` and `(4,4)`, does matmul on them.

In [None]:
amatmulb = a @ b
print(amatmulb)
print()
print(f"shape: {amatmulb.shape}")

In this case, it's doing the opposite: taking `a`, doing matmul on slices of `b`.

In [None]:
a @ b[0] # we will get three such matrices

The logic is extended to larger dimensions.

In [None]:
c = np.arange(3*4*5).reshape((1,3,4,5))
d = np.ones(2*3*4*5).reshape((4,3,5,2))
                                           # (1,3,4,     5)   ← last with
                                           #       (4,3, 5,2) ← next-to last 
print(f"c.dot(d) shape: {c.dot(d).shape}") # (1,3,4,4,3,   2)

                                           # (1,3,4,5) → (4,3,4,5)   ← c: broadcasting!
print()                                    #             (4,3,  5,2) ← last two dims: matmul
print(f"c @ d shape: {(c @ d).shape}")     #             (4,3,4,  2)

Final note: 
- for `dot`: the **last dim** of a and the **next-to-last dim** of b must be the same. The rest is freefall.  
- for `matmul` in higher dimensions, the **last two dims** must follow matrix multiplication rules: `(a,b) (b,c) → (a,c)`. **The rest must be broadcastable.**