# Getting started with TensorFlow:

## What is TensorFlow and why should we use it?

[TensorFlow](https://www.tensorflow.org/) is an open-source end-to-end machine learning library, developed by Google, for preprocessing data, building and training models and serving these.

Rather than building machine learning and deep learning models from scratch, it's more likely you'll use a library such as TensorFlow with Keras as part of the package. This is because it contains many of the most common machine learning functions you will be using. An increasingly popular alternative to TensorFlow is Pytorch, which we will not be covering in this course.

## What we're going to cover

TensorFlow is vast. But the main premise is simple: turn data into numbers (tensors) and build machine learning algorithms to find patterns in them.

In this notebook we cover some of the most fundamental TensorFlow operations, more specificially:
* Creating tensors
* Retreiving tensor attributes
* Common tensor operations
* Tensors and NumPy

Use the TensorFlow documentation: https://www.tensorflow.org/api_docs/python/


## Introduction to Tensors

[Tensors](https://www.tensorflow.org/guide/tensor) are very similar to NumPy arrays. The main difference between tensors and NumPy arrays (also an n-dimensional array of numbers) is that tensors can be used on [GPUs (graphical processing units)](https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/). The benefit of being able to run on GPUs is faster computation, since GPUs were made for processing tensors (e.g. images & video).

You can think of a tensor as a multi-dimensional numerical representation (n-dimensional) of something. Where something can be almost anything you can imagine: 
* It could be numbers themselves (using tensors to represent the price of houses). 
* It could be an image (using tensors to represent the pixels of an image).
* It could be text (using tensors to represent words).
* Or it could be some other form of information (or data) you want to represent with numbers.

The first thing we'll do is import TensorFlow under the common alias `tf`.

In [1]:
# Import TensorFlow
import tensorflow as tf
print(tf.__version__) # find the version number

2025-11-04 15:15:53.134138: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-11-04 15:15:53.134207: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-11-04 15:15:53.135010: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-11-04 15:15:53.141291: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


2.15.0


### Creating Tensors with `tf.constant()`

Usually you won't create tensors yourself. This is because TensorFlow has modules built-in (such as [`tf.io`](https://www.tensorflow.org/api_docs/python/tf/io) and [`tf.data`](https://www.tensorflow.org/guide/data)) which are able to read your data sources and automatically convert them to tensors and then later on, neural network models will process these for us. However, it is highly likely that you will run into tensor realtated errors, such as dimension mismatches. This is why we are getting familar with tensors themselves and how to manipulate them.

We'll begin by using [`tf.constant()`](https://www.tensorflow.org/api_docs/python/tf/constant).

In [2]:
# Create a scalar (rank 0 tensor)
scalar = tf.constant(7)
scalar

2025-11-04 15:15:56.980962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 43604 MB memory:  -> device: 0, name: NVIDIA A40, pci bus id: 0000:41:00.0, compute capability: 8.6


<tf.Tensor: shape=(), dtype=int32, numpy=7>

A scalar is known as a rank 0 tensor. Because it has no dimensions (it's just a number).

In [3]:
# Check the number of dimensions of a tensor
scalar.ndim

0

In [4]:
# Check the shape of your tensor
scalar.shape

TensorShape([])

In [5]:
# Create a vector
vector = tf.constant([2, 5])
vector

<tf.Tensor: shape=(2,), dtype=int32, numpy=array([2, 5], dtype=int32)>

In [6]:
# Check the number of dimensions and shape of our vector tensor
vector.ndim, vector.shape

(1, TensorShape([2]))

In [7]:
# Create a matrix
matrix = tf.constant([[2, 5],
                      [3, 6]])
matrix

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[2, 5],
       [3, 6]], dtype=int32)>

In [8]:
matrix.ndim, matrix.shape

(2, TensorShape([2, 2]))

By default, TensorFlow creates tensors with either an `int32` or `float32` datatype.

This is known as 32-bit precision. The higher the number, the more precise the number, the more space it takes up on memory.

In [9]:
# Create another matrix and define the datatype
another_matrix = tf.constant([[2., 5.],
                              [3., 6.],
                              [4., 7.]], dtype=tf.float16) # specify the datatype with 'dtype'
another_matrix

<tf.Tensor: shape=(3, 2), dtype=float16, numpy=
array([[2., 5.],
       [3., 6.],
       [4., 7.]], dtype=float16)>

In [10]:
# Even though another_matrix contains more numbers, its dimensions stay the same
another_matrix.ndim, another_matrix.shape

(2, TensorShape([3, 2]))

In [11]:
# How about a tensor with more than 2 dimensions:
tensor = tf.constant([[[1, 2, 3],
                       [4, 5, 6]],
                      [[1,2,4],
                      [0,9,6]],
                      [[7, 8, 9],
                       [10, 11, 12]]])
tensor

<tf.Tensor: shape=(3, 2, 3), dtype=int32, numpy=
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 1,  2,  4],
        [ 0,  9,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]], dtype=int32)>

In [12]:
# How about a tensor with more than 2 dimensions:
tensor = tf.constant([[[0, 1, 2, 3, 4],
                       [5, 6, 7, 8, 9]],
                      [[10, 11, 12, 13, 14],
                      [15, 16, 17, 18, 19]],
                      [[20, 21, 22, 23, 24],
                       [25, 26, 27, 28, 29]]])
tensor

<tf.Tensor: shape=(3, 2, 5), dtype=int32, numpy=
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]], dtype=int32)>

In [13]:
tensor.ndim, tensor.shape

(3, TensorShape([3, 2, 5]))

<img src="images/3-axis_front.png" align="top" width=200>

This is known as a rank 3 tensor (3-dimensions), however a tensor can have an arbitrary (unlimited) amount of dimensions.

For example, you might turn a series of images into tensors with shape (224, 224, 3, 32), where:
* 224, 224 (the first 2 dimensions) are the height and width of the images in pixels.
* 3 is the number of colour channels of the image (red, green blue).
* 32 is the batch size (the number of images a neural network sees at any one time).

All of the above variables we've created are actually tensors. But you may also hear them referred to as their different names (see above).

### Creating Tensors with `tf.Variable()`

You can also create tensors using [`tf.Variable()`](https://www.tensorflow.org/api_docs/python/tf/Variable).

The difference between `tf.Variable()` and `tf.constant()` is tensors created with `tf.constant()` are immutable (can't be changed, can only be used to create a new tensor), where as, tensors created with `tf.Variable()` are mutable (can be changed).

In [14]:
# Create the same tensor with tf.Variable() and tf.constant()
changeable_tensor = tf.Variable([2, 5])
unchangeable_tensor = tf.constant([2, 5])
changeable_tensor, unchangeable_tensor

(<tf.Variable 'Variable:0' shape=(2,) dtype=int32, numpy=array([2, 5], dtype=int32)>,
 <tf.Tensor: shape=(2,), dtype=int32, numpy=array([2, 5], dtype=int32)>)

Now let's try to change one of the elements of the changable tensor.

In [15]:
# To change an element of a tf.Variable() tensor requires the .assign() method
changeable_tensor[0].assign(7)
changeable_tensor

<tf.Variable 'Variable:0' shape=(2,) dtype=int32, numpy=array([7, 5], dtype=int32)>

Now let's try to change a value in a `tf.constant()` tensor.

In [16]:
# Will error (can't change tf.constant())
unchangeable_tensor[0].assign(7)
unchangleable_tensor

AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'assign'

Which one should you use? `tf.constant()` or `tf.Variable()`?

It will depend on what your problem requires. However, most of the time, TensorFlow will automatically choose for you when loading or modelling data.

### Creating random tensors

Random tensors are tensors of some abitrary size which contain random numbers. Why would you want to create random tensors? 
This is what neural networks use to intialize their weights (patterns) that they're trying to learn in the data.

We can create random tensors by using the [`tf.random.Generator`](https://www.tensorflow.org/guide/random_numbers#the_tfrandomgenerator_class) class.

In [17]:
# Create two random (but the same) tensors
random_1 = tf.random.Generator.from_seed(42) # set the seed for reproducibility
random_1 = random_1.normal(shape=(3, 2)) # create tensor from a normal distribution 
random_2 = tf.random.Generator.from_seed(42)
random_2 = random_2.normal(shape=(3, 2))

# Are they equal?
random_1, random_2, random_1 == random_2

(<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
 array([[-0.7565803 , -0.06854702],
        [ 0.07595026, -1.2573844 ],
        [-0.23193765, -1.8107855 ]], dtype=float32)>,
 <tf.Tensor: shape=(3, 2), dtype=float32, numpy=
 array([[-0.7565803 , -0.06854702],
        [ 0.07595026, -1.2573844 ],
        [-0.23193765, -1.8107855 ]], dtype=float32)>,
 <tf.Tensor: shape=(3, 2), dtype=bool, numpy=
 array([[ True,  True],
        [ True,  True],
        [ True,  True]])>)

The random tensors we've made are actually pseudorandom numbers. If we set a seed we'll get the same random numbers (if you've ever used NumPy, this is similar to `np.random.seed(42)`). 

Let's see what will happen when we change the seed.

In [18]:
# Create two random (and different) tensors
random_3 = tf.random.Generator.from_seed(42)
random_3 = random_3.normal(shape=(3, 2))
random_4 = tf.random.Generator.from_seed(11)
random_4 = random_4.normal(shape=(3, 2))

# Check the tensors and see if they are equal
random_3, random_4, random_1 == random_3, random_3 == random_4

(<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
 array([[-0.7565803 , -0.06854702],
        [ 0.07595026, -1.2573844 ],
        [-0.23193765, -1.8107855 ]], dtype=float32)>,
 <tf.Tensor: shape=(3, 2), dtype=float32, numpy=
 array([[ 0.2730574 , -0.29925638],
        [-0.3652325 ,  0.61883307],
        [-1.0130816 ,  0.2829171 ]], dtype=float32)>,
 <tf.Tensor: shape=(3, 2), dtype=bool, numpy=
 array([[ True,  True],
        [ True,  True],
        [ True,  True]])>,
 <tf.Tensor: shape=(3, 2), dtype=bool, numpy=
 array([[False, False],
        [False, False],
        [False, False]])>)

Can we shuffle the order of a tensor? Why would we want to do that?

Let's say you are working with 10,000 images of cats and dogs and the first 7,000 images of were of cats and the next 3,000 were of dogs. This order could affect how a neural network learns (it may overfit by learning the order of the data), instead, it might be a good idea to move your data around. In many cases we want this randomness to be part of our training, since our model will be more likely to generalize.

In [19]:
# Shuffle a tensor
not_shuffled = tf.constant([[10, 7],
                            [3, 4],
                            [2, 5]])
# Gets different results each time
tf.random.shuffle(not_shuffled)

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[ 3,  4],
       [10,  7],
       [ 2,  5]], dtype=int32)>

In [20]:
# Shuffle in the same order every time

# Set the global random seed
tf.random.set_seed(42)

# Set the operation random seed
tf.random.shuffle(not_shuffled)

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[ 3,  4],
       [ 2,  5],
       [10,  7]], dtype=int32)>

### Other ways to make tensors

Just like in NumPy, you can use [`tf.ones()`](https://www.tensorflow.org/api_docs/python/tf/ones) to create a tensor of all ones and [`tf.zeros()`](https://www.tensorflow.org/api_docs/python/tf/zeros) to create a tensor of all zeros.

In [21]:
# Make a tensor of all ones
tf.ones(shape=(3, 2))

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[1., 1.],
       [1., 1.],
       [1., 1.]], dtype=float32)>

In [22]:
# Make a tensor of all zeros
tf.zeros(shape=(3, 2))

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[0., 0.],
       [0., 0.],
       [0., 0.]], dtype=float32)>

## Getting information from tensors (shape, rank, size)

There will be times when you'll want to get different pieces of information from your tensors, in particuluar, you should know the following tensor vocabulary:
* **Shape:** The length (number of elements) of each of the dimensions of a tensor.
* **Rank:** The number of tensor dimensions. A scalar has rank 0, a vector has rank 1, a matrix is rank 2, a tensor has rank n.
* **Axis** or **Dimension:** A particular dimension of a tensor.
* **Size:** The total number of items in the tensor.

You'll use these especially when you're trying to line up the shapes of your data to the shapes of your model. For example, making sure the shape of your image tensors are the same shape as your models input layer.

We've already seen two of these before using the `ndim` and `shape` attribute. Let's see the rest.

In [23]:
# Create a rank 4 tensor (4 dimensions)
rank_4_tensor = tf.zeros([2, 3, 4, 5])
rank_4_tensor

<tf.Tensor: shape=(2, 3, 4, 5), dtype=float32, numpy=
array([[[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]],


       [[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]]], dtype=float32)>

In [24]:
rank_4_tensor.ndim, rank_4_tensor.shape

(4, TensorShape([2, 3, 4, 5]))

In [25]:
tf.size(rank_4_tensor)

<tf.Tensor: shape=(), dtype=int32, numpy=120>

In [26]:
# Get various attributes of tensor
print("Datatype of every element:", rank_4_tensor.dtype)
print("Number of dimensions (rank):", rank_4_tensor.ndim)
print("Shape of tensor:", rank_4_tensor.shape)
print("Elements along axis 0 of tensor:", rank_4_tensor.shape[0])
print("Elements along last axis of tensor:", rank_4_tensor.shape[-1])
print("Total number of elements (2*3*4*5):", tf.size(rank_4_tensor).numpy()) # .numpy() converts to NumPy array

Datatype of every element: <dtype: 'float32'>
Number of dimensions (rank): 4
Shape of tensor: (2, 3, 4, 5)
Elements along axis 0 of tensor: 2
Elements along last axis of tensor: 5
Total number of elements (2*3*4*5): 120


You can also index tensors just like Python lists.

In [27]:
# Get the first 2 items of each dimension
rank_4_tensor[:2, :2, :2, :2]

<tf.Tensor: shape=(2, 2, 2, 2), dtype=float32, numpy=
array([[[[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]]],


       [[[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]]]], dtype=float32)>

You can also add dimensions to your tensor whilst keeping the same information present using `tf.expand_dims()`. 

In [28]:
rank_2_tensor = tf.random.normal(shape=(3,2))
rank_2_tensor

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[ 0.08422458, -0.86090374],
       [ 0.37812304, -0.00519627],
       [-0.49453196,  0.6178192 ]], dtype=float32)>

In [29]:
tf.expand_dims(rank_2_tensor, axis=-1) # "-1" means last axis. This creates only a visual (no permanent change to rank_2_tensor)

<tf.Tensor: shape=(3, 2, 1), dtype=float32, numpy=
array([[[ 0.08422458],
        [-0.86090374]],

       [[ 0.37812304],
        [-0.00519627]],

       [[-0.49453196],
        [ 0.6178192 ]]], dtype=float32)>

In [30]:
rank_2_tensor

<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[ 0.08422458, -0.86090374],
       [ 0.37812304, -0.00519627],
       [-0.49453196,  0.6178192 ]], dtype=float32)>

In [31]:
tf.expand_dims(rank_2_tensor, axis=0)

<tf.Tensor: shape=(1, 3, 2), dtype=float32, numpy=
array([[[ 0.08422458, -0.86090374],
        [ 0.37812304, -0.00519627],
        [-0.49453196,  0.6178192 ]]], dtype=float32)>

## Manipulating tensors (tensor operations)

Finding patterns in tensors (numerical representation of data) requires manipulating them.

### Basic operations

You can perform many of the basic mathematical operations directly on tensors using Pyhton operators such as, `+`, `-`, `*`.

In [32]:
# You can add values to a tensor using the addition operator
tensor = tf.constant([[10, 7], [3, 4]])
tensor + 10

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[20, 17],
       [13, 14]], dtype=int32)>

Since we used `tf.constant()`, the original tensor is unchanged (the addition gets done on a copy).

In [33]:
# Original tensor unchanged
tensor

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[10,  7],
       [ 3,  4]], dtype=int32)>

Other operators also work.

In [34]:
# Multiplication (known as element-wise multiplication)
tensor * 10

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[100,  70],
       [ 30,  40]], dtype=int32)>

You can also use the equivalent TensorFlow function. Using the TensorFlow function (where possible) has the advantage of being sped up later down the line when running as part of a [TensorFlow graph](https://www.tensorflow.org/tensorboard/graphs).

In [35]:
# Use the tensorflow function equivalent of the '*' (multiply) operator
tf.multiply(tensor, 10)

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[100,  70],
       [ 30,  40]], dtype=int32)>

In [36]:
# The original tensor is still unchanged
tensor

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[10,  7],
       [ 3,  4]], dtype=int32)>

### Matrix mutliplication

One of the most common operations in machine learning algorithms is [matrix multiplication](https://www.mathsisfun.com/algebra/matrix-multiplying.html).

TensorFlow implements this matrix multiplication functionality in the [`tf.matmul()`](https://www.tensorflow.org/api_docs/python/tf/linalg/matmul) method.

The main two rules for matrix multiplication to remember are:
1. The inner dimensions must match:
  * `(3, 5) @ (3, 5)` won't work
  * `(5, 3) @ (3, 5)` will work
  * `(3, 5) @ (5, 3)` will work
2. The resulting matrix has the shape of the outer dimensions:
 * `(5, 3) @ (3, 5)` -> `(5, 5)`
 * `(3, 5) @ (5, 3)` -> `(3, 3)`

Note that the '`@`' in Python is the symbol for matrix multiplication.

In [37]:
tensor_1 = tf.random.uniform(shape = (15,15,15), minval=0, maxval=10000, dtype= "float32")
tensor_2 = tf.random.uniform(shape = (15,15,15), minval=0, maxval=10000, dtype= "float32")

In [38]:
tensor_1.shape

TensorShape([15, 15, 15])

In [39]:
tensor_2.shape

TensorShape([15, 15, 15])

In [40]:
# Matrix multiplication in TensorFlow
%time tf.matmul(tensor_1, tensor_2)

CPU times: user 11.4 ms, sys: 10.4 ms, total: 21.8 ms
Wall time: 20.9 ms


<tf.Tensor: shape=(15, 15, 15), dtype=float32, numpy=
array([[[2.7194602e+08, 2.7750950e+08, 2.8847434e+08, ...,
         3.5249142e+08, 3.1819619e+08, 3.5039738e+08],
        [3.9513885e+08, 3.3053933e+08, 3.5906589e+08, ...,
         3.9530554e+08, 4.5131386e+08, 4.6001632e+08],
        [3.7283363e+08, 3.4627782e+08, 3.2428986e+08, ...,
         2.8908496e+08, 4.0666998e+08, 3.7997731e+08],
        ...,
        [2.6728986e+08, 2.2160682e+08, 2.4824496e+08, ...,
         2.8139280e+08, 3.5775997e+08, 3.3616813e+08],
        [4.7329594e+08, 3.7599488e+08, 4.1667530e+08, ...,
         4.0835354e+08, 4.7946653e+08, 5.1834989e+08],
        [3.4945949e+08, 3.9131091e+08, 3.1678810e+08, ...,
         3.5079117e+08, 5.0881792e+08, 4.3157043e+08]],

       [[3.5271962e+08, 3.7655030e+08, 3.5104787e+08, ...,
         3.1650522e+08, 3.0104522e+08, 2.8863942e+08],
        [3.7785504e+08, 4.3542675e+08, 3.8117258e+08, ...,
         3.2819834e+08, 3.0877680e+08, 3.0437024e+08],
        [3.9537232e

In [41]:
# Matrix multiplication with Python operator '@'
%time tensor_1 @ tensor_2

CPU times: user 528 µs, sys: 0 ns, total: 528 µs
Wall time: 384 µs


<tf.Tensor: shape=(15, 15, 15), dtype=float32, numpy=
array([[[2.7194602e+08, 2.7750950e+08, 2.8847434e+08, ...,
         3.5249142e+08, 3.1819619e+08, 3.5039738e+08],
        [3.9513885e+08, 3.3053933e+08, 3.5906589e+08, ...,
         3.9530554e+08, 4.5131386e+08, 4.6001632e+08],
        [3.7283363e+08, 3.4627782e+08, 3.2428986e+08, ...,
         2.8908496e+08, 4.0666998e+08, 3.7997731e+08],
        ...,
        [2.6728986e+08, 2.2160682e+08, 2.4824496e+08, ...,
         2.8139280e+08, 3.5775997e+08, 3.3616813e+08],
        [4.7329594e+08, 3.7599488e+08, 4.1667530e+08, ...,
         4.0835354e+08, 4.7946653e+08, 5.1834989e+08],
        [3.4945949e+08, 3.9131091e+08, 3.1678810e+08, ...,
         3.5079117e+08, 5.0881792e+08, 4.3157043e+08]],

       [[3.5271962e+08, 3.7655030e+08, 3.5104787e+08, ...,
         3.1650522e+08, 3.0104522e+08, 2.8863942e+08],
        [3.7785504e+08, 4.3542675e+08, 3.8117258e+08, ...,
         3.2819834e+08, 3.0877680e+08, 3.0437024e+08],
        [3.9537232e

The TensorFlow operator is preferable, since it is more efficient and utilizes the GPU. However, if your Tensors are relatively small, Python might be faster than TF

In [42]:
X = tf.constant([1, 2, 3, 4, 5, 6], shape=(3,2))
Y = tf.constant([5, 6, 7, 8, 9, 10], shape=(3,2))
X, Y

(<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
 array([[1, 2],
        [3, 4],
        [5, 6]], dtype=int32)>,
 <tf.Tensor: shape=(3, 2), dtype=int32, numpy=
 array([[ 5,  6],
        [ 7,  8],
        [ 9, 10]], dtype=int32)>)

In [43]:
# This fails. Why?
tf.matmul(X, Y)

InvalidArgumentError: {{function_node __wrapped__MatMul_device_/job:localhost/replica:0/task:0/device:CPU:0}} Matrix size-incompatible: In[0]: [3,2], In[1]: [3,2] [Op:MatMul] name: 

Trying to matrix multiply two tensors with the shape `(3, 2)` errors because the inner dimensions don't match.

We need to either:
* Reshape X to `(2, 3)` so it's `(2, 3) @ (3, 2)`.
* Reshape Y to `(3, 2)` so it's `(3, 2) @ (2, 3)`.

We can do this with either:
* [`tf.reshape()`](https://www.tensorflow.org/api_docs/python/tf/reshape) - allows us to reshape a tensor into a defined shape.
* [`tf.transpose()`](https://www.tensorflow.org/api_docs/python/tf/transpose) - switches the dimensions of a given tensor.

In [44]:
# Example of reshape (3, 2) -> (2, 3)
tf.reshape(Y, shape=(2, 3))

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[ 5,  6,  7],
       [ 8,  9, 10]], dtype=int32)>

In [45]:
# Try matrix multiplication with reshaped Y
tf.matmul(X, tf.reshape(Y, shape=(2, 3)))

<tf.Tensor: shape=(3, 3), dtype=int32, numpy=
array([[21, 24, 27],
       [47, 54, 61],
       [73, 84, 95]], dtype=int32)>

It worked, let's try the same with a reshaped `X`, except this time we'll use [`tf.transpose()`](https://www.tensorflow.org/api_docs/python/tf/transpose) and `tf.matmul()`.

In [46]:
# Example of transpose (3, 2) -> (2, 3)
tf.transpose(X)

<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
array([[1, 3, 5],
       [2, 4, 6]], dtype=int32)>

In [47]:
# Try matrix multiplication 
tf.matmul(tf.transpose(X), Y)

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[ 71,  80],
       [ 92, 104]], dtype=int32)>

In [48]:
# You can achieve the same result with parameters
tf.matmul(a=X, b=Y, transpose_a=True, transpose_b=False)

<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[ 71,  80],
       [ 92, 104]], dtype=int32)>

### Changing the datatype of a tensor

Sometimes you'll want to alter the default datatype of your tensor. 
This is common when you want to compute using less precision (e.g. 16-bit floating point numbers vs. 32-bit floating point numbers).
Computing with less precision is useful on devices with less computing capacity such as mobile devices (because the less bits, the less space the computations require).

You can change the datatype of a tensor using [`tf.cast()`](https://www.tensorflow.org/api_docs/python/tf/cast).

In [49]:
# Create a new tensor with default datatype (float32)
B = tf.constant([1.7, 7.4])

# Create a new tensor with default datatype (int32)
C = tf.constant([1, 7])
B, C

(<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1.7, 7.4], dtype=float32)>,
 <tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 7], dtype=int32)>)

In [50]:
# Change from float32 to float16 (reduced precision)
B = tf.cast(B, dtype=tf.float16)
B

<tf.Tensor: shape=(2,), dtype=float16, numpy=array([1.7, 7.4], dtype=float16)>

In [51]:
# Change from int32 to float32
C = tf.cast(C, dtype=tf.float32)
C

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 7.], dtype=float32)>

### Getting the absolute value
Sometimes you'll want the absolute values (all values are positive) of elements in your tensors.

To do so, you can use [`tf.abs()`](https://www.tensorflow.org/api_docs/python/tf/math/abs).

In [52]:
# Create tensor with negative values
D = tf.constant([-7, -10])
D

<tf.Tensor: shape=(2,), dtype=int32, numpy=array([ -7, -10], dtype=int32)>

In [53]:
# Get the absolute values
tf.abs(D)

<tf.Tensor: shape=(2,), dtype=int32, numpy=array([ 7, 10], dtype=int32)>

### Finding the min, max, mean, sum (aggregation)

You can quickly aggregate (perform a calculation on a whole tensor) tensors to find things like the minimum value, maximum value, mean and sum of all the elements.

To do so, aggregation methods typically have the syntax `reduce()_[action]`, such as:
* [`tf.reduce_min()`](https://www.tensorflow.org/api_docs/python/tf/math/reduce_min) - find the minimum value in a tensor.
* [`tf.reduce_max()`](https://www.tensorflow.org/api_docs/python/tf/math/reduce_max) - find the maximum value in a tensor (helpful for when you want to find the highest prediction probability).
* [`tf.reduce_mean()`](https://www.tensorflow.org/api_docs/python/tf/math/reduce_mean) - find the mean of all elements in a tensor.
* [`tf.reduce_sum()`](https://www.tensorflow.org/api_docs/python/tf/math/reduce_sum) - find the sum of all elements in a tensor.
* **Note:** typically, each of these is under the `math` module, e.g. `tf.math.reduce_min()` but you can use the alias `tf.reduce_min()`.

Let's see them in action.

In [54]:
import numpy as np

In [55]:
# Create a tensor with 50 random values between 0 and 100
E = tf.constant(np.random.randint(low=0, high=100, size=50))
E

<tf.Tensor: shape=(50,), dtype=int64, numpy=
array([14, 86, 73, 85, 15, 97, 55, 33, 94, 89, 22, 36, 41, 98, 69, 38, 81,
       20, 99,  3, 31, 70, 71, 88,  8, 26, 98, 28, 43, 90, 72, 77, 52, 51,
       53, 68, 67, 16, 24, 95, 41, 38, 15, 55, 61, 35, 13, 31, 32, 35])>

In [56]:
# Find the minimum
tf.reduce_min(E)

<tf.Tensor: shape=(), dtype=int64, numpy=3>

In [57]:
# Find the maximum
tf.reduce_max(E)

<tf.Tensor: shape=(), dtype=int64, numpy=99>

In [58]:
# Find the mean
tf.reduce_mean(E)

<tf.Tensor: shape=(), dtype=int64, numpy=52>

In [59]:
# Find the sum
tf.reduce_sum(E)

<tf.Tensor: shape=(), dtype=int64, numpy=2632>

You can also find the standard deviation ([`tf.reduce_std()`](https://www.tensorflow.org/api_docs/python/tf/math/reduce_std)) and variance ([`tf.reduce_variance()`](https://www.tensorflow.org/api_docs/python/tf/math/reduce_variance)) of elements in a tensor using similar methods.

### Finding the positional maximum and minimum

How about finding the position a tensor where the maximum value occurs?
This is helpful when you want to line up your labels (say `['Green', 'Blue', 'Red']`) with your prediction probabilities tensor (e.g. `[0.98, 0.01, 0.01]`).
In this case, the predicted label (the one with the highest prediction probability) would be `'Green'`.

You can do the same for the minimum (if required) with the following:
* [`tf.argmax()`](https://www.tensorflow.org/api_docs/python/tf/math/argmax) - find the position of the maximum element in a given tensor.
* [`tf.argmin()`](https://www.tensorflow.org/api_docs/python/tf/math/argmin) - find the position of the minimum element in a given tensor.

In [60]:
# Create a tensor with 50 values between 0 and 1
F = tf.constant(np.random.random(50))
F

<tf.Tensor: shape=(50,), dtype=float64, numpy=
array([3.85802450e-01, 5.72019967e-01, 9.15697702e-01, 3.08174081e-01,
       5.07832280e-01, 3.34058818e-01, 9.80649264e-01, 6.70443911e-02,
       8.63145490e-01, 5.96403991e-01, 6.46238595e-02, 3.80004533e-01,
       1.14830702e-01, 4.53402828e-04, 5.70652248e-01, 9.17387741e-02,
       3.42918746e-01, 5.26456507e-01, 4.91525695e-01, 6.75777034e-01,
       8.58464227e-01, 8.65512326e-01, 1.67261050e-03, 6.67010464e-01,
       3.16165321e-01, 9.00021579e-01, 1.52068725e-01, 4.43903063e-01,
       4.05180560e-01, 3.95162920e-01, 4.46708907e-01, 5.92953232e-01,
       3.20772550e-01, 2.59896550e-01, 5.68515623e-01, 5.09625593e-01,
       6.44490117e-01, 5.07926281e-01, 3.12117487e-01, 5.58946636e-01,
       4.06078519e-01, 5.60474059e-02, 5.79395563e-01, 1.31039144e-01,
       3.33600962e-01, 8.18625669e-02, 9.53019060e-01, 7.97164754e-01,
       4.39310086e-01, 6.74488811e-01])>

In [61]:
# Find the maximum element position of F
tf.argmax(F)

<tf.Tensor: shape=(), dtype=int64, numpy=6>

In [62]:
# Find the minimum element position of F
tf.argmin(F)

<tf.Tensor: shape=(), dtype=int64, numpy=13>

In [63]:
# Find the maximum element position of F
print(f"The maximum value of F is at position: {tf.argmax(F).numpy()}") 
print(f"The maximum value of F is: {tf.reduce_max(F).numpy()}") 
print(f"Using tf.argmax() to index F, the maximum value of F is: {F[tf.argmax(F)].numpy()}")
print(f"Are the two max values the same (they should be)? {F[tf.argmax(F)].numpy() == tf.reduce_max(F).numpy()}")

The maximum value of F is at position: 6
The maximum value of F is: 0.9806492641557445
Using tf.argmax() to index F, the maximum value of F is: 0.9806492641557445
Are the two max values the same (they should be)? True


### Squeezing a tensor (removing all single dimensions)

If you need to remove single-dimensions from a tensor (dimensions with size 1), you can use `tf.squeeze()`.

* [`tf.squeeze()`](https://www.tensorflow.org/api_docs/python/tf/squeeze) - remove all dimensions of 1 from a tensor.


In [64]:
# Create a rank 5 (5 dimensions) tensor of 50 numbers between 0 and 100
G = tf.constant(np.random.randint(0, 100, 50), shape=(1, 1, 1, 1, 50))
G.shape, G.ndim

(TensorShape([1, 1, 1, 1, 50]), 5)

In [65]:
# Squeeze tensor G (remove all 1 dimensions)
G_squeezed = tf.squeeze(G)
G_squeezed.shape, G_squeezed.ndim

(TensorShape([50]), 1)

In [66]:
# Create a list of indices
some_list = [0, 1, 0, 3]

# One hot encode them
tf.one_hot(some_list, depth=3)

<tf.Tensor: shape=(4, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 0.]], dtype=float32)>

You can also specify values for `on_value` and `off_value` instead of the default `0` and `1`.

In [67]:
# Specify custom values for on and off encoding
tf.one_hot(some_list, depth=4, on_value="We're live!", off_value="Offline")

<tf.Tensor: shape=(4, 4), dtype=string, numpy=
array([[b"We're live!", b'Offline', b'Offline', b'Offline'],
       [b'Offline', b"We're live!", b'Offline', b'Offline'],
       [b"We're live!", b'Offline', b'Offline', b'Offline'],
       [b'Offline', b'Offline', b'Offline', b"We're live!"]], dtype=object)>

In [68]:
# Create a new tensor
H = tf.constant(np.arange(1, 10))
H

<tf.Tensor: shape=(9,), dtype=int64, numpy=array([1, 2, 3, 4, 5, 6, 7, 8, 9])>

In [69]:
# Square it
tf.square(H)

<tf.Tensor: shape=(9,), dtype=int64, numpy=array([ 1,  4,  9, 16, 25, 36, 49, 64, 81])>

In [70]:
# Find the squareroot (will error), needs to be non-integer
tf.sqrt(H)

InvalidArgumentError: Value for attr 'T' of int64 is not in the list of allowed values: bfloat16, half, float, double, complex64, complex128
	; NodeDef: {{node Sqrt}}; Op<name=Sqrt; signature=x:T -> y:T; attr=T:type,allowed=[DT_BFLOAT16, DT_HALF, DT_FLOAT, DT_DOUBLE, DT_COMPLEX64, DT_COMPLEX128]> [Op:Sqrt] name: 

In [71]:
# Change H to float32
H = tf.cast(H, dtype=tf.float32)
H

<tf.Tensor: shape=(9,), dtype=float32, numpy=array([1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float32)>

In [72]:
# Find the square root
tf.sqrt(H)

<tf.Tensor: shape=(9,), dtype=float32, numpy=
array([1.       , 1.4142135, 1.7320508, 2.       , 2.2360678, 2.4494896,
       2.6457512, 2.828427 , 3.       ], dtype=float32)>

In [73]:
# Find the log (input also needs to be float)
tf.math.log(H)

<tf.Tensor: shape=(9,), dtype=float32, numpy=
array([0.       , 0.6931472, 1.0986123, 1.3862944, 1.609438 , 1.7917595,
       1.9459102, 2.0794415, 2.1972246], dtype=float32)>

In [74]:
# Create a variable tensor
I = tf.Variable(np.arange(0, 5))
I

<tf.Variable 'Variable:0' shape=(5,) dtype=int64, numpy=array([0, 1, 2, 3, 4])>

In [75]:
# Assign the final value a new value of 50
I.assign([0, 1, 2, 3, 50])

<tf.Variable 'UnreadVariable' shape=(5,) dtype=int64, numpy=array([ 0,  1,  2,  3, 50])>

In [76]:
# The change happens in place (the last value is now 50, not 4)
I

<tf.Variable 'Variable:0' shape=(5,) dtype=int64, numpy=array([ 0,  1,  2,  3, 50])>

In [77]:
# Add 10 to every element in I
I.assign_add([10, 10, 10, 10, 10])

<tf.Variable 'UnreadVariable' shape=(5,) dtype=int64, numpy=array([10, 11, 12, 13, 60])>

In [78]:
# Again, the change happens in place
I

<tf.Variable 'Variable:0' shape=(5,) dtype=int64, numpy=array([10, 11, 12, 13, 60])>

## Tensors and NumPy

Tensors can be converted to NumPy arrays using:

* `np.array()` - pass a tensor to convert to an ndarray (NumPy's main datatype).
* `tensor.numpy()` - call on a tensor to convert to an ndarray.

Doing this is helpful as it makes tensors iterable as well as allows us to use any of NumPy's methods on them.

In [79]:
import numpy as np

In [80]:
# Create a tensor from a NumPy array
J = tf.constant(np.array([3., 7., 10.]))
J

<tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 3.,  7., 10.])>

In [81]:
# Convert tensor J to NumPy with np.array()
np.array(J), type(np.array(J))

(array([ 3.,  7., 10.]), numpy.ndarray)

In [82]:
# Convert tensor J to NumPy with .numpy()
J.numpy(), type(J.numpy())

(array([ 3.,  7., 10.]), numpy.ndarray)

By default tensors have `dtype=float32`, where as NumPy arrays have `dtype=float64`.

This is because neural networks (which are usually built with TensorFlow) can generally work very well with less precision (32-bit rather than 64-bit).

In [83]:
# Create a tensor from NumPy and from an array
numpy_J = tf.constant(np.array([3., 7., 10.])) # will be float64 (due to NumPy)
tensor_J = tf.constant([3., 7., 10.]) # will be float32 (due to being TensorFlow default)
numpy_J.dtype, tensor_J.dtype

(tf.float64, tf.float32)