# Tensor

# Torch

`torch` is a python library that provides all the necessary tools needed for building and training deep neural nets.

# Tensor

A `tensor` is the fundamental data structure that `torch` uses to store numerical data and perform numerical operations. `torch` enables us to utilize GPUs for faster computation, which is crucial for deep neural nets. Therefore, it is important to spend sometime to get comfortable with `tensor`. 

Before going further, I would like to show the inheritance chain of the `tensor` class. In many deep learning codebases, you will notice multi-level inheritance. Using the same technique shown below, you can find out the inheritance chain of such classes, which can help you better understand what's going on under the hood.

# Inheritance Chain

Let's see the inheritance chain of `tensor` and `ndarray`.

In [2]:
import torch
import numpy as np

# Inheritance
print(f"tensor inheritance: {torch.tensor([]).__class__.mro()}")
print(f"ndarray inheritance: {np.array([]).__class__.mro()}")

tensor inheritance: [<class 'torch.Tensor'>, <class 'torch._C.TensorBase'>, <class 'object'>]
ndarray inheritance: [<class 'numpy.ndarray'>, <class 'object'>]


Now, the first thing to learn about any python class is to learn how to initialize it.

# Initialization

Most of the class has `__init__` method for this purpose and they will ideally have some helper docstring which can be accessed using `?`.

In [25]:
torch.tensor?

[31mDocstring:[39m
tensor(data, *, dtype=None, device=None, requires_grad=False, pin_memory=False) -> Tensor

Constructs a tensor with no autograd history (also known as a "leaf tensor", see :doc:`/notes/autograd`) by copying :attr:`data`.


    When working with tensors prefer using :func:`torch.Tensor.clone`,
    :func:`torch.Tensor.detach`, and :func:`torch.Tensor.requires_grad_` for
    readability. Letting `t` be a tensor, ``torch.tensor(t)`` is equivalent to
    ``t.detach().clone()``, and ``torch.tensor(t, requires_grad=True)``
    is equivalent to ``t.detach().clone().requires_grad_(True)``.

.. seealso::

    :func:`torch.as_tensor` preserves autograd history and avoids copies where possible.
    :func:`torch.from_numpy` creates a tensor that shares storage with a NumPy array.

Args:
    data (array_like): Initial data for the tensor. Can be a list, tuple,
        NumPy ``ndarray``, scalar, and other types.

Keyword args:
    dtype (:class:`torch.dtype`, optional): the desir

For now, let's just focus on the `data` parameter which takes numerical data in some std form like `list`, `tuple`, `int`, `float`, `np.ndarray`

In [41]:
x = torch.tensor(1.0)
print(x, x.dtype, x.device)

x = torch.tensor(1)
print(x, x.dtype, x.device)

tensor(1.) torch.float32 cpu
tensor(1) torch.int64 cpu


By default, `data` is stored in the cpu and the `dtype` is automatically inferred based on the element stored. It is often a good practice to specify these to reduce unknowns.

In [45]:
print(torch.tensor(10.0, dtype=torch.float32, device=torch.device('mps:0')))
print(torch.tensor([1, 2, 3]))
print(torch.tensor((1, 2, 3), dtype=torch.float32))
print(torch.tensor(1.0))
print(torch.tensor(np.array(1.0)))

tensor(10., device='mps:0')
tensor([1, 2, 3])
tensor([1., 2., 3.])
tensor(1.)
tensor(1., dtype=torch.float64)


Instead of using the `__init__` method, there are several other `torch` APIs that can help instantiate `tensor`.

In [59]:
print(torch.arange(0, 10, 2))
print(torch.linspace(0, 10, 11))
print(torch.zeros(10))
print(torch.ones((2, 2)))
print(torch.randn(10))
print(torch.randint(0, 10, size=(10,)))

tensor([0, 2, 4, 6, 8])
tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
tensor([[1., 1.],
        [1., 1.]])
tensor([ 0.1218,  0.5997,  0.7335,  0.3472, -0.1635,  0.5211,  1.0149, -0.6709,
        -1.7268,  0.7381])
tensor([4, 9, 3, 0, 3, 8, 3, 5, 2, 4])


Feel free to use `?` to check the helper docstring.

Once the tensor is created, they also have some properties that are useful to us.

# Properties

In [81]:
t = torch.randn((4, 2)) # [0, 1, 2, ..., 9]
print(t)
print(t.numel()) # Number of elements
print(t.shape) # Shape of the tensor
print(len(t.shape)) # Order of the tensor

tensor([[-0.3673, -0.6593],
        [-0.1490,  0.1224],
        [ 1.1192, -0.5757],
        [ 1.4655, -0.1193]])
8
torch.Size([4, 2])
2


Now that we know how to create a `tensor` and how to access its properties. The next thing to look into is how to extract an element or a slice from an existing `tensor`. This is also called *indexing* / *slicing*.

# Indexing / Slicing

`tensor` has same way of indexing as python `list` i.e. using the bracket [] notation. The index count starts from 0 and not 1.

In [66]:
x = torch.randn(10)
print(x)
print(x[:])
print(x[3]) # 4th element
print(x[0]) # 1st element
print(x[3:]) # 4th element to the end
print(x[:3]) # 1st element to 3rd (index 3 not included)
print(x[0:5])
print(x[:-4]) # - is shorthand for len(x) - 4
print(x[-3:])

tensor([ 0.8268, -0.7101,  1.6207, -0.9045, -0.8997, -0.4761,  0.4710,  0.4985,
        -1.3404,  1.1091])
tensor([ 0.8268, -0.7101,  1.6207, -0.9045, -0.8997, -0.4761,  0.4710,  0.4985,
        -1.3404,  1.1091])
tensor(-0.9045)
tensor(0.8268)
tensor([-0.9045, -0.8997, -0.4761,  0.4710,  0.4985, -1.3404,  1.1091])
tensor([ 0.8268, -0.7101,  1.6207])
tensor([ 0.8268, -0.7101,  1.6207, -0.9045, -0.8997])
tensor([ 0.8268, -0.7101,  1.6207, -0.9045, -0.8997, -0.4761])
tensor([ 0.4985, -1.3404,  1.1091])


For `tensor` of higer order, you can use comma for each of the axes.

:::{note}
Indexing will always return a `tensor` object even if it is single value. You can use `.item()` method to return a python int/float.
:::

In [69]:
x[3], x[3].item()

(tensor(-0.9045), -0.9045100808143616)

So we can now access element/slice from a `tensor`, we should know how to update their values.

# Update Tensor Elements

In [72]:
T = torch.rand(10)
print(T)

T[3:] = 10.0

print(T)

tensor([0.4537, 0.4553, 0.9321, 0.3476, 0.2939, 0.2094, 0.2888, 0.6108, 0.8624,
        0.7129])
tensor([ 0.4537,  0.4553,  0.9321, 10.0000, 10.0000, 10.0000, 10.0000, 10.0000,
        10.0000, 10.0000])


# Operations

Transformation simply means to take `tensor` object(s) and returns a new tensor (transformed tensor).

Performing `tensor` operation is much faster due to C/C++ loop and GPU acceleration than using python `for` loop.

In [74]:
# Unary ops
t = torch.arange(10)
print(t)
print(torch.sin(t))
print(torch.cos(t))
print(torch.tan(t))
print(torch.exp(t))
print(torch.log(t)) # base is `e`
print(torch.log10(t))

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor([ 0.0000,  0.8415,  0.9093,  0.1411, -0.7568, -0.9589, -0.2794,  0.6570,
         0.9894,  0.4121])
tensor([ 1.0000,  0.5403, -0.4161, -0.9900, -0.6536,  0.2837,  0.9602,  0.7539,
        -0.1455, -0.9111])
tensor([ 0.0000,  1.5574, -2.1850, -0.1425,  1.1578, -3.3805, -0.2910,  0.8714,
        -6.7997, -0.4523])
tensor([1.0000e+00, 2.7183e+00, 7.3891e+00, 2.0086e+01, 5.4598e+01, 1.4841e+02,
        4.0343e+02, 1.0966e+03, 2.9810e+03, 8.1031e+03])
tensor([  -inf, 0.0000, 0.6931, 1.0986, 1.3863, 1.6094, 1.7918, 1.9459, 2.0794,
        2.1972])
tensor([  -inf, 0.0000, 0.3010, 0.4771, 0.6021, 0.6990, 0.7782, 0.8451, 0.9031,
        0.9542])


In [77]:
# Binary ops
t1 = torch.tensor([1, 2, 3])
t2 = torch.tensor([3, 4, 3])

# Arithmetic
print(t1, t2)
print(t1 + t2) 
print(t1 - t2)
print(t1 * t2)
print(t1 / t2)
print(t1 ** t2)
print(t1 // t2)
print(t1 % t2)

tensor([1, 2, 3]) tensor([3, 4, 3])
tensor([4, 6, 6])
tensor([-2, -2,  0])
tensor([3, 8, 9])
tensor([0.3333, 0.5000, 1.0000])
tensor([ 1, 16, 27])
tensor([0, 0, 1])
tensor([1, 2, 0])


In [78]:
# Comparison
print(t1, t2)
print(t1 > t2)
print(t1 >= t2)
print(t1 < t2)
print(t1 <= t2)
print(t1 == t2)
print(t1 != t2)

tensor([1, 2, 3]) tensor([3, 4, 3])
tensor([False, False, False])
tensor([False, False,  True])
tensor([ True,  True, False])
tensor([True, True, True])
tensor([False, False,  True])
tensor([ True,  True, False])


In [79]:
# Concatenation
print(t1, t2)
print(torch.cat((t1, t2)))

tensor([1, 2, 3]) tensor([3, 4, 3])
tensor([1, 2, 3, 3, 4, 3])


So far, we have seen that most tensor operations are performed elementwise. In the case of a unary operation, each element of a tensor is transformed independently. In the case of a binary operation, the transformation is applied to corresponding elements from two tensors of the same shape. However, there are situations where operations can still be applied to tensors of different shapes. The mechanism that makes this possible is called *broadcasting*.

# Broadcasting

*Broadcating* is an implicit intermediate step where one or both tensors are expanded to compatible shapes. This hidden step is the source of many subtle and hard-to-debug errors if broadcasting rules are not properly understood.

Here is the broadcating rule: 
1. Align shapes from the right
2. For each dimension: Dimensions are compatible if they are equal, or one of them is 1
3. If compatible, the tensor with size 1 is broadcast (virtually repeated)

In [68]:
# =======================================
# [Ankit Anand]
# Broadcasting
# =======================================

# Example 1
t1 = torch.randint(0, 5, (2, 4)) # 2X4
t2 = torch.randint(0, 5, (1, 4)) # 1X4
t3 = torch.randint(0, 5, (2, 1)) # 2X1
t4 = torch.randint(0, 5, (4, )) # 4
t5 = torch.randint(0, 5, (4, 1)) # 4X1

print(f"t1: {t1}")
print(f"t2: {t2}")
print(f"t3: {t3}")

""" 
Mechanism

====== t1 + t2 ======
t1: 2 4
t2: 1 4 <- t1.shape[-1] = 4 matches with t2.shape[-1] = 4
t2: 1 4 <- t1.shape[-2] = 2 does not match with t2.shape[-2] = 1 (but it is 1 so possibility of broadcast)
t2: 2 4 <- Virtually repeat along vertically to make it 2 4 => Apply binary operation!!

====== t1 + t3 ======
t1: 2 4
t3: 2 1 => Virtually repeat horizontally to make it 2 4 => Apply binary operation!!

====== t1 + t4 ======
t1: 2 4
t4:   4 <- Matches
t4: 1 4 <- Understood as 1
t4: 2 4 <- Virtually Repeat => Apply binary operation!!

====== t1 + t5 ======
t1: 2 4
t5: 4 1 <- 1 does not match 4 (but it is one so repeat)
t5: 4 4 <- Shapes do not match => Throw error
"""
print(f"t1 + t2: {t1 + t2}")
print(f"t1 + t3: {t1 + t3}")
print(f"t1 + t4: {t1 + t4}")
print(f"t1 + t5: {t1 + t5}")

t1: tensor([[1, 2, 1, 0],
        [2, 3, 1, 2]])
t2: tensor([[1, 2, 0, 0]])
t3: tensor([[2],
        [1]])
t1 + t2: tensor([[2, 4, 1, 0],
        [3, 5, 1, 2]])
t1 + t3: tensor([[3, 4, 3, 2],
        [3, 4, 2, 3]])
t1 + t4: tensor([[4, 5, 2, 1],
        [5, 6, 2, 3]])


RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 0

Another very important operation is called `reshape` which is heavily used in the deep learning architectures. Thus it is important to understand how it works.

# Reshape

Say, you have a $2 \times 3$ tensor $t$ and you want to reshape it to $3 \times 2$. This is only possible if they have same number of elements (`.numel()`).

The tensor to be reshaped is first interpreted as a one-dimensional sequence in row-major (C-style) order:
$[t[0,0], t[0,1], t[0,2], t[1,0], t[1,1], t[1,2]]$

The values are then placed sequentially (row-major) into the new $3 \times 2$ shape.

In [80]:
t = torch.rand((2, 3))

print(t)
print(t.reshape(3, 2))
print(t.reshape(3, -1)) # It automatically estimates the shape based on numel

tensor([[0.7100, 0.1680, 0.1622],
        [0.3576, 0.7814, 0.7145]])
tensor([[0.7100, 0.1680],
        [0.1622, 0.3576],
        [0.7814, 0.7145]])
tensor([[0.7100, 0.1680],
        [0.1622, 0.3576],
        [0.7814, 0.7145]])


# Inplace

This is something very subtle yet very important. To understand, let's make some quick observations.

In [84]:
# =======================================
# [Ankit Anand]
# Inplace operation
# =======================================

t1 = torch.tensor([1, 2, 3])
t2 = torch.tensor([4, 5, 6])
print(id(t1), id(t2))

t1 = t1 + t2
print(id(t1), id(t2))

4649866096 4761591088
4649868880 4761591088


You can observe that `t1` points to different memory address before and after the operation. This suggest that the operation was not inplace and has allocated another memory block.

In [85]:
t1 = torch.tensor([1, 2, 3])
t2 = torch.tensor([4, 5, 6])
print(id(t1), id(t2))

t1[:] = t1 + t2
print(id(t1), id(t2))

t1 = torch.tensor([1, 2, 3])
t2 = torch.tensor([4, 5, 6])
print(id(t1), id(t2))

t1 += t2
print(id(t1), id(t2))

4649866096 4649868880
4649866096 4649868880
4761415472 4761591088
4761415472 4761591088


Here you can observe that `t1` points to the same memory address before and after. This is recommended.

# ndarray $\rightarrow$ tensor

In [13]:
ndarray = np.array([1, 2, 3])

t_from_ndarray1 = torch.tensor(ndarray)
t_from_ndarray1[0] = 2
print(ndarray, t_from_ndarray1)

t_from_ndarray2 = torch.from_numpy(ndarray) # Shares same underlying memory, meaning changing the tensor elements will change ndarray element
t_from_ndarray2[0] = 2
print(ndarray, t_from_ndarray2)

[1 2 3] tensor([2, 2, 3])
[2 2 3] tensor([2, 2, 3])
