## PyTorch tensors

##### A tensor is a “n dimensional array” (0 to n dimension). Scalars, vectors, and matrices are examples of tensors.
* scalar: 0 dimension --> point --> rank 0 tensor
* vector: 1 dimension --> line --> rank 1 tensor
* matrix: 2 dimension --> plane --> rank 2 tensor
* tensor: n dimensional array --> rank n tensor

In [None]:
import torch

In [None]:
print(torch.__version__)

2.0.0+cu118


### PyTorch.tensor data types

Data is stored as numbers with default data type being floating point numbers. PyTorch supports 
several numerical data types. Data types can be defined basically, using one of the three following different ways:

1) At creation time by providing the dtype argument to the constructor method,

2) by calling the `torch.Tensor.to(device=None, dtype=None)` method and providing the dtype argument,

3) or through casting functions. e.g.: `torch.Tensor.double(memory_format=torch.preserve_format)`; `Tensor.bool(memory_format=torch.preserve_format)`; `Tensor.int(memory_format=torch.preserve_format)`; ecc.

Call `torch.get_default_dtype()` to find out what is the default data type.

And `torch.can_cast(from, to)` to find out if the type conversion is allowed under PyTorch casting rules. (https://pytorch.org/docs/stable/tensor_attributes.html#type-promotion-doc)

In [None]:
# print default dtype
print(f'default dtype: {torch.get_default_dtype()}')

print('\n')
A = torch.tensor([[[True], [True], [True], [False]]], dtype=torch.bool)
print(A)
print(A.dtype)

print('\n')
B = A.to(torch.int16)
print(B)
print(B.dtype)

print('\n')
C = B.float()
print(C)
print(C.dtype)

default dtype: torch.float32


tensor([[[ True],
         [ True],
         [ True],
         [False]]])
torch.bool


tensor([[[1],
         [1],
         [1],
         [0]]], dtype=torch.int16)
torch.int16


tensor([[[1.],
         [1.],
         [1.],
         [0.]]])
torch.float32


In [None]:
print(torch.can_cast(torch.int, torch.float))
print(torch.can_cast(torch.float, torch.double))
print(torch.can_cast(torch.double, torch.float))

print(torch.can_cast(torch.float, torch.int))
print(torch.can_cast(torch.float, torch.bool))
print(torch.can_cast(torch.int, torch.bool))

print(torch.can_cast(torch.bool, torch.int))
print(torch.can_cast(torch.bool, torch.float))

True
True
True
False
False
False
True
True


### Creation ops (Constructors)
Creation ops are methods used to instantiate a tensor.

To learn more about creation ops: https://pytorch.org/docs/stable/torch.html#tensor-creation-ops

* To create a tensor with pre-existing data

`torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False)` -- Constructs a leaf tensor (tensor with no autograd history)

* To create a tensor with a specific size filled with initialized, memory, data

`torch.empty(size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False, pin_memory=False, memory_format=torch.contiguous_format)`

* To create a tensor with specific size filled with a scalar

`torch.zeros(size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)`

`torch.ones(size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)`

`torch.rand(size, generator=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False, pin_memory=False)`

`torch.full(size, fill_value, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)`

* Others possibilities:

`torch.as_tensor()` -- Preserves autograd history and avoids copies where possible.

`torch.from_numpy()` -- Creates a tensor that shares storage with a NumPy array.


In [None]:
a = torch.tensor([1.0, 4.0, 3.0, 2.0, 5.0, 1.0])
b = torch.tensor([[2.0, 4.0], [1.0, 5.0]])
print(a)
print(b)

tensor([1., 4., 3., 2., 5., 1.])
tensor([[2., 4.],
        [1., 5.]])


In [None]:
a = torch.empty(1, 6) # return a tensor filled with unitialized data
b = torch.zeros(2, 2) # filled with zeros
c = torch.ones(2, 3) # filled with ones
d = torch.rand(2, 3) # filled with random values
print(a, '\n', b, '\n', c, '\n', d)

tensor([[ 1.1383e-16,  4.5771e-41,  1.0861e-34,  0.0000e+00, -5.4542e+19,
          4.5769e-41]]) 
 tensor([[0., 0.],
        [0., 0.]]) 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 
 tensor([[0.8105, 0.3995, 0.7601],
        [0.7168, 0.2727, 0.1475]])


In [None]:
g = torch.full((3,2), fill_value = 13)
print(g)

tensor([[13, 13],
        [13, 13],
        [13, 13]])


In [None]:
# torch.as_tensor(data, dtype=None, device=None) -- Preserves autograd history and avoids copies where possible
# data argument can be a list, tuple, NumPy ndarray, scalar, and other types

X = torch.as_tensor([[5,7,8],[1,3,4]])
print(X)

import numpy as np
Y_array = np.array([13., 15., 11.])
print(f'\n{type(Y_array)}\n{Y_array}')

# torch.from_numpy(ndarray) -- Creates a tensor that shares storage with a NumPy array
# Limitations: Modifications to the tensor will be reflected in the ndarray and vice versa. The returned tensor is not resizable.

Y_tensor = torch.from_numpy(Y_array)
print(f'\n{type(Y_tensor)}\n{Y_tensor}')

# use torch.Tensor.data_ptr() method to check if the memory allocations matchs
print(f'\n\'Y_array\' and \'Y_tensor\' share same storage: {Y_tensor[0].data_ptr() == Y_tensor[0].data_ptr()}')


tensor([[5, 7, 8],
        [1, 3, 4]])

<class 'numpy.ndarray'>
[13. 15. 11.]

<class 'torch.Tensor'>
tensor([13., 15., 11.], dtype=torch.float64)

'Y_array' and 'Y_tensor' share same storage: True


#### `torch.*_like()` methods
methods with `*_like` are used to reproduce the shape of the input tensor. They return a tensor of same size as input tensor filled with scalar value 0 (`torch.zeros_like(input, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)`) or 1 (`torch.ones_like(input, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)`).

In [None]:
x = torch.zeros_like(a)
print(x)
# the returned tensor is not a view of the input tensor
print(f'\'x\' and \'a\' share same storage: {x[0].data_ptr() == a[0].data_ptr()}\n')

y = torch.ones_like(b)
print(y)
# the returned tensor is not a view of the input tensor
print(f'\'y\' and \'b\' share same storage: {y[0].data_ptr() == b[0].data_ptr()}')

tensor([[0., 0., 0., 0., 0., 0.]])
'x' and 'a' share same storage: False

tensor([[1., 1.],
        [1., 1.]])
'y' and 'b' share same storage: False


### Tensor Attributes, Properties, and Built-in methods

#### attributes
`torch.dtype`

`torch.device`

`torch.layout`

`torch.Tensor.grad`

`torch.Tensor.requires_grad`

#### properties
`torch.Tensor.storage()`

`torch.Tensor.shape`

#### built-in methods
`torch.Tensor.size(dim=None)`

`torch.Tensor.storage_offset()`

`torch.Tensor.stride(dim)`


In [None]:
y = 2
a = torch.as_tensor(y)

In [None]:
print(f'dtype: {a.dtype}; device: {a.device}\n')

print(f'Storage:\n{a.storage()}\n')

print(f'Shape: {a.shape}\n')

print(f'size: {a.size()}; storage_offset: {a.storage_offset()}; stride: {a.stride()}\n')

dtype: torch.int64; device: cpu

Storage:
 2
[torch.storage.TypedStorage(dtype=torch.int64, device=cpu) of size 1]

Shape: torch.Size([])

size: torch.Size([]); storage_offset: 0; stride: ()



  print(f'Storage:\n{a.storage()}\n')


In [None]:
x = torch.rand(2, 3)
print(f'dtype: {x.dtype}; device: {x.device}; layout: {x.layout}\n')

print(f'grad: {x.grad}; requires_grad: {x.requires_grad}\n')

print(f'Storage:\n{x.storage()}\n')

print(f'Shape: {x.shape}\n')

print(f'size: {x.size()}; storage_offset: {x.storage_offset()}; stride: {x.stride()}\n')


dtype: torch.float32; device: cpu; layout: torch.strided

grad: None; requires_grad: False

Storage:
 0.34831613302230835
 0.21174710988998413
 0.8232155442237854
 0.8076144456863403
 0.9091317057609558
 0.3293258547782898
[torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 6]

Shape: torch.Size([2, 3])

size: torch.Size([2, 3]); storage_offset: 0; stride: (3, 1)



  print(f'Storage:\n{x.storage()}\n')


### Storage, size, stride, and storage offset

The `storage` is the memory allocated to store the tensor data. It’s a contiguous block of memory by default. Every tensor has an attribute storage which holds its data. Several tensors can have the same storage.

The tensor `size` is determined by the number of elements within each dimension. For instance a 2-D tensor has size defined by its number of rows (i) and its number of columns (j), i.e `2d_tensor.size() = tensor(i,j)`.

The tensor `stride` is the number of locations in the storage between the elements at the beginning of successive arrays + 1 along each dimension (e.g. rows or columns in a 2-D tensor).

The `storage_offset` of a tensor is the corresponding storage index of the first element of the tensor.
Use the method `torch.as_strided(input, size, stride, storage_offset=None)` to modify a tensor’s size, stride, and storage_offset.


In [None]:
w = torch.tensor([7.0, 4.0, 5.0, 2.0, 5.0, 1.0])
print(f'Storage:\n{w.storage()}\n')
print(f'Storage Memory address:\n{id(w.storage())}\n')
#print(f'Storage Memory address:\n{id(w.untyped_storage())}') #use untyped_storage in place of storage to avoid warnings

Storage:
 7.0
 4.0
 5.0
 2.0
 5.0
 1.0
[torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 6]

Storage Memory address:
140435363776736



  print(f'Storage:\n{w.storage()}\n')
  print(f'Storage Memory address:\n{id(w.storage())}\n')


In [None]:
print(f'Address of first element in Storage:\n{w.untyped_storage().data_ptr()}\n')

Address of first element in Storage:
116384384



In [None]:
# original tensor
print(f'original tensor:\n{w}')
print(f'Size: {w.size()}')
print(f'Size: {w.shape}')
print(f'Storage_offset: {w.storage_offset()}')
print(f'Stride: {w.stride()}\n')

# r view
# change storage_offset with torch.as_strided()
r = torch.as_strided(w, (2,2), (2,2), 1)
print(f'r view:\n{r}')
print(f'Size: {r.size()}')
print(f'Shape: {r.shape}')
print(f'Storage_offset: {r.storage_offset()}')
print(f'Stride: {r.stride()}')
# use torch.Tensor.data_ptr() method to verify that if new tensor share same storage with original tensor
print(f'\'r\' and \'w\' share same storage: {r.untyped_storage().data_ptr() == w.untyped_storage().data_ptr()}\n')

# k view
# change storage_offset with torch.as_strided()
k = torch.as_strided(w, (2,3), (2,1), 1)
print(f'k view:\n{k}')
print(f'Size: {k.size()}')
print(f'Size: {k.shape}')
print(f'Storage_offset: {k.storage_offset()}')
print(f'Stride: {k.stride()}')
# use torch.Tensor.data_ptr() method to verify that new tensor share same storage with original tensor
print(f'\'k\' and \'w\' share same storage: {k.untyped_storage().data_ptr() == w.untyped_storage().data_ptr()}\n')

# z view
# change storage_offset with torch.as_strided()
z = torch.as_strided(w, (2,2), (2,1), 2)
print(f'z view:\n{z}')
print(f'Size: {z.size()}')
print(f'Shape: {z.shape}')
print(f'Storage_offset: {z.storage_offset()}')
print(f'Stride: {z.stride()}')
# use torch.Tensor.data_ptr() method to verify that if new tensor share same storage with original tensor
print(f'\'z\' and \'w\' share same storage: {z.untyped_storage().data_ptr() == w.untyped_storage().data_ptr()}\n')

original tensor:
tensor([7., 4., 5., 2., 5., 1.])
Size: torch.Size([6])
Size: torch.Size([6])
Storage_offset: 0
Stride: (1,)

r view:
tensor([[4., 2.],
        [2., 1.]])
Size: torch.Size([2, 2])
Shape: torch.Size([2, 2])
Storage_offset: 1
Stride: (2, 2)
'r' and 'w' share same storage: True

k view:
tensor([[4., 5., 2.],
        [2., 5., 1.]])
Size: torch.Size([2, 3])
Size: torch.Size([2, 3])
Storage_offset: 1
Stride: (2, 1)
'k' and 'w' share same storage: True

z view:
tensor([[5., 2.],
        [5., 1.]])
Size: torch.Size([2, 2])
Shape: torch.Size([2, 2])
Storage_offset: 2
Stride: (2, 1)
'z' and 'w' share same storage: True



In [None]:
# Note: row and column vectors differ in size, shape, and stride

print(f'Size row vector: {torch.tensor([1, 2, 3]).size()}')
print(f'Size column vector: {torch.tensor([[1], [2], [3]]).size()}')
print('\n')

print(f'Shape row vector: {torch.tensor([1, 2, 3]).shape}')
print(f'Shape column vector: {torch.tensor([[1], [2], [3]]).shape}')
print('\n')

print(f'Stride row vector: {torch.tensor([1, 2, 3]).stride()}')
print(f'Stride column vector: {torch.tensor([[1], [2], [3]]).stride()}')
print('\n')

Size row vector: torch.Size([3])
Size column vector: torch.Size([3, 1])


Shape row vector: torch.Size([3])
Shape column vector: torch.Size([3, 1])


Stride row vector: (1,)
Stride column vector: (1, 1)




### Indexing
With indexing the tensor content can be accessed element by element or in chuncks, as well as modified.

In [None]:
p = torch.tensor([[2., 4.], [1., 5.], [6., 3.], [8., 7.]])

# grabing element
print(p[0, 1])
print('\n')

# grabing a row chunck
print(p[0])
print(p[:3]) # from row index 0 to row index 2
print(p[1:]) # from row index 1 to the last row
print('\n')

# grabing a column chunk
print(p[:,1])
print(p[2,:])
print('\n')

# specifying row and column
print(p[:,1]) # all row values of column index 1
print(p[1:,1]) # values from row index 1 of column index 1
print(p[2,:]) # all values in row index 2
print(p[:2,:]) # values from first row till row index 1 of all columns


tensor(4.)


tensor([2., 4.])
tensor([[2., 4.],
        [1., 5.],
        [6., 3.]])
tensor([[1., 5.],
        [6., 3.],
        [8., 7.]])


tensor([4., 5., 3., 7.])
tensor([6., 3.])


tensor([4., 5., 3., 7.])
tensor([5., 3., 7.])
tensor([6., 3.])
tensor([[2., 4.],
        [1., 5.]])


In [None]:
# modifying an element and chunk of elements
p[0, 1] = 5.1
print(p)
print('\n')

# modifying a chunk of elements
p[2] = torch.tensor([2.0, 7.0])
print(p)

tensor([[2.0000, 5.1000],
        [1.0000, 5.0000],
        [6.0000, 3.0000],
        [8.0000, 7.0000]])


tensor([[2.0000, 5.1000],
        [1.0000, 5.0000],
        [2.0000, 7.0000],
        [8.0000, 7.0000]])


### View tensor
A view tensor is a copy tensor that shares the same underlying storage (i.e., memory and data) with its original tensor. Several PyTorch tensors can share the same storage so that memory is allocated only once. Such feature allows for performing fast operations on tensors such as reshaping, slicing, and element-wise operations.

It’s important to note that modifying the data in a view tensor changes the data in the original tensor as well, and vice-versa.

In [None]:

A = torch.rand(4, 4)
B = A.view(2, 8)

#torch.Tensor.data_ptr() returns the address of the first element of self tensor
print(B.untyped_storage().data_ptr() == A.untyped_storage().data_ptr())  # `B` and `A` share the same underlying data.
A[0][0] = 3.14
print(B[0][0])


True
tensor(3.1400)


### View ops
Some PyTorch tensor operations return a view of the input tensor instead of returning a new tensor. Some view ops methods bellow:

- `torch.Tensor.view(shape)`
- `torch.as_strided(input, size, stride, storage_offset=None)`
Create a view of an existing torch.Tensor input with specified size, stride and storage_offset
- `torch.Tensor.detach()`
Returns a new Tensor, detached from the current graph
- `torch.narrow(input, dim, start, length)`
Returns a new tensor that is a narrowed version of input tensor (https://pytorch.org/docs/stable/generated/torch.narrow.html#torch.narrow)
- `torch.select(input, dim, index)`
Returns a view of input tensor as a slice along specified dimension and index.
- `torch.squeeze(input, dim=None)`
Returns a tensor with all specified dimensions of input of size 1 removed.
- `torch.unsqueeze(input, dim)`
Returns a new tensor with a dimension of size one inserted at the specified position.
- `torch.transpose(input, dim0, dim1)`
Returns a tensor that is a transposed version of input. The given dimensions dim0 and dim1 are swapped.
- `torch.t(input)`
Returns transpose dimensions 0 and 1 of a tensor of 2 dimesion or less.
- `torch.permute(torch.Tensor)`
Returns a view of the input tensor with permuted dimension.
- `torch.reshape(input, shape)`
Returns a view of input tensor with specified shape. This method returns a view of input tensor if the shapes are compatible and a copy otherwise (same applies to torch.flatten() and torch.Tensor.reshape_as()).
- `torch.split(torch.Tensor, split_size/split_sections, dim=0)`
Returns chunks of the input tensor along the specified dimension. Each chunck is a view of original tensor.
- `torch.chunk(input, chunks, dim=0)`
Attempts to split a tensor into specified number of chunks. Each chunk is a view of the input tensor.


### Contiguity issue

By default PyTorch tensors are allocated in dense non-overlapping memory, i.e. `memory_format = torch.contiguous_format`. (With strides represented by values in decreasing order.) However, calling a view method on a contiguous tensor can potentially produce a non-contiguous tensor.

Two important methods here:

1) `torch.Tensor.is_contiguous(memory_format=torch.contiguous_format)`

Returns True if self tensor is contiguous in memory in the order specified by memory format. Where `memory_format` parameter specifies memory allocation order. Default: `torch.contiguous_format`. Use this method to verify data contiguity.


2) `torch.Tensor.contiguous(memory_format=torch.contiguous_format)`

This method returns a contiguous in memory tensor containing the same data as self tensor in the specified memory format. Call this method to enforce copying data when self tensor is not contiguous. Warning: If self tensor is already in the specified memory format, this function returns the self tensor.


In [None]:
# Example 1: torch.transpose(input, dim0, dim1)

b = torch.tensor([[0, 1],[2, 3]])
print(f'b is contiguous: {b.is_contiguous()}\n')

t = b.transpose(0, 1)  # `t` is a view of `b`
# View tensors might be non-contiguous
print(f't is contiguous: {t.is_contiguous()}\n')

# Call Tensor.contiguous() to get a contiguous tensor
z = t.contiguous()
print(f'z is contiguous: {z.is_contiguous()}\n')

# verify stride
print(f'b stride: {b.stride()}')
print(f't stride: {t.stride()}')
print(f'z stride: {z.stride()}')

# verify storage
print(f'\n\'t\' and \'b\' share same storage: {t.untyped_storage().data_ptr() == b.untyped_storage().data_ptr()}')
print(f'\'z\' and \'b\' share same storage: {z.untyped_storage().data_ptr() == b.untyped_storage().data_ptr()}')
print(f'\'z\' and \'t\' share same storage: {z.untyped_storage().data_ptr() == t.untyped_storage().data_ptr()}')


b is contiguous: True

t is contiguous: False

z is contiguous: True

b stride: (2, 1)
t stride: (1, 2)
z stride: (2, 1)

't' and 'b' share same storage: True
'z' and 'b' share same storage: False
'z' and 't' share same storage: False


In [None]:
# Example 2: torch.t(input)

print(f'tensor x')
x = torch.tensor([[3.0, 6.0, 2.0], [4.0, 5.0, 9.0]])
print(x)
print(f' stride: {x.stride()}\n is contiguous: {x.is_contiguous()}\n')

print(f'tensor y')
y = torch.t(x)
print(y)
# View tensors might be non-contiguous
print(f' stride: {y.stride()}\n is contiguous: {y.is_contiguous()}\n') # non-contiguous

print(f'tensor k')
k = torch.as_strided(x, (2, 2), (2,1), 1)
print(k)
# View tensors might be non-contiguous
print(f' stride: {k.stride()}\n is contiguous: {k.is_contiguous()}\n') # contiguous

print(f'tensor w')
# calling torch.Tensor.contiguos() on a contiguous tensor returns the self tensor
w = k.contiguous()
print(w)
print(f' stride: {w.stride()}\n is contiguous: {w.is_contiguous()}\n')

# verify storage
print(f'\'y\' and \'x\' share same storage: {y.untyped_storage().data_ptr() == x.untyped_storage().data_ptr()}')
print(f'\'k\' and \'x\' share same storage: {k.untyped_storage().data_ptr() == x.untyped_storage().data_ptr()}')
print(f'\'w\' and \'k\' share same storage: {w.untyped_storage().data_ptr() == k.untyped_storage().data_ptr()}')
# since k shares the same storage with x, so do w
print(f'\'w\' and \'x\' share same storage: {w.untyped_storage().data_ptr() == x.untyped_storage().data_ptr()}\n')

# modofications of values in w will yield changes in x and k
w[1,1] = 0.5
print(x[1,1])
print(k[1,1])

tensor x
tensor([[3., 6., 2.],
        [4., 5., 9.]])
 stride: (3, 1)
 is contiguous: True

tensor y
tensor([[3., 4.],
        [6., 5.],
        [2., 9.]])
 stride: (1, 3)
 is contiguous: False

tensor k
tensor([[6., 2.],
        [4., 5.]])
 stride: (2, 1)
 is contiguous: True

tensor w
tensor([[6., 2.],
        [4., 5.]])
 stride: (2, 1)
 is contiguous: True

'y' and 'x' share same storage: True
'k' and 'x' share same storage: True
'w' and 'k' share same storage: True
'w' and 'x' share same storage: True

tensor(0.5000)
tensor(0.5000)


In [None]:
# Example 3: torch.as_strided(input, size, stride, storage_offset=None) might produce a non-contiguous view

x = torch.rand(2, 3)
print(x)
print(f' stride: {x.stride()}\n is contiguous: {x.is_contiguous()}\n')

h = torch.as_strided(x, (2,2), (1,3))
print(h)
print(f' stride: {h.stride()}\n is contiguous: {h.is_contiguous()}\n')

# create a contiguous copy of h
i = h.contiguous()
print(i)
print(f' stride: {i.stride()}\n is contiguous: {i.is_contiguous()}\n')

# verify storage
print(f'\'h\' and \'x\' share same storage: {h.untyped_storage().data_ptr() == x.untyped_storage().data_ptr()}')
# tensor i is not a view of tensor h
print(f'\'i\' and \'h\' share same storage: {h.untyped_storage().data_ptr() == i.untyped_storage().data_ptr()}')


tensor([[0.3790, 0.3809, 0.6042],
        [0.6416, 0.5678, 0.3592]])
 stride: (3, 1)
 is contiguous: True

tensor([[0.3790, 0.6416],
        [0.3809, 0.5678]])
 stride: (1, 3)
 is contiguous: False

tensor([[0.3790, 0.6416],
        [0.3809, 0.5678]])
 stride: (2, 1)
 is contiguous: True

'h' and 'x' share same storage: True
'i' and 'h' share same storage: False


In [None]:
# Special cases: torch.reshape(input, shape) sometimes returns a copy instead of a view

x2 = torch.tensor([[3.0, 6.0], [2.0, 4.0], [5.0, 9.0]])
print(x2)
print(f' stride: {x2.stride()}\n is contiguous: {x2.is_contiguous()}\n')

y2 = torch.as_strided(x2, (2,2), (1,3), 1)
print(y2)
print(f' stride: {y2.stride()}\n is contiguous: {y2.is_contiguous()}\n')

z2 = torch.reshape(y2, (4,1))
print(z2)
# tensor z2 is a contiguous copy of y2
print(f' stride: {z2.stride()}\n is contiguous: {z2.is_contiguous()}\n')

# verify storage
print(f'\'y2\' and \'x2\' share same storage: {y2.untyped_storage().data_ptr() == x2.untyped_storage().data_ptr()}')
# reshape yields a copy instead of a view
print(f'\'z2\' and \'y2\' share same storage: {z2.untyped_storage().data_ptr() == y2.untyped_storage().data_ptr()}')

tensor([[3., 6.],
        [2., 4.],
        [5., 9.]])
 stride: (2, 1)
 is contiguous: True

tensor([[6., 5.],
        [2., 9.]])
 stride: (1, 3)
 is contiguous: False

tensor([[6.],
        [5.],
        [2.],
        [9.]])
 stride: (1, 1)
 is contiguous: True

'y2' and 'x2' share same storage: True
'z2' and 'y2' share same storage: False


In [None]:
# Sum-up. Palying around with tensor contiguity

X = torch.tensor([[2.0, 1.0], [4.0, 3.0], [6.0, 5.0]])
print(X.is_contiguous()) #output: True

Y = torch.reshape(X, (2,3))
print(Y.is_contiguous()) #output: True

Z = torch.as_strided(X, (2,2), (3,1), 1)
#print(Z)
print(Z.is_contiguous()) #output: False

Z2 = torch.reshape(Z, (4,1))
#print(Z2)
print(Z2.is_contiguous()) #output: True

# Y is a view of X
print(Y.untyped_storage().data_ptr() == X.untyped_storage().data_ptr()) #output: True

# Z is a view of X
print(Z.untyped_storage().data_ptr() == X.untyped_storage().data_ptr()) #output: True

# Z2 is a view of X
print(Z2.untyped_storage().data_ptr() == X.untyped_storage().data_ptr()) #output: False

D = Y.contiguous()
# D is a view of Y
print(D.untyped_storage().data_ptr() == X.untyped_storage().data_ptr()) #output: True

G = Z.contiguous()
# G is a copy of Z
print(G.untyped_storage().data_ptr() == X.untyped_storage().data_ptr()) #output: False


True
True
False
True
True
True
False
True
False


### in-place methods (methods with trailing underscore)
in-place methods allow the operations to modify the self tensor instead of returning a new tensor

In [None]:

B = torch.tensor([[2.0, 4.0], [1.0, 5.0]])
print(B)
#output:

B.zero_()
print(B)
#output:

B.requires_grad_(True)
print(B)
#output:

tensor([[2., 4.],
        [1., 5.]])
tensor([[0., 0.],
        [0., 0.]])


### The OUT argument
Allows to provide a tensor as an argument to receive the output of the operation.

In [None]:
# to create a new tensor on the storage of a existing one specify the argument out

x = torch.rand(2,3)
y = torch.ones(2,3)
z = torch.zeros(2,3)

# grab tensor's id ()
z_id = id(z)

# add x and y values and pass z as out argument
d = torch.add(x, y, alpha = 10, out = z)

# new z and old z are the same (same memory address)
print(f'does z id matches original z_id? {id(z)==z_id}\n')

# new tensor d has the same memory address of z 
print(f'is z id same as d id? {id(z)==id(d)}\n')

# memory address of the first element of tensor
print(f'address of first element in d storage:\n{d.untyped_storage().data_ptr()}\n')
print(f'address of first element in z storage:\n{z.untyped_storage().data_ptr()}\n')


does z id matches original z_id? True

is z id same as d id? True

address of first element in d storage:
116110208

address of first element in z storage:
116110208



In [None]:
# further modify z values by passing it as out argument to a function
torch.ones(2,3, out=z)
print(z)

print(id(z) == z_id)

tensor([[1., 1., 1.],
        [1., 1., 1.]])
True


In [None]:
# another example:

# define 3 tensors
x = torch.empty(2,3)
y = torch.arange(6).reshape(2,3)
z = torch.full((2,3), fill_value = 2)

# provide tensor x as out argument
w = torch.add(y, z, alpha = 10, out = x)

# compare Size and elements equality
print(torch.equal(x, w))
#output: True

# compare addresses of first element in storage
print(x.untyped_storage().data_ptr() == w.untyped_storage().data_ptr()) 
#output: True

# compare tensor's memory addresses
print(id(x) == id(w))
#output: True


True
True
True


### Copying tensors
Use `torch.clone()` method to copy tensor values into a new storage and preserve the memory format of the input tensor.

In [None]:
a = torch.tensor([[0.2,0.7,0.3],[0.1,0.3,0.8]])

#print(a)

# an attribution operation returns a view of tensor
# b is a view into a storage
b = a

# let's check the storage with id() and data_ptr() python and pytorch methods, repctivelly
# a and b data share the same memory address
print(id(a) == id(b))
# a and b share the same storage
print(a.untyped_storage().data_ptr() == b.untyped_storage().data_ptr())

# use torch.clone() to create an independent copy (with its own storage) of a
c = a.clone()
# check if c and a share the same storage
print(a.untyped_storage().data_ptr() == c.untyped_storage().data_ptr())

#print(c)


True
True
False


In [None]:
# create a view of a using transpose() function
d = a.transpose(0,1)
print(d)

# verify contiguity of d
print(d.is_contiguous())

# create a separate copy of d
e = d.clone()
print(e)

# check if d (a view of a) and e (a clone of d) share the same storage
print(d.untyped_storage().data_ptr() == e.untyped_storage().data_ptr())
print(e.is_contiguous())

# create a contiguos copy of e
f = d.contiguous()
#print(f)

# check if d and f share the same storage
print(d.untyped_storage().data_ptr() == f.untyped_storage().data_ptr())

tensor([[0.2000, 0.1000],
        [0.7000, 0.3000],
        [0.3000, 0.8000]])
False
tensor([[0.2000, 0.1000],
        [0.7000, 0.3000],
        [0.3000, 0.8000]])
False
False
False


### Manipulating shapes
It's possible to modify a tensor shape while still preserving its number of elements.

Most used methods are: `torch.squeeze(input, dim=None)` and `torch.unsqueeze(input, dim)`; `torch.reshape(input, shape)`; and `torch.flatten(input, start_dim=0, end_dim=- 1)`.

#### 1) `torch.squeeze(input, dim=None)` and `torch.unsqueeze(input, dim)`
Use the methods `torch.squeeze(input, dim=None)` and `torch.unsqueeze(input, dim)` to remove or add a dimension of extent 1 to a tensor

Note: these functions only apply for dimensions of extent 1

`dim`: int or tuple of indices representing the specified dimensions the input will be squeezed

In [None]:
a = torch.arange(12).reshape(3,4)
print(a.size())

b = a.unsqueeze(dim = 0)
print(b.size())
# compare Size and elements equality
print(torch.equal(a, b))

c = b.squeeze(dim = 0)
print(c.size())
# compare Size and elements equality
print(torch.equal(b, c))

torch.Size([3, 4])
torch.Size([1, 3, 4])
False
torch.Size([3, 4])
False


In [None]:
x = torch.zeros(2, 1, 2, 1, 2)
print(x.size())

y = torch.squeeze(x, 1)
print(y.size())

torch.Size([2, 1, 2, 1, 2])
torch.Size([2, 2, 1, 2])


In [None]:
# torch.unsqueeze(input, dim) returns a new tensor with a dimension of size one inserted at the specified position.

x = torch.tensor([2, 3, 4])
print(torch.unsqueeze(x, 0))

print(torch.unsqueeze(x, 1))

tensor([[2, 3, 4]])
tensor([[2],
        [3],
        [4]])


#### 2) `torch.reshape(input, shape)`

In [None]:
# Example:

a = torch.rand(2, 3, 5)
print(a)
print(a.size())

print('\n')
b = a.reshape(2*3*5)
print(b)
print(b.size())

tensor([[[0.3385, 0.2512, 0.2215, 0.2505, 0.3464],
         [0.2060, 0.0754, 0.8791, 0.3030, 0.4763],
         [0.8611, 0.3436, 0.1828, 0.0882, 0.6567]],

        [[0.9485, 0.5098, 0.3869, 0.3874, 0.7220],
         [0.6165, 0.2554, 0.8959, 0.7414, 0.1511],
         [0.9348, 0.9783, 0.9442, 0.4174, 0.9970]]])
torch.Size([2, 3, 5])


tensor([0.3385, 0.2512, 0.2215, 0.2505, 0.3464, 0.2060, 0.0754, 0.8791, 0.3030,
        0.4763, 0.8611, 0.3436, 0.1828, 0.0882, 0.6567, 0.9485, 0.5098, 0.3869,
        0.3874, 0.7220, 0.6165, 0.2554, 0.8959, 0.7414, 0.1511, 0.9348, 0.9783,
        0.9442, 0.4174, 0.9970])
torch.Size([30])


#### 3) `torch.flatten(input, start_dim=0, end_dim=- 1)`
The method `torch.flatten(input, start_dim=0, end_dim=- 1)` flattens input by reshaping it into a one-dimensional tensor. This method may return the original object, a view, or copy. (learn more at: https://pytorch.org/docs/stable/generated/torch.flatten.html?highlight=torch+flatten#torch.flatten)

In [None]:
x = torch.arange(100).reshape(2, 5, 10)

x = torch.flatten(x)
print(x.shape)

x = torch.flatten(x, start_dim=-1)
print(x.shape)

torch.Size([100])
torch.Size([100])


### Splitting tensors
Split tensors using the method `torch.split(tensor, split_size_or_sections, dim=0)`, where `split_size_or_sections` is an integer or a list of integers representing either the size of a single chunk or a list of sizes for each chunk.

In [None]:
a = torch.arange(10).reshape(5, 2)
print(a)
print('\n')

# split into chunks of 2 rows each
print(torch.split(a, 2))
print('\n')

# split into 2 chunks, one with 1 row and the other with 4 rows
print(torch.split(a, [1, 4]))


tensor([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]])


(tensor([[0, 1],
        [2, 3]]), tensor([[4, 5],
        [6, 7]]), tensor([[8, 9]]))


(tensor([[0, 1]]), tensor([[2, 3],
        [4, 5],
        [6, 7],
        [8, 9]]))


### Mathematical operations with tensors

#### 1) Element-wise operations: addition, subtraction, multiplication, division, power of, exponential.

In [None]:
# add or subtract a scalar to a vector or to a matrix with element-wise operation

s = torch.rand(1)
A = torch.tensor([[ 0.12,  0.15, -0.01], [-1.13, -2.04, -1.38]])
v = torch.tensor([0.87, 0.42, 0.85, 0.98, 0.31, 0.14])

print(f' Scalar s:\n{s}')
print(f' Tensor A:\n{A}')
print(f' Vector B:\n{v}')
print('\n')

# add a scalar to a vector or matrix
print(f' s + A:\n{torch.add(s, A)}')
print(f' s + v:\n{torch.add(s, v)}')

# subtract a scalar to a vector or matrix
print(f' A - s:\n{torch.subtract(A, s)}')
print(f' v - s:\n{torch.subtract(v, s)}')


 Scalar s:
tensor([0.4255])
 Tensor A:
tensor([[ 0.1200,  0.1500, -0.0100],
        [-1.1300, -2.0400, -1.3800]])
 Vector B:
tensor([0.8700, 0.4200, 0.8500, 0.9800, 0.3100, 0.1400])


 s + A:
tensor([[ 0.5455,  0.5755,  0.4155],
        [-0.7045, -1.6145, -0.9545]])
 s + v:
tensor([1.2955, 0.8455, 1.2755, 1.4055, 0.7355, 0.5655])
 A - s:
tensor([[-0.3055, -0.2755, -0.4355],
        [-1.5555, -2.4655, -1.8055]])
 v - s:
tensor([ 0.4445, -0.0055,  0.4245,  0.5545, -0.1155, -0.2855])


In [None]:
# Element-wise operations over multidimensional input tensors
# addition, subtraction, multiplication, division, power of

B = torch.tensor([[2.0, 1.0, 4.0], [3.0, 7.0, 5.0]])
C = torch.tensor([[5.0, 10.0, 2.5], [3.3, 1.5, 2.0]])
print(f'tensor B:\n{B}')
print(f'tensor C:\n{C}')

# add arguments element-wise. Equivalent to b+c
print(f'Addition (B+C):\n {torch.add(B, C)}')
# subtracts arguments element-wise. Equivalent to b-c in broadcasting
print(f'Subtraction (B-C):\n {torch.subtract(B, C)}')
# multiply each element of the first tensor by corresponding element in second tensor. Equivalent to b*c in terms of broadcasting
print(f'Multiplication (BxC):\n {torch.multiply(B, C)}')
# divide each element of the first tensor by corresponding element in second tensor. Same as b/c in terms of broadcasting
print(f'Division (B/C):\n {torch.divide(B, C)}')
# rises each element of the first tensor to the power of corresponding element in second tensor. Equivalent to b**c
print(f'Rise to the power of (B^C):\n: {torch.pow(B, C)}')


tensor B:
tensor([[2., 1., 4.],
        [3., 7., 5.]])
tensor C:
tensor([[ 5.0000, 10.0000,  2.5000],
        [ 3.3000,  1.5000,  2.0000]])
Addition (B+C):
 tensor([[ 7.0000, 11.0000,  6.5000],
        [ 6.3000,  8.5000,  7.0000]])
Subtraction (B-C):
 tensor([[-3.0000, -9.0000,  1.5000],
        [-0.3000,  5.5000,  3.0000]])
Multiplication (BxC):
 tensor([[10.0000, 10.0000, 10.0000],
        [ 9.9000, 10.5000, 10.0000]])
Division (B/C):
 tensor([[0.4000, 0.1000, 1.6000],
        [0.9091, 4.6667, 2.5000]])
Rise to the power of (B^C):
: tensor([[32.0000,  1.0000, 32.0000],
        [37.5405, 18.5203, 25.0000]])


In [None]:
# element-wise multiplication

print(f'Multiplication (BxC) with torch.multiply():\n {torch.multiply(B, C)}')
print(f'Multiplication (BxC) with torch.mul():\n {torch.mul(B, C)}')
print(f'Multiplication (BxC) with * operator:\n {B * C}')

Multiplication (BxC) with torch.multiply():
 tensor([[10.0000, 10.0000, 10.0000],
        [ 9.9000, 10.5000, 10.0000]])
Multiplication (BxC) with torch.mul():
 tensor([[10.0000, 10.0000, 10.0000],
        [ 9.9000, 10.5000, 10.0000]])
Multiplication (BxC) with * operator:
 tensor([[10.0000, 10.0000, 10.0000],
        [ 9.9000, 10.5000, 10.0000]])


#### 2) Matrix operations (Linear Algebra): transpose, determinant, inverse, eye, matrix multiplication.

1) `torch.linalg` methods:

* `torch.linalg.det(A, *, out=None)`
* `torch.linalg.inv(A, *, out=None)`
* `torch.linalg.matmul(input, other, *, out=None)` -- equivalent to `@` operator

To earn more about `torch.linalg` see: https://pytorch.org/docs/stable/linalg.html

2) Other methods:

* `torch.t(input)`
* `torch.transpose(input, dim0, dim1)`
* `torch.eye(n, m=n)`
* `torch.Tensor.mm()`


In [None]:
# create matrix A
A = torch.rand(3,3)
print(A)
print('\n')

# compute A transpose
print(A.t())
print(torch.transpose(A, 0, 1))
print('\n')

# compute determinat of A
print(A.det())
print(torch.linalg.det(A))
print('\n')

# compute inverse of A
print(torch.linalg.inv(A))
print('\n')

# compute the eye of A
print(torch.eye(3,3))
print('\n')

# compute matrix multiplication (matrices shapes must be compatible)
print(torch.linalg.matmul(A, torch.eye(3,3)))
print('\n')
print(torch.linalg.matmul(A, torch.linalg.inv(A)))

tensor([[0.2669, 0.6401, 0.0550],
        [0.0902, 0.9186, 0.5040],
        [0.5403, 0.8227, 0.1301]])


tensor([[0.2669, 0.0902, 0.5403],
        [0.6401, 0.9186, 0.8227],
        [0.0550, 0.5040, 0.1301]])
tensor([[0.2669, 0.0902, 0.5403],
        [0.6401, 0.9186, 0.8227],
        [0.0550, 0.5040, 0.1301]])


tensor(0.0648)
tensor(0.0648)


tensor([[-4.5535, -0.5873,  4.1986],
        [ 4.0204,  0.0776, -1.9992],
        [-6.5130,  1.9478,  2.8925]])


tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])


tensor([[0.2669, 0.6401, 0.0550],
        [0.0902, 0.9186, 0.5040],
        [0.5403, 0.8227, 0.1301]])


tensor([[ 1.0000e+00, -7.4506e-09,  8.9407e-08],
        [ 2.3842e-07,  1.0000e+00,  0.0000e+00],
        [-5.9605e-08, -2.9802e-08,  1.0000e+00]])


In [None]:
# the method torch.Tensor.mm() --> matrix multiplication: linear combination of vectors
# matrices shapes must be compatible

X = torch.rand(2,6)
Y = torch.arange(12, dtype = torch.float32).reshape((6,2))
Z = torch.arange(18, dtype = torch.float32).reshape((6,3))
print(torch.mm(X, Y))
print(torch.mm(X, Z))
#print(torch.mm(Y, Z)) # ERROR: mat1 and mat2 shapes cannot be multiplied (6x2 and 6x3)

tensor([[12.7122, 16.1172],
        [19.9703, 24.5590]])
tensor([[19.0684, 22.4733, 25.8783],
        [29.9555, 34.5442, 39.1329]])


In [None]:
# the @ operator --> matrix multiplication: linear combination of vectors
# matrices shapes must be compatible (A(ixn) * B(nxj) = X(ixj)

x = torch.tensor([[2.0, 1.0, 4.0], [3.0, 7.0, 5.0]])
y = torch.arange(3, dtype = torch.float32).unsqueeze(dim = 1)

print(f'x size: {x.size()}, y size: {y.size()}')

print(x @ y)

x size: torch.Size([2, 3]), y size: torch.Size([3, 1])
tensor([[ 9.],
        [17.]])


In [None]:
# matrix multiplications with linear combination of vectors -- shapes must be compatible

D = torch.rand(2,3)
E = torch.arange(6, dtype = torch.float32).reshape((3,2))
print(D)
print(E)
print('\n')

# using torch.linalg.matmul()
print(torch.linalg.matmul(D, E))
print('\n')
# using torch.mm()
print(torch.mm(D, E))
print('\n')
# using @
print(D @ E) 

tensor([[0.5113, 0.4985, 0.0753],
        [0.2521, 0.5372, 0.5058]])
tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])


tensor([[1.2983, 2.3833],
        [3.0976, 4.3927]])


tensor([[1.2983, 2.3833],
        [3.0976, 4.3927]])


tensor([[1.2983, 2.3833],
        [3.0976, 4.3927]])


#### 3) Dot product (or inner product): is the sum of the products of each corresponding element

Use the method `torch.dot(input, other, out=None)` to perform dot product between two 1D tensors with the same number of elements.

Use the method `torch.inner(input, other, out=None)` to perform dot product between tensors of higher dimensions. This method sums the product of elements from input and other along their last dimension. Note: If both input and other are non-scalars, the size of their last dimension must match.

Use the method `torch.mm(input, other, out=None)` to perform dot product between 2D tensors. This method sums the product of elements from input and other along their last dimension. Note: apply only to matrices of the same size and shape.


In [None]:
# torch.dot(input, other, out=None) -- applies to 1D tensors with the same number of elements

# 1D tensors: Dot product
print(torch.dot(torch.tensor([2, 3]), torch.tensor([2, 1])))

# Scalar input
#print(torch.dot(torch.tensor([4, 5, 6]), torch.tensor(2))) # ERROR: 1D tensors expected, but got 1D and 0D tensors


tensor(7)


In [None]:
# torch.inner(input, other, out=None) -- applies to 1D and to higher dimension tensors
# do work for scalar x tensor multiplication

# Scalar input
print(torch.inner(torch.tensor(2), torch.randn(2, 3)))
print('\n')

# 1D input tensors
# Note: The size of the last dimension must match.
print(torch.inner(torch.tensor([1, 2, 3]), torch.tensor([0, 2, 1])))
print('\n')

# Multidimensional input tensors: perform dot product along the tensors last dimension.
# Note: The size of the last dimension must match.
a = torch.randn(2, 3)
print(a)

b = torch.randn(2, 4, 3)
print(b)

print(torch.inner(a, b))

# the result is equivalent to torch.tensordot(input, other, dims=([-1], [-1]))
#print(torch.tensordot(a, b, dims=([-1], [-1])))

tensor([[ 2.5392,  1.2391,  2.6282],
        [ 1.4203, -1.5125,  3.5483]])


tensor(7)


tensor([[ 1.4918,  0.1814, -1.6719],
        [ 2.0150, -0.7804, -0.2495]])
tensor([[[ 0.8480,  2.1880, -0.7319],
         [ 1.0656, -0.4168,  0.0910],
         [-1.1119,  1.2358,  0.8453],
         [-0.6829,  0.9956, -0.1030]],

        [[-0.9585,  0.3884, -0.1473],
         [-0.3188, -2.3169, -1.2417],
         [ 0.4717, -0.7689,  1.4844],
         [ 0.6470, -0.2971, -0.0756]]])
tensor([[[ 2.8857,  1.3619, -2.8477, -0.6658],
         [-1.1130,  1.1801, -1.9177,  1.0377]],

        [[ 0.1837,  2.4498, -3.4158, -2.1273],
         [-2.1977,  1.4756,  1.1802,  1.5545]]])


In [None]:
# Dot product for 1D tensors (do not apply with torch.mm())
print(torch.dot(torch.tensor([2, 3]), torch.tensor([2, 1])))
print(torch.inner(torch.tensor([2, 3]), torch.tensor([2, 1])))
print(torch.matmul(torch.tensor([2, 3]), torch.tensor([2, 1])))
print(torch.tensor([2, 3]) @ torch.tensor([2, 1]))

#print(torch.mm(torch.tensor([2, 3]), torch.tensor([2, 1]))) ### ERROR: self must be a matrix

tensor(7)
tensor(7)
tensor(7)
tensor(7)


In [None]:
# Dot product / matrix multiplication for multidimensional inputs with different shapes
# different shapes => torch.inner()

a = torch.randn(2, 3)
print(a)

b = torch.randn(2, 4, 3)
print(b)

print(f'\n inner operator')
print(torch.inner(a, b))

#print(f'\n mm operator')
#print(torch.mm(a, b)) # Error: mat2 must be a matrix

#print(f'\n matmul operator')
#print(torch.matmul(a, b)) # Error: mat1 and mat2 shapes cannot be multiplied (6x4 and 3x2)

#print(f'\n @ operator')
#print(a @ b) # Error: mat1 and mat2 shapes cannot be multiplied (6x4 and 3x2)

tensor([[ 1.2265,  0.9129,  0.7441],
        [ 1.1507, -2.5597,  0.6432]])
tensor([[[ 0.0679,  1.2022, -2.2437],
         [ 0.6421, -0.0036, -0.5857],
         [-0.8509,  0.1924,  0.7389],
         [-0.3911, -0.2389,  1.2887]],

        [[-2.2724,  0.3461, -0.1914],
         [-0.9517,  0.2594,  2.8443],
         [-0.9623, -0.0471,  0.5775],
         [-0.6221,  0.3150, -0.1248]]])

 inner operator
tensor([[[-0.4888,  0.3484, -0.3182,  0.2611],
         [-2.6135,  1.1860, -0.7935, -0.5683]],

        [[-4.4424,  0.3714, -0.9965,  0.9902],
         [-3.6241,  0.0702, -0.6152, -1.6025]]])


#### - matrix multiplication summary

In [None]:
# element-wise multiplication: The tensors last dimension size must match
B = torch.tensor([[2.0, 1.0, 4.0], [3.0, 7.0, 5.0]])
C = torch.tensor([[5.0, 10.0, 2.5], [3.3, 1.5, 2.0]])
print(f'element-wise multiplication:\n {torch.multiply(B, C)}\n')

# matrix multiplication through linear combination of vectors (row by column)
D = torch.rand(2,3)
E = torch.arange(3, dtype = torch.float32).reshape((3,1))
print(f'matrix multiplication:\n {torch.linalg.matmul(D, E)}\n')

# dot product: perform dot product along the tensors last dimension.
# Note: The size of the last dimension must match.
G = torch.randn(2, 3)
J = torch.randn(2, 4, 3)
print(f'dot product:\n {torch.inner(G, J)}\n')

element-wise multiplication:
 tensor([[10.0000, 10.0000, 10.0000],
        [ 9.9000, 10.5000, 10.0000]])

matrix multiplication:
 tensor([[1.5934],
        [1.0471]])

dot product:
 tensor([[[ 0.1091, -1.0031,  0.0366,  2.3827],
         [ 1.7492, -0.6717, -0.6313,  2.3689]],

        [[ 0.0415, -1.2001, -0.3059,  0.8838],
         [ 2.5060, -0.8551, -2.1795,  1.0900]]])



### Broadcasting
way of doing operations on tensors with shapes considered uncompatible by linear algebra

Rules:
- each tensor must have at least 1 dimension
- comparing the tensors dimensions from right to left, or from the last to the first, their size must either:
- 1) be equal, or
- 2) one of the dimensions must be of size 1, or
- 3) one of the dimensions should be absent in one of the tensors

to learn more on broadcasting semantics and backwards compatibility see https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics

In [None]:
x=torch.empty((0,))
y=torch.empty(2,2)
# x and y are not broadcastable, because x does not have at least 1 dimension

#print(x * y) # ERROR: The size of tensor a (0) must match the size of tensor b (2) at non-singleton dimension 1


In [None]:
x=torch.empty(5,7,3)
y=torch.empty(5,7,3)
# same shapes are always broadcastable (i.e. the above rules always hold)
print(x * y)

x=torch.empty((0,))
y=torch.empty(2,2)
# x and y are not broadcastable, because x does not have at least 1 dimension
#print(x * y) # ERROR: The size of tensor a (0) must match the size of tensor b (2) at non-singleton dimension 1


tensor([[[0., 0., -0.],
         [0., -0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., -0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., nan],
         [-0., 0., 0.],
         [0., 0., 0.],
         [-0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., -0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., -0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]])


In [None]:
x = torch.rand(2,4)
y = torch.ones(1,4)
# x and y are broadcastable.
# 1st trailing dimension: x size == y size
# 2nd trailing dimension: y has size 1
# 3rd trailing dimension: x size == y size
print(x * (y * 2))


tensor([[0.5338, 1.0168, 1.0718, 0.7299],
        [0.0678, 1.7494, 1.1978, 1.7939]])


In [None]:
x=torch.randn(5,3,4,1)
y=torch.randn(  3,1,1)
# x and y are broadcastable.
# 1st trailing dimension: both have size 1
# 2nd trailing dimension: y has size 1
# 3rd trailing dimension: x size == y size
# 4th trailing dimension: y dimension doesn't exist
print(x * y)


tensor([[[[-2.9360e-01],
          [ 3.9480e-01],
          [-1.9569e-01],
          [-4.7564e-02]],

         [[ 6.2501e-02],
          [-7.5487e-01],
          [ 1.2323e-01],
          [ 7.3563e-01]],

         [[ 6.3212e-03],
          [ 1.2984e-02],
          [ 1.7448e-02],
          [-1.5850e-03]]],


        [[[ 2.7100e-01],
          [ 6.0374e-01],
          [ 4.3250e-01],
          [-7.8030e-01]],

         [[ 1.2481e+00],
          [ 2.1190e-01],
          [-5.4301e-02],
          [ 4.8851e-02]],

         [[ 1.6022e-02],
          [-1.6117e-03],
          [-3.9112e-04],
          [-1.2361e-03]]],


        [[[-1.8876e-01],
          [-5.9112e-01],
          [-7.1652e-02],
          [ 5.3304e-01]],

         [[-3.4825e-01],
          [-1.7068e-01],
          [-1.0657e+00],
          [-7.0671e-01]],

         [[-6.9044e-03],
          [-2.7209e-02],
          [ 1.9179e-03],
          [ 5.6053e-03]]],


        [[[ 8.3304e-01],
          [-4.3798e-01],
          [ 8.8767e-02],
 

### PyTorch.torch.autograd package provide methods to compute automatic differentiation

Use the method `torch.tensor.backward(gradient=None, inputs=None)` to compute the sum of gradients of input tensors with respect to graph leaves.

Use the method `torch.tensor.detach()` to return a new tensor detached from the current graph.

Use the method `torch.tensor.item()` to return the value (as a Python number) stored in the tensor (apply only to tensor with a sigle value).

To learn more about `PyTorch.torch.autograd` package refer to: http://pytorch.org/docs/stable/autograd.html

In [2]:
import torch

In [3]:
A = torch.tensor([[11]])
print(f'tensor A:\n{A}')
print(A.item())

B = torch.arange(1, 10)
print(f'tensor B:\n{B}')
print(B.sum().item())

tensor A:
tensor([[11]])
11
tensor B:
tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])
45


In [4]:
# create two tensors X (variables) and P (parameters)
X = torch.randn(3,3)
P = torch.arange(1, 4, dtype=torch.float32)

# sets tensor's P requires_grad to True to enable automatic computation of gradients during backward pass
P.requires_grad_(True)
print(f'- P tensor showing requires_grad:\n{P}')
print('\n')

# perform an operation with tensors X and P (i.e. do a forward pass)
Y = torch.multiply(X, P)
print(f'- Y tensor:\n{Y}')
# note that Y requires_grad is also True
print(f'- Y requires_grad:\n{Y.requires_grad}')
# Note: Y tensor has a property grad_fn to store a reference to the backpropagation fucntion
print(f'- Y grad_fn:\n{Y.grad_fn}')
print('\n')

# compute a second operation, now over previous output, i.e. tensor Y
L = Y.mean()
print(f'- L tensor:\n{L}')
# note that L requires_grad is also True
print(f'- L requires_grad:\n{L.requires_grad}')
print('\n')

# print X grad property before doing backward pass (grad = computed derivatives)
print(f'- gradient of P BEFORE backpropagation:\n{P.grad}')

# compute backpropagation
L.backward()

# print X grad property after backward pass (grad = computed derivatives)
print(f'- gradient of P AFTER backpropagation:\n{P.grad}')
print('\n')

### print(f'try to print the grad of a non-lef tensor:\n{L.grad}')
#Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed.

# In order to NOT track the next operation in autograd we perform it whithin torch.no_grad()
with torch.no_grad():
  L = (P+Y).sum()

# note that L requires_grad is False
print(f'L requires_grad after computation whithin torch.no_grad():\n{L.requires_grad}')
print('\n')

# another way to disable gradient tracking is using the detach() method
# this method returns a new Tensor, detached from the current graph
L2 = Y.mean()
print(f'requires_grad of L2\n{L2.requires_grad}')
L2_detach = L2.detach()
print(f'requires_grad of L2.detach()\n{L2_detach.requires_grad}')

- P tensor showing requires_grad:
tensor([1., 2., 3.], requires_grad=True)


- Y tensor:
tensor([[ 0.6607,  2.7439, -1.7320],
        [ 0.3982, -0.5768, -5.8050],
        [-1.4563,  0.2384,  0.8608]], grad_fn=<MulBackward0>)
- Y requires_grad:
True
- Y grad_fn:
<MulBackward0 object at 0x7f737c77fe20>


- L tensor:
-0.5186777710914612
- L requires_grad:
True


- gradient of P BEFORE backpropagation:
None
- gradient of P AFTER backpropagation:
tensor([-0.0442,  0.1336, -0.2473])


L requires_grad after computation whithin torch.no_grad():
False


requires_grad of L2
True
requires_grad of L2.detach()
False


### Moving tensors to GPU and CPU memory
A tensor created in the CPU can be moved to GPU, and vice-versa, using the method `torch.Tensor.to(device=None, dtype=None)`. This method provides parameters to perform tensor dtype and/or device conversion.

Other ways to move tensors are:
1) during construction by providing the device argument

2) using the methods: `torch.Tensor.cpu()` or `torch.Tensor.cuda()`

In [7]:
# Returns a copy of the tensor in CPU/GPU memory

# at instantiation time
if torch.cuda.is_available():
    D = torch.tensor([2.,3.,4.,5.], device = 'cuda')
    print(f'Tensor in cuda')
else:
    D = torch.tensor([2.,3.,4.,5.], device = 'cpu')
    print(f'Tensor in cpu')

# using torch.Tensor.to(device) method
if torch.cuda.is_available():
    hdm = torch.device('cuda')
else:
    hdm = torch.device('cpu')
print(f'device: {hdm}')
D.to(hdm)

# using torch.Tensor.cuda(device) method
if torch.cuda.is_available():
    D.cuda()
    print(f'Tensor moved to cuda')
else:
    print(f'Cuda is not available. Using CPU')


Tensor in cpu
device: cpu
Cuda is not available. Using CPU


### Saving and loading tensors
Easily save and load tensors with the methods torch.save(torch.Tensor, file_name) and torch.load(file_name)

PyTorch file extensions are '.pt' or '.pth'

Learn more on serialization semantics at: https://pytorch.org/docs/stable/notes/serialization.html#saving-loading-tensors

In [None]:
y = torch.tensor([[1,2],[3,4]])
torch.save(y, 'tensor_file.pt')
torch.load('tensor_file.pt')

tensor([[1, 2],
        [3, 4]])

### A few other PyTorch.torch methods

PyTorch offers a myriade (over 300) of methods to manipulate tensors. Here a few others:

`torch.manual_seed(seed)` # sets the seed for generating random numbers. Returns a torch.Generator object.

`torch.t(torch.Tensor)` # returns transpose dimensions 0 and 1 of a tensor of 2 dimesion or less

`torch.transpose(torch.Tensor, dim0, dim1)` # returns a tensor that is a transposed version of input. The given dimensions dim0 and dim1 are swapped

`torch.permute(torch.Tensor)` # returns a view of the input tensor with permuted dimension

`torch.reshape(input, shape)` # returns a view of input tensor with specified shape

`torch.select(torch.Tensor, dim, index)` # returns a view of input tensor as a slice along specified dimension and index

`torch.split(torch.Tensor, split_size/split_sections, dim=0)` # returns chunks of the input tensor along the specified dimension. Each chunck is a view of original tensor

`torch.stack(torch.Tensors, dim=0, out=None)` # concatenate a sequence of tensors along a given dimension

`torch.eye(n, m=n)` # returns a 2-D tensor with ones on the diagonal and zeros elsewhere (n is number of rows)

`torch.linalg.inv(A, *, out=None)`

`torch.argmax(torch.Tensor)` # returns the index of element with the largest value in the input tensor

`torch.sum(input, dtype=None)`

`torch.pow(input, exponent, *, out=None)`

`torch.div(input, other, out=None)` # divide each element of input tensor by the corresponding element in other tensor EQU

`torch.add(input, other, alpha=1, out=None)` #add another tensor, or a scalar, scaled by alpha to the input tensor EQU

`torch.no_grad()`/`torch.enable_grad()` #disable/enable gradient computation


### To learn more
An extensive explanation of all the methods and their usage is available in PyTorch documentation: https://pytorch.org/docs/stable/torch.html
