In [2]:
import torch
torch.version.__version__

'1.9.1+cpu'

In [3]:
#Using Tensors
x= [45,89,25,64,92]
torch.is_tensor(x)

False

In [4]:
torch.is_storage(x)

False

In [6]:
y=torch.randn(23,67,34,64,28)
torch.is_tensor(y)

True

In [7]:
torch.is_storage(y)

False

In [9]:
torch.numel(y)#the total number of elements in the input tensor

93890048

In [10]:
torch.zeros(4,4)

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [12]:
torch.numel(torch.zeros(4,4))

16

In [13]:
torch.eye(3,4)  #the eye function creates a diagonal matrix, of which the diagonal elements have ones, and off diagonal elements have
#zeros. The eye function can be manipulated by providing the shape option

tensor([[1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.]])

Although PyTorch provides a
large collection of libraries and modules for computation, three modules
are very prominent.
• Autograd. This module provides functionality for
automatic differentiation of tensors. A recorder class in
the program remembers the operations and retrieves
those operations with a trigger called backward to
compute the gradients. This is immensely helpful in the
implementation of neural network models.
• Optim. This module provides optimization techniques
that can be used to minimize the error function for a
specific model. Currently, PyTorch supports various
advanced optimization methods, which includes
Adam, stochastic gradient descent (SGD), and more.
NN. NN stands for neural network model.
Manually defining the functions, layers, and further
computations using complete tensor operations is very
difficult to remember and execute. We need functions
that automate the layers, activation functions, loss
functions, and optimization functions and provides a
layer defined by the user so that manual intervention
can be reduced. The NN module has a set of builtin functions that automates the manual process of
running a tensor operation.

In [14]:
import numpy as np
x1 = np.array(x)
x1

array([45, 89, 25, 64, 92])

In [15]:
torch.from_numpy(x1)

tensor([45, 89, 25, 64, 92], dtype=torch.int32)

Linear space and points between the linear space can be created using
tensor operations. Let’s use an example of creating 25 points in a linear
space starting from value 2 and ending with 10.

In [23]:
torch.linspace(2, 10, steps=25) #linear spacing
#torch.linspace(3, 10, steps=5)

tensor([ 2.0000,  2.3333,  2.6667,  3.0000,  3.3333,  3.6667,  4.0000,  4.3333,
         4.6667,  5.0000,  5.3333,  5.6667,  6.0000,  6.3333,  6.6667,  7.0000,
         7.3333,  7.6667,  8.0000,  8.3333,  8.6667,  9.0000,  9.3333,  9.6667,
        10.0000])

In [25]:
#Like linear spacing, logarithmic spacing can be created.
torch.logspace(start=-10, end=10,steps=15)

tensor([1.0000e-10, 2.6827e-09, 7.1969e-08, 1.9307e-06, 5.1795e-05, 1.3895e-03,
        3.7276e-02, 1.0000e+00, 2.6827e+01, 7.1969e+02, 1.9307e+04, 5.1795e+05,
        1.3895e+07, 3.7276e+08, 1.0000e+10])

In [26]:
#random numbers from a uniform distribution between the values 0 and 1
torch.rand(10)

tensor([0.2285, 0.0642, 0.6654, 0.8246, 0.2579, 0.0115, 0.5754, 0.7466, 0.3520,
        0.0947])

The following script shows how the random number from two values,
0 and 1, are selected. The result tensor can be reshaped to create a (4,5)
matrix. The random numbers from a normal distribution with arithmetic
mean 0 and standard deviation 1 can also be created, as follows.

In [27]:
torch.rand(4,5)#random values between 0 and 1 and filled with a matrix of size rows 4 and columns 5


tensor([[0.0993, 0.5152, 0.5540, 0.2413, 0.9036],
        [0.0995, 0.9740, 0.1236, 0.6967, 0.1955],
        [0.6223, 0.7113, 0.4478, 0.2681, 0.4564],
        [0.2592, 0.2132, 0.9900, 0.0598, 0.4361]])

In [28]:
#random numbers from a normal distribution,
#with mean=0 and standard deviation = 1
torch.randn(10)

tensor([-1.3228,  1.1538,  0.3841, -1.7216, -0.9077, -2.4868, -0.8622, -0.1351,
        -2.8012,  0.0199])

In [29]:
torch.randn(4,5)

tensor([[ 0.1646, -0.2210,  1.0523,  0.5416,  0.8627],
        [ 0.9016, -0.0691, -1.1163, -0.1675,  0.1547],
        [-1.1098,  1.0145, -0.4092,  1.3756,  0.1972],
        [ 0.5134,  1.0119, -1.0315, -0.8074,  1.2096]])

In [30]:
#selecting values from a range ,this is called random permutation
torch.randperm(10)  

tensor([3, 0, 5, 1, 2, 6, 7, 9, 4, 8])

When using the arrange function, you must
define the step size, which places all the values in an equal distance space.
By default, the step size is 1.

In [31]:
torch.arange(10,40,2) #step size 2

tensor([10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38])

In [32]:
torch.arange(10,40) #step size 1

tensor([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
        28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39])

In [34]:
#create a 2dtensor filled with values as 0
torch.zeros(4,5)

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [35]:
#create a 1dtensor filled with values as 0
torch.zeros(10)

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [36]:
#indexing and performing operation on the tensors
x = torch.randn(4,5)

In [37]:
x

tensor([[-1.5857, -1.6895, -0.9806,  0.1532,  0.1346],
        [ 1.9240,  0.6930,  2.1854,  0.8121, -0.4223],
        [-1.0471,  0.2860, -0.9779, -0.1609,  0.4070],
        [-0.5372, -0.9011, -0.1162,  0.5809, -1.3750]])

In [38]:
#concatenate two tensors
torch.cat((x,x))

tensor([[-1.5857, -1.6895, -0.9806,  0.1532,  0.1346],
        [ 1.9240,  0.6930,  2.1854,  0.8121, -0.4223],
        [-1.0471,  0.2860, -0.9779, -0.1609,  0.4070],
        [-0.5372, -0.9011, -0.1162,  0.5809, -1.3750],
        [-1.5857, -1.6895, -0.9806,  0.1532,  0.1346],
        [ 1.9240,  0.6930,  2.1854,  0.8121, -0.4223],
        [-1.0471,  0.2860, -0.9779, -0.1609,  0.4070],
        [-0.5372, -0.9011, -0.1162,  0.5809, -1.3750]])

In [39]:
#concatenate n times based on array size, over column
torch.cat((x,x,x),1)

tensor([[-1.5857, -1.6895, -0.9806,  0.1532,  0.1346, -1.5857, -1.6895, -0.9806,
          0.1532,  0.1346, -1.5857, -1.6895, -0.9806,  0.1532,  0.1346],
        [ 1.9240,  0.6930,  2.1854,  0.8121, -0.4223,  1.9240,  0.6930,  2.1854,
          0.8121, -0.4223,  1.9240,  0.6930,  2.1854,  0.8121, -0.4223],
        [-1.0471,  0.2860, -0.9779, -0.1609,  0.4070, -1.0471,  0.2860, -0.9779,
         -0.1609,  0.4070, -1.0471,  0.2860, -0.9779, -0.1609,  0.4070],
        [-0.5372, -0.9011, -0.1162,  0.5809, -1.3750, -0.5372, -0.9011, -0.1162,
          0.5809, -1.3750, -0.5372, -0.9011, -0.1162,  0.5809, -1.3750]])

In [41]:
#concatenate n times based on array size, over column
torch.cat((x,x),0)

tensor([[-1.5857, -1.6895, -0.9806,  0.1532,  0.1346],
        [ 1.9240,  0.6930,  2.1854,  0.8121, -0.4223],
        [-1.0471,  0.2860, -0.9779, -0.1609,  0.4070],
        [-0.5372, -0.9011, -0.1162,  0.5809, -1.3750],
        [-1.5857, -1.6895, -0.9806,  0.1532,  0.1346],
        [ 1.9240,  0.6930,  2.1854,  0.8121, -0.4223],
        [-1.0471,  0.2860, -0.9779, -0.1609,  0.4070],
        [-0.5372, -0.9011, -0.1162,  0.5809, -1.3750]])

A tensor can be split between multiple chunks. Those small chunks
can be created along dim rows and dim columns. The following example
shows a sample tensor of size (4,4). The chunk is created using the third
argument in the function, as 0 or 1.

In [42]:
a = torch.randn(4,4)
print(a)
torch.chunk(a,2)

tensor([[ 0.8270,  0.2627, -0.3464,  2.0845],
        [-0.0587, -1.1112,  2.4332,  0.6597],
        [-0.4120,  1.7132, -1.2971,  1.3479],
        [-1.6110,  0.9675, -1.8007, -1.1301]])


(tensor([[ 0.8270,  0.2627, -0.3464,  2.0845],
         [-0.0587, -1.1112,  2.4332,  0.6597]]),
 tensor([[-0.4120,  1.7132, -1.2971,  1.3479],
         [-1.6110,  0.9675, -1.8007, -1.1301]]))

In [43]:
torch.chunk(a,2,0)

(tensor([[ 0.8270,  0.2627, -0.3464,  2.0845],
         [-0.0587, -1.1112,  2.4332,  0.6597]]),
 tensor([[-0.4120,  1.7132, -1.2971,  1.3479],
         [-1.6110,  0.9675, -1.8007, -1.1301]]))

In [44]:
torch.chunk(a,2,1)

(tensor([[ 0.8270,  0.2627],
         [-0.0587, -1.1112],
         [-0.4120,  1.7132],
         [-1.6110,  0.9675]]),
 tensor([[-0.3464,  2.0845],
         [ 2.4332,  0.6597],
         [-1.2971,  1.3479],
         [-1.8007, -1.1301]]))

The gather function collects elements from a tensor and places it in
another tensor using an index argument. The index position is determined
by the LongTensor function in PyTorch.


In [45]:
torch.Tensor([[11,12],[23,24]])

tensor([[11., 12.],
        [23., 24.]])

In [46]:
torch.gather(torch.Tensor([[11,12],[23,24]]),1,
             torch.LongTensor([[0,0],[1,0]]))

tensor([[11., 11.],
        [24., 23.]])

In [47]:
torch.LongTensor([[0,0],[1,0]])
#the  ID tensor containing the indices to index

tensor([[0, 0],
        [1, 0]])

The LongTensor function or the index select function can be used to
fetch relevant values from a tensor. The following sample code shows two
options: selection along rows and selection along columns. If the second
argument is 0, it is for rows. If it is 1, then it is along the columns.

In [48]:
a=torch.randn(4,4)
print(a)

tensor([[ 0.6277, -2.0711,  0.7167,  0.9953],
        [ 0.4511,  0.4283, -1.2487,  1.8526],
        [-0.2400, -0.2030, -0.0682, -0.9124],
        [ 0.7833,  2.2397, -1.2834, -0.2132]])


In [49]:
indices = torch.LongTensor([0,2])

In [50]:
torch.index_select(a,0,indices)

tensor([[ 0.6277, -2.0711,  0.7167,  0.9953],
        [-0.2400, -0.2030, -0.0682, -0.9124]])

In [51]:
torch.index_select(a,1,indices)

tensor([[ 0.6277,  0.7167],
        [ 0.4511, -1.2487],
        [-0.2400, -0.0682],
        [ 0.7833, -1.2834]])

It is a common practice to check non-missing values in a tensor, the
objective is to identify non-zero elements in a large tensor.

In [54]:
torch.nonzero(torch.tensor([10,00,23,0,0,0]))

tensor([[0],
        [2]])

Restructuring the input tensors into smaller tensors not only fastens
the calculation process, but also helps in distributed computing. The split
function splits a long tensor into smaller tensors.

In [55]:
#splitting the tensor into small chunks
torch.split(torch.tensor([12,21,34,32,45,54,56,65]),2)

(tensor([12, 21]), tensor([34, 32]), tensor([45, 54]), tensor([56, 65]))

In [56]:
#splitting the tensor into small chunks
torch.split(torch.tensor([12,21,34,32,45,54,56,65]),3)

(tensor([12, 21, 34]), tensor([32, 45, 54]), tensor([56, 65]))

The transpose function is
primarily used to reshape tensors. There are two ways of writing the
transpose function: .t and .transpose.


In [57]:
#how to reshape the tensors along a new dimension

In [58]:
x

tensor([[-1.5857, -1.6895, -0.9806,  0.1532,  0.1346],
        [ 1.9240,  0.6930,  2.1854,  0.8121, -0.4223],
        [-1.0471,  0.2860, -0.9779, -0.1609,  0.4070],
        [-0.5372, -0.9011, -0.1162,  0.5809, -1.3750]])

In [59]:
x.t() #transpose is one option to change the shape of the tensor

tensor([[-1.5857,  1.9240, -1.0471, -0.5372],
        [-1.6895,  0.6930,  0.2860, -0.9011],
        [-0.9806,  2.1854, -0.9779, -0.1162],
        [ 0.1532,  0.8121, -0.1609,  0.5809],
        [ 0.1346, -0.4223,  0.4070, -1.3750]])

In [60]:
x.transpose(1,0)

tensor([[-1.5857,  1.9240, -1.0471, -0.5372],
        [-1.6895,  0.6930,  0.2860, -0.9011],
        [-0.9806,  2.1854, -0.9779, -0.1162],
        [ 0.1532,  0.8121, -0.1609,  0.5809],
        [ 0.1346, -0.4223,  0.4070, -1.3750]])

The unbind function removes a dimension from a tensor. To remove
the dimension row, the 0 value needs to be passed. To remove a column,
the 1 value needs to be passed.

In [61]:
x

tensor([[-1.5857, -1.6895, -0.9806,  0.1532,  0.1346],
        [ 1.9240,  0.6930,  2.1854,  0.8121, -0.4223],
        [-1.0471,  0.2860, -0.9779, -0.1609,  0.4070],
        [-0.5372, -0.9011, -0.1162,  0.5809, -1.3750]])

In [62]:
torch.unbind(x,1) #dim=1 removing a column

(tensor([-1.5857,  1.9240, -1.0471, -0.5372]),
 tensor([-1.6895,  0.6930,  0.2860, -0.9011]),
 tensor([-0.9806,  2.1854, -0.9779, -0.1162]),
 tensor([ 0.1532,  0.8121, -0.1609,  0.5809]),
 tensor([ 0.1346, -0.4223,  0.4070, -1.3750]))

In [63]:
torch.unbind(x) #dim=0 removing a row

(tensor([-1.5857, -1.6895, -0.9806,  0.1532,  0.1346]),
 tensor([ 1.9240,  0.6930,  2.1854,  0.8121, -0.4223]),
 tensor([-1.0471,  0.2860, -0.9779, -0.1609,  0.4070]),
 tensor([-0.5372, -0.9011, -0.1162,  0.5809, -1.3750]))

In [64]:
x

tensor([[-1.5857, -1.6895, -0.9806,  0.1532,  0.1346],
        [ 1.9240,  0.6930,  2.1854,  0.8121, -0.4223],
        [-1.0471,  0.2860, -0.9779, -0.1609,  0.4070],
        [-0.5372, -0.9011, -0.1162,  0.5809, -1.3750]])

In [65]:
#addition value to the existing tensor,scalar addition
torch.add(x,20)

tensor([[18.4143, 18.3105, 19.0194, 20.1532, 20.1346],
        [21.9240, 20.6930, 22.1854, 20.8121, 19.5777],
        [18.9529, 20.2860, 19.0221, 19.8391, 20.4070],
        [19.4628, 19.0989, 19.8838, 20.5809, 18.6250]])

In [66]:
#scalar multiplication
torch.mul(x,2)

tensor([[-3.1714, -3.3790, -1.9612,  0.3065,  0.2693],
        [ 3.8481,  1.3860,  4.3708,  1.6242, -0.8446],
        [-2.0941,  0.5721, -1.9557, -0.3219,  0.8139],
        [-1.0745, -1.8023, -0.2325,  1.1619, -2.7500]])

torch.argmin(input, dim=None, keepdim=False)[SOURCE]
Returns the indices of the minimum values of a tensor across a dimension.

This is the second value returned by torch.min(). See its documentation for the exact semantics of this method.

Parameters:	
input (Tensor) – the input tensor
dim (int) – the dimension to reduce. If None, the argmin of the flattened input is returned.
keepdim (bool) – whether the output tensors have dim retained or not. Ignored if dim=None.

In [2]:
import torch
d= torch.randn(4,5)
d

tensor([[ 0.5031, -0.3597,  0.5115, -0.8496, -0.5085],
        [ 1.3130,  0.0022, -0.1272, -0.5685, -1.5096],
        [ 1.5656, -0.4109, -0.1015,  0.8786,  0.9383],
        [-0.0375, -0.1174,  0.5063, -0.8111,  0.1513]])

In [3]:
torch.argmin(d,dim=1)

tensor([3, 4, 1, 3])

torch.argmax(input, dim=None, keepdim=False)[SOURCE]
Returns the indices of the maximum values of a tensor across a dimension.

This is the second value returned by torch.max(). See its documentation for the exact semantics of this method.

Parameters:	
input (Tensor) – the input tensor
dim (int) – the dimension to reduce. If None, the argmax of the flattened input is returned.
keepdim (bool) – whether the output tensors have dim retained or not. Ignored if dim=None.

In [4]:
torch.argmax(d,dim=1)

tensor([2, 0, 0, 2])

Like NumPy operations, the tensor values must be rounded up by 
using either the ceiling or the flooring function, which is done using the 
following syntax

In [5]:
torch.manual_seed(1234)
torch.randn(5,5)

tensor([[-0.1117, -0.4966,  0.1631, -0.8817,  0.0539],
        [ 0.6684, -0.0597, -0.4675, -0.2153, -0.7141],
        [-1.0831, -0.5547,  0.9717, -0.5150,  1.4255],
        [ 0.7987, -1.4949,  1.4778, -0.1696, -0.9919],
        [-1.4569,  0.2563, -0.4030,  0.4195,  0.9380]])

In [6]:
torch.manual_seed(1234)
torch.ceil(torch.randn(5,5))

tensor([[-0., -0.,  1., -0.,  1.],
        [ 1., -0., -0., -0., -0.],
        [-1., -0.,  1., -0.,  2.],
        [ 1., -1.,  2., -0., -0.],
        [-1.,  1., -0.,  1.,  1.]])

In [7]:
torch.manual_seed(1234)
torch.floor(torch.randn(5,5))

tensor([[-1., -1.,  0., -1.,  0.],
        [ 0., -1., -1., -1., -1.],
        [-2., -1.,  0., -1.,  1.],
        [ 0., -2.,  1., -1., -1.],
        [-2.,  0., -1.,  0.,  0.]])

Limiting the values of any tensor within a certain range can be done 
using the minimum and maximum argument and using the clamp 
function. The same function can apply minimum and maximum in 
parallel or any one of them to any tensor, be it 1D or 2D; 1D is the far 
simpler version. The following example shows the implementation in 
a 2D scenario

In [8]:
#trucate the values in a range say 0,1
torch.manual_seed(1234)
torch.clamp(torch.floor(torch.randn(5,5)), min=0.3,max=0.4)

tensor([[0.3000, 0.3000, 0.3000, 0.3000, 0.3000],
        [0.3000, 0.3000, 0.3000, 0.3000, 0.3000],
        [0.3000, 0.3000, 0.3000, 0.3000, 0.4000],
        [0.3000, 0.3000, 0.4000, 0.3000, 0.3000],
        [0.3000, 0.3000, 0.3000, 0.3000, 0.3000]])

In [10]:
#trucate with only lower limit
torch.manual_seed(1234)
torch.clamp(torch.floor(torch.randn(5,5)), min=-0.3)

tensor([[-0.3000, -0.3000,  0.0000, -0.3000,  0.0000],
        [ 0.0000, -0.3000, -0.3000, -0.3000, -0.3000],
        [-0.3000, -0.3000,  0.0000, -0.3000,  1.0000],
        [ 0.0000, -0.3000,  1.0000, -0.3000, -0.3000],
        [-0.3000,  0.0000, -0.3000,  0.0000,  0.0000]])

In [11]:
#trucate with only upper limit
torch.manual_seed(1234)
torch.clamp(torch.floor(torch.randn(5,5)), max=0.3)

tensor([[-1.0000, -1.0000,  0.0000, -1.0000,  0.0000],
        [ 0.0000, -1.0000, -1.0000, -1.0000, -1.0000],
        [-2.0000, -1.0000,  0.0000, -1.0000,  0.3000],
        [ 0.0000, -2.0000,  0.3000, -1.0000, -1.0000],
        [-2.0000,  0.0000, -1.0000,  0.0000,  0.0000]])

In [15]:
x=torch.randn(2,2)
x

tensor([[ 1.2550,  0.2626],
        [-0.0773,  0.2841]])

In [16]:
#compute the exponential of a tensor
torch.exp(x)

tensor([[3.5079, 1.3003],
        [0.9256, 1.3286]])

In [18]:
import numpy as np
np.exp(x)

tensor([[3.5079, 1.3003],
        [0.9256, 1.3286]])

In [19]:
#how to get fractional part of each tensor

In [20]:
torch.add(x,10)

tensor([[11.2550, 10.2626],
        [ 9.9227, 10.2841]])

In [21]:
torch.frac(torch.add(x,10))

tensor([[0.2550, 0.2626],
        [0.9227, 0.2841]])

#The following syntax explains the logarithmic values in a tensor. The 
values with a negative sign are converted to nan. The power function 
computes the exponential of any value in a tensor

In [None]:
#compute the log of the values in a tensor

In [22]:
x

tensor([[ 1.2550,  0.2626],
        [-0.0773,  0.2841]])

In [23]:
torch.log(x)

tensor([[ 0.2271, -1.3370],
        [    nan, -1.2583]])

In [24]:
#to rectify the negative values do a power transformation
torch.pow(x,2)

tensor([[1.5751, 0.0690],
        [0.0060, 0.0807]])

To compute the transformation functions (i.e., sigmoid, hyperbolic 
tangent, radial basis function, and so forth, which are the most commonly 
used transfer functions in deep learning), you must construct the tensors. 
The following sample script shows how to create a sigmoid function and 
apply it on a tensor

In [25]:
#how to compute the sigmid of the input tensor


In [26]:
x

tensor([[ 1.2550,  0.2626],
        [-0.0773,  0.2841]])

In [27]:
torch.sigmoid(x)

tensor([[0.7782, 0.5653],
        [0.4807, 0.5706]])

In [28]:
#finding the squareroot of the values
x

tensor([[ 1.2550,  0.2626],
        [-0.0773,  0.2841]])

In [29]:
torch.sqrt(x)

tensor([[1.1203, 0.5125],
        [   nan, 0.5331]])

In probability and statistics, a random variable is also known as a 
stochastic variable, whose outcome is dependent on a purely stochastic 
phenomenon, or random phenomenon. There are different types of 
probability distributions, including normal distribution, binomial 
distribution, multinomial distribution, and Bernoulli distribution. Each 
statistical distribution has its own properties.
The torch.distributions module contains probability distributions 
and sampling functions. Each distribution type has its own importance 
in a computational graph. The distributions module contains binomial, 
Bernoulli, beta, categorical, exponential, normal, and Poisson 
distributions

Problem
Weight initialization is an important task in training a neural network and 
any kind of deep learning model, such as a convolutional neural network 
(CNN), a deep neural network (DNN), and a recurrent neural network 
(RNN). The question always remains on how to initialize the weights.
Solution
Weight initialization can be done by using various methods, including 
random weight initialization. Weight initialization based on a distribution 
is done using uniform distribution, Bernoulli distribution, multinomial 
distribution, and normal distribution. How to do it using PyTorch is 
explained next

How It Works
To execute a neural network, a set of initial weights needs to be passed to 
the backpropagation layer to compute the loss function (and hence, the 
accuracy can be calculated). The selection of a method depends on the 
data type, the task, and the optimization required for the model. Here we 
are going to look at all types of approaches to initialize weights.
If the use case requires reproducing the same set of results to maintain 
consistency, then a manual seed needs to be set

In [30]:
import torch

In [31]:
#how to perform random sampling of the tensors

In [32]:
torch.manual_seed(1234)

<torch._C.Generator at 0x2207ce458b0>

In [33]:
torch.manual_seed(1234)
torch.randn(4,4)

tensor([[-0.1117, -0.4966,  0.1631, -0.8817],
        [ 0.0539,  0.6684, -0.0597, -0.4675],
        [-0.2153,  0.8840, -0.7584, -0.3689],
        [-0.3424, -1.4020,  0.3206, -1.0219]])

The seed value can be customized. The random number is generated 
purely by chance. Random numbers can also be generated from a 
statistical distribution. The probability density function of the continuous 
uniform distribution is defined by the following formula.
f(x)= 1/(b-a)  for a<= x <=b,
f(x)=0   for a< x >b,
The function of x has two points, a and b, in which a is the starting 
point and b is the end. In a continuous uniform distribution, each number 
has an equal chance of being selected. In the following example, the start 
is 0 and the end is 1; between those two digits, all 16 elements are selected 
randoml

In [35]:
torch.Tensor(4,4).uniform_(0,1) #random number from uniform distribution

tensor([[0.2837, 0.6567, 0.2388, 0.7313],
        [0.6012, 0.3043, 0.2548, 0.6294],
        [0.9665, 0.7399, 0.4517, 0.4757],
        [0.7842, 0.1525, 0.6662, 0.3343]])

In statistics, the Bernoulli distribution is considered as the discrete 
probability distribution, which has two possible outcomes. If the event 
happens, then the value is 1, and if the event does not happen, then the 
value is 0.
For discrete probability distribution, we calculate probability mass 
function instead of probability density function. The probability mass 
function looks like the following formula.
q=(1-p)  fork=0
     p   fork=1
From the Bernoulli distribution, we create sample tensors by 
considering the uniform distribution of size 4 and 4 in a matrix format, 
as follow

    

In [36]:
torch.bernoulli(torch.Tensor(4,4).uniform_(0,1))

tensor([[0., 0., 0., 0.],
        [1., 0., 1., 0.],
        [1., 0., 1., 1.],
        [0., 0., 0., 0.]])

The generation of sample random values from a multinomial 
distribution is defined by the following script. In a multinomial 
distribution, we can choose with a replacement or without a replacement. 
By default, the multinomial function picks up without a replacement and 
returns the result as an index position for the tensors. If we need to run it 
with a replacement, then we need to specify that while sampling

In [38]:
torch.Tensor([10,10,13,10,34,45,65,67,87,89,87,34])

tensor([10., 10., 13., 10., 34., 45., 65., 67., 87., 89., 87., 34.])

In [42]:
torch.multinomial(torch.Tensor([10,10,13,10,34,45,65,67,87,89,87,34]), num_samples=4)

tensor([ 8, 11,  5,  6])

In [43]:
#Sampling from multinomial distribution with a replacement returns 
#the tensors’ index values
torch.multinomial(torch.Tensor([10,10,13,10,34,45,65,67,87,89,87,34]), num_samples=4,replacement=True)

tensor([11, 10,  9, 11])

The weight initialization from the normal distribution is a method 
that is used in fitting a neural network, fitting a deep neural network, and 
CNN and RNN. Let’s have a look at the process of creating a set of random 
weights generated from a normal distribution

In [44]:
torch.normal(mean=torch.arange(1.,11.),
                            std=torch.arange(1,0,-0.1))

tensor([1.6486, 3.5888, 3.8649, 4.8705, 5.5382, 5.9608, 7.2219, 7.9747, 9.1181,
        9.8997])

In [45]:
torch.normal(mean=0.5,
                            std=torch.arange(1.,6.))

tensor([-1.2873,  0.6076,  2.9737, -1.7891, -1.9382])

In [46]:
torch.normal(mean=0.5,
                            std=torch.arange(0.2,0.6))

tensor([0.5389])

#What is a variable in PyTorch and how is it defined? What is a random 
#variable in PyTorch
Solution
In PyTorch, the algorithms are represented as a computational graph. 
A variable is considered as a representation around the tensor object, 
corresponding gradients, and a reference to the function from where it was 
created. For simplicity, gradients are considered as slope of the function. 
The slope of the function can be computed by the derivative of the 
function with respect to the parameters that are present in the function. 
For example, in linear regression (Y = W*X + alpha), 
Basically, a PyTorch variable is a node in a computational graph, which 
stores data and gradients. When training a neural network model, after 
each iteration, we need to compute the gradient of the loss function with 
respect to the parameters of the model, such as weights and biases. After 
that, we usually update the weights using the gradient descent algorithm. 


How It Works
An example of how a variable is used to create a computational graph is 
displayed in the following script. There are three variable objects around 
tensors— x1, x2, and x3—with random points generated from a = 12 and 
b = 23. The graph computation involves only multiplication and addition, 
and the final result with the gradient is shown.
The partial derivative of the loss function with respect to the weights 
and biases in a neural network model is achieved in PyTorch using the 
Autograd module. Variables are specifically designed to hold the changed 
values while running a backpropagation in a neural network model when 
the parameters of the model change. The variable type is just a wrapper 
around the tensor. It has three properties: data, grad, and function.

In [47]:
from torch.autograd  import Variable
Variable(torch.ones(2,2),requires_grad=True)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

In [48]:
a,b = 12,23
x1 = Variable(torch.randn(a,b),
                        requires_grad=True)
x2 = Variable(torch.randn(a,b),
                        requires_grad=True)
x3 = Variable(torch.randn(a,b),
                        requires_grad=True)


In [49]:
c = x1 * x2
d = a+x3
e = torch.sum(d)

e.backward()
print(e)

tensor(3299.6853, grad_fn=<SumBackward0>)


How do we compute basic statistics, such as mean, median, mode, and so 
forth, from a Torch tensor


Solution

Computation of basic statistics using PyTorch enables the user to apply 
probability distributions and statistical tests to make inferences from data. 
Though the Torch functionality is like that of Numpy, Torch functions have 
GPU acceleration. Let’s have a look at the functions to create basic statistic

How It Works

The mean computation is simple to write for a 1D tensor; however, for a 2D 
tensor, an extra argument needs to be passed as a mean, median, or mode 
computation, across which the dimension needs to be specified

In [51]:
#computing the descriptive statistics: mean
torch.mean(torch.tensor([10.,10.,13.,10.,34.,45.,65.,67.,87.,89.,87.,34.]))

tensor(45.9167)

In [52]:
#mean across rows and across columns
d =torch.randn(4,5)
d

tensor([[ 0.0981, -0.1275,  0.1092, -0.2081, -0.6718],
        [-0.7763,  1.1963, -0.4281,  0.7897, -1.5123],
        [ 0.4537, -0.5508,  0.5083, -0.0619, -0.7811],
        [ 1.3835, -0.2480,  0.2184,  0.7165, -1.0857]])

In [53]:
torch.mean(d,dim=0)

tensor([ 0.2898,  0.0675,  0.1020,  0.3091, -1.0127])

In [54]:
torch.mean(d,dim=1)

tensor([-0.1600, -0.1461, -0.0864,  0.1970])

In [56]:
#Median, mode, and standard deviation computation can be written in 
#he same way

In [57]:
#compute median
torch.median(d,dim=0)

torch.return_types.median(
values=tensor([ 0.0981, -0.2480,  0.1092, -0.0619, -1.0857]),
indices=tensor([0, 3, 0, 2, 3]))

In [58]:
torch.median(d,dim=1)

torch.return_types.median(
values=tensor([-0.1275, -0.4281, -0.0619,  0.2184]),
indices=tensor([1, 2, 3, 2]))

In [59]:
#compute the mode
torch.mode(d)

torch.return_types.mode(
values=tensor([-0.6718, -1.5123, -0.7811, -1.0857]),
indices=tensor([4, 4, 4, 4]))

In [60]:
torch.mode(d,dim=0)


torch.return_types.mode(
values=tensor([-0.7763, -0.5508, -0.4281, -0.2081, -1.5123]),
indices=tensor([1, 2, 1, 0, 1]))

In [61]:
torch.mode(d,dim=1)

torch.return_types.mode(
values=tensor([-0.6718, -1.5123, -0.7811, -1.0857]),
indices=tensor([4, 4, 4, 4]))

Standard deviation shows the deviation from the measures of central 
tendency, which indicates the consistency of the data/variable. It shows 
whether there is enough fluctuation in data or not

In [62]:
#compute the standard deviation
torch.std(d)

tensor(0.7508)

In [63]:
torch.std(d,dim=0)

tensor([0.8938, 0.7733, 0.3914, 0.5170, 0.3763])

In [64]:
torch.std(d,dim=1)

tensor([0.3180, 1.1203, 0.5797, 0.9383])

In [65]:
#compute variance
torch.var(d)

tensor(0.5636)

In [66]:
torch.var(d,dim=0)

tensor([0.7988, 0.5980, 0.1532, 0.2673, 0.1416])

In [67]:
torch.var(d,dim=1)

tensor([0.1011, 1.2552, 0.3360, 0.8804])

Gradient Computation
Problem
How do we compute basic gradients from the sample tensors using 
PyTorch?
Solution
We are going to consider a sample datase0074, where two variables (x and y) 
are present. With the initial weight given, can we computationally get the 
gradients after each iteration? Let’s take a look at the example

How It Works
x_data and y_data both are lists. To compute the gradient of the two data 
lists requires computation of a loss function, a forward pass, and running 
the stuff in a loop.
The forward function computes the matrix multiplication of the weight 
tensor with the input tensor

In [68]:
def forward(x):
        return x * w

In [69]:
import torch
from torch.autograd import Variable

x_data = [11.0,22.0,33.0]
y_data = [21.0,14.0,64.0]

w = Variable(torch.Tensor([1.0]),  requires_grad=True) #Any random value

#Before training
print("predict (before training)",4,forward(4).data[0])

predict (before training) 4 tensor(4.)


In [70]:
#Using forward pass
def forward(x):
        return x * w


In [74]:
#define the Loss function
def  loss(x,y):
      y_pred = forward(x)
      return (y_pred - y) * (y_pred - y)

In [85]:
#Run the Training Loop
for epoch in range(10):
       for x_val, y_val in zip(x_data, y_data):
        l = loss(x_val, y_val)
        l.backward()
        print("\tgrad:  ",x_val, y_val,w.grad.data[0])
        w.data = w.data - 0.01 * w.grad.data

          #Manually set the gradients to zero after updating weights
        w.grad.data.zero_()
       print("progress:",epoch,l.data[0])

	grad:   11.0 21.0 tensor(-220.)
	grad:   22.0 14.0 tensor(2481.6001)
	grad:   33.0 64.0 tensor(-51303.6484)
progress: 0 tensor(604238.8125)
	grad:   11.0 21.0 tensor(118461.7578)
	grad:   22.0 14.0 tensor(-671630.6875)
	grad:   33.0 64.0 tensor(13114108.)
progress: 1 tensor(3.9481e+10)
	grad:   11.0 21.0 tensor(-30279010.)
	grad:   22.0 14.0 tensor(1.7199e+08)
	grad:   33.0 64.0 tensor(-3.3589e+09)
progress: 2 tensor(2.5900e+15)
	grad:   11.0 21.0 tensor(7.7553e+09)
	grad:   22.0 14.0 tensor(-4.4050e+10)
	grad:   33.0 64.0 tensor(8.6030e+11)
progress: 3 tensor(1.6991e+20)
	grad:   11.0 21.0 tensor(-1.9863e+12)
	grad:   22.0 14.0 tensor(1.1282e+13)
	grad:   33.0 64.0 tensor(-2.2034e+14)
progress: 4 tensor(1.1146e+25)
	grad:   11.0 21.0 tensor(5.0875e+14)
	grad:   22.0 14.0 tensor(-2.8897e+15)
	grad:   33.0 64.0 tensor(5.6436e+16)
progress: 5 tensor(7.3118e+29)
	grad:   11.0 21.0 tensor(-1.3030e+17)
	grad:   22.0 14.0 tensor(7.4013e+17)
	grad:   33.0 64.0 tensor(-1.4455e+19)
progress: 6

In [86]:
#After training
print("predict ( after training)",4, forward(4).data[0])

predict ( after training) 4 tensor(-9.2687e+24)


The following program shows how to compute the gradients from a 
loss function using the variable method on the tensor

In [87]:
from torch import FloatTensor
from torch.autograd import Variable

a = Variable(FloatTensor([5]))

weights = [Variable(FloatTensor([i]), requires_grad=True) for i in (12,53,91,73)]

w1,w2,w3,w4 = weights

b = w1 * a
c = w2 * a
d = w3 * b + w4 * c
Loss = (10 - d)

Loss.backward()

for index,weight in enumerate(weights,start=1):
       gradient, *_= weight.grad.data
       print(f"Gradient of w{index} w.r.t.  to Loss:  {gradient}")

Gradient of w1 w.r.t.  to Loss:  -455.0
Gradient of w2 w.r.t.  to Loss:  -365.0
Gradient of w3 w.r.t.  to Loss:  -60.0
Gradient of w4 w.r.t.  to Loss:  -265.0


Problem
How do we compute or perform operations based on variables such as 
matrix multiplication?
Solution
Tensors are wrapped within the variable, which has three properties: grad, 
volatile, and gradient.
How It Works
Let’s create a variable and extract the properties of the variable. This is 
required to weight update process requires gradient computation. By using 
the mm module, we can perform matrix multiplication

In [89]:
x = Variable(torch.Tensor(4,4).uniform_(-4,5))
y = Variable(torch.Tensor(4,4).uniform_(-3,2))
#matrix multiplication
z = torch.mm(x,y)
print(z.size())

torch.Size([4, 4])


The following program shows the properties of the variable, which is a 
wrapper around the tensor

In [90]:
z = Variable(torch.Tensor(4,4).uniform_(-5,5))
print(z)

tensor([[ 4.5875, -2.3846,  2.2561,  2.6735],
        [ 0.6668, -3.1370, -0.3582, -0.9842],
        [ 4.9814,  3.4517,  0.5688,  2.6663],
        [-3.9159,  2.8737,  2.3478,  4.9391]])


In [91]:
print('Requires Gradient : %s ' %(z.requires_grad))
print('Volatile : %s ' %(z.volatile))
print('Gradient : %s ' %(z.grad))
print(z.data)

Requires Gradient : False 
Volatile : False 
Gradient : None 
tensor([[ 4.5875, -2.3846,  2.2561,  2.6735],
        [ 0.6668, -3.1370, -0.3582, -0.9842],
        [ 4.9814,  3.4517,  0.5688,  2.6663],
        [-3.9159,  2.8737,  2.3478,  4.9391]])


  print('Volatile : %s ' %(z.volatile))


Problem
How do we compute or perform operations based on variables such 
as matrix-vector computation, and matrix-matrix and vector-vector 
calculation?
Solution
One of the necessary conditions for the success of matrix-based operations 
is that the length of the tensor needs to match or be compatible for the 
execution of algebraic expressions
How It Works
The tensor definition of a scalar is just one number. A 1D tensor is a 
vector, and a 2D tensor is a matrix. When it extends to an n dimensional 
level, it can be generalized to only tensors. When performing algebraic 
computations in PyTorch, the dimension of a matrix and a vector or scalar 
should be compatible

In [92]:
#tensor operations

In [93]:
mat1 = torch.FloatTensor(4,4).uniform_(0,1)
mat1

tensor([[0.3580, 0.1682, 0.9159, 0.9878],
        [0.5393, 0.6097, 0.9861, 0.0763],
        [0.3469, 0.1317, 0.7387, 0.8339],
        [0.0693, 0.4194, 0.0466, 0.8690]])

In [94]:
mat2 = torch.FloatTensor(5,4).uniform_(0,1)
mat2

tensor([[0.8990, 0.6630, 0.4962, 0.4947],
        [0.8344, 0.6721, 0.1182, 0.5997],
        [0.8990, 0.8252, 0.1466, 0.1093],
        [0.8135, 0.9047, 0.2486, 0.1873],
        [0.6159, 0.2471, 0.7582, 0.6879]])

In [95]:
vec1 = torch.FloatTensor(4).uniform_(0,1)
vec1

tensor([0.8949, 0.3995, 0.3528, 0.1089])

In [96]:
mat1 + 10.5

tensor([[10.8580, 10.6682, 11.4159, 11.4878],
        [11.0393, 11.1097, 11.4861, 10.5763],
        [10.8469, 10.6317, 11.2387, 11.3339],
        [10.5693, 10.9194, 10.5466, 11.3690]])

In [97]:
#scalar subtraction


In [98]:
mat2 - 0.20

tensor([[ 0.6990,  0.4630,  0.2962,  0.2947],
        [ 0.6344,  0.4721, -0.0818,  0.3997],
        [ 0.6990,  0.6252, -0.0534, -0.0907],
        [ 0.6135,  0.7047,  0.0486, -0.0127],
        [ 0.4159,  0.0471,  0.5582,  0.4879]])

In [99]:
#vector and matrix addition

In [100]:
mat1 + vec1


tensor([[1.2530, 0.5676, 1.2687, 1.0967],
        [1.4342, 1.0092, 1.3389, 0.1852],
        [1.2418, 0.5312, 1.0915, 0.9428],
        [0.9643, 0.8189, 0.3995, 0.9779]])

In [101]:
mat2 +vec1

tensor([[1.7939, 1.0625, 0.8490, 0.6036],
        [1.7293, 1.0716, 0.4710, 0.7086],
        [1.7939, 1.2247, 0.4995, 0.2182],
        [1.7084, 1.3042, 0.6014, 0.2961],
        [1.5108, 0.6466, 1.1111, 0.7968]])

Since the mat1 and the mat2 dimensions are different, they are not 
compatible for matrix addition or multiplication. If the dimension remains 
the same, we can multiply them. 

Problem
Knowledge of statistical distributions is essential for weight normalization, 
weight initialization, and computation of gradients in neural network–
based operations using PyTorch. How do we know which distributions to 
use and when to use them?
Solution
Each statistical distribution follows a pre-established mathematical 
formula. We are going to use the most commonly used statistical 
distributions, their arguments in scenarios of problems.
How It Works
Bernoulli distribution is a special case of binomial distribution, in which 
the number of trials can be more than one; but in a Bernoulli distribution, 
the number of experiment or trial remains one. It is a discrete probability 
distribution of a random variable, which takes a value of 1 when there is 
probability that an event is a success, and takes a value of 0 when there is 
probability that an event is a failure. A perfect example of this is tossing a 
coin, where 1 is heads and 0 is tails. Let’s look at the program.