# TOC

__Chapter 2 - Probability distributions using PyTorch__

1. [Import](#Import)
1. [Recipe 2-1 : Sampling tensors](#Recipe-2-1-:-Sampling-tensors)
1. [Recipe 2-2 : Variable tensors](#Recipe-2-2-:-Variable-tensors)
1. [Recipe 2-3 : Basic statistics](#Recipe-2-3-:-Basic-statistics)
1. [Recipe 2-4 : Gradient computation](#Recipe-2-4-:-Gradient-computation)
1. [Recipe 2-5 : Tensor operations](#Recipe-2-5-:-Tensor-operations)
1. [Recipe 2-6 : Tensor operations 2](#Recipe-2-6-:-Tensor-operations-2)
1. [Recipe 2-7 : Distributions](#Recipe-2-7-:-Distributions)

# Import

<a id = 'Import'></a>

In [3]:
# standard libary and settings
import os
import sys
import importlib
import itertools
import warnings

warnings.simplefilter("ignore")
from IPython.core.display import display, HTML

display(HTML("<style>.container { width:95% !important; }</style>"))

# data extensions and settings
import numpy as np

np.set_printoptions(threshold=np.inf, suppress=True)
import pandas as pd

pd.set_option("display.max_rows", 500)
pd.set_option("display.max_columns", 500)
pd.options.display.float_format = "{:,.6f}".format

import torch

# visualization extensions and settings
import seaborn as sns
import matplotlib.pyplot as plt

# magic functions
%matplotlib inline

# Recipe 2-1 : Sampling tensors

__Problem__: There are many ways to initialize a tensor of weights in a neural network.

__Solution__: Explore the various methods for initializing weights using several different distributions, including uniform, Bernoulli, multinomial and normal distributions

<a id = 'Recipe-2-1-:-Sampling-tensors'></a>

In [31]:
# create tensor of values drawn from a uniform distribution between 0 and 1
torch.Tensor(4, 4).uniform_(0, 1)

tensor([[0.7893, 0.3216, 0.5247, 0.6688],
        [0.8436, 0.4265, 0.9561, 0.0770],
        [0.4108, 0.0014, 0.5414, 0.6419],
        [0.2976, 0.7077, 0.4189, 0.0655]])

In [32]:
# create tensor of values drawn from a Bernoulli distribution
torch.bernoulli(torch.Tensor(4, 4).uniform_(0, 1))

tensor([[1., 1., 0., 1.],
        [1., 1., 1., 1.],
        [1., 0., 0., 1.],
        [1., 1., 0., 0.]])

In [46]:
# create tensor of values drawn from a multinomial distribution
# returns index values. by default samples without replacement
x = torch.Tensor([10, 10, 13, 10, 34, 45, 65, 67, 87, 89, 87, 34])
torch.multinomial(x, 3)

tensor([3, 5, 8])

In [49]:
# create tensor of values drawn from a multinomial distribution
# returns index values. by default samples without replacement
x = torch.Tensor([10, 10, 13, 10, 34, 45, 65, 67, 87, 89, 87, 34])
torch.multinomial(x, 5, replacement=True)

tensor([10,  9, 11,  5, 10])

In [52]:
# initialize tensor of values drawn from normal distribution
# draw ten values from a normal distribution with increasing means
# and decreasing standard deviations
torch.normal(mean=torch.arange(1.0, 11.0), std=torch.arange(1, 0, -0.1))

tensor([1.8246, 1.4849, 2.6099, 4.1361, 4.3711, 5.9630, 6.8620, 7.5366, 9.0137,
        9.8490])

In [55]:
# initialize tensor of values drawn from normal distribution
# draw five values from a normal distribution with a static mean
# and increasing standard deviations
torch.normal(mean=0.5, std=torch.arange(1.0, 6.0))

tensor([-0.3209,  2.5122, -2.6993, -1.3289,  0.9503])

In [59]:
# initialize tensor of values drawn from normal distribution
# draw one values from a normal distribution with a static mean
# and a static standard deviation
torch.normal(mean=0.5, std=torch.arange(0.1, 0.6))

tensor([0.3846])

# Recipe 2-2 : Variable tensors

__Problem__: What is a variable in PyTorch and how is it defined? What is a random variable in PyTorch?

__Solution__: PyTorch represents algorithms as a computational graph. A variable is a representation of the tensor object, corresponding gradients, and a reference to the function from which it was created. A PyTorch variable is a node in a computational graph that stores data and gradients. For example, in simple linear regression, the graph includes one-dimensional tensors X, Y, W and alpha. X is multiplied by Y to create B, and B is added to alpha to create Y.

In the following example, there are three variable objects representing the tensors x1, x2, and x3 with random points generated from a = 12 and b = 23. This is a simple graph that only involves multiplication and addition. PyTorch determines the partial derivative of the loss function with respect to the weights and biases in a neural network model using the Autograd module. Variables are specially designed to hold the changed values while running a backpropagation in a neural network model when the parameters of the model changed.

<a id = 'Recipe-2-1-:-Variable-tensors'></a>

In [61]:
# PyTorch operations with variable
from torch.autograd import Variable

Variable(torch.ones(2, 2), requires_grad=True)

a, b = 12, 23

x1 = Variable(torch.randn(a, b), requires_grad=True)
x2 = Variable(torch.randn(a, b), requires_grad=True)
x3 = Variable(torch.randn(a, b), requires_grad=True)

c = x1 * x2
d = a + x3
e = torch.sum(d)

e.backward()

print(e)

tensor(3317.7900, grad_fn=<SumBackward0>)


# Recipe 2-3 : Basic statistics

__Problem__: How do we calculate basic statistics such as mean, median, mode and so on?

__Solution__: PyTorch has several built in operations that handle these calculations.

<a id = 'Recipe-2-3-:-Basic-statistics'></a>

In [63]:
# compute the mean of a tensor
torch.mean(
    torch.tensor(
        [10.0, 10.0, 13.0, 10.0, 34.0, 45.0, 65.0, 67.0, 87.0, 89.0, 87.0, 34.0]
    )
)

tensor(45.9167)

In [65]:
# compute the mean of a tensor across columns
d = torch.randn(4, 5)
torch.mean(d, dim=0)

tensor([-0.2273, -0.7638, -0.5213, -0.6272, -0.0245])

In [67]:
# compute the mean of a tensor across rows
d = torch.randn(4, 5)
torch.mean(d, dim=1)

tensor([-0.5879, -0.3620,  0.3714, -0.3910])

In [75]:
# median, mode, variance and standard deviation are calculated the same way
print("\nMedian \n")
print(torch.median(d))
print(torch.median(d, dim=0))
print(torch.median(d, dim=1))

print("\nMode \n")
print(torch.mode(d))
print(torch.mode(d, dim=0))
print(torch.mode(d, dim=1))

print("\nStandard deviation \n")
print(torch.std(d))
print(torch.std(d, dim=0))
print(torch.std(d, dim=1))

print("\nVariance \n")
print(torch.var(d))
print(torch.var(d, dim=0))
print(torch.var(d, dim=1))


Median 

tensor(-0.5423)
(tensor([ 0.5568, -0.7525, -0.1093, -1.6368, -0.9087]), tensor([1, 3, 3, 0, 0]))
(tensor([-0.9087, -0.5524,  0.4179, -0.5423]), tensor([4, 1, 2, 4]))

Mode 

(tensor([-1.6368, -1.3341,  0.0235, -2.2679]), tensor([3, 4, 0, 3]))
(tensor([ 0.0235, -0.8017, -1.0633, -2.2679, -1.3341]), tensor([2, 0, 0, 3, 1]))
(tensor([-1.6368, -1.3341,  0.0235, -2.2679]), tensor([3, 4, 0, 3]))

Standard deviation 

tensor(1.0127)
tensor([0.7901, 0.4708, 0.6921, 1.2584, 0.8070])
tensor([1.1953, 0.8142, 0.2539, 1.4317])

Variance 

tensor(1.0255)
tensor([0.6242, 0.2217, 0.4790, 1.5835, 0.6513])
tensor([1.4287, 0.6629, 0.0645, 2.0497])


# Recipe 2-4 : Gradient computation

__Problem__: How do we compute basic gradients from the sample tensors using PyTorch.

__Solution__: We will use a sample dataset datase0074. There are two values, x and y.With the initial weights given, we can get the gradients after each iteration. We will use two list, x_data and y_data. Computing the gradient of the two data lists requires computation of a loss function, a forward pass and  running everything in a loop.

<a id = 'Recipe-2-4-:-Gradient-computation'></a>

In [85]:
# using forward pass
def forward(x):
    return x * w


x_data = [11.0, 22.0, 33.0]
y_data = [21.0, 14.0, 64.0]

w = Variable(torch.Tensor([1.0]), requires_grad=True)

# before training
print("predict (before training): {} {}".format(4, forward(4).data[0]))

predict (before training): 4 4.0


In [86]:
# loss function
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)

In [87]:
# run training loop
for epoch in range(10):
    for x_val, y_val in zip(x_data, y_data):
        l = loss(x_val, y_val)
        l.backward()
        print("\tgrad: {} {} {}".format(x_val, y_val, w.grad.data[0]))
        w.data = w.data - 0.01 * w.grad.data

        # manually set the gradients to zero after updating weights
        w.grad.data.zero_()

    print("progress: {} {}".format(epoch, l.data[0]))

	grad: 11.0 21.0 -220.0
	grad: 22.0 14.0 2481.60009765625
	grad: 33.0 64.0 -51303.6484375
progress: 0 604238.8125
	grad: 11.0 21.0 118461.7578125
	grad: 22.0 14.0 -671630.6875
	grad: 33.0 64.0 13114108.0
progress: 1 39481139200.0
	grad: 11.0 21.0 -30279010.0
	grad: 22.0 14.0 171986000.0
	grad: 33.0 64.0 -3358889472.0
progress: 2 2590022582665216.0
	grad: 11.0 21.0 7755301376.0
	grad: 22.0 14.0 -44050112512.0
	grad: 33.0 64.0 860298674176.0
progress: 3 1.6990675778403933e+20
	grad: 11.0 21.0 -1986333900800.0
	grad: 22.0 14.0 11282376818688.0
	grad: 33.0 64.0 -220344807849984.0
progress: 4 1.114596779703609e+25
	grad: 11.0 21.0 508751660449792.0
	grad: 22.0 14.0 -2889709562888192.0
	grad: 33.0 64.0 5.643602918322995e+16
progress: 5 7.311812297356367e+29
	grad: 11.0 21.0 -1.3030450598720307e+17
	grad: 22.0 14.0 7.40129586448171e+17
	grad: 33.0 64.0 -1.445473181358044e+19
progress: 6 4.79658536356913e+34
	grad: 11.0 21.0 3.3374367042303427e+19
	grad: 22.0 14.0 -1.8956639512458114e+20
	grad

In [88]:
# after training
print("predict (after training) {} {}".format(4, forward(4).data[0]))

predict (after training) 4 -9.268691075357862e+24


In [95]:
# compute the gradients from a loss function using the variable method on the tensor

from torch import FloatTensor
from torch.autograd import Variable

a = Variable(FloatTensor([5]))

weights = [Variable(FloatTensor([i]), requires_grad=True) for i in (12, 53, 91, 73)]

w1, w2, w3, w4 = weights

b = w1 * a
c = w2 * a
d = w3 * b + w4 * c
Loss = 10 - d
Loss.backward()

for index, weight in enumerate(weights, start=1):
    gradient, *_ = weight.grad.data
    print("gradient of {} w.r.t Loss: {}".format(index, gradient))

gradient of 1 w.r.t Loss: -455.0
gradient of 2 w.r.t Loss: -365.0
gradient of 3 w.r.t Loss: -60.0
gradient of 4 w.r.t Loss: -265.0


# Recipe 2-5 : Tensor operations
__Problem__: how do we compute or perform operations based on variables such as matrix multiplication?

__Solution__: Tensors are wrapped with the Variable, which has three properties: grad, volatile and gradient. So we can create a variable and extract the properties of the variable, and by using the mm module, we can perform matrix multiplication.

<a id = 'Recipe-2-5-:-Tensor-operations'></a>

In [97]:
# create two variables and perform matri multiplication
x = Variable(torch.Tensor(4, 4).uniform_(-4, 5))
y = Variable(torch.Tensor(4, 4).uniform_(-3, 2))
z = torch.mm(x, y)
print(z.size())
print(z)

torch.Size([4, 4])
tensor([[ 2.8845, -2.5487, -3.4192,  0.1215],
        [ 0.5551, -1.8283, -6.6794, -1.5815],
        [ 5.1654, -3.0978, -0.3022,  2.0202],
        [-1.3079,  4.3098, -1.1989,  0.6112]])


In [100]:
# print properties of variable
print("requires gradient: {}".format(z.requires_grad))
print("volatile: {}".format(z.volatile))
print("gradient: {}".format(z.grad))
print(z.data)

requires gradient: False
volatile: False
gradient: None
tensor([[ 2.8845, -2.5487, -3.4192,  0.1215],
        [ 0.5551, -1.8283, -6.6794, -1.5815],
        [ 5.1654, -3.0978, -0.3022,  2.0202],
        [-1.3079,  4.3098, -1.1989,  0.6112]])


# Recipe 2-6 : Tensor operations 2
__Problem__: How do we perform operations based on variables such as matrix-vector operations, matrix-matri operations and vector-vector operations?

__Solution__: Follow the rules of vector/matrix multiplication

<a id = 'Recipe-2-6-:-Tensor-operations-2'></a>

In [109]:
# define tensors
mat1 = torch.FloatTensor(4, 4).uniform_(0, 1)
mat1

tensor([[0.7786, 0.4222, 0.8917, 0.9283],
        [0.3969, 0.6694, 0.4934, 0.5396],
        [0.9355, 0.6115, 0.9139, 0.6781],
        [0.4118, 0.9431, 0.1685, 0.1206]])

In [110]:
# define tensors
mat2 = torch.FloatTensor(5, 4).uniform_(0, 1)
mat2

tensor([[0.7083, 0.8576, 0.1535, 0.5538],
        [0.2005, 0.7325, 0.1566, 0.0832],
        [0.4326, 0.9626, 0.3203, 0.0698],
        [0.3634, 0.6819, 0.7408, 0.9494],
        [0.6214, 0.7851, 0.4899, 0.3554]])

In [111]:
# define tensors
vec1 = torch.FloatTensor(4).uniform_(0, 1)
vec1

tensor([0.4807, 0.9386, 0.6107, 0.6612])

In [112]:
# scalar addition
mat1 + 10.5

tensor([[11.2786, 10.9222, 11.3917, 11.4283],
        [10.8969, 11.1694, 10.9934, 11.0396],
        [11.4355, 11.1115, 11.4139, 11.1781],
        [10.9118, 11.4431, 10.6685, 10.6206]])

In [113]:
# scalar subtraction
mat2 - 0.20

tensor([[ 5.0835e-01,  6.5757e-01, -4.6478e-02,  3.5376e-01],
        [ 5.3887e-04,  5.3246e-01, -4.3382e-02, -1.1684e-01],
        [ 2.3256e-01,  7.6260e-01,  1.2025e-01, -1.3018e-01],
        [ 1.6341e-01,  4.8187e-01,  5.4082e-01,  7.4939e-01],
        [ 4.2141e-01,  5.8513e-01,  2.8990e-01,  1.5540e-01]])

In [114]:
# vector and matrix addition
mat1 + vec1

tensor([[1.2593, 1.3609, 1.5024, 1.5895],
        [0.8777, 1.6080, 1.1041, 1.2008],
        [1.4162, 1.5501, 1.5246, 1.3394],
        [0.8926, 1.8817, 0.7792, 0.7818]])

In [115]:
# vector and matrix addition
mat2 + vec1

tensor([[1.1891, 1.7962, 0.7642, 1.2150],
        [0.6812, 1.6711, 0.7673, 0.7444],
        [0.9133, 1.9012, 0.9309, 0.7310],
        [0.8441, 1.6205, 1.3515, 1.6106],
        [1.1021, 1.7238, 1.1006, 1.0166]])

In [116]:
# matrix multiplication
mat1 * mat1

tensor([[0.6062, 0.1783, 0.7951, 0.8617],
        [0.1576, 0.4481, 0.2434, 0.2911],
        [0.8751, 0.3739, 0.8352, 0.4599],
        [0.1696, 0.8894, 0.0284, 0.0145]])

# Recipe 2-7 : Distributions
__Problem__: Understanding statistical distributions is essential for weight normalization / initialization and computation of gradients in neural network-based operations. How do we know which distrbution to use?

__Solution__: Each distribution follows an established mathematical formula. We can utilize PyTorch's implementation of these distributions to explore how to apply them to different scenarios.

<a id = 'Recipe-2-7-:-Distributions'></a>

In [137]:
# Bernoulli distribution - binary sample where 1 with probability p and 0 with 1 - p
from torch.distributions import Bernoulli

# creates a Bernoulli distribution paramaterized by probabilities
# returns values 1 with probability p and 0 with probability 1 - p.
dist = Bernoulli(torch.tensor([0.3, 0.6, 0.9]))
dist.sample()

tensor([1., 0., 1.])

In [146]:
# beta distribution - a distribution of number random variables defined in the range of 0 to 1.
# typically used for Bayesian inference analysis
from torch.distributions.beta import Beta

dist = Beta(torch.tensor([0.5]), torch.tensor([0.5]))
dist.sample()

tensor([0.5445])

In [150]:
# binomial distribution - applicable when the outcome is twofold and the experiment is repetitive.
# the binomial distribution is used to model the number of successful events over many trials. it is a
# discrete probability distribution, where the probability of success is defined as 1 and
# the probability of failure is 0.
from torch.distributions.binomial import Binomial

dist = Binomial(100, torch.tensor([0, 0.2, 0.8, 1]))

# 100 trials, returns count of successes given a tensor of probabilities
dist.sample()

tensor([  0.,  19.,  81., 100.])

In [164]:
# object distribution - can be thought of as a generalized Bernoulli distribution, which can be
# expanded to > 2 outcomes
from torch.distributions.object import object

# pass in tensor of probabilities for n outcomes, adds up to 1.0
dist = object(torch.tensor([0.20, 0.20, 0.20, 0.20, 0.20]))
dist.sample()

tensor(2)

In [176]:
# laplacian distribution - a number distribution function that is also known as the double exponential
# distribution. often used in speech regonition systems to understand prior probabilities. also useful in
# bayesian regression for deciding prior probabilities
from torch.distributions.laplace import Laplace

dist = Laplace(torch.tensor([10.0]), torch.tensor([0.990]))
dist.sample()

tensor([9.6696])

In [183]:
# normal distribution - a distribution that is defined by the a mean and a standard deviation. If we know
# both of those elements, we can estimate event probabilities.
from torch.distributions.normal import Normal

dist = Normal(torch.tensor([100.0]), torch.tensor([10.0]))
dist.sample()

tensor([94.8375])