# Data Manipulation

Tensor:
- GPU is well-supported fto accelerate computations on tensors, NumPy only supports CPU computation
- Tensor class support automatic dfferentiation

## Getting started

In [1]:
import torch

A new tensor is stored in the main memory by default and designated for CPU-based computation.

In [2]:
x = torch.arange(10).reshape(2,-1)
x

tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])

In [3]:
x.shape

torch.Size([2, 5])

Total number of elements: product of number of elements in each axis

In [4]:
x.numel()

10

In [5]:
torch.zeros(2,3,4)

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])

In [6]:
torch.ones(1,2,3)

tensor([[[1., 1., 1.],
         [1., 1., 1.]]])

Sample values from distributions, for example, neural network initialization

In [7]:
# standard Gaussian
torch.randn(3,4)

tensor([[ 0.6714, -0.3224,  0.5101, -0.1838],
        [-0.4115, -1.6775, -0.3420,  0.5941],
        [ 0.4668, -0.7929,  0.4826, -0.2185]])

From python lists, for example, training data labels

In [8]:
torch.tensor([[2,1,3,4],[1,2,3,4],[4,2,3,1]])

tensor([[2, 1, 3, 4],
        [1, 2, 3, 4],
        [4, 2, 3, 1]])

## Operations

Elementwise operations

In [9]:
x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])

In [10]:
x + y, x - y, x / y, x ** y

(tensor([ 3.,  4.,  6., 10.]),
 tensor([-1.,  0.,  2.,  6.]),
 tensor([0.5000, 1.0000, 2.0000, 4.0000]),
 tensor([ 1.,  4., 16., 64.]))

In [11]:
torch.exp(x)

tensor([2.7183e+00, 7.3891e+00, 5.4598e+01, 2.9810e+03])

Linear algebra operations, for example, dot product & matrix multiplication

In [12]:
# concatenation
# must have the correct dimension
X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
X, Y

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]]), tensor([[2., 1., 4., 3.],
         [1., 2., 3., 4.],
         [4., 3., 2., 1.]]))

In [13]:
torch.cat((X, Y), dim=1)

tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
        [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
        [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]])

In [14]:
X == Y

tensor([[False,  True, False,  True],
        [False, False, False, False],
        [False, False, False, False]])

In [15]:
# sum over all elements in matrix, rather than axis
# input must be tensor, cannot be list
torch.sum(X == Y)

tensor(2)

In [16]:
X.sum()

tensor(66.)

In [17]:
(X == Y).sum()

tensor(2)

## Broadcasting mechanism

Can still perform elementwise operations when shapes are not exactly the same.
- Expand one or both arrays by copying elements appropriately so that after this transformation, the two tensors have the same shape
- Carry out elementwise operations on the resulting arrrays

In [18]:
a = torch.arange(3).reshape((3, 1)) # 3x1
b = torch.arange(2).reshape((1, 2)) # 1x2
a, b

(tensor([[0],
         [1],
         [2]]), tensor([[0, 1]]))

In [19]:
'''
a -> [[0,0], b -> [[0,1],
      [1,1],       [0,1],
      [2,2]]       [0,1]]
'''
a + b

tensor([[0, 1],
        [1, 2],
        [2, 3]])

## Indexing and slicing

In [20]:
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])

In [21]:
X[-1], X[1:3]

(tensor([ 8.,  9., 10., 11.]), tensor([[ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]]))

In [22]:
X[1,2]

tensor(6.)

In [23]:
X[:2, 1:]

tensor([[1., 2., 3.],
        [5., 6., 7.]])

Altering values in place

In [24]:
X[0:2, :] = 12
X

tensor([[12., 12., 12., 12.],
        [12., 12., 12., 12.],
        [ 8.,  9., 10., 11.]])

## Saving memory


In [25]:
before = id(Y)
Y = Y + X
print(id(Y) == before)
before

False


140417578052112

In-place operations, where we assign the result of an operation to a previously allocated array with slice notation.

In [26]:
Z = torch.zeros_like(Y) # zeros with the same shape as Y
id(Z)

140417578051728

In [27]:
Z[:] = X + Y
id(Z)

140417578051728

If X is not used in subsequentt computations, we can also use X[:] = X + Y or X += Y to reduce the memory overhead

In [28]:
print(id(X))
X += Y
id(X)

140417578053360


140417578053360

## Conversion to other python objects

torch Tensor and numpy array will share the underlying memory locations, changing one through an in-place operation will also change the other

- tensor.numpy()
- torch.from_numpy(ndarray)

In [29]:
X = torch.arange(12).reshape(3,4)
A = X.numpy()
B = torch.from_numpy(A)
type(A), type(B)

(numpy.ndarray, torch.Tensor)

convert a size-1 tensor to a python scale
- item()

In [30]:
a = torch.tensor([3.5])
a, a.item(), float(a), int(a)

(tensor([3.5000]), 3.5, 3.5, 3)

In [31]:
a = torch.tensor(3.5)
a, a.item(), float(a), int(a)

(tensor(3.5000), 3.5, 3.5, 3)

# Data Preprocessing

## Reading data

In [31]:
import pandas as pd

data = pd.read_csv(data_file)

## Handling missing data

In [None]:
inputs = inputs.fillna(inputs.mean())

[pd.get_dummies](https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html): convert Series to dummy codes

In [None]:
inputs = pd.get_dummies(inputs, dummy_na=True)

## Conversion to the tensor format

In [None]:
import torch
X, y = torch.tensor(inputs.values), torch.tensor(outputs.values) # DataFrame.to_numpy()

## Exercise

Delete the column with the most missing values

In [42]:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.choice([2,np.nan], (20, 5), p=[0.2, 0.8]), columns=list('abcde'))
df

Unnamed: 0,a,b,c,d,e
0,,,,2.0,2.0
1,,,,2.0,
2,2.0,2.0,,,
3,,,,,
4,,,,2.0,
5,,,,,
6,,,,2.0,
7,,2.0,,,
8,2.0,,,2.0,
9,,,2.0,2.0,


In [53]:
# isna(), isnull()
df.drop(df.isna().sum().idxmax(), axis=1)

Unnamed: 0,a,b,d,e
0,,,2.0,2.0
1,,,2.0,
2,2.0,2.0,,
3,,,,
4,,,2.0,
5,,,,
6,,,2.0,
7,,2.0,,
8,2.0,,2.0,
9,,,2.0,


# Linear Algebra

## Matrices

In [60]:
A = torch.arange(12).reshape(3,4)
A.T # 4x3

tensor([[ 0,  4,  8],
        [ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11]])

## Properties of Tensor Arithmetic

In [62]:
A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = A.clone() # allocating new memory
A, A + B

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [12., 13., 14., 15.],
         [16., 17., 18., 19.]]), tensor([[ 0.,  2.,  4.,  6.],
         [ 8., 10., 12., 14.],
         [16., 18., 20., 22.],
         [24., 26., 28., 30.],
         [32., 34., 36., 38.]]))

Hadamard product: elementwise multiplication of two matrices

In [63]:
A * B

tensor([[  0.,   1.,   4.,   9.],
        [ 16.,  25.,  36.,  49.],
        [ 64.,  81., 100., 121.],
        [144., 169., 196., 225.],
        [256., 289., 324., 361.]])

## Reduction

In [66]:
A

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]])

In [67]:
A.sum(axis=0), A.sum(axis=1)

(tensor([40., 45., 50., 55.]), tensor([ 6., 22., 38., 54., 70.]))

In [68]:
A.sum(axis=[0,1]), A.sum()

(tensor(190.), tensor(190.))

In [69]:
A.mean(), A.sum() / A.numel()

(tensor(9.5000), tensor(9.5000))

Reduction along a specific axis

In [70]:
A.mean(axis=0), A.sum(axis=0) / A.shape[0]

(tensor([ 8.,  9., 10., 11.]), tensor([ 8.,  9., 10., 11.]))

## Non-reduction sum

In [71]:
A.sum(axis=1, keepdims=True)

tensor([[ 6.],
        [22.],
        [38.],
        [54.],
        [70.]])

Cumulative sum of elements of A along some axis

In [72]:
A

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]])

In [73]:
A.cumsum(axis=0)

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  6.,  8., 10.],
        [12., 15., 18., 21.],
        [24., 28., 32., 36.],
        [40., 45., 50., 55.]])

## Dot product

In [78]:
x = torch.tensor([13,4,5,2], dtype=torch.float32)
y = torch.ones(4)
x, y, torch.dot(x,y)

(tensor([13.,  4.,  5.,  2.]), tensor([1., 1., 1., 1.]), tensor(24.))

which is equivalent to

In [79]:
torch.sum(x * y)

tensor(24.)

## Matrix-Vector products
- torch.mv(A, x)

In [80]:
A, x # 5,4 4,1

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [12., 13., 14., 15.],
         [16., 17., 18., 19.]]), tensor([13.,  4.,  5.,  2.]))

In [81]:
torch.mv(A, x)

tensor([ 20., 116., 212., 308., 404.])

## Matrix-Matrix multiplication

In [82]:
A, B

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [12., 13., 14., 15.],
         [16., 17., 18., 19.]]), tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [12., 13., 14., 15.],
         [16., 17., 18., 19.]]))

In [84]:
torch.mm(A, B.T)

tensor([[  14.,   38.,   62.,   86.,  110.],
        [  38.,  126.,  214.,  302.,  390.],
        [  62.,  214.,  366.,  518.,  670.],
        [  86.,  302.,  518.,  734.,  950.],
        [ 110.,  390.,  670.,  950., 1230.]])

## Norm

In [95]:
# L2 norm
x = torch.tensor([3.3,5,56,0.3], dtype=torch.float16)
torch.norm(x)

tensor(56.3125, dtype=torch.float16)

In [96]:
# L1 norm
torch.abs(x).sum()

tensor(64.6250, dtype=torch.float16)

In [98]:
u = torch.ones((4, 9))
torch.norm(u)

tensor(6.)

## Exercises

In [100]:
A = torch.arange(24).reshape(2,3,4)
len(A)

2

In [103]:
A

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

        [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]]])

In [110]:
A.sum(axis=0)

tensor([[12, 14, 16, 18],
        [20, 22, 24, 26],
        [28, 30, 32, 34]])

In [111]:
A.sum(axis=1)

tensor([[12, 15, 18, 21],
        [48, 51, 54, 57]])

In [109]:
A.sum(axis=2)

tensor([[ 6, 22, 38],
        [54, 70, 86]])

In [108]:
A.sum(axis=[2,0,1])

tensor(276)

In [124]:
U = torch.tensor(torch.arange(60) / 100, dtype=torch.float16).reshape(2,2,3,5)
U

  """Entry point for launching an IPython kernel.


tensor([[[[0.0000, 0.0100, 0.0200, 0.0300, 0.0400],
          [0.0500, 0.0600, 0.0700, 0.0800, 0.0900],
          [0.1000, 0.1100, 0.1200, 0.1300, 0.1400]],

         [[0.1500, 0.1600, 0.1700, 0.1801, 0.1899],
          [0.2000, 0.2100, 0.2200, 0.2300, 0.2400],
          [0.2500, 0.2600, 0.2700, 0.2800, 0.2900]]],


        [[[0.3000, 0.3101, 0.3201, 0.3301, 0.3401],
          [0.3501, 0.3601, 0.3701, 0.3799, 0.3899],
          [0.3999, 0.4099, 0.4199, 0.4299, 0.4399]],

         [[0.4500, 0.4600, 0.4700, 0.4800, 0.4900],
          [0.5000, 0.5098, 0.5200, 0.5298, 0.5400],
          [0.5498, 0.5601, 0.5698, 0.5801, 0.5898]]]], dtype=torch.float16)

[torch.linalg.norm](https://pytorch.org/docs/stable/generated/torch.linalg.norm.html)

In [125]:
torch.linalg.norm(U)

tensor(2.6484, dtype=torch.float16)

# Calculus

In [None]:
!pip install d2l

In [129]:
%matplotlib inline

In [130]:
import numpy as np
from IPython import display
from d2l import torch as d2l

In [132]:
def f(x):
    return 3 * x ** 2 - 4 * x
def numerical_lim(f, x, h):
    return (f(x + h) - f(x)) / h

In [133]:
h = 0.1
for i in range(5):
    print(f'h={h:.5f}, numerical limit={numerical_lim(f, 1, h):.5f}')
    h *= 0.1

h=0.10000, numerical limit=2.30000
h=0.01000, numerical limit=2.03000
h=0.00100, numerical limit=2.00300
h=0.00010, numerical limit=2.00030
h=0.00001, numerical limit=2.00003


# Automatic Differentiation

In [149]:
import torch
x = torch.arange(4.0)
x.requires_grad_(True)
# or x = torch.arange(4.0, requires_grad=True) 
x.grad # default to None

In [150]:
y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=<MulBackward0>)

automatically calculate the gradient of y with repect to each component of x
- function for backpropagation: backward()

In [151]:
y.backward() # loss.backward()
x.grad

tensor([ 0.,  4.,  8., 12.])

The gradient of function y = 2xTx with respect to x should be 4x

In [152]:
x.grad == 4 * x

tensor([True, True, True, True])

PyTorch accumulates the gradient in default, need to clear previous values

In [153]:
x.grad.zero_()

tensor([0., 0., 0., 0.])

In [154]:
y = x.sum()
y.backward()
x.grad

tensor([1., 1., 1., 1.])

When y is a vector, the gradient will be a matrix.



In [162]:
x.grad.zero_()
y = x * x
y.sum().backward() # or y.backward(torch.ones(len(x))), want to sum the partial derivatives
x.grad

tensor([0., 2., 4., 6.])

Sometimes we wish to move some calculations outside of the recorded computational graph. We can detach y to return a new variable u that has the same value as y but discards any informations about how y was computed in the computational graph. In other words, the gradient will not flow backwards through u to x. ie treat y as a constant

In [157]:
x.grad.zero_()
y = x * x
u = y.detach()
z = u * x
z.sum().backward()
x.grad == u

tensor([True, True, True, True])

In [158]:
x.grad.zero_()
y.sum().backward()
x.grad == 2 * x

tensor([True, True, True, True])

In [159]:
a = torch.randn(size=(), requires_grad=True)
a

tensor(-1.2113, requires_grad=True)

# Probability

In [163]:
import torch
from torch.distributions import multinomial
from d2l import torch as d2l

In [168]:
fair_probs = torch.ones([6]) / 6
multinomial.Multinomial(1, fair_probs).sample() # toss a fair die

tensor([0., 0., 0., 0., 0., 1.])

In [171]:
counts = multinomial.Multinomial(1000, fair_probs).sample()
counts / 1000, 1/6

(tensor([0.1520, 0.1600, 0.1730, 0.1740, 0.1680, 0.1730]), 0.16666666666666666)

# Documentation

Find all functions and classes in a module

In [173]:
import torch

print(dir(torch.distributions))

['AbsTransform', 'AffineTransform', 'Bernoulli', 'Beta', 'Binomial', 'CatTransform', 'Categorical', 'Cauchy', 'Chi2', 'ComposeTransform', 'ContinuousBernoulli', 'CorrCholeskyTransform', 'Dirichlet', 'Distribution', 'ExpTransform', 'Exponential', 'ExponentialFamily', 'FisherSnedecor', 'Gamma', 'Geometric', 'Gumbel', 'HalfCauchy', 'HalfNormal', 'Independent', 'IndependentTransform', 'Kumaraswamy', 'LKJCholesky', 'Laplace', 'LogNormal', 'LogisticNormal', 'LowRankMultivariateNormal', 'LowerCholeskyTransform', 'MixtureSameFamily', 'Multinomial', 'MultivariateNormal', 'NegativeBinomial', 'Normal', 'OneHotCategorical', 'OneHotCategoricalStraightThrough', 'Pareto', 'Poisson', 'PowerTransform', 'RelaxedBernoulli', 'RelaxedOneHotCategorical', 'ReshapeTransform', 'SigmoidTransform', 'SoftmaxTransform', 'StackTransform', 'StickBreakingTransform', 'StudentT', 'TanhTransform', 'Transform', 'TransformedDistribution', 'Uniform', 'VonMises', 'Weibull', '__all__', '__builtins__', '__cached__', '__doc__'

Find the usage of specific functions and classes

In [174]:
help(torch.ones)

Help on built-in function ones:

ones(...)
    ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) -> Tensor
    
    Returns a tensor filled with the scalar value `1`, with the shape defined
    by the variable argument :attr:`size`.
    
    Args:
        size (int...): a sequence of integers defining the shape of the output tensor.
            Can be a variable number of arguments or a collection like a list or tuple.
    
    Keyword arguments:
        out (Tensor, optional): the output tensor.
        dtype (:class:`torch.dtype`, optional): the desired data type of returned tensor.
            Default: if ``None``, uses a global default (see :func:`torch.set_default_tensor_type`).
        layout (:class:`torch.layout`, optional): the desired layout of returned Tensor.
            Default: ``torch.strided``.
        device (:class:`torch.device`, optional): the desired device of returned tensor.
            Default: if ``None``, uses the cur

In [175]:
list?