<a href="https://colab.research.google.com/github/jinxiqinghuan/learn_pytorch/blob/main/%E9%BE%99%E6%9B%B2%E8%89%AF%E8%80%81%E5%B8%88%E8%AF%BE%E7%A8%8B%E7%AC%94%E8%AE%B0/05_Pytorch%E8%BF%9B%E9%98%B6%E6%95%99%E7%A8%8B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pytorch进阶教程

In [3]:
import torch

## 1.Broadcasting

**key idea:**
* insert i dim ahead
* expand dims with size 1 to same size
* feature maps: [4, 32, 14, 14]
* bias: [32, 1, 1]=> [1, 32, 1, 1] => [4, 32, 14, 14]

龙老师只有原理讲解，没有代码，一下代码来自：https://pytorch.org/docs/stable/notes/broadcasting.html

Two tensors are “broadcastable” if the following rules hold:


*   Each tensor has at least one dimension.

*   When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.





**For Example:**

1. same shapes are always broadcastable (i.e. the above rules always hold)


In [11]:
x = torch.empty(5, 7, 3)
y = torch.empty(5, 7, 3)
x.shape == y.shape

True

2. x and y are not broadcastable, because x does not have at least 1 dimension


In [17]:
x = torch.empty((0,))
x, x.shape

(tensor([]), torch.Size([0]))

In [19]:
y = torch.empty(2, 2)
y, y.shape

(tensor([[4.1494e-35, 0.0000e+00],
         [1.8578e-01, 3.9177e-02]]), torch.Size([2, 2]))

3. can line up trailing dimensions
x and y are broadcastable.

  1st trailing dimension: both have size 1

  2nd trailing dimension: y has size 1

  3rd trailing dimension: x size == y size

  4th trailing dimension: y dimension doesn't exist


In [29]:
x = torch.empty(5, 3, 4, 1)
y = torch.empty(  3, 1, 1)
x.shape, y.shape

(torch.Size([5, 3, 4, 1]), torch.Size([3, 1, 1]))

In [31]:
(x + y).shape

torch.Size([5, 3, 4, 1])

4. but:

  x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3

In [23]:
x = torch.empty(5, 2, 4, 1)
y = torch.empty(  3, 1, 1)

In [27]:
# x + y

If two tensors x, y are “broadcastable”, the resulting tensor size is calculated as follows:

* If the number of dimensions of x and y are not equal, prepend 1 to the dimensions of the tensor with fewer dimensions to make them equal length

* Then, for each dimension size, the resulting dimension size is the max of the sizes of x and y along that dimension.

**For example**


In [33]:
x = torch.empty(5, 1, 4, 1)
y = torch.empty(  3, 1, 1)
(x + y).shape

torch.Size([5, 3, 4, 1])

## 2.合并与分割

* cat
* stack
* split(按长度拆分)
* chunk(按数量拆分)

### cat

In [34]:
a = torch.rand(4, 32, 8) # 四个班级32名同学的八门课程的成绩单
a.shape

torch.Size([4, 32, 8])

In [35]:
b = torch.rand(5, 32, 8)
b.shape

torch.Size([5, 32, 8])

In [36]:
torch.cat([a, b], dim=0).shape

torch.Size([9, 32, 8])

In [41]:
a1 = torch.rand(4, 3, 32, 32)
a2 = torch.rand(5, 3, 32, 32)

In [42]:
torch.cat([a1, a2], dim=0).shape

torch.Size([9, 3, 32, 32])

In [44]:
a2 = torch.rand(4, 1, 32, 32)
a1.shape, a2.shape

(torch.Size([4, 3, 32, 32]), torch.Size([4, 1, 32, 32]))

In [45]:
torch.cat([a1, a2], dim=1).shape

torch.Size([4, 4, 32, 32])

In [48]:
a1 = torch.rand(4, 3, 16, 32)
a2 = torch.rand(4, 3, 16, 32)
torch.cat([a1, a2], dim=2).shape

torch.Size([4, 3, 32, 32])

###stack

In [49]:
torch.cat([a1, a2], dim=2).shape

torch.Size([4, 3, 32, 32])

In [51]:
torch.stack([a1, a2], dim=2).shape # 创建了一个新的维度

torch.Size([4, 3, 2, 16, 32])

假设这样一个场景，两位老师统计了两个班各32名同学的8门课成绩，想把他们合成一个表格，如果用cat,就无法区分这两个班级了，这个时候要使用stack（这个举例不是很合适，因为用stack两个班级的同学数必须相同）

In [53]:
a = torch.rand(32, 8)
b = torch.rand(32, 8)
torch.stack([a, b], dim=0).shape

torch.Size([2, 32, 8])

### split

In [54]:
b = torch.rand(32, 8)
a.shape

torch.Size([32, 8])

In [55]:
c = torch.stack([a, b], dim=0)
c.shape

torch.Size([2, 32, 8])

In [56]:
aa, bb = c.split([1, 1], dim=0)
aa.shape, bb.shape

(torch.Size([1, 32, 8]), torch.Size([1, 32, 8]))

In [57]:
aa, bb = c.split(1, dim=0) # 拆分成N块，每块长度为1
aa.shape, bb.shape

(torch.Size([1, 32, 8]), torch.Size([1, 32, 8]))

In [59]:
# aa, bb = c.split(2, dim=0) # ValueError: not enough values to unpack(expected 2, got 1)

### chunk

In [60]:
b = torch.rand(32, 8)
a.shape

torch.Size([32, 8])

In [61]:
c = torch.stack([a, b], dim=0)
c.shape

torch.Size([2, 32, 8])

In [65]:
aa, bb = c.chunk(2, dim=0)
aa.shape, bb.shape

(torch.Size([1, 32, 8]), torch.Size([1, 32, 8]))

## 3.数学运算

### 加减乘除

In [67]:
a = torch.rand(3, 4)
b = torch.rand(4)
a, b

(tensor([[0.0959, 0.6001, 0.4030, 0.7851],
         [0.0390, 0.4698, 0.0845, 0.8910],
         [0.8941, 0.9692, 0.9845, 0.0893]]),
 tensor([0.6872, 0.2317, 0.7131, 0.6929]))

In [68]:
a + b

tensor([[0.7832, 0.8318, 1.1161, 1.4780],
        [0.7263, 0.7015, 0.7976, 1.5840],
        [1.5813, 1.2009, 1.6976, 0.7822]])

In [69]:
torch.add(a, b)

tensor([[0.7832, 0.8318, 1.1161, 1.4780],
        [0.7263, 0.7015, 0.7976, 1.5840],
        [1.5813, 1.2009, 1.6976, 0.7822]])

In [70]:
torch.all(torch.eq(a-b, torch.sub(a, b)))

tensor(True)

In [72]:
torch.all(torch.eq(a*b, torch.mul(a, b)))

tensor(True)

In [73]:
torch.all(torch.eq(a/b, torch.div(a, b)))

tensor(True)

### 矩阵相乘matmul

In [79]:
b = torch.ones(2, 2)
b

tensor([[1., 1.],
        [1., 1.]])

In [80]:
a = b*3
a

tensor([[3., 3.],
        [3., 3.]])

In [81]:
torch.mm(a, b)

tensor([[6., 6.],
        [6., 6.]])

In [82]:
torch.matmul(a, b)

tensor([[6., 6.],
        [6., 6.]])

In [83]:
a@b

tensor([[6., 6.],
        [6., 6.]])

In [84]:
a = torch.rand(4, 784)
x = torch.rand(4, 784)
w = torch.rand(512, 784) # (out, in)

In [90]:
(x@w.t()).shape #如果是高纬的使用transpose

torch.Size([4, 512])

高纬矩阵相乘

In [91]:
a = torch.rand(4, 3, 28, 64)
b = torch.rand(4, 3, 64, 32)

In [94]:
# torch.mm(a, b).shape
torch.matmul(a, b).shape

torch.Size([4, 3, 28, 32])

In [96]:
b = torch.rand(4, 1, 64, 32)

In [97]:
torch.matmul(a, b).shape

torch.Size([4, 3, 28, 32])

In [100]:
# b = torch.rand(4, 64, 32)
# torch.matmul(a, b).shape

### power

In [101]:
a = torch.full([2, 3], 3)
a

tensor([[3, 3, 3],
        [3, 3, 3]])

In [102]:
a.pow(2)

tensor([[9, 9, 9],
        [9, 9, 9]])

In [103]:
a**2

tensor([[9, 9, 9],
        [9, 9, 9]])

In [107]:
aa = a**2
aa

tensor([[9, 9, 9],
        [9, 9, 9]])

In [114]:
aa.type()

'torch.LongTensor'

In [119]:
# aa.sqrt()

In [121]:
# aa.rsqrt()

In [123]:
aa ** 0.5

tensor([[3., 3., 3.],
        [3., 3., 3.]])

### exp and log

In [124]:
a = torch.exp(torch.ones(2, 2))
a

tensor([[2.7183, 2.7183],
        [2.7183, 2.7183]])

In [125]:
torch.log(a)

tensor([[1., 1.],
        [1., 1.]])

### 近似值 Approximation
* floor().ceil()
* round()
* trunc().frac()

In [126]:
a = torch.tensor(3.14)
a

tensor(3.1400)

In [127]:
a.floor(), a.ceil(), a.trunc(), a.frac()

(tensor(3.), tensor(4.), tensor(3.), tensor(0.1400))

In [128]:
a = torch.tensor(3.499)
a.round()

tensor(3.)

In [130]:
a = torch.tensor(3.5)
a.round()

tensor(4.)

### clamp
裁剪功能，梯度离散和梯度爆炸

train不稳定时，打印梯度的模

w.grad.norm(2)

In [138]:
grad = torch.rand(2, 3)*15
grad.max()

tensor(14.3433)

In [139]:
grad.median()

tensor(8.4997)

In [140]:
grad.clamp(10) # 小于10的赋值为10

tensor([[10.0000, 13.0912, 14.3433],
        [10.0000, 10.0000, 12.2009]])

In [136]:
grad

tensor([[ 7.2440,  6.6741, 11.2324],
        [ 9.2214,  0.7991,  4.2448]])

In [137]:
grad.clamp(0, 10) # 0到10之间

tensor([[ 7.2440,  6.6741, 10.0000],
        [ 9.2214,  0.7991,  4.2448]])

## 4.统计属性
* norm 
* mean sum
* prod
* max, min, argmin, argmax(最大最小值的位置)
* kthvalue, topk

### norm

In [157]:
torch.set_default_tensor_type(torch.DoubleTensor)

In [158]:
a = torch.full([8], 1).double()
a

tensor([1., 1., 1., 1., 1., 1., 1., 1.])

In [159]:
b = a.view(2, 4).double()
b

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [160]:
c = a.view(2, 2, 2).double()
c

tensor([[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]])

In [161]:
a.norm(1), b.norm(1), c.norm(1)

(tensor(8.), tensor(8.), tensor(8.))

In [162]:
a.norm(2), b.norm(2), c.norm(2)

(tensor(2.8284), tensor(2.8284), tensor(2.8284))

In [164]:
b.norm(1, dim=1)

tensor([4., 4.])

In [165]:
b.norm(2, dim=1)

tensor([2., 2.])

In [166]:
c.norm(2,dim=0)

tensor([[1.4142, 1.4142],
        [1.4142, 1.4142]])

In [167]:
c.norm(2,dim=0)

tensor([[1.4142, 1.4142],
        [1.4142, 1.4142]])

### mean, sum, min, max, prod

In [169]:
a = torch.arange(8).view(2, 4).float()
a

tensor([[0., 1., 2., 3.],
        [4., 5., 6., 7.]], dtype=torch.float32)

In [171]:
a.min(), a.max(), a.mean(), a.prod() # prod()累乘

(tensor(0., dtype=torch.float32),
 tensor(7., dtype=torch.float32),
 tensor(3.5000, dtype=torch.float32),
 tensor(0., dtype=torch.float32))

In [172]:
a.sum()

tensor(28., dtype=torch.float32)

In [173]:
a.argmax(), a.argmin() # 打平以后的索引

(tensor(7), tensor(0))

In [177]:
a[1,1]

tensor(5., dtype=torch.float32)

In [179]:
a = torch.randn(4, 10)
a[0]

tensor([-0.8184,  0.0473,  0.1040,  1.5820, -0.4505,  0.1631,  1.2550,  0.1290,
        -1.5484, -0.7438])

In [180]:
a.argmax()

tensor(3)

In [182]:
a.argmax(dim=1) # 每一行的最大值

tensor([3, 0, 0, 7])

### dim, keepdim

In [183]:
a

tensor([[-0.8184,  0.0473,  0.1040,  1.5820, -0.4505,  0.1631,  1.2550,  0.1290,
         -1.5484, -0.7438],
        [ 1.2569,  0.5309, -0.3137, -0.0231, -0.9398,  0.4437, -2.1725,  0.3998,
          0.2672,  0.3625],
        [ 0.8056, -0.4072, -0.3879,  0.4357,  0.3430, -0.0866, -0.5303, -0.7454,
          0.0940, -0.6318],
        [-0.5242,  0.4509, -0.4646,  0.7723, -0.3870, -1.1821, -0.0316,  0.8623,
          0.2936, -0.0057]])

In [188]:
a.max(dim=1)

torch.return_types.max(values=tensor([1.5820, 1.2569, 0.8056, 0.8623]), indices=tensor([3, 0, 0, 7]))

In [189]:
a.max(dim=1)[1]

tensor([3, 0, 0, 7])

In [185]:
a.argmax(dim=1)

tensor([3, 0, 0, 7])

In [190]:
a.max(dim=1, keepdim=True)

torch.return_types.max(values=tensor([[1.5820],
        [1.2569],
        [0.8056],
        [0.8623]]), indices=tensor([[3],
        [0],
        [0],
        [7]]))

In [191]:
a.argmax(dim=1, keepdim=True)

tensor([[3],
        [0],
        [0],
        [7]])

### top-k or k-th

In [199]:
a

tensor([[-0.8184,  0.0473,  0.1040,  1.5820, -0.4505,  0.1631,  1.2550,  0.1290,
         -1.5484, -0.7438],
        [ 1.2569,  0.5309, -0.3137, -0.0231, -0.9398,  0.4437, -2.1725,  0.3998,
          0.2672,  0.3625],
        [ 0.8056, -0.4072, -0.3879,  0.4357,  0.3430, -0.0866, -0.5303, -0.7454,
          0.0940, -0.6318],
        [-0.5242,  0.4509, -0.4646,  0.7723, -0.3870, -1.1821, -0.0316,  0.8623,
          0.2936, -0.0057]])

In [196]:
a.topk(3, dim=1)

torch.return_types.topk(values=tensor([[1.5820, 1.2550, 0.1631],
        [1.2569, 0.5309, 0.4437],
        [0.8056, 0.4357, 0.3430],
        [0.8623, 0.7723, 0.4509]]), indices=tensor([[3, 6, 5],
        [0, 1, 5],
        [0, 3, 4],
        [7, 3, 1]]))

In [197]:
a.topk(3, dim=1, largest=False)

torch.return_types.topk(values=tensor([[-1.5484, -0.8184, -0.7438],
        [-2.1725, -0.9398, -0.3137],
        [-0.7454, -0.6318, -0.5303],
        [-1.1821, -0.5242, -0.4646]]), indices=tensor([[8, 0, 9],
        [6, 4, 2],
        [7, 9, 6],
        [5, 0, 2]]))

In [198]:
a.kthvalue(8, dim=1) # 第k小

torch.return_types.kthvalue(values=tensor([0.1631, 0.4437, 0.3430, 0.4509]), indices=tensor([5, 5, 4, 1]))

In [200]:
a.kthvalue(3)

torch.return_types.kthvalue(values=tensor([-0.7438, -0.3137, -0.5303, -0.4646]), indices=tensor([9, 2, 6, 2]))

In [201]:
a.kthvalue(3, dim=1)

torch.return_types.kthvalue(values=tensor([-0.7438, -0.3137, -0.5303, -0.4646]), indices=tensor([9, 2, 6, 2]))

### compare

In [202]:
a>0

tensor([[False,  True,  True,  True, False,  True,  True,  True, False, False],
        [ True,  True, False, False, False,  True, False,  True,  True,  True],
        [ True, False, False,  True,  True, False, False, False,  True, False],
        [False,  True, False,  True, False, False, False,  True,  True, False]])

In [203]:
torch.gt(a, 0)

tensor([[False,  True,  True,  True, False,  True,  True,  True, False, False],
        [ True,  True, False, False, False,  True, False,  True,  True,  True],
        [ True, False, False,  True,  True, False, False, False,  True, False],
        [False,  True, False,  True, False, False, False,  True,  True, False]])

In [204]:
a != 0

tensor([[True, True, True, True, True, True, True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True]])

In [205]:
a = torch.ones(2, 3)
b = torch.randn(2, 3)
torch.eq(a, b)

tensor([[False, False, False],
        [False, False, False]])

In [206]:
torch.eq(a, a)

tensor([[True, True, True],
        [True, True, True]])

In [207]:
torch.equal(a, a)

True

## 5.高阶操作

* where
* gather

### where

In [222]:
cond = torch.tensor([[0.6, 0.7], 
        [0.8, 0.4]]).double()
cond

tensor([[0.6000, 0.7000],
        [0.8000, 0.4000]])

In [223]:
import numpy as np

a = torch.tensor([[0, 0],
     [0, 0]]).double()
b = torch.tensor([[1, 1],
     [1, 1]]).double()
a.type()


'torch.DoubleTensor'

In [225]:
torch.where(cond>0.5, a, b)

tensor([[0., 0.],
        [0., 1.]])

### gather
查表

torch.gather(input, dim, index, out=None) -> Tensor

In [241]:
prob = torch.randn(4, 10)
prob

tensor([[ 0.8064,  1.7472,  2.3267,  0.2556,  0.7068, -1.3559, -0.7067, -0.9383,
          0.6252,  1.0672],
        [-0.9990,  2.9914, -0.3745, -0.7212, -0.9276, -2.1059, -0.5704, -0.7625,
         -0.4164, -1.2965],
        [-0.3657,  2.7413, -0.0811,  0.8845, -0.3860,  0.4786, -1.9337, -0.9188,
         -0.5971, -1.0415],
        [-0.6080,  1.0508, -1.1711, -0.6335, -0.4743, -0.5340,  1.1351,  1.3671,
         -1.0961, -0.7392]])

In [242]:
idx = prob.topk(dim=1, k=3)
idx

torch.return_types.topk(values=tensor([[ 2.3267,  1.7472,  1.0672],
        [ 2.9914, -0.3745, -0.4164],
        [ 2.7413,  0.8845,  0.4786],
        [ 1.3671,  1.1351,  1.0508]]), indices=tensor([[2, 1, 9],
        [1, 2, 8],
        [1, 3, 5],
        [7, 6, 1]]))

In [243]:
idx = idx[1]
idx

tensor([[2, 1, 9],
        [1, 2, 8],
        [1, 3, 5],
        [7, 6, 1]])

In [244]:
label = torch.arange(10) + 100
label

tensor([100, 101, 102, 103, 104, 105, 106, 107, 108, 109])

In [245]:
torch.gather(label.expand(4, 10), dim=1, index=idx.long())

tensor([[102, 101, 109],
        [101, 102, 108],
        [101, 103, 105],
        [107, 106, 101]])