In [1]:
import torch
import numpy as np

In [2]:
%config Completer.use_jedi = False

In [6]:
!ls

Pytorch-multiplications.ipynb [1m[36massets[m[m
README.md


## PyTorch各种乘法操作的使用

https://zhuanlan.zhihu.com/p/100069938
*****


### 1. 二维矩阵乘法 torch.mm( )

`torch.mm(mat1, mat2, out=None)`，其中mat(nxm),mat(mxd)，输出out的维度是(nxd)。该函数只支持二维矩阵的矩阵乘法相乘，不支持broadcast操作。

In [40]:
a  = torch.randn(3, 4)
b = torch.randn(4, 5)

In [41]:
torch.mm(a, b).shape

torch.Size([3, 5])

In [42]:
out = torch.mm(a, b)

In [43]:
ein_out = torch.einsum('ij,jk->ik',[a, b])

In [45]:
out - ein_out

tensor([[-1.1921e-07,  0.0000e+00,  0.0000e+00,  0.0000e+00,  4.4703e-08],
        [-2.3842e-07,  0.0000e+00, -2.3842e-07,  1.1921e-07,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  1.1921e-07,  1.1921e-07]])

### 2.三维带Batch矩阵乘法 torch.bmm()

输入是三维带batch矩阵。这个操作相比于torch.mm多了一个batch维度。torch.bmm(bmat1, bmat2, out=None)，其中bmat1 ($B \times n \times m$)， bmat2 $B \times m \times d$。输出的矩阵是 $B \times n \times d$。该输入的两个矩阵的第一个batch维度必须相同，不支持broadcast操作。

In [46]:
a = torch.randn(3,4,5)
b = torch.randn(3,5,6)

In [47]:
out = torch.bmm(a, b)

In [48]:
out.shape

torch.Size([3, 4, 6])

In [49]:
ein_out = torch.einsum('bmn,bno->bmo',[a,b])

In [50]:
out - ein_out

tensor([[[0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0., 0.]]])

### 3. 混合矩阵乘法 torch.matmul()
https://pytorch.org/docs/stable/generated/torch.matmul.html#torch.matmul

`torch.matmul(input, other, *, out=None) → Tensor`。支持broadcast操作，使用起来比较复杂。

  The behavior depends on the dimensionality of the tensors as follows:

 - If both tensors are 1-dimensional, the dot product (scalar) is returned.
     - 因而要求两个vector的长度相同  

 - If both arguments are 2-dimensional, the matrix-matrix product is returned.
     - 这种情况和`torch.mm`相同

 - If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.
     - 这种情况相当于vector和matrix的每个vector相乘(内积),因而要求vector长度是m的话，则矩阵的大小是 
     $m\times n$，vector和每一列内积，得到一个长度为`n`的vector

 - If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned.
     - 这种情况和上面的情况不同。上面是vector和矩阵的每一列做内积。这种情况是和矩阵的每一个行做内积。假设矩阵的大小为$m \times n$，则vector的长度为n，最后得到长度为`m`的vector。
     - 这种情况和  **torch.mv**是等价的。https://pytorch.org/docs/stable/generated/torch.mv.html


 - If both arguments are at least 1-dimensional and at least one argument is N-dimensional (where N > 2), then a batched matrix multiply is returned. 
     - If the first argument is 1-dimensional, a 1 is prepended to its dimension for the purpose of the batched matrix multiply and removed after. If the second argument is 1-dimensional, a 1 is appended to its dimension for the purpose of the batched matrix multiple and removed after. The non-matrix (i.e. batch) dimensions are broadcasted (and thus must be broadcastable). For example, if input is a $(j \times 1 \times n \times n)$ tensor and other is a $(k \times n \times n)$ tensor, out will be a $(j \times k \times n \times n)$ tensor.
         - 解释: 默认后面的两个维度的是矩阵相乘，因而前面的维度被看成batch维度，因而后面的tensor可以进行broadcast成  $( 1\times k \times n \times n)$。后面的两个$n\times n$的方阵还是$n\times n$，前面的相乘得到$j \times k$。因而最后结果Tensor的形状是$(j \times k \times n \times n)$.
         
  Note that the broadcasting logic only looks at the batch dimensions when determining if the inputs are broadcastable, and not the matrix dimensions. For example, if input is a $(j \times 1 \times n \times m)$ tensor and other is a $(k \times m \times p)$ tensor, these inputs are valid for broadcasting even though the final two dimensions (i.e. the matrix dimensions) are different. out will be a $(j \times k \times n \times p)$ tensor.


特别 ，针对多维数据 `torch.matmul()`乘法，我们可以认为该 `torch.matmul()`乘法使用使用两个参数的后两个维度来计算，其他的维度都可以认为是batch维度。假设两个输入的维度分别是$1000\times 500 \times 99 \times 11$, $500 \times 11 \times 99$，那么我们可以认为乘法首先是进行后两位矩阵乘法得到$(99\times 11)\times (11\times 99)$ ，然后分析两个参数的batch size分别是 $1000\times 500$ 和 $500$ , 可以广播成为 $(1000\times 500)$， 因此最终输出的维度是 $(1000\times 500\times 99\times 99)$。


In [51]:
# 情况1

x = torch.randn(3)
y = torch.randn(3)

out = torch.matmul(x,y)
out.shape

torch.Size([])

In [53]:
ein_out = torch.einsum('i,i->',[x,y])

In [54]:
ein_out - out

tensor(0.)

In [21]:
# 情况2
x = torch.randn(3,4)
y = torch.randn(4,5)
torch.matmul(x, y)

tensor([[-0.0080, -3.1086, -2.6243, -0.9612,  2.7958],
        [ 2.6126, -1.8866, -2.6951, -3.7642,  3.1455],
        [ 1.4079,  1.1407, -0.5208, -1.3218,  0.1834]])

In [22]:
 # 等价于torch.mm
torch.mm(x, y)

tensor([[-0.0080, -3.1086, -2.6243, -0.9612,  2.7958],
        [ 2.6126, -1.8866, -2.6951, -3.7642,  3.1455],
        [ 1.4079,  1.1407, -0.5208, -1.3218,  0.1834]])

In [58]:
# 情况2
x = torch.arange(3)
y = torch.arange(12).reshape(3,4)

out = torch.matmul(x,y)
out.shape

torch.Size([4])

In [59]:
x

tensor([0, 1, 2])

In [57]:
y

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [60]:
ein_out = torch.einsum('i,ij->j',[x,y])

In [61]:
ein_out - out

tensor([0, 0, 0, 0])

In [63]:
# 情况3
x = torch.arange(4)
y = torch.arange(12).reshape(3,4)


out = torch.matmul(y,x)
out

tensor([14, 38, 62])

In [64]:
x

tensor([0, 1, 2, 3])

In [65]:
y

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])

In [67]:
ein_out = torch.einsum('j,ij->i', [x,y])
ein_out

tensor([14, 38, 62])

In [32]:

# 情况5包含的情况较多

In [68]:
# batched matrix x broadcasted vector
x = torch.randn(10,3,4)
y = torch.randn(4)
out = torch.matmul(x,y)
out.shape

torch.Size([10, 3])

In [69]:
ein_out = torch.einsum('bmn,n->bm',[x,y])
ein_out - out

tensor([[ 0.0000e+00,  4.7684e-07,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00, -4.7684e-07],
        [ 0.0000e+00,  4.7684e-07,  0.0000e+00],
        [-1.1921e-07,  1.1921e-07,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00]])

In [70]:
# batched matrix x batched matrix
x = torch.randn(10,3,4)
y = torch.randn(10,4,5)
out = torch.matmul(x,y)
out.shape

torch.Size([10, 3, 5])

In [71]:
ein_out = torch.einsum('bmn,bno->bmo',[x,y])
ein_out - out

tensor([[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]])

In [72]:
# batched matrix x broadcasted matrix
x = torch.randn(10,3,4)
y= torch.randn(4,5)
out = torch.matmul(x,y)
out.shape

torch.Size([10, 3, 5])

In [73]:
ein_out = torch.einsum('bmn,no->bmo',[x,y])

In [74]:
ein_out - out

tensor([[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]])

### 4. 矩阵逐元素(Element-wise)乘法 torch.mul

`torch.mul(mat1, other)`

- 如果other是数，那么得到结果是mat1每一个与该数相乘
- 如果是other是矩阵，那么other与mat1满足bbroadcast的条件，则可以逐元素相乘


In [75]:
x = torch.randn(2,3,4)
y = torch.randn(3,4)
out = torch.mul(x,y)
out.shape

torch.Size([2, 3, 4])

In [76]:
ein_out = torch.einsum('bmn,mn->bmn', [x,y])
ein_out - out

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])

### 5.两个运算符号`@`与`*`

 - `@` 矩阵乘法运算符号，类似于torch.mm torch.bmm和torch.matmul
 - `*` element-wise矩阵相乘，类似于torch.mul

### 6.向量点乘/内积，得到scalar

`torch.dot`两个向量内积，得到一个数

In [77]:
a = torch.Tensor([1,2,3])
b = torch.Tensor([4,5,6])

out = torch.dot(a,b)
out.shape

torch.Size([])

In [78]:
ein_out = torch.einsum('i,i->',[a,b])
ein_out - out

tensor(0.)

### 7.torch.tensor_dot

### 有关tensor的维度

x = torch.randn(3,4)

- 数据的维度包含：个数和大小
- Tensor x的维度/axis个数是2，大小是(3,4)


### 广播的机制

广播的执行过程：

1.如果维度个数不同，则在维度较少的左边补1，使得维度的个数相同。并且共同的维度的大小相同。

2.各维度的维度大小不同时，如果有维度为1的，直接将该维拉伸至维度相同。