(ch_bcast_add)=
# 广播加法(`te`)

广播算子处理两个不同形状的张量。通常，其中一个操作数的特定维度为 1，该维度将沿着另一个算子的相应维度广播，以执行给定的计算。普通标量计算都可以广播，如基本算术和逻辑运算。{ref}`fig_bcast_add` 说明了两个二维张量之间的广播加法案例。Broadcast 算子在深度学习工作负载（workload）中很常见，如 [batch normalization](http://d2l.ai/chapter_convolutional-modern/batch-norm.html)。

(fig_bcast_add)=
```{figure} ../img/bcast_add.svg
二维张量间广播相加的一种情况
```

在本节中，将演示如何在两个二维张量之间进行广播相加。下面的代码定义了计算。

In [1]:
import numpy as np
import tvm
from tvm import te

# Save to the d2ltvm package.
def broadcast_add(shape1, shape2):
    """Broadcast add between two 2-dimensional tensors

    shape1, shape2 : the shapes of the input tensors
    """
    assert len(shape1) == 2 and len(shape2) == 2, \
        "broadcast tensors should both be 2-dimension"
    for i in range(len(shape1)):
        assert shape1[i] == shape2[i] or shape1[i] == 1 or shape2[i] == 1, \
            "tensor shapes do not fit for broadcasting"
    A = te.placeholder(shape1, name='A')
    B = te.placeholder(shape2, name='B')
    m = shape1[0] if shape2[0] == 1 else shape2[0]
    n = shape1[1] if shape2[1] == 1 else shape2[1]
    f = lambda x, y: A[0 if shape1[0]==1 else x, 0 if shape1[1]==1 else y] + \
        B[0 if shape2[0]==1 else x, 0 if shape2[1]==1 else y]
    C = te.compute((m, n), f, name='C')
    return A, B, C

然后使用它来执行广播加法：

In [5]:
m = 3
n = 4
shape1 = (m, 1)
shape2 = (m, n)
A, B, C = broadcast_add(shape1, shape2)
s = te.create_schedule(C.op)
m = tvm.lower(s, [A, B], simple_mode=True)
mod = tvm.build(m)
m["main"]

PrimFunc([A, B]) attrs={"from_legacy_te_schedule": (bool)1, "global_symbol": "main", "tir.noalias": (bool)1} {
  allocate C[float32 * 12], storage_scope = global
  for (x, 0, 3) {
    for (y, 0, 4) {
      let cse_var_1 = ((x*4) + y)
      C[cse_var_1] = (A[x] + B[cse_var_1])
    }
  }
}

打印的伪代码清楚地描述了广播相加的过程。验证结果如下。

In [3]:
# Save to the d2ltvm package.
def get_bcast_data(shape1, shape2, constructor=None):
    """Return random tensors a, b 
    and empty tensor c to store broadcast results between a and b

    shape1, shape2: shapes of input tensors
    constructor : user-defined tensor constructor
    """
    np.random.seed(0)
    a = np.random.normal(size=shape1).astype("float32")
    b = np.random.normal(size=shape2).astype("float32")
    out_shape = (shape1[0] if shape2[0] == 1 else shape2[0], 
                 shape1[1] if shape2[1] == 1 else shape2[1])
    c = np.empty(out_shape, dtype='float32')
    if constructor:
        a, b, c = [constructor(x) for x in (a, b, c)]
    return a, b, c
a, b, c = get_bcast_data(shape1, shape2, tvm.nd.array)
mod(a, b, c)
np.testing.assert_allclose(np.add(a.asnumpy(), b.asnumpy()), c.asnumpy(), atol=1e-5)

注意，广播是允许沿着多个维度执行的。

In [4]:
shape1 = (m, 1)
shape2 = (1, n)
A, B, C = broadcast_add(shape1, shape2)
s = te.create_schedule(C.op)
mod = tvm.build(s, [A, B, C])
a, b, c = get_bcast_data(shape1, shape2, tvm.nd.array)
mod(a, b, c)
np.testing.assert_allclose(np.add(a.asnumpy(), b.asnumpy()), c.asnumpy(), atol=1e-5)
print(a.shape, b.shape, c.shape)

(3, 1) (1, 4) (3, 4)


最后，很容易注意到，当两个输入张量的形状相同时，广播相加归约为元素级相加。

## 小结

- 可以在 TVM 中定义广播算子。
- 广播可以沿着多个维度进行。