In [2]:
import numpy as np

## theory

Let's get the result in this easy case.

Suppose $z = xW = [W_{s1} ... W_{sD}]$ where $x$ is one-hot encoded and $x_s = 1$. Then we have $\frac{\partial J}{\partial W_{ij}} = \sum_k{\frac{\partial J}{\partial z_k} \frac{\partial z_k}{\partial W_{ij}}}$.

This is equal to $0$ if $s \neq i$. Also $\frac{\partial z_k}{\partial W_{sj}} = 0$ if $k \neq j$ and $1$ otherwise. So $\frac{\partial J}{\partial W_{sj}} = \frac{\partial J}{\partial z_j}$.

In other words: $\frac{\partial J}{\partial W} = \begin{bmatrix}0 & ... & 0\\ \frac{\partial J}{\partial z_1} & ... & \frac{\partial J}{\partial z_D} \\ 0 & ... & 0 \end{bmatrix}$.

So we just need to fill in $s^{th}$ row of our gradient with the upstream gradient. 

## implementation

In [3]:
np.random.seed(42)
V, D = 10, 2
x = [3]
W = np.random.randn(V, D)
dW = np.zeros_like(W)
dout = np.random.randn(1, D)

In [4]:
dout

array([[ 1.46564877, -0.2257763 ]])

In [5]:
np.add.at(dW, x, dout)

In [6]:
dW

array([[ 0.        ,  0.        ],
       [ 0.        ,  0.        ],
       [ 0.        ,  0.        ],
       [ 1.46564877, -0.2257763 ],
       [ 0.        ,  0.        ],
       [ 0.        ,  0.        ],
       [ 0.        ,  0.        ],
       [ 0.        ,  0.        ],
       [ 0.        ,  0.        ],
       [ 0.        ,  0.        ]])

This concludes our short analysis in this simple case.