Vector，Matrix and Tensor Derivatives

# 1 Simplify
Keep simple, and avoid doing to many thing at once.

## 1.1 Expanding notation into explict sums and equaltions for each component
Be useful to write out the explict formula for *a single scalar element* of the output in terms of nothing but *scalar variables*

**Example**  
Suppose we have a column vertor $\vec{y}$ of lenght C that is calculated by forming the product a matrix $W$ that is $C$ rows by D columns with a column vector $\vec{x}$ of length D:
$$\vec{y}=W\vec{x}$$ 
If we want to calculate the 3rd component of $\vec{y}$ with respect to the 7th component of $\vec{x}$:
$$\frac{\partial \vec{y_{3}}}{\partial \vec{x_{7}}}$$
The first thing to do is to write down the formula for computing $\vec{y_3}$
$$\vec{y_3}=\sum_{j=1}^{D}W_{3,j}\vec{x_j}$$

## 1.2 Removing the summation notation
We will make errors when differentating expression that contains summation notation $\sum$ or product notation $\prod$

$$\vec{y_3}=W_{3,1}\vec{x_1}+W_{3,2}\vec{x_2}+\ldots+W_{3,7}\vec{x_7}+\ldots+W_{3,D}\vec{x_{D}}$$
So:
$$\frac{\partial \vec{y_3}}{\partial \vec{x_7}}=\frac{\partial}{\partial \vec{x_7}}[W_{3,1}\vec{x_1}+W_{3,2}\vec{x_2}+\ldots+W_{3,7}\vec{x_7}+\ldots+W_{3,D}\vec{x_{D}}] =\frac{\partial}{\partial \vec{y}}[W_{3,7}\vec{x_7}]=W_{3,7}$$

## 1.3 The Jacobian matrix
Compute teh derivatives of each component of \vec{y} with respect to each component of \vec{x}, and we noted that there would be $C \times D$ of these.
$$
\begin{bmatrix}
\frac{\partial \vec{y_1}}{\partial \vec{x_1}} & \frac{\partial \vec{y_1}}{\partial \vec{x_2}} & \ldots & \frac{\partial \vec{y_1}}{\partial \vec{x_D}}  \\
\frac{\partial \vec{y_2}}{\partial \vec{x_1}} & \frac{\partial \vec{y_2}}{\partial \vec{x_2}} & \ldots & \frac{\partial \vec{y_2}}{\partial \vec{x_D}}  \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial \vec{y_C}}{\partial \vec{x_1}} & \frac{\partial \vec{y_C}}{\partial \vec{x_2}} & \ldots & \frac{\partial \vec{y_C}}{\partial \vec{x_D}}  \\
\end{bmatrix}
$$
this is called the *Jacobian matrix*. Thus, after all this work, we have concluded that for $$\vec{y}=W\vec{x}$$ We have $$\frac{\partial \vec{y}}{\partial \vec{x}} = W$$

# 2 Row vectors instead of Column vector
When working with different neural network packages to pay close attenation to the arrangement of weight matrics, data matrics, and so on. $X$ contains many different vectors, each of which represents an input, is each data vector a row or column of the data matrix $X$.

## 2.1 Example 2
Let $\vec{y}$ be a row vector with $C$ components computed by taking the product of another row vector $\vec{x}$ with D components and a matrix $W$ that is $D$ rows by $C$ columns
$$\vec{y}=\vec{x}W$$
In this case, you wiil see, by writing $$\vec{y_3}=\sum_{j=1}^{D}\vec{x_j}W_{j,3}$$
that
$$\frac{\partial \vec{y_3}}{\partial \vec{x_7}}=W_{7,3}$$
So
$$\frac{\partial \vec{y}}{\partial \vec{x}}=W$$

# 3 Dealing with more than two dimensions
Conider another closely related problem, that of computing
$$\frac{\partial \vec{y}}{\partial W}$$
\vec{y} varies along one coordinate whhile $W$ varies along two coordinates. Thus, the entire derivative is most naturally contained in a *three dimensional array*

## 3.1 Use scalar notation
$$\vec{y_3}=\vec{x_1}W_{1,3}+\vec{x_2}W_{2,3}+\ldots+\vec{x_D}W_{D,3}$$
In other word:
$$\frac{\partial \vec{y_3}}{\partial W_{7,8}}=0$$

## 3.2 Generality
In general, when the index of the $\vec{y}$ component is equal to the second index of the $W$, the derivative will be non-zore, but will be zero otherwise, We can write:
$$\frac{\partial \vec{y_j}}{\partial W_{i,j}}=\vec{x_i}$$

If we let $F$ represent the 3rd array representing the derivative of \vec{y} with respect to $W$, where $F_{i,j,k}=\frac{\partial \vec{y_i}}{\partial W_{j,k}}$
then 
$$F_{i,j,i}=\vec{x_j}$$ but all other entries of $F$ are zeros.
Finally, if we difine a new **two dimensional ** array G as
$$G_{i,j}=F_{i,j,i}$$

# 2 Multiple data points
Using multple examples of $\vec{x}$, stacked together to form a matrix $X$. Let us assume that each row represent individual $\vec{x}$ with length D. that X is a two-dimensional array with N rows and D columns, W, as in our last example, will be a matrix with D rows and C columns, Y will become N rows and C columns. Each row of Y will give a row vector associated with the corresponding row of the input X.
$$Y_{i,j}=\sum_{k=1}^{D}X_{i,k}W_{k,j}$$

if we let $Y_{i,:}$ be the ith row of $Y$ and let $X_{i,:}$ be the ith row of X, then we will see that $$\frac{\partial Y_{i,:}}{\partial X_{i,:}}=W$$