<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Note about the `out_grad` parameter

The `out_grad` parameter refers to the gradient of the loss function with respect to the output of the node. Multiplying this with the local gradient gives the gradient of the loss with respect to the input to the node, according to the chain rule of calculus, which is the basis for backpropagation in neural networks.

The chain rule is a fundamental concept in calculus that provides a method to compute the derivative of composite functions. In simple terms, the chain rule states that the derivative of a composite function is the derivative of the outer function multiplied by the derivative of the inner function.

Given a composite function that is the composition of two functions, say, $f(g(x))$, the chain rule can be stated as follows:

$$\frac{df}{dx} = \frac{df}{dg} \cdot \frac{dg}{dx}$$

Where:

- $\frac{df}{dx}$ is the derivative of the composite function $f(g(x))$ with respect to $x$,
- $\frac{df}{dg}$ is the derivative of the outer function $f$ with respect to its argument $g(x)$, and
- $\frac{dg}{dx}$ is the derivative of the inner function $g(x)$ with respect to $x$.

The chain rule can be extended to the case where we have more than two composite functions.

## Element Wise Addition

Let's walk through the step-by-step derivative calculation for the [`EWiseAdd`](https://m0saan.github.io/minima/operators.html#ewiseadd) operation:

We have the function `f(a, b) = a + b`, where `a` and `b` are tensors. Our goal is to compute the partial derivatives with respect to `a` and `b`.

Let's start by calculating the derivative of `f` with respect to `a`, denoted as `df/da`:

Step 1: Compute the derivative of `f` with respect to `a`.

$\frac{{\partial f}}{{\partial a}} = \frac{{\partial}}{{\partial a}} (a + b)$

Since `a` is the variable we are differentiating with respect to, the derivative of `a` with respect to itself is 1:

$$\frac{{\partial f}}{{\partial a}} = 1$$

Therefore, $$\frac{{\partial f}}{{\partial a}} = 1.$$

Step 2: Compute the derivative of `f` with respect to `b`.

$$\frac{{\partial f}}{{\partial b}} = \frac{{\partial}}{{\partial b}} (a + b)$$

Again, since `b` is the variable we are differentiating with respect to, the derivative of `b` with respect to itself is 1:

$$\frac{{\partial f}}{{\partial b}} = 1$$

Therefore, $$\frac{{\partial f}}{{\partial b}} = 1$$

Hence, the partial derivatives of `f(a, b) = a + b` with respect to `a` and `b` are both equal to 1.

In [1]:
#| echo: false
#| output: asis
show_doc(add)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L63){target="_blank" style="float:right; font-size:smaller"}

### add

>      add (a:minima.autograd.Tensor, b:minima.autograd.Tensor)

Adds two tensors element-wise.

Args:
- a: The first tensor.
- b: The second tensor.

Returns:
The element-wise sum of a and b.

In [2]:
#| echo: false
#| output: asis
show_doc(EWiseAdd)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L24){target="_blank" style="float:right; font-size:smaller"}

### EWiseAdd

>      EWiseAdd ()

Performs element-wise addition of two tensors.

Example:
>>> a = Tensor([1, 2, 3])
>>> b = Tensor([4, 5, 6])
>>> op = EWiseAdd()
>>> result = op.compute(a, b)
>>> print(result)
Tensor([5, 7, 9])

Create two 1-D tensors

In [4]:
a = Tensor([1, 2, 3])
b = Tensor([4, 5, 6])

Create an EWiseAdd operation instance

In [5]:
op = EWiseAdd()

Compute the element-wise sum of a and b

In [6]:
result = op.compute(a, b)
result

minima.Tensor([5 7 9])

Alternatively, you can use the add function directly

In [7]:
result = add(a, b)
result

minima.Tensor([5 7 9])

or

In [8]:
op(a,b)

minima.Tensor([5 7 9])

For 2-D tensors, we can compute the element-wise sum of a and b in the same way

In [9]:
a = Tensor([[1, 2, 3], [4, 5, 6]])
b = Tensor([[7, 8, 9], [10, 11, 12]])

result = op.compute(a, b)
result

minima.Tensor([[ 8 10 12]
 [14 16 18]])

## Scalar Addition

Explanation for the derivative of the [`AddScalar`](https://m0saan.github.io/minima/operators.html#addscalar) operator:

Let's denote the scalar as `c` and `a` as the tensor being added by the scalar. The operation can be described as `f(a) = a + c`.

The function for the backward pass (i.e., the gradient) is `df/da = 1`, which means the derivative of `f(a)` with respect to `a` is simply `1`.

We are given a function $f(a) = a + c$, where $a$ is a tensor and $c$ is a scalar. Our task is to find the derivative of this function with respect to $a$.

By differentiating the function $f(a)$ with respect to $a$, we find:

\begin{align*}
\frac{df}{da} &= \frac{d}{da} (a + c) \\
&= 1
\end{align*}

Therefore, the gradient of $f(a)$ with respect to $a$ is $1$.


We starts by defining the function `f(a) = a + c`. It then explains that when we differentiate `f(a)` with respect to `a`, we find that the derivative is `1`. This means that the gradient of `f(a)` with respect to `a` is `1`, which matches the behavior of the [`AddScalar`](https://m0saan.github.io/minima/operators.html#addscalar) operator as provided in the `gradient` method.

In [3]:
#| echo: false
#| output: asis
show_doc(add_scalar)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L122){target="_blank" style="float:right; font-size:smaller"}

### add_scalar

>      add_scalar (a:minima.autograd.Tensor, scalar:Union[int,float])

Adds a scalar to a tensor.

Args:
- a: The tensor.
- scalar: The scalar to add.

Returns:
The sum of a and the scalar.

In [4]:
#| echo: false
#| output: asis
show_doc(AddScalar)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L77){target="_blank" style="float:right; font-size:smaller"}

### AddScalar

>      AddScalar (scalar:Union[int,float])

Performs addition of a tensor and a scalar.

Example:
>>> a = Tensor([1, 2, 3])
>>> op = AddScalar(5)
>>> result = op.compute(a)
>>> print(result)
Tensor([6, 7, 8])

## Element Wise Multiplication

Explanation for the derivative of the [`EWiseMul`](https://m0saan.github.io/minima/operators.html#ewisemul) (element-wise multiplication) operator:

Let's denote the two input tensors as `a` and `b`. The operation can be described as `f(a, b) = a * b`, where `*` represents element-wise multiplication.

The function for the backward pass (i.e., the gradient) is `df/da = b` and `df/db = a`. This means that the derivative of `f(a, b)` with respect to `a` is `b`, and the derivative with respect to `b` is `a`.


We are given a function $f(a, b) = a \odot b$, where $a$ and $b$ are tensors, and $\odot$ represents element-wise multiplication. Our task is to find the derivatives of this function with respect to $a$ and $b$.

By differentiating the function $f(a, b)$ with respect to $a$, we find:

\begin{align*}
\frac{df}{da} &= \frac{d}{da} (a \odot b) \\
&= b
\end{align*}

Therefore, the gradient of $f(a, b)$ with respect to $a$ is $b$.

Similarly, by differentiating the function $f(a, b)$ with respect to $b$, we find:

\begin{align*}
\frac{df}{db} &= \frac{d}{db} (a \odot b) \\
&= a
\end{align*}

Therefore, the gradient of $f(a, b)$ with respect to $b$ is $a$.

In [5]:
#| echo: false
#| output: asis
show_doc(multiply)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L175){target="_blank" style="float:right; font-size:smaller"}

### multiply

>      multiply (a:minima.autograd.Tensor, b:minima.autograd.Tensor)

Multiplies two tensors element-wise.

Args:
- a: The first tensor.
- b: The second tensor.

Returns:
The element-wise product of a and b.

In [6]:
#| echo: false
#| output: asis
show_doc(EWiseMul)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L136){target="_blank" style="float:right; font-size:smaller"}

### EWiseMul

>      EWiseMul ()

Performs element-wise multiplication of two tensors.

Example:
>>> a = Tensor([1, 2, 3])
>>> b = Tensor([4, 5, 6])
>>> op = EWiseMul()
>>> result = op.compute(a, b)
>>> print(result)
Tensor([4, 10, 18])

## Scalar Multiplication

Let's denote the scalar as `c` and `a` as the tensor being multiplied by the scalar. The operation can be described as `f(a) = a * c`.

The function for the backward pass (i.e., the gradient) is `df/da = c`, which means the derivative of `f(a)` with respect to `a` is `c`.

The LaTeX document will look as follows:

We are given a function $f(a) = a \cdot c$, where $a$ is a tensor and $c$ is a scalar. Our task is to find the derivative of this function with respect to $a$.

By differentiating the function $f(a)$ with respect to $a$, we find:

\begin{align*}
\frac{df}{da} &= \frac{d}{da} (a \cdot c) \\
&= c
\end{align*}

Therefore, the gradient of $f(a)$ with respect to $a$ is $c$.

We starts by defining the function `f(a) = a * c`. It then explains that when we differentiate `f(a)` with respect to `a`, we find that the derivative is `c`. This means that the gradient of `f(a)` with respect to `a` is `c`, which matches the behavior of the [`MulScalar`](https://m0saan.github.io/minima/operators.html#mulscalar) operator as provided in the `gradient` method.

In [7]:
#| echo: false
#| output: asis
show_doc(mul_scalar)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L234){target="_blank" style="float:right; font-size:smaller"}

### mul_scalar

>      mul_scalar (a:minima.autograd.Tensor, scalar:Union[int,float])

Multiplies a tensor by a scalar.

Args:
- a: The tensor.
- scalar: The scalar to multiply.

Returns:
The product of a and the scalar.

In [8]:
#| echo: false
#| output: asis
show_doc(MulScalar)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L189){target="_blank" style="float:right; font-size:smaller"}

### MulScalar

>      MulScalar (scalar:Union[int,float])

Performs multiplication of a tensor and a scalar.

Example:
>>> a = Tensor([1, 2, 3])
>>> op = MulScalar(5)
>>> result = op.compute(a)
>>> print(result)
Tensor([5, 10, 15])

## Element Wise Divide

The operation described here is an element-wise division of two tensors, `a` and `b`, where the operation can be described as `f(a, b) = a / b`. 

We'll compute the partial derivatives with respect to `a` and `b`:

1. The partial derivative of `f(a, b)` with respect to `a` (`df/da`) is `1/b`.

2. The partial derivative of `f(a, b)` with respect to `b` (`df/db`) is `-a / b^2`.

We are given a function $f(a, b) = \frac{a}{b}$, where $a$ and $b$ are tensors. Our task is to find the partial derivatives of this function with respect to $a$ and $b$.

Let's start with $\frac{\partial f}{\partial a}$:

\begin{align*}
\frac{\partial f}{\partial a} &= \frac{\partial}{\partial a} \left(\frac{a}{b}\right) \\
&= \frac{1}{b}
\end{align*}

Now, let's compute $\frac{\partial f}{\partial b}$:

\begin{align*}
\frac{\partial f}{\partial b} &= \frac{\partial}{\partial b} \left(\frac{a}{b}\right) \\
&= - \frac{a}{b^{2}}
\end{align*}

Here is a detailed derivative:

Given a function of the form $y = \frac{u}{v}$, where both $u$ and $v$ are functions of $x$, the quotient rule of differentiation states:

$$\frac{dy}{dx} = \frac{v \cdot \frac{du}{dx} - u \cdot \frac{dv}{dx}}{v^2}$$

In our case, we're looking at the function $y = \frac{a}{b}$, where $a$ and $b$ are tensors. We want to find the derivative with respect to $b$ (instead of $x$ in our general formula). So we have:

$$\frac{dy}{db} = \frac{b \cdot \frac{da}{db} - a \cdot \frac{db}{db}}{b^2}$$

Since $a$ does not depend on $b$, $\frac{da}{db} = 0$, and since any variable is equal to itself, $\frac{db}{db} = 1$. 

So the derivative $\frac{dy}{db}$ simplifies to:

$$\frac{dy}{db} = \frac{b \cdot 0 - a \cdot 1}{b^2}$$

Therefore, the derivative of $y$ with respect to $b$ is $-\frac{a}{b^2}$.

Therefore, the gradient of $f(a, b)$ with respect to $a$ is $\frac{1}{b}$, and the gradient of $f(a, b)$ with respect to $b$ is $- \frac{a}{b^{2}}$.

In [9]:
#| echo: false
#| output: asis
show_doc(divide)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L291){target="_blank" style="float:right; font-size:smaller"}

### divide

>      divide (a:minima.autograd.Tensor, b:minima.autograd.Tensor)

Divides two tensors element-wise.

Args:
    a (Tensor): The dividend tensor.
    b (Tensor): The divisor tensor.

Returns:
    Tensor: The resulting tensor after element-wise division.

Example:
    >>> import numpy as np
    >>> a = Tensor(np.array([1, 2, 3]))
    >>> b = Tensor(np.array([4, 5, 6]))
    >>> result = divide(a, b)
    >>> print(result)
    Tensor([0.25, 0.4, 0.5])

In [10]:
#| echo: false
#| output: asis
show_doc(EWiseDiv)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L248){target="_blank" style="float:right; font-size:smaller"}

### EWiseDiv

>      EWiseDiv ()

The EWiseDiv operation divides two tensors element-wise.

Example:
    >>> import numpy as np
    >>> a = Tensor(np.array([1, 2, 3]))
    >>> b = Tensor(np.array([4, 5, 6]))
    >>> div = EWiseDiv()
    >>> result = div.compute(a.data, b.data)
    >>> print(result)
    array([0.25, 0.4, 0.5])

## Scalar Division

Let's denote the scalar as `c`, and `a` as the tensor being divided by the scalar. The operation can be described as `f(a) = a / c`.

The function for the backward pass (i.e., the gradient) is `df/da = 1/c`.

This is the derivative of `f(a)` with respect to `a`.

We are given a function $f(a) = \frac{a}{c}$, where $a$ is a tensor and $c$ is a scalar. Our task is to find the derivative of this function with respect to $a$.

By using the power rule of differentiation, where the derivative of $a^n$ is $n \cdot a^{n-1}$, we can rewrite $f(a)$ as $f(a) = c^{-1}a$. 

Now, we can differentiate this with respect to $a$:

\begin{align*}
\frac{df}{da} &= \frac{d}{da} (c^{-1}a) \\
&= c^{-1} \frac{d}{da} (a) \\
&= c^{-1} \\
&= \frac{1}{c}
\end{align*}

Therefore, the gradient of $f(a)$ with respect to $a$ is $\frac{1}{c}$.

In [11]:
#| echo: false
#| output: asis
show_doc(divide_scalar)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L363){target="_blank" style="float:right; font-size:smaller"}

### divide_scalar

>      divide_scalar (a:minima.autograd.Tensor, scalar:Union[int,float])

Divides a tensor by a scalar.

Args:
    a (Tensor): The tensor to divide.
    scalar (int, float): The scalar to divide the tensor by.

Returns:
    Tensor: The resulting tensor after division.

Example:
    >>> import numpy as np
    >>> a = Tensor(np.array([1, 2, 3]))
    >>> scalar = 2
    >>> result = divide_scalar(a, scalar)
    >>> print(result)
    Tensor([0.5, 1.0, 1.5])

In [12]:
#| echo: false
#| output: asis
show_doc(DivScalar)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L314){target="_blank" style="float:right; font-size:smaller"}

### DivScalar

>      DivScalar (scalar:Union[int,float])

The DivScalar operation divides a tensor by a scalar.

Example:
    >>> import numpy as np
    >>> a = Tensor(np.array([1, 2, 3]))
    >>> scalar = 2
    >>> div_scalar = DivScalar(scalar)
    >>> result = div_scalar.compute(a.data)
    >>> print(result)
    array([0.5, 1.0, 1.5])

## Negation

Let's denote `a` as the tensor being negated. The operation can be described as `f(a) = -a`.

The function for the backward pass (i.e., the gradient) is `df/da = -1`.

We are given a function $f(a) = -a$, where $a$ is a tensor. Our task is to find the derivative of this function with respect to $a$.

By differentiating the function $f(a)$ with respect to $a$, we find:

\begin{align*}
\frac{df}{da} &= \frac{d}{da} (-a) \\
&= -1
\end{align*}

Therefore, the gradient of $f(a)$ with respect to $a$ is $-1$.

In [13]:
#| echo: false
#| output: asis
show_doc(negate)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L423){target="_blank" style="float:right; font-size:smaller"}

### negate

>      negate (a:minima.autograd.Tensor)

Negates the given tensor.

Args:
- a: The tensor to negate.

Returns:
The negation of a.

Example:
>>> a = Tensor([1, -2, 3])
>>> result = negate(a)
>>> print(result)
Tensor([-1, 2, -3])

In [14]:
#| echo: false
#| output: asis
show_doc(Negate)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L385){target="_blank" style="float:right; font-size:smaller"}

### Negate

>      Negate ()

Negates the given tensor.

Example:
>>> a = Tensor([1, -2, 3])
>>> op = Negate()
>>> result = op.compute(a)
>>> print(result)
Tensor([-1, 2, -3])

## Exp

Explanation for the derivative of the [`Exp`](https://m0saan.github.io/minima/operators.html#exp) operator:

Let's denote `a` as the tensor on which the exponential function is applied. The operation can be described as `f(a) = exp(a)`, where [`exp`](https://m0saan.github.io/minima/operators.html#exp) represents the exponential function.

The function for the backward pass (i.e., the gradient) is `df/da = exp(a)`.

We are given a function $f(a) = \exp(a)$, where $a$ is a tensor. Our task is to find the derivative of this function with respect to $a$.

By differentiating the function $f(a)$ with respect to $a$, we find:

\begin{align*}
\frac{df}{da} &= \frac{d}{da} (\exp(a)) \\
&= \exp(a)
\end{align*}

Therefore, the gradient of $f(a)$ with respect to $a$ is $\exp(a)$.

In [15]:
#| echo: false
#| output: asis
show_doc(exp)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L480){target="_blank" style="float:right; font-size:smaller"}

### exp

>      exp (a:minima.autograd.Tensor)

Calculates the exponential of the given tensor.

Args:
- a: The tensor.

Returns:
The exponential of a.

Example:
>>> a = Tensor([1, 2, 3])
>>> result = exp(a)
>>> print(result)
Tensor([2.71828183, 7.3890561, 20.08553692])

In [16]:
#| echo: false
#| output: asis
show_doc(Exp)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L442){target="_blank" style="float:right; font-size:smaller"}

### Exp

>      Exp ()

Calculates the exponential of the given tensor.

Example:
>>> a = Tensor([1, 2, 3])
>>> op = Exp()
>>> result = op.compute(a)
>>> print(result)
Tensor([2.71828183, 7.3890561, 20.08553692])

## ReLU

The derivative of the [`ReLU`](https://m0saan.github.io/minima/operators.html#relu) (Rectified Linear Unit) operator:

Let's denote `a` as the tensor on which the ReLU function is applied. The ReLU function is defined as follows: 

$$
f(a) = 
\begin{cases}
a, & \text{if } a \geq 0 \\
0, & \text{if } a < 0
\end{cases}
$$

The function for the backward pass (i.e., the gradient) is `df/da = 1` if `a >= 0`, and `df/da = 0` if `a < 0`.

We are given a function $f(a) = \max(0, a)$, where $a$ is a tensor. Our task is to find the derivative of this function with respect to $a$.

By considering the definition of the ReLU function, we can write $f(a)$ as:

$$
f(a) = 
\begin{cases}
a, & \text{if } a \geq 0 \\
0, & \text{if } a < 0
\end{cases}
$$

Now, let's differentiate $f(a)$ with respect to $a$:

$$
\frac{df}{da} = 
\begin{cases}
1, & \text{if } a \geq 0 \\
0, & \text{if } a < 0
\end{cases}
$$

Therefore, the gradient of $f(a)$ with respect to $a$ is $1$ if $a \geq 0$, and $0$ if $a < 0$.

In [17]:
#| echo: false
#| output: asis
show_doc(relu)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L537){target="_blank" style="float:right; font-size:smaller"}

### relu

>      relu (a:minima.autograd.Tensor)

Applies the ReLU (Rectified Linear Unit) activation function to the given tensor.

Args:
- a: The tensor.

Returns:
The result of applying ReLU to a.

Example:
>>> a = Tensor([1, -2, 3])
>>> result = relu(a)
>>> print(result)
Tensor([1, 0, 3])

In [18]:
#| echo: false
#| output: asis
show_doc(ReLU)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L499){target="_blank" style="float:right; font-size:smaller"}

### ReLU

>      ReLU ()

Applies the ReLU (Rectified Linear Unit) activation function to the given tensor.

Example:
>>> a = Tensor([1, -2, 3])
>>> op = ReLU()
>>> result = op.compute(a)
>>> print(result)
Tensor([1, 0, 3])

## Power Scalar

The derivative of the [`PowerScalar`](https://m0saan.github.io/minima/operators.html#powerscalar) operator:

Let's denote the scalar as `n` and `a` as the tensor being raised to the power of the scalar. The operation can be described as `f(a) = a^n`.

The function for the backward pass (i.e., the gradient) is `df/da = n * a^(n-1)`.

We are given a function $f(a) = a^n$, where $a$ is a tensor and $n$ is a scalar. Our task is to find the derivative of this function with respect to $a$.

By differentiating the function $f(a)$ with respect to $a$, we find:

\begin{align*}
\frac{df}{da} &= \frac{d}{da} (a^n) \\
&= n \cdot a^{n-1}
\end{align*}

Therefore, the gradient of $f(a)$ with respect to $a$ is $n \cdot a^{n-1}$.

In [19]:
#| echo: false
#| output: asis
show_doc(power_scalar)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L610){target="_blank" style="float:right; font-size:smaller"}

### power_scalar

>      power_scalar (a:minima.autograd.Tensor, scalar:int)

Raises a tensor to a power.

Args:
    a (Tensor): The input tensor.
    scalar (int): The power to raise the tensor to.

Returns:
    Tensor: The resulting tensor after the power operation.

Example:
    >>> import numpy as np
    >>> tensor = Tensor(np.array([1, 2, 3]))
    >>> result = power_scalar(tensor, 2)
    >>> print(result)
    Tensor([1, 4, 9])

In [20]:
#| echo: false
#| output: asis
show_doc(PowerScalar)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L557){target="_blank" style="float:right; font-size:smaller"}

### PowerScalar

>      PowerScalar (scalar:int)

The PowerScalar operation raises a tensor to an (integer) power.

Attributes:
    scalar (int): The power to raise the tensor to.

Example:
    >>> import numpy as np
    >>> tensor = Tensor(np.array([1, 2, 3]))
    >>> pow_scalar = PowerScalar(2)
    >>> result = pow_scalar.compute(tensor.data)
    >>> print(result)
    array([1, 4, 9])

## Log

Explanation for the derivative of the `Log` operator:

Let's denote `a` as the tensor on which the logarithm is applied. The operation can be described as `f(a) = log(a)`, where `log` represents the natural logarithm.

The function for the backward pass (i.e., the gradient) is `df/da = 1/a`.

We are given a function $f(a) = \log(a)$, where $a$ is a tensor. Our task is to find the derivative of this function with respect to $a$.

By differentiating the function $f(a)$ with respect to $a$, we find:

\begin{align*}
\frac{df}{da} &= \frac{d}{da} (\log(a)) \\
&= \frac{1}{a}
\end{align*}

We started by defining the function `f(a) = log(a)`, where `log` represents the natural logarithm. It then explains that when we differentiate `f(a)` with respect to `a`, we find that the derivative is `1/a`. This means that the gradient of `f(a)` with respect to `a` is `1/a`, which represents the behavior of the `Log` operator.

In [19]:
class Log(TensorOp):
    """
    The Log operation applies the natural logarithm element-wise on the tensor.

    Example:
        >>> import numpy as np
        >>> a = Tensor(np.array([1.0, 2.0, 3.0]))
        >>> log_op = Log()
        >>> result = log_op.compute(a.data)
        >>> print(result)
        array([0., 0.69314718, 1.09861229])
    """

    def compute(self, a: NDArray) -> NDArray:
        """
        Applies the natural logarithm to the tensor.

        Args:
            a (NDArray): The input tensor.

        Returns:
            NDArray: The resulting tensor after applying the natural logarithm.
        """
        return array_api.log(a)

    def gradient(self, out_grad: Tensor, node: Tensor) -> Tuple[Tensor, ...]:
        """
        Computes the gradient of the log operation.

        Args:
            out_grad (Tensor): The gradient of the output tensor.
            node (Tensor): The node in the computational graph where the operation was performed.

        Returns:
            Tuple[Tensor, ...]: The gradient with respect to the input tensor.
        """
        a = node.children[0]
        return (out_grad / a, )

def log(a: Tensor) -> Tensor:
    """
    Applies the natural logarithm to the tensor.

    Args:
        a (Tensor): The input tensor.

    Returns:
        Tensor: The resulting tensor after applying the natural logarithm.

    Example:
        >>> import numpy as np
        >>> a = Tensor(np.array([1.0, 2.0, 3.0]))
        >>> result = log(a)
        >>> print(result)
        Tensor([0., 0.69314718, 1.09861229])
    """
    return Log()(a)

## Transpose

The operation described here is a transposition of a tensor `a`, where the operation can be described as `f(a) = a^T`.

We'll compute the derivative of this operation.

First, we note that the transpose operation doesn't change the values of the tensor elements but only their positions. This means that the gradient of a transposed tensor is just the transposed gradient of the original tensor.

Let's denote the gradient of the transposed tensor as `g = df/da`, where `f(a) = a^T`.

Given this, we can derive the following:

1. The derivative of `f(a)` with respect to `a` is `df/da = g^T`.

This conclusion can be illustrated as follows in Latex:

We are given a function $f(a) = a^T$, where $a$ is a tensor and $a^T$ is the transpose of the tensor. Our task is to find the derivative of this function with respect to $a$.

Let's compute $\frac{df}{da}$:

\begin{align*}
\frac{df}{da} &= \frac{d}{da} (a^T) \\
&= (g)^T
\end{align*}

Here, $g$ is the gradient of the transposed tensor. The derivative of a transposed tensor is the transposed derivative of the original tensor.

Now, let's apply this to the [`Transpose`](https://m0saan.github.io/minima/operators.html#transpose) class.

The `gradient` method in the [`Transpose`](https://m0saan.github.io/minima/operators.html#transpose) class computes the gradient of the transpose operation. The gradient of the transposed tensor is just the transposed gradient of the original tensor. This is implemented by applying the [`transpose`](https://m0saan.github.io/minima/operators.html#transpose) function to `out_grad`, which is the gradient of the output tensor, and then returning this transposed gradient. The axes used for the transpose operation are the same as the ones used in the forward pass.

Therefore, the gradient of the transposition operation with respect to the input tensor `a` is the transpose of the output gradient `out_grad`.

In this code, `transpose(out_grad, axes=self.axes)` performs the transposition of `out_grad` along the specified axes.

In [21]:
#| echo: false
#| output: asis
show_doc(transpose)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L684){target="_blank" style="float:right; font-size:smaller"}

### transpose

>      transpose (a:minima.autograd.Tensor, axes:Optional[tuple]=None)

Perform the transpose operation on the input tensor along the specified axes.
If no axes are specified, it swaps the last two dimensions of the input tensor.

Args:
    a (Tensor): The input tensor.
    axes (Optional[tuple]): The pair of axes that should be swapped. If not provided, the last two axes are swapped.

Returns:
    Tensor: The transposed tensor.

Example:
    >>> a = Tensor(np.arange(1, 7).reshape(2, 3))
    >>> result = transpose(a)
    >>> print(result)
    Tensor([[1, 4],
            [2, 5],
            [3, 6]])

In [22]:
#| echo: false
#| output: asis
show_doc(Transpose)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L631){target="_blank" style="float:right; font-size:smaller"}

### Transpose

>      Transpose (axes:Optional[tuple]=None)

Tensor operation class that performs transposition of a tensor along specified axes.

If no axes are specified, it swaps the last two dimensions of the input tensor.

Example:
    >>> a = Tensor(np.arange(1, 7).reshape(2, 3))
    >>> op = Transpose()
    >>> result = op.compute(a.data)
    >>> print(result)
    array([[1, 4],
           [2, 5],
           [3, 6]])

## Reshape

The operation described here is a reshaping of a tensor `a`, where the operation can be described as `f(a) = reshape(a, new_shape)`.

We'll compute the derivative of this operation.

The reshaping operation doesn't change the values of the tensor elements but only rearranges them. This means that the gradient of a reshaped tensor is just the reshaped gradient of the original tensor.

Let's denote the gradient of the reshaped tensor as `g = df/da`, where `f(a) = reshape(a, new_shape)`.

Given this, we can derive the following:

1. The derivative of `f(a)` with respect to `a` is `df/da = reshape(g, original_shape)`.

This conclusion can be illustrated as follows in Latex:

We are given a function $f(a) = reshape(a, new\_shape)$, where $a$ is a tensor and `reshape(a, new_shape)` is the reshaped tensor. Our task is to find the derivative of this function with respect to $a$.

Let's compute $\frac{df}{da}$:

\begin{align*}
\frac{df}{da} &= \frac{d}{da} (reshape(a, new\_shape)) \\
&= reshape(g, original\_shape)
\end{align*}

Here, $g$ is the gradient of the reshaped tensor. The derivative of a reshaped tensor is the reshaped derivative of the original tensor. The reshaped derivative has the same shape as the original tensor.

Now, let's apply this to the [`Reshape`](https://m0saan.github.io/minima/operators.html#reshape) class.

The `gradient` method in the [`Reshape`](https://m0saan.github.io/minima/operators.html#reshape) class computes the gradient of the reshape operation. The gradient of the reshaped tensor is just the reshaped gradient of the original tensor. This is implemented by applying the [`reshape`](https://m0saan.github.io/minima/operators.html#reshape) function to `out_grad`, which is the gradient of the output tensor, and then returning this reshaped gradient. The shape used for the reshaping is the shape of the original tensor, which is obtained from `node.children[0].shape`.

Therefore, the gradient of the reshape operation with respect to the input tensor `a` is the reshaping of the output gradient `out_grad` to the shape of the original tensor.

Here is the corresponding Python code:

```python
def gradient(self, out_grad: Tensor, node: Tensor) -> Tuple[Tensor, ...]:
    """
    Compute the gradient of the reshape operation.

    Args:
        out_grad (Tensor): The gradient of the output tensor.
        node (Tensor): The node in the computational graph where the operation was performed.

    Returns:
        Tuple[Tensor, ...]: The gradient with respect to the input tensor.
    """
    input_shape = node.children[0].shape
    return reshape(out_grad, input_shape),
```

In this code, `reshape(out_grad, input_shape)` performs the reshaping of `out_grad` to the shape of the original tensor.

In [23]:
#| echo: false
#| output: asis
show_doc(reshape)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L755){target="_blank" style="float:right; font-size:smaller"}

### reshape

>      reshape (a:minima.autograd.Tensor, shape:Tuple[int,...])

Reshape the input tensor to the specified shape.

Args:
    a (Tensor): The input tensor.
    shape (Tuple[int, ...]): The desired shape of the output tensor.

Returns:
    Tensor: The reshaped tensor.

Example:
    >>> a = Tensor([1, 2, 3, 4, 5, 6])
    >>> result = reshape(a, (2, 3))
    >>> print(result)
    Tensor([[1, 2, 3],
             [4, 5, 6]])

In [24]:
#| echo: false
#| output: asis
show_doc(Reshape)

---

[source](https://github.com/m0saan/minima/blob/main/minima/operators.py#L708){target="_blank" style="float:right; font-size:smaller"}

### Reshape

>      Reshape (shape:Tuple[int,...])

Tensor operation class that reshapes a tensor.

Example:
    >>> a = Tensor([1, 2, 3, 4, 5, 6])
    >>> op = Reshape((2, 3))
    >>> result = op.compute(a)
    >>> print(result)
    Tensor([[1, 2, 3],
             [4, 5, 6]])

In [22]:
import nbdev; nbdev.nbdev_export()