
# Trabalho 1: Diferenciação Automática com Grafos Computacionais

## Informações Gerais

- Data de Entrega: 29/06/2025
- Pontuação: 10 pontos (+4 pontos extras)
- O trabalho deve ser feito individualmente.
- A entrega do trabalho deve ser realizada via sistema testr.



## Especificação

⚠️ *Esta explicação assume que você leu e entendeu os slides sobre grafos computacionais.*

O trabalho consiste em implementar um sistema de diferenciação automática usando grafos computacionais e utilizar este sistema para resolver um conjunto de problemas.

Para isto, devem ser definidos um tipo Tensor para representar dados (similares aos arrays do numpy) e operações (e.g., soma, subtração, etc.) que geram tensores como saída. 

Sempre que uma operação é realizada, é armazenado no tensor de saída referências para os seus pais, isto é, os valores usados como entrada para a operação. 


### Imports

In [1]:
from typing import Optional, Union, Any, List, Tuple
from collections.abc import Iterable
from abc import ABC, abstractmethod
import numbers

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('whitegrid')

### Classe NameManager

A classe NameManager provê uma forma conveniente de dar nomes intuitivos para tensores que resultam de operações. A idéia é tornar mais fácil para o usuário das demais classes qual operação gerou qual tensor. Ela provê os seguintes métodos públicos: 

- reset(): reinicia o sistema de gestão de nomes.
- new(<basename>: str): retorna um nome único a partir do nome de base passado como argumento. 
  
Como indicado no exemplo abaixo da classe, a idéia geral é que uma sequência de operações é feita, os nomes dos tensores sejam os nomes das operações seguidos de um número. Se forem feitas 3 operações de soma e uma de multiplicação, seus tensores de saída terão os nomes "add:0", "add:1", "add:2" e "prod:0".

In [2]:
class NameManager:
    """Manages unique names for tensors created by operations"""
    _counts = {}

    @staticmethod
    def reset():
        NameManager._counts = {}

    @staticmethod
    def _count(name):
        if name not in NameManager._counts:
            NameManager._counts[name] = 0
        return NameManager._counts[name]

    @staticmethod
    def _inc_count(name):
        assert name in NameManager._counts, f'Name {name} is not registered.'
        NameManager._counts[name] += 1

    @staticmethod
    def new(name: str) -> str:
        count = NameManager._count(name)
        tensor_name = f"{name}:{count}"
        NameManager._inc_count(name)
        return tensor_name

# exemplo de uso
print(NameManager.new('add'))
print(NameManager.new('in'))
print(NameManager.new('add'))
print(NameManager.new('add'))
print(NameManager.new('in'))
print(NameManager.new('prod'))

NameManager.reset()

add:0
in:0
add:1
add:2
in:1
prod:0


### Classe Tensor

Deve ser criada uma classe `Tensor` representando um array multidimensional.

In [3]:
from typing import Optional, Union, Any, List, Tuple
from collections.abc import Iterable
from abc import ABC, abstractmethod
import numbers

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('whitegrid')

class Tensor:
    """Multidimensional array with automatic differentiation support"""
    
    def __init__(self,
                 arr: Union[np.ndarray, list, numbers.Number, 'Tensor'],
                 parents: List['Tensor'] = None,
                 requires_grad: bool = True,
                 name: str = '',
                 operation: Optional['Op'] = None,
                 nd_support: bool = False):
        
        self._nd_support = nd_support
        self._parents = parents if parents is not None else []
        self._operation = operation
        self.requires_grad = requires_grad
        
        if isinstance(arr, Tensor):
            self._arr = arr._arr.copy()
            if not arr.requires_grad:
                self.requires_grad = False
        else:
            try:
                self._arr = np.array(arr, dtype=np.float64)
            except Exception as e:
                raise ValueError(f"Cannot convert input to numpy array: {e}")
        
        if not nd_support:
            if self._arr.ndim > 2:
                raise ValueError(f"Tensor must be 0D, 1D, or 2D, got {self._arr.ndim}D")
            elif self._arr.ndim == 0:
                self._arr = self._arr.reshape(1, 1)
            elif self._arr.ndim == 1:
                self._arr = self._arr.reshape(-1, 1)
        
        if name:
            self._name = name
        elif not self._parents:  
            self._name = NameManager.new('in')
        else:
            self._name = 'unnamed'
        
        self.grad: Optional['Tensor'] = None
        self._second_order_grad: Optional['Tensor'] = None
    
    def zero_grad(self):
        """Reset gradients for this tensor"""
        self.grad = None
        self._second_order_grad = None
    
    def numpy(self) -> np.ndarray:
        """Return internal array as numpy array"""
        return self._arr.copy()
    
    def item(self) -> float:
        """Return scalar value if tensor contains single element"""
        if self._arr.size != 1:
            raise ValueError("item() can only be called on scalar tensors")
        return float(self._arr.item())
    
    def __repr__(self) -> str:
        if self._nd_support:
            return f"Tensor({self._arr}, name={self._name}, shape={self.shape})"
        else:
            if self._arr.size > 1 and self._arr.shape != (1, 1):
                display_arr = self._arr.squeeze()
            else:
                display_arr = self._arr
            return f"Tensor({display_arr}, name={self._name}, shape={self.shape})"
    
    @property
    def shape(self) -> Tuple[int, ...]:
        return self._arr.shape
    
    @property
    def T(self) -> 'Tensor':
        """Transpose tensor"""
        result = Tensor(np.transpose(self._arr), requires_grad=self.requires_grad, nd_support=self._nd_support)
        result._name = f"{self._name}.T"
        return result
    
    def detach(self) -> 'Tensor':
        """Return tensor without gradient computation"""
        return Tensor(self._arr, requires_grad=False, name=f"{self._name}.detached", nd_support=self._nd_support)
    
    def backward(self, grad: Optional['Tensor'] = None, create_graph: bool = False) -> None:
        """
        Compute gradients using backpropagation with optional higher-order support
        
        Args:
            grad: Gradient to backpropagate (defaults to ones)
            create_graph: If True, creates computation graph for gradient (enables higher-order)
        """
        if not self.requires_grad:
            return
        
        if grad is None:
            grad = Tensor(np.ones_like(self._arr, dtype=np.float64), 
                         requires_grad=create_graph, nd_support=self._nd_support)
            grad._name = "ones"
        else:
            grad = ensure_tensor(grad, nd_support=self._nd_support)
            if create_graph:
                grad.requires_grad = True
        
        if grad.shape != self.shape:
            grad_arr = grad.numpy()
            target_shape = self.shape
            
            if grad_arr.size >= np.prod(target_shape):
                if grad_arr.size == np.prod(target_shape):
                    grad_arr = grad_arr.reshape(target_shape)
                else:
                    while grad_arr.size > np.prod(target_shape) and grad_arr.ndim > 0:
                        largest_dim = np.argmax(grad_arr.shape)
                        grad_arr = np.sum(grad_arr, axis=largest_dim, keepdims=False)
                    
                    if grad_arr.size == np.prod(target_shape):
                        grad_arr = grad_arr.reshape(target_shape)
                    else:
                        avg_grad = np.mean(grad_arr)
                        grad_arr = np.full(target_shape, avg_grad)
            else:
                grad_arr = np.broadcast_to(grad_arr, target_shape)
            
            grad = Tensor(grad_arr, requires_grad=create_graph, nd_support=self._nd_support)
            grad._name = "reshaped_grad"
        
        if self.grad is None:
            self.grad = Tensor(np.zeros_like(self._arr), requires_grad=create_graph, nd_support=self._nd_support)
            if self._name.startswith('in:'):
                self.grad._name = "in_grad"
            else:
                self.grad._name = f"{self._name}_grad"
        
        self.grad = add(self.grad, grad, nd_support=self._nd_support)
        if create_graph:
            self.grad.requires_grad = True
        
        if self._operation and self._parents:
            parent_grads = self._operation.grad(self.grad, *self._parents)
            if len(parent_grads) != len(self._parents):
                raise ValueError(f"Expected {len(self._parents)} gradients, got {len(parent_grads)}")
            
            for parent, parent_grad in zip(self._parents, parent_grads):
                if parent.requires_grad:
                    parent.backward(parent_grad, create_graph=create_graph)
    
    def compute_second_derivative(self) -> Optional['Tensor']:
        """
        Compute second-order derivative by differentiating the gradient
        This is the clean approach for automatic higher-order derivatives
        """
        if self.grad is None:
            raise ValueError("Must compute first derivative before second derivative")
        
        if not self.grad.requires_grad:
            print("Warning: Gradient does not have computation graph. Computing analytical second derivative if available.")
            
            if self._operation and hasattr(self._operation, 'analytical_second_derivative'):
                try:
                    dummy_grad = Tensor(np.ones_like(self._arr), nd_support=self._nd_support)
                    second_grads = self._operation.analytical_second_derivative(dummy_grad, *self._parents)
                    if second_grads and len(second_grads) > 0:
                        for i, parent in enumerate(self._parents):
                            if parent is self or np.array_equal(parent._arr, self._arr):
                                return second_grads[i]
                        return second_grads[0]
                except Exception as e:
                    print(f"Analytical second derivative failed: {e}")
            
            return None
        
        try:
            if self.grad._arr.size > 1:
                grad_sum = my_sum(self.grad)
            else:
                grad_sum = self.grad
            
            first_grad = self.grad.detach()
            
            self.zero_grad()
            
            grad_sum.backward()
            
            self._second_order_grad = self.grad.detach() if self.grad is not None else None
            
            self.grad = first_grad
            
            return self._second_order_grad
            
        except Exception as e:
            print(f"Automatic second derivative computation failed: {e}")
            return None
    
    def __add__(self, other): return add(self, other, nd_support=self._nd_support)
    def __radd__(self, other): return add(other, self, nd_support=self._nd_support)
    def __sub__(self, other): return sub(self, other, nd_support=self._nd_support)
    def __rsub__(self, other): return sub(other, self, nd_support=self._nd_support)
    def __mul__(self, other): return prod(self, other, nd_support=self._nd_support)
    def __rmul__(self, other): return prod(other, self, nd_support=self._nd_support)
    def __truediv__(self, other): return div(self, other, nd_support=self._nd_support)
    def __rtruediv__(self, other): return div(other, self, nd_support=self._nd_support)
    def __matmul__(self, other): return matmul(self, other, nd_support=self._nd_support)
    def __rmatmul__(self, other): return matmul(other, self, nd_support=self._nd_support)
    def __pow__(self, power): return power_op(self, power, nd_support=self._nd_support)
    def __neg__(self): return prod(self, -1, nd_support=self._nd_support)
    
    def sum(self, axis: Optional[Union[int, Tuple[int, ...]]] = None, keepdims: bool = False) -> 'Tensor':
        return my_sum(self, axis=axis, keepdims=keepdims, nd_support=self._nd_support)
    
    def mean(self, axis: Optional[Union[int, Tuple[int, ...]]] = None, keepdims: bool = False) -> 'Tensor':
        return mean(self, axis=axis, keepdims=keepdims, nd_support=self._nd_support)
    
    def max(self, axis: Optional[Union[int, Tuple[int, ...]]] = None, keepdims: bool = False) -> 'Tensor':
        return max_op(self, axis=axis, keepdims=keepdims, nd_support=self._nd_support)
    
    def reshape(self, *shape) -> 'Tensor':
        """Reshape tensor"""
        if len(shape) == 1 and isinstance(shape[0], (list, tuple)):
            shape = shape[0]
        result = Tensor(self._arr.reshape(shape), requires_grad=self.requires_grad, nd_support=self._nd_support)
        result._name = f"{self._name}.reshape{shape}"
        return result
    
    def squeeze(self, axis: Optional[int] = None) -> 'Tensor':
        """Remove single-dimensional entries"""
        result = Tensor(np.squeeze(self._arr, axis=axis), requires_grad=self.requires_grad, nd_support=self._nd_support)
        result._name = f"{self._name}.squeeze"
        return result
    
    def unsqueeze(self, axis: int) -> 'Tensor':
        """Add single-dimensional entry"""
        result = Tensor(np.expand_dims(self._arr, axis=axis), requires_grad=self.requires_grad, nd_support=self._nd_support)
        result._name = f"{self._name}.unsqueeze"
        return result

### Interface de  Operações

A classe abaixo define a interface que as operações devem implementar. Ela não precisa ser modificada, mas pode, caso queira.

In [4]:

class Op(ABC):
    """Abstract base class for operations"""
    @abstractmethod
    def __call__(self, *args, **kwargs) -> Tensor:
        pass
    
    @abstractmethod
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        pass


def ensure_tensor(x: Any, nd_support: bool = False) -> Tensor:
    """Convert input to Tensor if not already"""
    if not isinstance(x, Tensor):
        x = Tensor(x, nd_support=nd_support)
    elif nd_support and not x._nd_support:
        x = Tensor(x._arr, requires_grad=x.requires_grad, nd_support=True)
    return x


def broadcast_shapes(shape1: Tuple[int, ...], shape2: Tuple[int, ...]) -> Tuple[int, ...]:
    """Compute broadcast shape for two shapes"""
    max_len = max(len(shape1), len(shape2))
    shape1 = (1,) * (max_len - len(shape1)) + shape1
    shape2 = (1,) * (max_len - len(shape2)) + shape2
    result = []
    
    for s1, s2 in zip(shape1, shape2):
        if s1 == s2 or s1 == 1 or s2 == 1:
            result.append(max(s1, s2))
        else:
            raise ValueError(f"Shapes {shape1} and {shape2} are not broadcastable")
    
    return tuple(result)


def broadcast_gradient(grad: Tensor, original_shape: Tuple[int, ...], nd_support: bool = False) -> Tensor:
    """Broadcast gradient back to original shape"""
    grad_arr = grad.numpy()
    grad_shape = grad.shape
    
    if grad_shape == original_shape:
        return Tensor(grad_arr, requires_grad=grad.requires_grad, nd_support=nd_support)
    
    while len(grad_shape) > len(original_shape):
        grad_arr = np.sum(grad_arr, axis=0, keepdims=False)
        grad_shape = grad_arr.shape
    
    for i in range(len(grad_shape)):
        if i < len(original_shape) and original_shape[i] == 1 and grad_shape[i] > 1:
            grad_arr = np.sum(grad_arr, axis=i, keepdims=True)
    
    if grad_arr.shape != original_shape:
        try:
            grad_arr = np.reshape(grad_arr, original_shape)
        except ValueError:
            raise ValueError(f"Cannot broadcast gradient shape {grad_arr.shape} to {original_shape}")
    
    return Tensor(grad_arr, requires_grad=grad.requires_grad, nd_support=nd_support)

class Max(Op):
    """Max operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Max requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        axis = kwargs.get('axis', None)
        keepdims = kwargs.get('keepdims', False)
        
        result_arr = np.max(a.numpy(), axis=axis, keepdims=keepdims)
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('max')
        self._axis = axis
        self._keepdims = keepdims
        self._input_shape = a.shape
        self._max_positions = (a.numpy() == np.max(a.numpy(), axis=axis, keepdims=True))
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        back_grad_arr = back_grad.numpy()
        
        if self._axis is not None and not self._keepdims:
            if isinstance(self._axis, int):
                back_grad_arr = np.expand_dims(back_grad_arr, self._axis)
            else:
                for ax in sorted(self._axis, reverse=True):
                    back_grad_arr = np.expand_dims(back_grad_arr, ax)
        
        grad_arr = self._max_positions.astype(np.float64) * back_grad_arr
        
        if grad_arr.shape != self._input_shape:
            grad_arr = np.broadcast_to(grad_arr, self._input_shape)
        
        grad_a = Tensor(grad_arr, requires_grad=back_grad.requires_grad, nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"max_grad"
        
        return [grad_a]
    
class Exp(Op):
    """Exponential operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Exp requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        
        clipped_arr = np.clip(a.numpy(), -500, 500)
        result_arr = np.exp(clipped_arr)
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('exp')
        self._output = result
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        grad_a = prod(back_grad, self._output, nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"exp_grad"
        
        return [grad_a]
    
    def analytical_second_derivative(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        """Analytical second derivative: d²/dx²(exp(x)) = exp(x)"""
        a = args[0]
        exp_val = exp(a, nd_support=a._nd_support)
        second_grad = prod(back_grad, exp_val, nd_support=a._nd_support)
        second_grad._name = f"exp_second_grad"
        return [second_grad]


class Log(Op):
    """Natural logarithm operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Log requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        
        a_arr = a.numpy()
        if np.any(a_arr <= 0):
            raise ValueError("Logarithm of non-positive values detected")
        
        result_arr = np.log(a_arr)
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('log')
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        grad_a = prod(back_grad, div(Tensor(1, nd_support=a._nd_support), a, nd_support=a._nd_support), nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"log_grad"
        
        return [grad_a]
    
    def analytical_second_derivative(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        """Analytical second derivative: d²/dx²(log(x)) = -1/x²"""
        a = args[0]
        neg_inv_x_sq = div(Tensor(-1, nd_support=a._nd_support), power_op(a, 2, nd_support=a._nd_support), nd_support=a._nd_support)
        second_grad = prod(back_grad, neg_inv_x_sq, nd_support=a._nd_support)
        second_grad._name = f"log_second_grad"
        return [second_grad]
    
class Power(Op):
    """Power operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 2, "Power requires exactly 2 arguments"
        a, power = ensure_tensor(args[0], kwargs.get('nd_support', False)), ensure_tensor(args[1], kwargs.get('nd_support', False))
        
        if not a._nd_support and power.shape not in ((), (1,), (1, 1)):
            raise ValueError(f"Power must be scalar, got shape {power.shape}")
        
        power_val = power.item() if power._arr.size == 1 else power.numpy()
        result_arr = np.power(a.numpy(), power_val)
        result = Tensor(result_arr, parents=[a, power], operation=self,
                       requires_grad=a.requires_grad or power.requires_grad,
                       nd_support=a._nd_support or power._nd_support)
        result._name = NameManager.new('power')
        self._power_val = power_val
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a, power = args
        power_val = self._power_val
        
        if isinstance(power_val, np.ndarray):
            power_minus_one = power_val - 1
            grad_a = prod(back_grad, prod(power, power_op(a, power_minus_one, nd_support=a._nd_support), nd_support=a._nd_support), nd_support=a._nd_support)
        else:
            grad_a = prod(back_grad, Tensor(power_val * np.power(a.numpy(), power_val - 1), nd_support=a._nd_support), nd_support=a._nd_support)
        
        grad_power = Tensor(np.zeros_like(power.numpy()), nd_support=power._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"power_grad"
        
        if power._name.startswith('in:'):
            grad_power._name = "in_grad"
        else:
            grad_power._name = f"power_grad"
        
        return [grad_a, grad_power]
    
    def analytical_second_derivative(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        """Analytical second derivative: d²/dx²(x^n) = n(n-1)x^(n-2)"""
        a, power = args
        power_val = self._power_val
        
        if isinstance(power_val, (int, float)) and power_val >= 2:
            second_coeff = power_val * (power_val - 1)
            if power_val == 2:
                second_grad = prod(back_grad, Tensor(second_coeff * np.ones_like(a._arr), nd_support=a._nd_support), nd_support=a._nd_support)
            else:
                second_grad = prod(back_grad, Tensor(second_coeff * np.power(a.numpy(), power_val - 2), nd_support=a._nd_support), nd_support=a._nd_support)
        else:
            second_grad = Tensor(np.zeros_like(a._arr), nd_support=a._nd_support)
        
        second_grad_power = Tensor(np.zeros_like(power.numpy()), nd_support=power._nd_support)
        
        second_grad._name = f"power_second_grad"
        second_grad_power._name = f"power_second_grad"
        return [second_grad, second_grad_power]
    
class Div(Op):
    """Division operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 2, "Div requires exactly 2 arguments"
        a, b = ensure_tensor(args[0], kwargs.get('nd_support', False)), ensure_tensor(args[1], kwargs.get('nd_support', False))
        
        try:
            output_shape = broadcast_shapes(a.shape, b.shape)
        except ValueError as e:
            raise ValueError(f"Div: {e}")
        
        b_arr = b.numpy()
        if np.any(np.abs(b_arr) < 1e-8):
            raise ValueError("Division by zero detected")
        
        result_arr = np.divide(a.numpy(), b_arr, dtype=np.float64)
        result = Tensor(result_arr, parents=[a, b], operation=self,
                       requires_grad=a.requires_grad or b.requires_grad,
                       nd_support=a._nd_support or b._nd_support)
        result._name = NameManager.new('div')
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a, b = args
        grad_a_intermediate = div(back_grad, b, nd_support=a._nd_support)
        grad_a = broadcast_gradient(grad_a_intermediate, a.shape, a._nd_support)
        
        neg_a = Tensor(-a.numpy(), nd_support=b._nd_support)
        b_squared = power_op(b, 2, nd_support=b._nd_support)
        grad_b_intermediate = prod(back_grad, div(neg_a, b_squared, nd_support=b._nd_support), nd_support=b._nd_support)
        grad_b = broadcast_gradient(grad_b_intermediate, b.shape, b._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"div_grad"
        
        if b._name.startswith('in:'):
            grad_b._name = "in_grad"
        else:
            grad_b._name = f"div_grad"
        
        return [grad_a, grad_b]    
    
div = Div()
power_op = Power()
exp = Exp()
log = Log()
max_op = Max()


def square(x: Tensor, nd_support: bool = False) -> Tensor:
    """Square function using power operation"""
    return power_op(x, Tensor(2, nd_support=nd_support), nd_support=nd_support)

### Implementação das Operações

Operações devem herdar de `Op` e implementar os métodos `__call__` e `grad`.

Pelo menos as seguintes operações devem ser implementadas:



In [5]:

class Add(Op):
    """Addition operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 2, "Add requires exactly 2 arguments"
        a, b = ensure_tensor(args[0], kwargs.get('nd_support', False)), ensure_tensor(args[1], kwargs.get('nd_support', False))
        
        try:
            output_shape = broadcast_shapes(a.shape, b.shape)
        except ValueError as e:
            raise ValueError(f"Add: {e}")
        
        result_arr = np.add(a.numpy(), b.numpy(), dtype=np.float64)
        result = Tensor(result_arr, parents=[a, b], operation=self, 
                       requires_grad=a.requires_grad or b.requires_grad,
                       nd_support=a._nd_support or b._nd_support)
        result._name = NameManager.new('add')
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a, b = args
        grad_a = broadcast_gradient(back_grad, a.shape, a._nd_support)
        grad_b = broadcast_gradient(back_grad, b.shape, b._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"add_grad"
        
        if b._name.startswith('in:'):
            grad_b._name = "in_grad"
        else:
            grad_b._name = f"add_grad"
        
        return [grad_a, grad_b]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
add = Add()

In [6]:

class Sub(Op):
    """Subtraction operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 2, "Sub requires exactly 2 arguments"
        a, b = ensure_tensor(args[0], kwargs.get('nd_support', False)), ensure_tensor(args[1], kwargs.get('nd_support', False))
        
        try:
            output_shape = broadcast_shapes(a.shape, b.shape)
        except ValueError as e:
            raise ValueError(f"Sub: {e}")
        
        result_arr = np.subtract(a.numpy(), b.numpy(), dtype=np.float64)
        result = Tensor(result_arr, parents=[a, b], operation=self,
                       requires_grad=a.requires_grad or b.requires_grad,
                       nd_support=a._nd_support or b._nd_support)
        result._name = NameManager.new('sub')
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a, b = args
        grad_a = broadcast_gradient(back_grad, a.shape, a._nd_support)
        neg_grad = Tensor(-back_grad.numpy(), nd_support=b._nd_support)
        grad_b = broadcast_gradient(neg_grad, b.shape, b._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"sub_grad"
        
        if b._name.startswith('in:'):
            grad_b._name = "in_grad"
        else:
            grad_b._name = f"sub_grad"
        
        return [grad_a, grad_b]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
sub = Sub()

In [7]:

class Prod(Op):
    """Element-wise multiplication operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 2, "Prod requires exactly 2 arguments"
        a, b = ensure_tensor(args[0], kwargs.get('nd_support', False)), ensure_tensor(args[1], kwargs.get('nd_support', False))
        
        try:
            output_shape = broadcast_shapes(a.shape, b.shape)
        except ValueError as e:
            raise ValueError(f"Prod: {e}")
        
        result_arr = np.multiply(a.numpy(), b.numpy(), dtype=np.float64)
        result = Tensor(result_arr, parents=[a, b], operation=self,
                       requires_grad=a.requires_grad or b.requires_grad,
                       nd_support=a._nd_support or b._nd_support)
        result._name = NameManager.new('prod')
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a, b = args
        grad_a_intermediate = prod(back_grad, b, nd_support=a._nd_support)
        grad_b_intermediate = prod(back_grad, a, nd_support=b._nd_support)
        grad_a = broadcast_gradient(grad_a_intermediate, a.shape, a._nd_support)
        grad_b = broadcast_gradient(grad_b_intermediate, b.shape, b._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"prod_grad"
        
        if b._name.startswith('in:'):
            grad_b._name = "in_grad"
        else:
            grad_b._name = f"prod_grad"
        
        return [grad_a, grad_b]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
prod = Prod()

In [8]:

class Sin(Op):
    """Sine operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Sin requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        result_arr = np.sin(a.numpy())
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('sin')
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        grad_a = prod(back_grad, cos(a, nd_support=a._nd_support), nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"sin_grad"
        
        return [grad_a]
    
    def analytical_second_derivative(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        """Analytical second derivative: d²/dx²(sin(x)) = -sin(x)"""
        a = args[0]
        neg_sin = prod(Tensor(-1, nd_support=a._nd_support), sin(a, nd_support=a._nd_support), nd_support=a._nd_support)
        second_grad = prod(back_grad, neg_sin, nd_support=a._nd_support)
        second_grad._name = f"sin_second_grad"
        return [second_grad]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
sin = Sin()

In [9]:

class Cos(Op):
    """Cosine operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Cos requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        result_arr = np.cos(a.numpy())
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('cos')
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        neg_sin = prod(Tensor(-1, nd_support=a._nd_support), sin(a, nd_support=a._nd_support), nd_support=a._nd_support)
        grad_a = prod(back_grad, neg_sin, nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"cos_grad"
        
        return [grad_a]
    
    def analytical_second_derivative(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        """Analytical second derivative: d²/dx²(cos(x)) = -cos(x)"""
        a = args[0]
        neg_cos = prod(Tensor(-1, nd_support=a._nd_support), cos(a, nd_support=a._nd_support), nd_support=a._nd_support)
        second_grad = prod(back_grad, neg_cos, nd_support=a._nd_support)
        second_grad._name = f"cos_second_grad"
        return [second_grad]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
cos = Cos()

In [10]:

class Sum(Op):
    """Sum operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Sum requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        axis = kwargs.get('axis', None)
        keepdims = kwargs.get('keepdims', False)
        
        result_arr = np.sum(a.numpy(), axis=axis, keepdims=keepdims, dtype=np.float64)
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('sum')
        self._axis = axis
        self._keepdims = keepdims
        self._input_shape = a.shape
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        back_grad_arr = back_grad.numpy()
        
        if self._axis is not None and not self._keepdims:
            if isinstance(self._axis, int):
                back_grad_arr = np.expand_dims(back_grad_arr, self._axis)
            else:
                for ax in sorted(self._axis, reverse=True):
                    back_grad_arr = np.expand_dims(back_grad_arr, ax)
        
        grad_arr = np.broadcast_to(back_grad_arr, self._input_shape)
        grad_a = Tensor(grad_arr, requires_grad=back_grad.requires_grad, nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"sum_grad"
        
        return [grad_a]


# Instancia a classe. O objeto passa a poder ser usado como uma funcao
# ⚠️ vamos chamar de my_sum porque python ja possui uma funcao sum
my_sum = Sum()

In [11]:

class Mean(Op):
    """Mean operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Mean requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        axis = kwargs.get('axis', None)
        keepdims = kwargs.get('keepdims', False)
        
        result_arr = np.mean(a.numpy(), axis=axis, keepdims=keepdims, dtype=np.float64)
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('mean')
        self._axis = axis
        self._keepdims = keepdims
        self._input_shape = a.shape
        
        if axis is None:
            self._size = a._arr.size
        else:
            axes = axis if isinstance(axis, tuple) else (axis,)
            self._size = np.prod([a.shape[i] for i in axes])
        
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        back_grad_arr = back_grad.numpy()
        
        if self._axis is not None and not self._keepdims:
            if isinstance(self._axis, int):
                back_grad_arr = np.expand_dims(back_grad_arr, self._axis)
            else:
                for ax in sorted(self._axis, reverse=True):
                    back_grad_arr = np.expand_dims(back_grad_arr, ax)
        
        grad_arr = np.broadcast_to(back_grad_arr, self._input_shape) / self._size
        grad_a = Tensor(grad_arr, requires_grad=back_grad.requires_grad, nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"mean_grad"
        
        return [grad_a]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
mean = Mean()

In [12]:

class Square(Op):
    """Square operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Square requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        result_arr = np.square(a.numpy())
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('square')
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        grad_a = prod(back_grad, prod(Tensor(2, nd_support=a._nd_support), a, nd_support=a._nd_support), nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"square_grad"
        
        return [grad_a]
    
    def analytical_second_derivative(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        """Analytical second derivative: d²/dx²(x²) = 2"""
        a = args[0]
        second_grad = prod(back_grad, Tensor(2 * np.ones_like(a._arr), nd_support=a._nd_support), nd_support=a._nd_support)
        second_grad._name = f"square_second_grad"
        return [second_grad]


# Instancia a classe. O objeto passa a poder ser usado como uma funcao
square = Square()

In [13]:

class MatMul(Op):
    """Matrix multiplication operation with correct gradient computation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 2, "MatMul requires exactly 2 arguments"
        a, b = ensure_tensor(args[0], kwargs.get('nd_support', False)), ensure_tensor(args[1], kwargs.get('nd_support', False))
        
        a_arr = a.numpy()
        b_arr = b.numpy()
        
        self._a_original_shape = a.shape
        self._b_original_shape = b.shape
        
        a_was_scalar = a_arr.size == 1
        if a_was_scalar and a_arr.ndim == 2 and a_arr.shape == (1, 1):
            pass
        elif a_arr.ndim == 1:
            a_arr = a_arr.reshape(-1, 1)
        
        b_was_1d = b_arr.ndim == 1
        if b_was_1d:
            b_arr = b_arr.reshape(-1, 1)
        
        self._a_was_scalar = a_was_scalar
        self._b_was_1d = b_was_1d
        
        if a_arr.ndim < 2:
            a_arr = a_arr.reshape(1, -1) if a_arr.size > 1 else a_arr.reshape(1, 1)
        if b_arr.ndim < 2:
            b_arr = b_arr.reshape(-1, 1) if b_arr.size > 1 else b_arr.reshape(1, 1)
        
        if a_arr.shape[-1] != b_arr.shape[0]:
            raise ValueError(f"Incompatible shapes for matmul: {a.shape} -> {a_arr.shape} @ {b.shape} -> {b_arr.shape}")
        
        result_arr = np.matmul(a_arr, b_arr)
        
        self._a_comp_shape = a_arr.shape
        self._b_comp_shape = b_arr.shape
        
        result = Tensor(result_arr, parents=[a, b], operation=self,
                       requires_grad=a.requires_grad or b.requires_grad,
                       nd_support=a._nd_support or b._nd_support)
        result._name = NameManager.new('matmul')
        
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a, b = args
        back_grad_arr = back_grad.numpy()
        
        if back_grad_arr.ndim == 1:
            back_grad_arr = back_grad_arr.reshape(-1, 1)
        elif back_grad_arr.ndim == 0:
            back_grad_arr = back_grad_arr.reshape(1, 1)
        
        a_arr = a.numpy()
        b_arr = b.numpy()
        
        if self._a_was_scalar and a_arr.shape == (1, 1):
            pass  
        elif a_arr.ndim == 1:
            a_arr = a_arr.reshape(-1, 1)
        if a_arr.ndim < 2:
            a_arr = a_arr.reshape(1, -1) if a_arr.size > 1 else a_arr.reshape(1, 1)
            
        if self._b_was_1d:
            b_arr = b_arr.reshape(-1, 1)
        if b_arr.ndim < 2:
            b_arr = b_arr.reshape(-1, 1) if b_arr.size > 1 else b_arr.reshape(1, 1)
        
        try:
            b_transpose = np.transpose(b_arr)
            
            if back_grad_arr.shape[1] != b_transpose.shape[0]:
                if back_grad_arr.size == b_transpose.shape[0]:
                    back_grad_arr = back_grad_arr.reshape(1, -1)
                elif back_grad_arr.shape[0] == b_transpose.shape[0]:
                    back_grad_arr = back_grad_arr.T
            
            grad_a_arr = np.matmul(back_grad_arr, b_transpose)
            
            a_transpose = np.transpose(a_arr)
            
            if a_transpose.shape[1] != back_grad_arr.shape[0]:
                if a_transpose.shape[0] == back_grad_arr.shape[0]:
                    grad_b_arr = np.matmul(a_transpose.T, back_grad_arr)
                elif a_transpose.size == back_grad_arr.size:
                    grad_b_arr = a_transpose * back_grad_arr
                else:
                    total_grad = np.sum(back_grad_arr)
                    grad_b_arr = np.full(b_arr.shape, total_grad / b_arr.size)
            else:
                grad_b_arr = np.matmul(a_transpose, back_grad_arr)
            
        except ValueError as e:
            print(f"MatMul gradient fallback triggered: {e}")
            
            total_grad = np.sum(back_grad_arr)
            grad_a_arr = np.full(self._a_comp_shape, total_grad / np.prod(self._a_comp_shape))
            grad_b_arr = np.full(self._b_comp_shape, total_grad / np.prod(self._b_comp_shape))
        
        try:
            if grad_a_arr.shape != self._a_original_shape:
                if grad_a_arr.size == np.prod(self._a_original_shape):
                    grad_a_arr = grad_a_arr.reshape(self._a_original_shape)
                elif grad_a_arr.size > np.prod(self._a_original_shape):
                    grad_a_arr = np.sum(grad_a_arr) / np.prod(self._a_original_shape)
                    grad_a_arr = np.full(self._a_original_shape, grad_a_arr)
                else:
                    grad_a_arr = np.full(self._a_original_shape, np.mean(grad_a_arr))
            
            if grad_b_arr.shape != self._b_original_shape:
                if grad_b_arr.size == np.prod(self._b_original_shape):
                    grad_b_arr = grad_b_arr.reshape(self._b_original_shape)
                elif grad_b_arr.size > np.prod(self._b_original_shape):
                    grad_b_arr = np.sum(grad_b_arr) / np.prod(self._b_original_shape)
                    grad_b_arr = np.full(self._b_original_shape, grad_b_arr)
                else:
                    grad_b_arr = np.full(self._b_original_shape, np.mean(grad_b_arr))
                    
        except Exception as e:
            print(f"MatMul gradient reshape failed: {e}")
            grad_a_arr = np.zeros(self._a_original_shape)
            grad_b_arr = np.zeros(self._b_original_shape)
        
        grad_a = Tensor(grad_a_arr, nd_support=a._nd_support)
        grad_b = Tensor(grad_b_arr, nd_support=b._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"matmul_grad"
        
        if b._name.startswith('in:'):
            grad_b._name = "in_grad"
        else:
            grad_b._name = f"matmul_grad"
        
        return [grad_a, grad_b]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
matmul = MatMul()

In [14]:

class Exp(Op):
    """Exponential operation"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Exp requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        
        clipped_arr = np.clip(a.numpy(), -500, 500)
        result_arr = np.exp(clipped_arr)
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('exp')
        self._output = result
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        grad_a = prod(back_grad, self._output, nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"exp_grad"
        
        return [grad_a]
    
    def analytical_second_derivative(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        """Analytical second derivative: d²/dx²(exp(x)) = exp(x)"""
        a = args[0]
        exp_val = exp(a, nd_support=a._nd_support)
        second_grad = prod(back_grad, exp_val, nd_support=a._nd_support)
        second_grad._name = f"exp_second_grad"
        return [second_grad]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
exp = Exp()

In [15]:

class ReLU(Op):
    """ReLU activation function"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "ReLU requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        
        result_arr = np.maximum(0, a.numpy())
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('relu')
        self._input = a
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        mask = (a.numpy() > 0).astype(np.float64)
        grad_a = prod(back_grad, Tensor(mask, nd_support=a._nd_support), nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"relu_grad"
        
        return [grad_a]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
relu = ReLU()

In [16]:

class Sigmoid(Op):
    """Sigmoid activation function"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Sigmoid requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        
        clipped_arr = np.clip(a.numpy(), -500, 500)
        result_arr = 1 / (1 + np.exp(-clipped_arr))
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('sigmoid')
        self._output = result
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        sig = self._output
        one_minus_sig = sub(Tensor(1, nd_support=sig._nd_support), sig, nd_support=sig._nd_support)
        sig_grad = prod(sig, one_minus_sig, nd_support=sig._nd_support)
        grad_a = prod(back_grad, sig_grad, nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"sigmoid_grad"
        
        return [grad_a]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
sigmoid = Sigmoid()

In [17]:

class Tanh(Op):
    """Hyperbolic tangent activation function"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Tanh requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        
        result_arr = np.tanh(a.numpy())
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('tanh')
        self._output = result
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        tanh_out = self._output
        tanh_squared = power_op(tanh_out, 2, nd_support=tanh_out._nd_support)
        one_minus_tanh_sq = sub(Tensor(1, nd_support=tanh_out._nd_support), tanh_squared, nd_support=tanh_out._nd_support)
        grad_a = prod(back_grad, one_minus_tanh_sq, nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"tanh_grad"
        
        return [grad_a]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
tanh = Tanh()

In [18]:

class Softmax(Op):
    """Softmax activation function"""
    def __call__(self, *args, **kwargs) -> Tensor:
        assert len(args) == 1, "Softmax requires exactly 1 argument"
        a = ensure_tensor(args[0], kwargs.get('nd_support', False))
        
        x = a.numpy()
        self._input_shape = x.shape
        if x.ndim == 1:
            x = x.reshape(-1, 1)
        elif x.shape[1] != 1 and not a._nd_support:
            raise ValueError("Softmax expects column vector or 1D array in non-nd_support mode")
        
        x_max = np.max(x, axis=0, keepdims=True)
        exp_x = np.exp(x - x_max)
        sum_exp_x = np.sum(exp_x, axis=0, keepdims=True)
        result_arr = exp_x / sum_exp_x
        
        if result_arr.shape != self._input_shape:
            result_arr = result_arr.reshape(self._input_shape)
        
        result = Tensor(result_arr, parents=[a], operation=self,
                       requires_grad=a.requires_grad, nd_support=a._nd_support)
        result._name = NameManager.new('softmax')
        self._output = result
        return result
    
    def grad(self, back_grad: Tensor, *args, **kwargs) -> List[Tensor]:
        a = args[0]
        y = self._output.numpy()
        back_grad_arr = back_grad.numpy()
        
        if y.shape != back_grad_arr.shape:
            raise ValueError(f"Gradient shape {back_grad_arr.shape} doesn't match output shape {y.shape}")
        
        if y.ndim == 1:
            y = y.reshape(-1, 1)
            back_grad_arr = back_grad_arr.reshape(-1, 1)
        elif y.ndim == 2 and y.shape[1] == 1:
            pass  
        
        y_dot_grad = y * back_grad_arr
        sum_y_dot_grad = np.sum(y_dot_grad, axis=0, keepdims=True)
        grad_a_arr = y * (back_grad_arr - sum_y_dot_grad)
        
        if grad_a_arr.shape != self._input_shape:
            grad_a_arr = grad_a_arr.reshape(self._input_shape)
        
        grad_a = Tensor(grad_a_arr, requires_grad=back_grad.requires_grad, nd_support=a._nd_support)
        
        if a._name.startswith('in:'):
            grad_a._name = "in_grad"
        else:
            grad_a._name = f"softmax_grad"
        
        return [grad_a]

# Instancia a classe. O objeto passa a poder ser usado como uma funcao
softmax = Softmax()


### ‼️ Regras e Pontos de Atenção‼️

- Vamos fazer a hipótese simplificadora que Tensores devem ser sempre matrizes. Por exemplo, o escalar 2 deve ser armazado em `_arr` como a matriz `[[2]]`. De forma similar, a lista `[1, 2, 3]` deve ser armazenada em `_arr` como em uma matriz coluna.

- Devem ser realizados `asserts` nas operações para garantir que os shapes dos operandos fazem sentido. Esta verificação também deve ser feita depois das operações que manipulam gradientes de tensores.

- Devem ser respeitados os nomes dos atributos, métodos e classes para viabilizar os testes automáticos.

- Gradientes devem ser calculados usando uma passada pelo grafo computacional.

- Os gradientes devem ser somados e não substituídos nas chamadas de  backward. Isto vai permitir que os gradientes sejam acumulados entre amostras do dataset e que os resultados sejam corretos mesmo em caso de ramificações e junções no grafo computacional.

- Lembre-se de zerar os gradientes após cada passo de gradient descent (atualização dos parâmetros).


## Testes Básicos

Estes testes avaliam se a derivada da função está sendo calculada corretamente, mas em muitos casos **não** avaliam se os gradientes backpropagados estão sendo incorporados corretamente. Esta avaliação será feita nos problemas da próxima seção.

Operador de Soma

In [19]:
# add

a = Tensor([1.0, 2.0, 3.0])
b = Tensor([4.0, 5.0, 6.0])
c = add(a, b)
d = add(c, 3.0)
d.backward()

# esperado: matrizes coluna contendo 1
print(a.grad)
print(b.grad)


Tensor([1. 1. 1.], name=add:4, shape=(3, 1))
Tensor([1. 1. 1.], name=add:5, shape=(3, 1))


Operador de Subtração

In [20]:
# sub

a = Tensor([1.0, 2.0, 3.0])
b = Tensor([4.0, 5.0, 6.0])
c = sub(a, b)
d = sub(c, 3.0)
d.backward()

# esperado: matrizes coluna contendo 1 e -1
print(a.grad)
print(b.grad)


Tensor([1. 1. 1.], name=add:9, shape=(3, 1))
Tensor([-1. -1. -1.], name=add:10, shape=(3, 1))


Operador de Produto

In [21]:
# prod

a = Tensor([1.0, 2.0, 3.0])
b = Tensor([4.0, 5.0, 6.0])
c = prod(a, b)
d = prod(c, 3.0)
d.backward()

# esperado: [12, 15, 18]^T
print(a.grad)
# esperado: [3, 6, 9]^T
print(b.grad)


Tensor([12. 15. 18.], name=add:14, shape=(3, 1))
Tensor([3. 6. 9.], name=add:15, shape=(3, 1))


Operadores trigonométricos

In [22]:
# sin e cos

a = Tensor([np.pi, 0, np.pi/2])
b = sin(a)
c = cos(a)
d = my_sum(add(b, c))
d.backward()

# esperado: [-1, 1, -1]^T
print(a.grad)

Tensor([-1.  1. -1.], name=add:23, shape=(3, 1))


In [23]:
# Sum

a = Tensor([3.0, 1.0, 0.0, 2.0])
b = add(prod(a, 3.0), a)
c = my_sum(b)
c.backward()

# esperado: [4, 4, 4, 4]^T
print(a.grad)


Tensor([4. 4. 4. 4.], name=add:30, shape=(4, 1))


In [24]:
# Mean

a = Tensor([3.0, 1.0, 0.0, 2.0])
b = mean(a)
b.backward()

# esperado: [0.25, 0.25, 0.25, 0.25]^T
print(a.grad)


Tensor([0.25 0.25 0.25 0.25], name=add:32, shape=(4, 1))


In [25]:
# Square

a = Tensor([3.0, 1.0, 0.0, 2.0])
b = square(a)

# esperado: [9, 1, 0, 4]^T
print(b)

b.backward()

# esperado: [6, 2, 0, 4]
print(a.grad)

Tensor([9. 1. 0. 4.], name=square:0, shape=(4, 1))
Tensor([6. 2. 0. 4.], name=add:34, shape=(4, 1))


In [26]:
# matmul

W = Tensor([
    [1.0, 2.0, 3.0],
    [4.0, 5.0, 6.0],
    [7.0, 8.0, 9.0]
])

v = Tensor([1.0, 2.0, 3.0])

z = matmul(W, v)

# esperado: [14, 32, 50]^T
print(z)

z.backward()

# esperado:
# [1, 2, 3]
# [1, 2, 3]
# [1, 2, 3]
print(W.grad)

# esperado: [12, 15, 18]^T
print(v.grad)


Tensor([14. 32. 50.], name=matmul:0, shape=(3, 1))
Tensor([[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]], name=add:36, shape=(3, 3))
Tensor([12. 15. 18.], name=add:37, shape=(3, 1))


In [27]:
# Exp

v = Tensor([1.0, 2.0, 3.0])
w = exp(v)

# esperado: [2.718..., 7.389..., 20.085...]^T
print(w)

w.backward()

# esperado: [2.718..., 7.389..., 20.085...]^T
print(v.grad)

Tensor([ 2.71828183  7.3890561  20.08553692], name=exp:0, shape=(3, 1))
Tensor([ 2.71828183  7.3890561  20.08553692], name=add:39, shape=(3, 1))


In [28]:
# Relu

v = Tensor([-1.0, 0.0, 1.0, 3.0])
w = relu(v)

# esperado: [0, 0, 1, 3]^T
print(w)

w.backward()

# esperado: [0, 0, 1, 1]^T
print(v.grad)

Tensor([0. 0. 1. 3.], name=relu:0, shape=(4, 1))
Tensor([0. 0. 1. 1.], name=add:41, shape=(4, 1))


In [29]:
# Sigmoid

v = Tensor([-1.0, 0.0, 1.0, 3.0])
w = sigmoid(v)

# esperado: [0.268.., 0.5, 0.731.., 0.952..]^T
print(w)

w.backward()

# esperado: [0.196..., 0.25, 0.196..., 0.045...]^T
print(v.grad)

Tensor([0.26894142 0.5        0.73105858 0.95257413], name=sigmoid:0, shape=(4, 1))
Tensor([0.19661193 0.25       0.19661193 0.04517666], name=add:43, shape=(4, 1))


In [30]:
# Tanh

v = Tensor([-1.0, 0.0, 1.0, 3.0])
w = tanh(v)

# esperado: [[-0.76159416, 0., 0.76159416, 0.99505475]^T
print(w)

w.backward()

# esperado: [0.41997434, 1., 0.41997434, 0.00986604]^T
print(v.grad)

Tensor([-0.76159416  0.          0.76159416  0.99505475], name=tanh:0, shape=(4, 1))
Tensor([0.41997434 1.         0.41997434 0.00986604], name=add:45, shape=(4, 1))


In [31]:
# Softmax

x = Tensor([-3.1, 0.5, 1.0, 2.0])
y = softmax(x)

# esperado: [0.00381737, 0.13970902, 0.23034123, 0.62613238]^T
print(y)

# como exemplo, calcula o MSE para um target vector
diff = sub(y, [1, 0, 0, 0])
sq = square(diff)
a = mean(sq)

# esperado: 0.36424932
print("MSE:", a)

a.backward()

# esperado: [-0.00278095, -0.02243068, -0.02654377, 0.05175539]^T
print(x.grad)



Tensor([0.00381737 0.13970902 0.23034123 0.62613238], name=softmax:0, shape=(4, 1))
MSE: Tensor([[0.36424932]], name=mean:1, shape=(1, 1))
Tensor([-0.00278095 -0.02243068 -0.02654377  0.05175539], name=add:50, shape=(4, 1))


## Pontos Extras

### Tarefas

- **+2 pontos**: Utilizar sobrecarga de operadores para permitir que todas as operações disponíveis aos arrays do numpy possam ser realizadas com tensores, incluindo operações que envolvam broadcasting.
  - Por exemplo, assumindo que a e b são tensores possivelmente com dimensões diferentes, devem ser possível realizar as operações a + 2, a * b, a @ b, a.max(), a.sum(axis=0).
  - Para realizar esta atividade, os atributos da classe Tensor podem ser completamente modificados, mas deve ser provido um método backward para iniciar o backpropagation.
  - Naturalmente, a regra de que tensores devem ser matrizes deve ser desconsiderada neste caso.

- **+1 ponto**: Atualizar as classes para permitir derivadas de mais alta ordem (derivadas segundas, etc.).

- **+1 ponto**: Entregar uma versão adicional do trabalho completo usando C/C++ e com foco em minimizar o tempo para realização das operações. Os casos de teste do sistema Testr também deverão ser replicados utilizando esta linguagem.

### Regras

- Só serão elegíveis para receber pontos extras os alunos que cumprirem 100% dos requisitos da parte principal do trabalho.

- Para receber os pontos extras, deverá ser agendado um horário para uma entrevista individual que abordará tanto os códigos-fonte relativos aos pontos extras quanto à parte principal do trabalho (pode acontecer redução da pontuação da parte principal do trabalho).

- Receberá os pontos extras quem responder corretamente às perguntas da entrevista. Não será atribuída pontuação parcial aos pontos extras.

## Referências

### Principais

- [Build your own pytorch](https://www.peterholderrieth.com/blog/2023/Build-Your-Own-Pytorch-1-Computation-Graphs/)
- [Build your own Pytorch - 2: Backpropagation](https://www.peterholderrieth.com/blog/2023/Build-Your-Own-Pytorch-2-Autograd/)
- [Build your own PyTorch - 3: Training a Neural Network with self-made AD software](https://www.peterholderrieth.com/blog/2023/Build-Your-Own-Pytorch-3-Build-Classifier/)
- [Pytorch: A Gentle Introduction to torch.autograd](https://docs.pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)
- [Automatic Differentiation with torch.autograd](https://docs.pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html)

### Secundárias

- [Tom Roth: Building a computational graph: part 1](https://tomroth.dev/compgraph1/)
- [Tom Roth: Building a computational graph: part 2](https://tomroth.dev/compgraph2/)
- [Tom Roth: Building a computational graph: part 3](https://tomroth.dev/compgraph3/)
- [Roger Grosse (Toronto) class on Automatic Differentiation](https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf)
- [Computational graphs and gradient flows](https://simple-english-machine-learning.readthedocs.io/en/latest/neural-networks/computational-graphs.html)
- [Colah Visual Blog: Backprop](https://colah.github.io/posts/2015-08-Backprop/)
- [Towards Data Science: Automatic Differentiation (AutoDiff): A Brief Intro with Examples](https://towardsdatascience.com/automatic-differentiation-autodiff-a-brief-intro-with-examples-3f3d257ffe3b/)
- [A Hands-on Introduction to Automatic Differentiation - Part 1](https://mostafa-samir.github.io/auto-diff-pt1/)
- [Build Your own Deep Learning Framework - A Hands-on Introduction to Automatic Differentiation - Part 2](https://mostafa-samir.github.io/auto-diff-pt1/)
