Implementing a Custom Dense Layer in Python
You are provided with a base Layer class that defines the structure of a neural network layer. Your task is to implement a subclass called Dense, which represents a fully connected neural network layer. The Dense class should extend the Layer class and implement the following methods:

Initialization (__init__):

Define the layer with a specified number of neurons (n_units) and an optional input shape (input_shape).
Set up placeholders for the layer's weights (W), biases (w0), and optimizers.
Weight Initialization (initialize):

Initialize the weights W using a uniform distribution with a limit of 1 / sqrt(input_shape[0]), and bias w0 should be set to zero.
Initialize optimizers for W and w0.
Parameter Count (parameters):

Return the total number of trainable parameters in the layer, which includes the parameters in W and w0.
Forward Pass (forward_pass):

Compute the output of the layer by performing a dot product between the input X and the weight matrix W, and then adding the bias w0.
Backward Pass (backward_pass):

Calculate and return the gradient with respect to the input.
If the layer is trainable, update the weights and biases using the optimizer's update rule.
Output Shape (output_shape):

Return the shape of the output produced by the forward pass, which should be (self.n_units,).
Objective:
Extend the Layer class by implementing the Dense class to ensure it functions correctly within a neural network framework.

Example:
Input:
# Initialize a Dense layer with 3 neurons and input shape (2,)
dense_layer = Dense(n_units=3, input_shape=(2,))

# Define a mock optimizer with a simple update rule
class MockOptimizer:
    def update(self, weights, grad):
        return weights - 0.01 * grad

optimizer = MockOptimizer()

# Initialize the Dense layer with the mock optimizer
dense_layer.initialize(optimizer)

# Perform a forward pass with sample input data
X = np.array([[1, 2]])
output = dense_layer.forward_pass(X)
print("Forward pass output:", output)

# Perform a backward pass with sample gradient
accum_grad = np.array([[0.1, 0.2, 0.3]])
back_output = dense_layer.backward_pass(accum_grad)
print("Backward pass output:", back_output)
Output:
Forward pass output: [[-0.00655782  0.01429615  0.00905812]]
Backward pass output: [[ 0.00129588  0.00953634]]
Reasoning:
The code initializes a Dense layer with 3 neurons and input shape (2,). It then performs a forward pass with sample input data and a backward pass with sample gradients. The output demonstrates the forward and backward pass results.

In [None]:
import copy
import math
import numpy as np

class Dense(Layer):
    def __init__(self, n_units, input_shape=None):
        self.n_units = int(n_units)
        self.input_shape = input_shape  # tuple like (in_dim,) or None (lazy)
        self.trainable = True
        self.layer_input = None
        self.W = None
        self.w0 = None
        self.W_opt = None
        self.w0_opt = None

    def _build_if_needed(self, in_dim, optimizer):
        if self.W is not None:
            return
        limit = 1.0 / math.sqrt(in_dim)
        self.W  = np.random.uniform(-limit, limit, size=(in_dim, self.n_units))
        self.w0 = np.zeros((1, self.n_units), dtype=self.W.dtype)
        # independent optimizer states per parameter tensor
        self.W_opt  = copy.deepcopy(optimizer)
        self.w0_opt = copy.deepcopy(optimizer)

    def initialize(self, optimizer):
        if self.input_shape is None:
            # will lazily build on first forward_pass
            self.W_opt  = copy.deepcopy(optimizer)
            self.w0_opt = copy.deepcopy(optimizer)
            return
        in_dim = int(self.input_shape[0])
        self._build_if_needed(in_dim, optimizer)

    def parameters(self):
        if self.W is None or self.w0 is None:
            return 0
        return int(np.prod(self.W.shape) + np.prod(self.w0.shape))

    def forward_pass(self, X, training=True):
        # lazy init if needed
        if self.W is None:
            if self.input_shape is None:
                in_dim = X.shape[-1]
                # need an optimizer already set via initialize(...)
                if self.W_opt is None or self.w0_opt is None:
                    raise RuntimeError("Call initialize(optimizer) before the first forward_pass when input_shape is None.")
                self._build_if_needed(in_dim, self.W_opt)
        self.layer_input = X  # (batch, in_dim)
        return X.dot(self.W) + self.w0  # (batch, n_units)

    def backward_pass(self, accum_grad):
        # accum_grad: dL/dY, shape (batch, n_units)
        if self.W is None:
            raise RuntimeError("Layer not initialized.")
        W_prev = self.W  # preserve for upstream gradient
        if self.trainable:
            batch_size = accum_grad.shape[0]
            # gradients w.r.t. params
            grad_W  = self.layer_input.T.dot(accum_grad) / batch_size           # (in_dim, n_units)
            grad_w0 = np.sum(accum_grad, axis=0, keepdims=True) / batch_size    # (1, n_units)
            # parameter updates via optimizer
            self.W  = self.W_opt.update(self.W, grad_W)
            self.w0 = self.w0_opt.update(self.w0, grad_w0)
        # gradient w.r.t. input: dL/dX = dL/dY Â· W^T
        return accum_grad.dot(W_prev.T)  # (batch, in_dim)

    def output_shape(self):
        return (self.n_units,)

NameError: name 'Layer' is not defined