Sure, I'll provide a full example of using gradient descent to train a simple artificial neural network (ANN) with one ReLU-activated neuron. In this example, we'll implement a single-variable regression task. We'll create a network with one input neuron, one hidden layer with one ReLU-activated neuron, and one output neuron.

**Step 1: Initialization**

Let's initialize the network's parameters:

- Weight of the hidden neuron (\(w\)): Initialize randomly, e.g., \(w = 0.5\).
- Bias of the hidden neuron (\(b\)): Initialize randomly, e.g., \(b = 0.2\).
- Weight of the output neuron (\(v\)): Initialize randomly, e.g., \(v = 0.3\).
- Bias of the output neuron (\(c\)): Initialize randomly, e.g., \(c = 0.1\).

Hyperparameters:

- Learning rate (\(\alpha\)): Choose a suitable learning rate, e.g., \(\alpha = 0.01\).

**Step 2: Forward Pass**

For each training example, perform the forward pass through the network:

- Input (\(x\)): The input data.
- Hidden layer output (\(h\)): Apply the ReLU activation function.

   \[h = \text{ReLU}(w \cdot x + b)\]

- Output (\(y\)): Compute the network's output.

   \[y = v \cdot h + c\]

**Step 3: Compute the Loss**

Calculate the loss using a loss function, such as mean squared error (MSE):

- Target output (\(y_{\text{target}}\)): The desired output for the given input.

   \[J(\theta) = \frac{1}{2}(y - y_{\text{target}})^2\]

**Step 4: Backpropagation**

Calculate the gradients of the loss with respect to the parameters (\(w\), \(b\), \(v\), \(c\)) using backpropagation:

- Compute the gradient of the loss with respect to the output (\(\frac{\partial J}{\partial y}\)) and use the chain rule to compute the gradients for the other parameters.

   \(\frac{\partial J}{\partial v} = \frac{\partial J}{\partial y} \cdot h\)

   \(\frac{\partial J}{\partial c} = \frac{\partial J}{\partial y}\)

   \(\frac{\partial J}{\partial h} = \frac{\partial J}{\partial y} \cdot v\)

   \(\frac{\partial J}{\partial w} = \frac{\partial J}{\partial h} \cdot \frac{\partial h}{\partial(w \cdot x + b)} \cdot x\)

   \(\frac{\partial J}{\partial b} = \frac{\partial J}{\partial h} \cdot \frac{\partial h}{\partial(w \cdot x + b)}\)

- Update the parameters using gradient descent:

   - Update \(v\) and \(c\) using the computed gradients and the learning rate (\(\alpha\)).

   - Update \(w\) and \(b\) using the computed gradients and the learning rate (\(\alpha\)).

Repeat steps 2-4 for a specified number of iterations or until the loss converges to a minimum. This process will train the network to make accurate predictions for the given input data. The choice of learning rate and the number of iterations can significantly affect the training process, and tuning these hyperparameters is an important part of training neural networks effectively.

In [17]:
import numpy as np
import matplotlib.pyplot as plt
# Initialize parameters
w = 0.5
b = 0.2
v = 0.3
c = 0.1
learning_rate = 0.01

# Training data
x = 1  # Input feature
y_target = 3# Target output

# Number of training iterations
num_iterations = 10

# Gradient Descent
for i in range(num_iterations):
    # Forward pass
    h = max(0, w * x + b)  # ReLU activation
    y_pred = v * h + c
    
    # Compute the cost (MSE)
    cost = 0.5 * (y_pred - y_target)**2
    
    # Backpropagation
    # Compute gradients
    dv = h
    dc = 1
    dh = v
    
    if w * x + b > 0:
        dw = x * v
        db = v
    else:
        dw = 0
        db = 0
    
    # Update parameters using gradient descent
    w =w- learning_rate * dw
    b =b- learning_rate * db
    v =v- learning_rate * dv
    c =c- learning_rate * dc
    
    # Print progress
    print(f"Iteration {i}: Cost = {cost:.4f}, Predicted Output = {y_pred:.2f}")
    print("\nTrained Parameters:")
    print(f"w {w}: b = {b:.4f}, v = {v:.2f}, c = {v:.2f}")



Iteration 0: Cost = 3.6180, Predicted Output = 0.31

Trained Parameters:
w 0.497: b = 0.1970, v = 0.29, c = 0.29
Iteration 1: Cost = 3.6630, Predicted Output = 0.29

Trained Parameters:
w 0.49407: b = 0.1941, v = 0.29, c = 0.29
Iteration 2: Cost = 3.7078, Predicted Output = 0.28

Trained Parameters:
w 0.4912094: b = 0.1912, v = 0.28, c = 0.28
Iteration 3: Cost = 3.7524, Predicted Output = 0.26

Trained Parameters:
w 0.488417614: b = 0.1884, v = 0.27, c = 0.27
Iteration 4: Cost = 3.7968, Predicted Output = 0.24

Trained Parameters:
w 0.48569406988: b = 0.1857, v = 0.27, c = 0.27
Iteration 5: Cost = 3.8411, Predicted Output = 0.23

Trained Parameters:
w 0.4830382092828: b = 0.1830, v = 0.26, c = 0.26
Iteration 6: Cost = 3.8853, Predicted Output = 0.21

Trained Parameters:
w 0.480449487499576: b = 0.1804, v = 0.25, c = 0.25
Iteration 7: Cost = 3.9293, Predicted Output = 0.20

Trained Parameters:
w 0.47792737335820856: b = 0.1779, v = 0.25, c = 0.25
Iteration 8: Cost = 3.9732, Predicted Ou

Certainly! Let's walk through a complete example of training a neural network with a single ReLU (Rectified Linear Unit) neuron using gradient descent. In this example, we'll create a simple artificial neural network (ANN) with one input feature, one hidden layer containing one ReLU neuron, and one output neuron.

### Neural Network Architecture:
- **Input Layer**: One neuron (\(x\)).
- **Hidden Layer**: One ReLU neuron (\(h\)) with weight (\(w\)) and bias (\(b\)).
- **Output Layer**: One neuron (\(y\)) with weight (\(v\)) and bias (\(c\)).

### Forward Pass:
The output \(y\) of the neural network is calculated as follows:

\[h = \text{ReLU}(w \cdot x + b)\]
\[y = v \cdot h + c\]

### Cost Function:
Let's use Mean Squared Error (MSE) as the cost function:

\[J(w, b, v, c) = \frac{1}{2}(y - y_{\text{target}})^2\]

### Gradient Descent Updates:
The gradients of the cost function with respect to the parameters (\(w\), \(b\), \(v\), \(c\)) are computed using backpropagation, and the parameters are updated using the gradient descent algorithm:

\[w = w - \alpha \cdot \frac{\partial J}{\partial w}\]
\[b = b - \alpha \cdot \frac{\partial J}{\partial b}\]
\[v = v - \alpha \cdot \frac{\partial J}{\partial v}\]
\[c = c - \alpha \cdot \frac{\partial J}{\partial c}\]

Here's how you can implement and train this simple ReLU neural network using gradient descent in Python:

```python
# Initialize parameters
w = 0.5
b = 0.2
v = 0.3
c = 0.1
learning_rate = 0.1

# Training data
x = 1.0  # Input feature
y_target = 3.0  # Target output

# Number of training iterations
num_iterations = 100

# Gradient Descent
for i in range(num_iterations):
    # Forward pass
    h = max(0, w * x + b)  # ReLU activation
    y_pred = v * h + c
    
    # Compute the cost (MSE)
    cost = 0.5 * (y_pred - y_target)**2
    
    # Backpropagation
    # Compute gradients
    dv = h
    dc = 1
    dh = v
    
    if w * x + b > 0:
        dw = x * v
        db = v
    else:
        dw = 0
        db = 0
    
    # Update parameters using gradient descent
    w -= learning_rate * dw
    b -= learning_rate * db
    v -= learning_rate * dv
    c -= learning_rate * dc
    
    # Print progress
    if i % 10 == 0:
        print(f"Iteration {i}: Cost = {cost:.4f}, Predicted Output = {y_pred:.2f}")

# Final trained parameters
print("\nTrained Parameters:")
print(f"w = {w:.2f}")
print(f"b = {b:.2f}")
print(f"v = {v:.2f}")
print(f"c = {c:.2f}")
```

In this example, the neural network is trained for 100 iterations using gradient descent. You can adjust the number of iterations, learning rate, and the initial parameters to see how the network's performance changes during training. The ReLU activation function is applied to the hidden layer output during the forward pass.