# **Neural Networks and Backpropagation: An In-Depth Exploration of Multivariate Functions**

Author:

Jorge Iván Padilla Buriticá, PhD.

Contact:

Email: jorgeivanpb@gmail.com

**Contents:**

This notebook explores detailed exercises involving forward and backward propagation using different multivariate functions. Each exercise covers mathematical analysis, gradient computation, and practical gradient descent applications for neural network training.

## Multivariate Functions and Neural Network Computation (Forward and Backward Propagation)

In this notebook, we illustrate two examples of multivariate functions that demonstrate the concepts of forward propagation and backpropagation.

### Forward Propagation
Forward propagation is the computational process in which input values pass through a network or computational graph to produce output predictions. The function output is calculated by applying operations layer by layer.

### Backpropagation
Backpropagation is a method used primarily in training neural networks, allowing the network to adjust its internal parameters (weights) by calculating gradients of the loss function with respect to these parameters. It involves propagating error gradients backward through the network.

### Examples

#### Example 1: Linear-Combination Function

- **Function:**


   $$
   f(x, y, z) = (x + y) \times z
   $$

- **Forward propagation:** Compute the intermediate and final outputs using provided inputs.
- **Backpropagation:** Compute gradients of the function with respect to each input variable \( (x, y, z) \).

#### Example 2: Sigmoid Activation Function

- **Function:**

$$
f(w,x) = \frac{1}{1 + e^{-(w_0 x_0 + w_1 x_1 + w_2)}}
$$

- **Forward propagation:** Calculate the output using the sigmoid activation based on given inputs and weights.
- **Backpropagation:** Determine the gradients with respect to each weight \( w_0, w_1, w_2 \) to measure sensitivity.

### Interpretation of Outputs and Gradients
- **Output Interpretation (Forward Propagation):** The outputs represent predictions or calculated values from the given inputs based on the computational graph.
- **Gradient Interpretation (Backward Propagation):** Gradients indicate the sensitivity of the function’s output with respect to changes in each input or parameter. A higher absolute gradient value implies greater sensitivity, guiding how parameter updates should be adjusted during optimization.



In [6]:
# Implementation of the multivariate function f(x, y, z) = (x + y) * z
# with forward and backward propagation

# Forward propagation
def forward(x, y, z):
    q = x + y
    f = q * z
    return f, q

# Backward propagation
def backward(q, z):
    # partial derivatives
    df_dz = q          # ∂f/∂z = q
    df_dq = z          # ∂f/∂q = z

    # ∂q/∂x = 1, ∂q/∂y = 1
    df_dx = df_dq * 1  # ∂f/∂x = z
    df_dy = df_dq * 1  # ∂f/∂y = z

    return df_dx, df_dy, df_dz

# Example with provided values:
x, y, z = -2, 5, -4

# Forward propagation
f, q = forward(x, y, z)
print(f"Forward propagation: f = {f}")
print(f"Forward propagation: q = {q}")

# Backward propagation
df_dx, df_dy, df_dz = backward(q, z)
print(f"Backward propagation (gradients):")
print(f"df/dx = {df_dx}")
print(f"df/dy = {df_dy}")
print(f"df/dz = {df_dz}")




Forward propagation: f = -12
Forward propagation: q = 3
Backward propagation (gradients):
df/dx = -4
df/dy = -4
df/dz = 3


### Interpretation of Forward and Backward Propagation Results (Exercise 01):

#### Function:
$$
f(x, y, z) = (x + y) \times z   = q \times z
$$

#### Forward Propagation Results:
- **Intermediate Calculation:**
$$
q = 3
$$
This value indicates the intermediate result, \( q = x + y \). With the given inputs, this sum equals **3**.
  
- **Final Output:**  
$$
f = -12
$$
The final result is calculated by multiplying
$$
f = q \times z = -12
$$

#### Backward Propagation (Gradients) Interpretation:
The gradients illustrate the sensitivity of the output to small changes in each input variable:

- **Gradient with respect to $ x $:
$$
 \frac{\partial f}{\partial x} = -4
 $$

Increasing $x$ slightly results in decreasing the output $f$ by approximately 4 times the increment.

- **Gradient with respect to $y$:
$$
\frac{\partial f}{\partial y} = -4
$$

Similarly, increasing $ y $ slightly will also decrease the output $ f $ by roughly 4 times that increment. Thus, $x$ and $ y$ equally impact the output.

- **Gradient with respect to $ z$**:
$$
  \frac{\partial f}{\partial z} = 3
 $$
An incremental increase in $z$ will increase the output $f$ approximately 3 times that increment, indicating a direct proportional relationship.

These gradients help identify how sensitive the function output is to each input, aiding in optimization and parameter adjustments during model training.


## How to Use Gradient Values for Updating Weights in Neural Networks (Backpropagation)

In neural network training, gradients computed through **backpropagation** are essential for updating the network's weights. These gradients represent the sensitivity of the output with respect to each weight and guide the adjustments needed to minimize the error or loss function.

### General Weight Update Rule (Gradient Descent)
The general formula for updating weights using gradients is given by:

$$
w_{\text{new}} = w_{\text{old}} + \Delta w \quad\text{where}\quad \Delta w = -\alpha \frac{\partial E}{\partial w}
$$

- $ w_{\text{old}} $: Current value of the weight.
- $ \alpha$: Learning rate, a small positive constant (typically between 0.001 and 0.1).
- Gradient: The partial derivative of the loss function with respect to the weight.

### Example Using Gradients in our function:
Consider the function:
$$
f(x, y, z) = (x + y) \times z
$$
Suppose we obtained the following gradients:

- $\frac{\partial f}{\partial x} = -4$
- $\frac{\partial f}{\partial y} = -4$
- $\frac{\partial f}{\partial z} = 3$

### Practical Example of Weight Updates:

Let's assume the following initial weights (or inputs, considered analogous to network weights):
- $w_x = 2$, $w_y = 5$, $w_z = -4$

Using a learning rate $(\alpha$) of 0.1, we update the weights as follows:

| Weight | Current Value | Gradient | Update Calculation                         | Updated Value |
|--------|---------------|----------|--------------------------------------------|---------------|
| $w_x$| 2             | -4       | $2 - (0.1 \times -4) = 2 + 0.4$        | **2.4**       |
| $w_y$| 5             | -4       | $5 - (0.1 \times -4) = 5 + 0.4$        | **5.4**       |
| $w_z$| -4            | 3        | $-4 - (0.1 \times 3) = -4 - 0.3$       | **-4.3**      |

### Interpretation:
- **Negative gradient:** Suggests increasing the weight will decrease the error or loss.
- **Positive gradient:** Suggests decreasing the weight will reduce the error or loss.

This iterative adjustment using gradients systematically guides the neural network towards optimal weight values, minimizing the loss function and enhancing overall model performance.


## Activation Functions in Neural Networks

### Overview
Activation functions play a crucial role in neural network modeling by introducing non-linear transformations. These functions determine the output of a neuron, influencing the network's capability to learn and generalize complex data patterns.

### Importance of Activation Functions
Activation functions are important because they:
- Introduce non-linearity, enabling neural networks to approximate complex, non-linear relationships.
- Determine neuronal firing based on inputs, guiding decision-making within the network.
- Facilitate effective gradient flow during backpropagation, crucial for accurate training and convergence.

### Activation Function in the Provided Example
Consider the following example function:

$$
f(x, y, z) = (x + y) \times z
$$

In this example, the activation function implicitly used is a **linear activation** function:

$$
g(q) = q \quad \text{with} \quad q = (x + y)
$$

### Characteristics of the Linear Activation Function
- **Linear behavior:** Does not alter input data beyond scaling, offering no non-linear transformation.
- **Limitations:** Unable to capture complex, non-linear relationships between variables effectively.
- **Applications:** Suitable primarily for regression tasks or as the final output layer in neural networks when continuous outputs are needed.

### Modifying the Activation Function
To enhance the capability of this model, a non-linear activation function could be employed. Common choices include:

- **Sigmoid:**
  $$
  g(q) = \frac{1}{1 + e^{-q}}
  $$

- **Rectified Linear Unit (ReLU):**
  $$
  g(q) = \text{max}(0, q)
  $$

- **Hyperbolic Tangent (Tanh):**
  $$
  g(q) = \frac{e^{q} - e^{-q}}{e^{q} + e^{-q}}
  $$

### Example of Activation Function Modification
You can easily integrate a non-linear activation function as follows:

- Original computation:
  $ q = x + y $

- Modified with sigmoid activation:
  $ q = \frac{1}{1 + e^{-(x+y)}} $

### Conclusion
Initially, the provided example implicitly uses a linear activation function. Introducing a non-linear activation can significantly enhance the neural network’s ability to model complex and realistic data scenarios, facilitating improved learning outcomes.




In [9]:
import numpy as np

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Forward propagation
def forward(w, x):
    z = np.dot(w, x)
    f = sigmoid(z)
    return f, z

# Backward propagation (gradient calculation)
def backward(f, z, x):
    # Derivative of sigmoid
    df_dz = f * (1 - f)

    # Gradient with respect to weights
    df_dw = df_dz * x
    return df_dw

# Example inputs
w = np.array([2, -3, -3])
x = np.array([-1, -2, 1])  # x2 is assumed to be 1 for bias

# Forward propagation
f, z = forward(w, x)
print(f"Forward propagation result: f = {f}")

# Backward propagation
grad_w = backward(f, z, x)
print(f"Backward propagation gradients: df/dw = {grad_w}")


Forward propagation result: f = 0.7310585786300049
Backward propagation gradients: df/dw = [-0.19661193 -0.39322387  0.19661193]


## Activation Functions and Gradient Computation (Exercise 2)

### Overview of the Exercise
This exercise demonstrates forward and backward propagation through a neural network using a sigmoid activation function. The purpose is to understand how to compute predictions, gradients, and update weights effectively.

### Sigmoid Activation Function
The sigmoid function is defined as:

$$
g(z) = \frac{1}{1 + e^{-z}}
$$

### Importance of the Sigmoid Function:
- Introduces non-linearity, enabling the network to capture complex patterns.
- Constrains outputs between 0 and 1, making it suitable for binary classification tasks.

### Forward Propagation Explanation
Given weights $ w $ and input values $x $, forward propagation computes:

$$
z = w_0 x_0 + w_1 x_1 + w_2 x_2
$$

Then, applies the sigmoid activation to obtain the output $ f $:

$$
f = \frac{1}{1 + e^{-z}}
$$

### Interpretation of Forward Propagation
- **$f$**: The predicted probability (between 0 and 1) indicating the network's output based on the inputs and current weights.
- **$z$**: The linear combination of inputs and weights before activation.

### Backward Propagation Explanation (Gradient Calculation)
Backward propagation calculates gradients to measure how the output changes with respect to weights, enabling the network to adjust weights effectively.

- Gradient of sigmoid:
  $$
  \frac{\partial f}{\partial z} = f(1 - f)
  $$

- Gradient with respect to weights:
  $$
  \frac{\partial f}{\partial w} = \frac{\partial f}{\partial z} \cdot x
  $$

### Interpretation of Gradients
The computed gradients indicate how sensitive the network’s output is to each weight. Larger gradients imply greater influence and thus a stronger adjustment during updates.

### Example Results (Your Exercise)
- Forward propagation result (predicted output): $ f \approx 0.731 $ indicates a high activation given the current inputs.
- Gradients: Indicate the sensitivity of output concerning each weight, determining adjustment direction and magnitude:

$$
\frac{\partial f}{\partial w} = [df/dw_0, df/dw_1, df/dw_2]
$$

### Updating Weights Using Gradients
The gradients help update the weights through gradient descent:

$$
w_{\text{new}} = w_{\text{old}} - \alpha \times \frac{\partial f}{\partial w}
$$

- $ \alpha $ (Learning rate): Typically a small value controlling the step size.
- Negative gradient: Indicates the weight should increase.
- Positive gradient: Indicates the weight should decrease.

### Practical Example (Weight Update)
Given your computed gradients and an example learning rate (e.g., 0.1), weights are updated:

$$
w_{\text{new}} = [w_0, w_1, w_2] - 0.1 \times \frac{\partial f}{\partial w}
$$

### Conclusion
This exercise demonstrates clearly how forward and backward propagation work together to predict outputs and iteratively adjust weights, ensuring improved performance of neural network models.



In [None]:
# Proposed exercises on forward and backward propagation

## Exercises for Neural Networks Using Forward and Backward Propagation

### **Exercise 1**

**Function:**
$$
f(x, y, z) = \frac{x + y}{z}
$$

#### Instructions:
- Implement forward propagation to compute $ f$.
- Derive and implement backward propagation to compute gradients with respect to each input $ x, y, z $.
- Use these gradients to update weights using gradient descent.

#### Example Inputs:
- $x = 3, \quad y = 2, \quad z = 1 $

#### Mathematical Analysis Questions:
- How do gradients behave as $ z $ approaches zero?
- What implications do these gradient behaviors have on network stability during training?

### **Exercise 2**
**Function:**

$$
f(x, y, z) = x e^{y} + \sin(z)
$$

#### Instructions:
- Compute forward propagation to determine the output.
- Calculate backward propagation gradients explicitly for each variable $ x, y, z $.
- Demonstrate weight updates using gradient descent.

#### Example Inputs:
- $$
 x = 1.5, \quad y = -1, \quad z = \frac{\pi}{2}
 $$

#### Mathematical Analysis Questions:
- Which input has the greatest effect on the function's output based on gradient magnitudes?
- How do exponential and trigonometric functions impact gradient stability?

### **Exercise 3:**
**Function:**

$$
f(x, y) = \log(x^{2} + y^{2} + 1)
$$

#### Instructions:
- Perform forward propagation for output calculation.
- Compute backward propagation gradients explicitly for \( x \) and \( y \).
- Apply computed gradients in gradient descent to update the inputs or weights.

#### Example Inputs:
- $$
 x = 2, \quad y = 1
 $$

#### Mathematical Analysis Questions:
- How do gradients behave as inputs become very large or approach zero?
- Discuss techniques for maintaining numerical stability during backpropagation with logarithmic functions.

---

### General Gradient Descent Update Rule (Common to All Exercises):

$$
w_{\text{new}} = w_{\text{old}} - \alpha \frac{\partial f}{\partial w}
$$

- $ \alpha $: Learning rate (typically a small positive value).
- Interpretation: Gradients guide weight updates, improving the neural network's performance by minimizing errors.

