In [7]:
import numpy as np

# <span style="color:blue">Define the sigmoid activation function</span>

**What is the Sigmoid Function?**
The sigmoid function maps any real-valued number to a value between 0 and 1. Its formula is:

$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$
Where:

x is the input (can be a scalar, vector, or matrix)

e is Euler’s number (approximately 2.718)

$ \sigma(x)$ is the output, always between 0 and 1

### [Sigmoid Function Graph - Math-Deep-Dives](https://github.com/progressivepull/Math-Deep-Dives/blob/main/sigmoid_function-graph.ipynb)

In [24]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# <span style="color:blue">Define the structure of the neural network</span> 
* Input layer: X1, X2, X3
* Hidden layers: Two layers with weights and biases
* Output layer: Single output node

| Layer          | Function                         |
|----------------|----------------------------------|
| Input Layer    | Receives raw data                |
| Hidden Layers  | Extract and combine features     |
| Output Layer   | Produces the final prediction/result |

# <span style="color:blue">Input Layer</span> 

### **Breakdown:**
* inputs: This is a NumPy array containing three values: [1, 0, 1].
* Each position corresponds to a variable:
    - X1 = Good waves: The first value (1) means yes, the waves are good.
    - X2 = Crowded beaches: The second value (0) means no, the beach are not empty of people.
    - X3 = Shark-free zone: The third value (1) means yes, the area is free of sharks.
* The values:
    - 1 = Yes (the condition is true)
    - 0 = No (the condition is false)

In [15]:
inputs = np.array([1, 0, 1]) 

# <span style="color:blue">Weights and Biases</span>   

# Weights for the first hidden layer 
* 3 inputs, 4 neurons in the first hidden layer
* 4 neurons in the first hidden layer: Each neuron in this layer receives input from all three input variables.

In [37]:
weights_hidden1 = np.array([
    [0.5, 0.2, 0.1, 0.4],  # Weights for X1
    [0.3, 0.8, 0.5, 0.7],  # Weights for X2
    [0.6, 0.1, 0.3, 0.9]   # Weights for X3
])

* This is a 3x4 matrix (3 rows for inputs, 4 columns for neurons).

### How to Read the Matrix:
* Rows: Each row corresponds to an input variable (X1, X2, X3).
* Columns: Each column corresponds to a neuron in the first hidden layer (Neuron 1, Neuron 2, Neuron 3, Neuron 4).

|        | Neuron 1 | Neuron 2 | Neuron 3 | Neuron 4 |
|--------|----------|----------|----------|----------|
| **X1** |   0.5    |   0.2    |   0.1    |   0.4    |
| **X2** |   0.3    |   0.8    |   0.5    |   0.7    |
| **X3** |   0.6    |   0.1    |   0.3    |   0.9    |

### What does this mean?
* For Neuron 1:      
    - Weight from X1: 0.5
    - Weight from X2: 0.3
    - Weight from X3: 0.6
* For Neuron 2:     
    - Weight from X1: 0.2
    - Weight from X2: 0.8
    - Weight from X3: 0.1
* ... and so on for each neuron.

### In Calculation:
When you multiply your input vector by this weight matrix (plus a bias, if present), you get the raw input (before activation) for each neuron in the hidden layer. For example, if your input is [1, 0, 1], the calculation for Neuron 1 would be:

```
(1 * 0.5) + (0 * 0.3) + (1 * 0.6) = 0.5 + 0 + 0.6 = 1.1
```
Repeat for each neuron using the appropriate column.

### Summary:
***weights_hidden1*** defines how strongly each input affects each neuron in the first hidden layer. Each element in the matrix is a “weight” that will be multiplied with its corresponding input value during the forward pass of the neural network.

# Biases for the first hidden layer

Biases are additional parameters in neural networks, just like weights. Each neuron in a neural network (except sometimes for the input layer) typically has its own bias.

### Why are biases needed?
* **Shift the Activation:** The bias allows the activation function of a neuron to be shifted to the left or right, which makes the network more flexible and able to fit the data better.
* **Without Bias:** If there were no bias, all activations would always pass through the origin (0,0), limiting the network’s ability to model real-world data.

### Mathematical Representation
For a single neuron:

```
output = activation(weighted_sum + bias)
```

Where:

- **weighted_sum** = w₁x₁ + w₂x₂ + ... + wₙxₙ
- **bias** is a constant added to the weighted sum
- **activation** is a function (like sigmoid, ReLU, etc.)

### Example
Suppose you have a neuron with 3 inputs and a bias:

* Inputs: [x1, x2, x3]
* Weights: [w1, w2, w3]
* Bias: b
* 
The neuron computes:

```
output = activation(w1*x1 + w2*x2 + w3*x3 + b)
```
### Visualization
Think of bias as the intercept in a linear equation (y = mx + b), where b is the intercept (bias). It allows the line (or the activation function in a neuron) to move up/down independently of the input.

### Summary
* Biases are trainable parameters in neural networks.
* They allow neurons to fit the data better by shifting the activation function.
* Every neuron typically has its own bias.


In [6]:
bias_hidden1 = np.array([0.1, 0.2, 0.3, 0.1])  

# Weights for the second hidden layer 
(4 inputs, 3 neurons in the second hidden layer)

In [7]:
weights_hidden2 = np.array([
    [0.5, 0.3, 0.6],
    [0.8, 0.2, 0.9],
    [0.4, 0.7, 0.5],
    [0.6, 0.1, 0.3]
])

# Biases for the second hidden layer

In [8]:
bias_hidden2 = np.array([0.2, 0.1, 0.3])  

# Weights for the output layer 
(3 inputs, 1 neuron in the output layer)

The **output layer** is the final layer in a neural network. Its main purpose is to produce the network’s prediction or result, based on the information processed by the previous layers (input and hidden layers).

### Key Points:
* **Final Stage:** It takes the outputs from the last hidden layer, processes them (using weights, biases, and an activation function), and produces the final result.
* **Shape:** The number of neurons in the output layer depends on the kind of problem:
    - **Regression:** 1 neuron (for a single predicted value, e.g., house price).
    - **Binary Classification:** 1 neuron (outputting a value between 0 and 1, e.g., probability of “yes” or “no”).
    - **Multi-class Classification:** 1 neuron per class (e.g., for 3 classes, 3 neurons).
* **Activation Function:** The output layer often uses a special activation function to shape the results:
    - **Regression:** Linear (no activation or just return the value).
    - **Binary Classification:** Sigmoid (to return a probability between 0 and 1).
    - **Multi-class Classification:** Softmax (to return probabilities for each class).

### Example
Suppose you have a neural network for classifying whether to "Go Surfing" or "Stay Home":

* The output layer might have 1 neuron (for binary decision).
* It takes the values from the last hidden layer, applies weights and bias, then passes the result through a sigmoid function.
* The neuron outputs a value between 0 and 1:        
    - Closer to 1 = “Go Surfing!”
    - Closer to 0 = “Stay Home.”

### Mathematically:

```
output = activation(weighted_sum_from_hidden + bias)
```

### In short:
The output layer is where your neural network “speaks”—it gives you the answer to whatever question you set it up to solve!

In [9]:
weights_output = np.array([0.7, 0.5, 0.8])

# Bias for the output layer

In [10]:
bias_output = -0.5  

# Forward pass through the first hidden layer

In [11]:
hidden1_input = np.dot(inputs, weights_hidden1) + bias_hidden1

 # Apply activation function

### Reasons to Use the Sigmoid Function:

1. Non-linearity

     - The sigmoid introduces non-linearity to the model, allowing the neural network to learn more complex patterns beyond just straight lines.

2. Output Range (0 to 1)

    - The sigmoid function squashes any input value to a range between **0 and 1**.
    - This is especially useful when you want the output to represent a probability **(e.g., probability of "yes" or "no")**.
      
3. Probability Interpretation

    - Because the output is between 0 and 1, it can be directly interpreted as the probability of belonging to a certain class (e.g., "Go Surfing" = 1, "Don't Go Surfing" = 0).
    - 
4. Smooth Gradient

   
    - The function is smooth and differentiable, which helps with the optimization process during training (using gradient descent).

### Example Usage
Suppose the output of your neuron (before activation) is 2.0:

$$
\text{sigmoid}(2.0) = \frac{1}{1 + e^{-2.0}} \approx 0.88
$$

This means the model predicts an **88% probability** for the positive class. 

### Common Usage Scenarios
* Output layer of binary classification networks
* Logistic regression models
* Sometimes in hidden layers (though ReLU is more popular there now)

### Summary:
The sigmoid activation function is applied to constrain outputs between 0 and 1, making them interpretable as probabilities and enabling neural networks to learn non-linear patterns.


In [12]:
hidden1_output = sigmoid(hidden1_input) 

# Forward pass through the second hidden layer

In [13]:
hidden2_input = np.dot(hidden1_output, weights_hidden2) + bias_hidden2

 # Apply activation function

In [14]:
hidden2_output = sigmoid(hidden2_input) 

# Forward pass through the output layer

In [15]:
output_input = np.dot(hidden2_output, weights_output) + bias_output

# Apply activation function

In [16]:
output = sigmoid(output_input)  

# Decision

In [17]:
if output > 0.5:  # Using 0.5 as the threshold
    print(f"Output: 1 (Decision: We’re going surfing!)")
else:
    print(f"Output: 0 (Decision: No surfing today.)")

Output: 1 (Decision: We’re going surfing!)


# Display the computed value

In [18]:
print(f"Output value (Y_hat): {output}")

Output value (Y_hat): 0.762182530316983


# [Go Surfing Context](./README.md)