### Neural Networks.. what are they?

Neural networks are computational models inspired by the human brain, consisting of layers of interconnected nodes, or neurons. Each neuron processes input data through weighted connections, adds a bias, and applies an activation function to produce an output. 

**Key Components**:  
- Neurons: process inputs to produce outputs, organized in layers.
- Weights: determine the strength of connections between neurons, adjusted during training.
- Biases: allow neurons to shift the activation function, increasing flexibility in modeling data.
- Activation Functions: introduce non-linearity, enabling the network to learn complex relationships.

In [6]:
import numpy as np

# Define sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# XOR dataset: inputs and outputs
X = np.array([[0, 0],[0, 1],[1, 0],[1, 1]])
y = np.array([[0], [1], [1], [0]])

# Initialize weights and biases
np.random.seed(42)  
W1 = np.random.rand(2, 2)  # Weights: input (2) to hidden (2)
b1 = np.random.rand(2)     # Biases: hidden layer
W2 = np.random.rand(2, 1)  # Weights: hidden (2) to output (1)
b2 = np.random.rand(1)     # Bias: output layer

print("Initial weights and biases:")
print("W1:", W1)
print("b1:", b1)
print("W2:", W2)
print("b2:", b2)

# Forward propagation
Z1 = X @ W1 + b1    # Linear combination for hidden layer
A1 = sigmoid(Z1)    # Activation for hidden layer
Z2 = A1 @ W2 + b2   # Linear combination for output layer
A2 = sigmoid(Z2)    # Activation for output layer
print("Predictions before training:")
print(A2)

Initial weights and biases:
W1: [[0.37454012 0.95071431]
 [0.73199394 0.59865848]]
b1: [0.15601864 0.15599452]
W2: [[0.05808361]
 [0.86617615]]
b2: [0.60111501]
Predictions before training:
[[0.7501134 ]
 [0.7740691 ]
 [0.78391515]
 [0.79889097]]


**Key Note**: Initial predictions are poor because the network hasn't learned the XOR pattern. Training would adjust weights and biases to minimize the error between predictions(`A2`) and true outputs (`Y`).

In [7]:
def tanh(x):
    return np.tanh(x)

# Forward propagation
Z1 = X @ W1 + b1            # Weighted sum for hidden layer
A1 = tanh(Z1)               # Activation for hidden layer
Z2 = A1 @ W2 + b2           # Weighted sum for output layer
A2 = sigmoid(Z2)            # Output prediction
print("Predictions with tanh hidden layer activation, sigmoid for the output layer:")
print(A2)

Predictions with tanh hidden layer activation, sigmoid for the output layer:
[[0.67789997]
 [0.767621  ]
 [0.78997617]
 [0.81174609]]


The **XOR problem** is a fundamental example in machine learning, demonstrating the limitations of linear models and the power of multi-layer neural networks. Its resolution through hidden layers and non-linear activations was a key milestone in the development of deep learning, showing that complex problems require complex models.

In [8]:
# Define activation functions
def relu(x):
    return np.maximum(0, x)

def tanh(x):
    return np.tanh(x)

# Forward pass with different activations
Z1 = X @ W1 + b1
A1_sigmoid = sigmoid(Z1)
A1_relu = relu(Z1)
A1_tanh = tanh(Z1)
print("Sigmoid activations:", A1_sigmoid)
print("ReLU activations:", A1_relu)
print("Tanh activations:", A1_tanh)

Sigmoid activations: [[0.53892573 0.53891974]
 [0.70847987 0.68019172]
 [0.62961342 0.75151503]
 [0.77946523 0.84623444]]
ReLU activations: [[0.15601864 0.15599452]
 [0.88801258 0.754653  ]
 [0.53055876 1.10670883]
 [1.2625527  1.70536731]]
Tanh activations: [[0.15476492 0.15474138]
 [0.71041072 0.63791667]
 [0.48580809 0.80289593]
 [0.85176634 0.93607668]]


In [10]:
# Define sigmoid derivative
def sigmoid_deriv(x):
    s = sigmoid(x)
    return s * (1 - s)

# Training loop
learning_rate = 0.1
epochs = 10000

for epoch in range(epochs):
    # Forward pass
    Z1 = X @ W1 + b1
    A1 = sigmoid(Z1)
    Z2 = A1 @ W2 + b2
    A2 = sigmoid(Z2)
    
    # Compute loss (mean squared error)
    loss = np.mean((A2 - y)**2)
    if epoch % 1000 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")
    
    # Backpropagation
    dA2 = 2 * (A2 - y) / len(y)  # Derivative of loss
    dZ2 = dA2 * sigmoid_deriv(Z2)
    dW2 = A1.T @ dZ2
    db2 = np.sum(dZ2, axis=0)
    
    dA1 = dZ2 @ W2.T
    dZ1 = dA1 * sigmoid_deriv(Z1)
    dW1 = X.T @ dZ1
    db1 = np.sum(dZ1, axis=0)
    
    # Update weights and biases
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1

# Test after training
predictions = sigmoid(np.dot(sigmoid(X @ W1 + b1), W2) + b2)
print("Predictions after training:")
print(predictions)

Epoch 0, Loss: 0.3247
Epoch 1000, Loss: 0.2473
Epoch 2000, Loss: 0.2406
Epoch 3000, Loss: 0.2233
Epoch 4000, Loss: 0.1960
Epoch 5000, Loss: 0.1676
Epoch 6000, Loss: 0.1206
Epoch 7000, Loss: 0.0605
Epoch 8000, Loss: 0.0304
Epoch 9000, Loss: 0.0183
Predictions after training:
[[0.10801367]
 [0.8918913 ]
 [0.89154907]
 [0.12260958]]


In [11]:
# Xavier initialization
W1 = np.random.randn(2, 2) * np.sqrt(1/2)
W2 = np.random.randn(2, 1) * np.sqrt(1/2)
print("Xavier-initialized W1:", W1)

Xavier-initialized W1: [[-0.41074287 -0.37135113]
 [-0.40402679 -0.65342524]]
