# DAY 1 - Generative Artificial Intelligence
**A simple neural network to shows how the XOR problem can be solved with neural networks when its parameters are set by hand.**

This imports the NumPy library and gives it the alias np. NumPy provides efficient array and matrix operations, which are essential for neural network computations.

In [1]:
import numpy as np

This defines a function phi(x) that acts as a simple threshold (step) activation function:
* np.greater_equal(x, 1) compares every element of x to 1 and returns:
  * True if the element is greater than or equal to 1
  * False otherwise.
* .astype(int) converts True → 1 and False → 0.

So overall: phi(x) returns 1 if x ≥ 1, and 0 otherwise.

In [2]:
def phi(x):
  return np.greater_equal(x,1).astype(int)

This defines a two-layer neural network and computes the forward propagation:
* nn(x, w1, w2) where
  * x is the input vector.
  * w1 is the weight matrix connecting the input layer to the hidden layer.
  * w2 is the weight matrix connecting the hidden layer to the output layer.

* np.dot(x, w1) performs the dot product between input x and weight matrix w1, giving the hidden layer's pre-activation values, and applies phi() applies the threshold function to get the hidden layer's activations (0 or 1).

* np.dot(h1, w1) takes the dot product of the hidden activations h1 and the second weight matrix w2, and applies phi() again to produce the output (0 or 1).

Returns the final output of the network.

In [3]:
def nn(x, w1, w2):
  h1 = phi(np.dot(x, w1))
  y = phi(np.dot(h1, w2))
  return y

These define:
* w1: a 2x3 matrix (2 input neurons → 3 hidden neurons).
* w2: a 3x1 matrix (3 hidden neurons → 1 output neuron).

In [4]:
w1 = np.array([ [1, 0.5, 0], [0, 0.5, 1] ])
w2 = np.array([[1], [-2], [1]])

Test the neural network with different inputs and see that it indeed solves the XOR problem.

In [5]:
print(nn([1,0], w1, w2))
print(nn([0,1], w1, w2))
print(nn([0,0], w1, w2))
print(nn([1,1], w1, w2))

[1]
[1]
[0]
[0]


**Training a neural network for the XOR problem so that its parameters can be estimated.**

Define an activation function that is differentiable (this is essential for computing gradients during backpropagation). The sigmoid function maps any real number to the range (0, 1).

In [6]:
def phi(x, deriv = False):
    if deriv:
        return phi(x)*(1-phi(x))
    return 1/(1+np.exp(-x))

This computes the forward propagation — i.e., how the input flows through the network to produce an output.

In [7]:
def forward_pass(x, w1, w2):
    h1 = phi(x.dot(w1))
    y = phi(h1.dot(w2))
    return h1, y

This function performs backpropagation — computing gradients of the loss with respect to the weights, and updating the weights to minimize that loss.
* gold is the target.
* lrate is the learning rate (controls the size of each step).

In [8]:
def backward_pass(x, gold, w1, w2, h1, y, lrate):
# Computes the error at the output layer (diff. between prediction and target).
    l2_error = y - gold
# Multiplies the output error by the derivative of the activation function at
# the output.This gives the gradient of the loss w.r.t. the output layer input
# (pre-activation).
    l2_deriv = l2_error*phi(y,deriv=True)
# Propagates the error backward from the output layer to the hidden layer. This
# tells us how much each hidden neuron contributed to the final error.
    l1_error = l2_deriv.dot(w2.T)
# Multiplies the hidden layer error by the derivative of the activation
# function. This gives the gradient of the loss w.r.t. the hidden layer input
# (pre-activation).
    l1_deriv = l1_error*phi(h1,deriv=True)
# Multiply the input (transposed) by error gradient to get the weight gradient
# Subtract this scaled gradient from the current weights (gradient descent step)
    w2 -= lrate*h1.T.dot(l2_deriv)
    w1 -= lrate*x.T.dot(l1_deriv)

Define the training data.

In [9]:
x = np.array([ [0, 1], [1, 1], [1, 0], [0, 0] ])
y_true = np.array([ [1], [0], [1], [0] ])

Initialize randomly model parameters in the define network and train it for some time.

In [10]:
w1 = np.random.random((2,3))
w2 = np.random.random((3,1))

for j in range(10000):
    h1, y = forward_pass(x, w1, w2)
    backward_pass(x, y_true, w1, w2, h1 , y , lrate = 1)

print(w1, '\n')
print(w2, '\n')
print(forward_pass(x, w1, w2)[1], '\n')

[[ 24.47181639   1.58626583 -22.39277136]
 [-22.9019011    1.6651065   23.97162847]] 

[[-18.70623044]
 [ 27.76155203]
 [-19.01096321]] 

[[0.9870592 ]
 [0.01078543]
 [0.98711037]
 [0.00684192]] 



**High-level code with neural network Library**

Neural network libraries like [PyTorch](https://pytorch.org/) make it much easier to define and train large-scale networks:

- libraries provide implementations of common building blocks (neural layers, activation functions, loss functions, ...). This saves development effort and increases readability.
- you only need to define the forward pass; the backward pass required to determine gradients is built automatically (autodiff).
- under the hood, building blocks have optimized implementations for different hardware (CPU/GPU) -- GPU support is crucial for large-scale neural networks.

Bring in PyTorch (torch), the neural-network layers/utilities (torch.nn), and optimization algorithms (torch.optim).

In [11]:
import torch
import torch.nn as nn
import torch.optim as optim

Define the model.
- Creates a subclass of nn.Module called Net.
- super(Net, self).\_\_init\_\_() initializes base nn.Module state.
  - self.ff1 = nn.Linear(2, 3, bias=False): first fully connected layer: input size 2 → hidden size 3, no bias. Its weight matrix has shape (3, 2).
  - self.ff2 = nn.Linear(3, 1, bias=False): second fully connected layer: hidden 3 → output 1, no bias. Weight matrix shape (1, 3).

Define the forward pass.
- Passes x through ff1, then applies the sigmoid activation element-wise (outputs in (0,1)).
- Passes the hidden activations through ff2, then another sigmoid to produce a scalar probability-like output per sample.

In [12]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.ff1 = nn.Linear(2, 3, bias=False)
        self.ff2 = nn.Linear(3, 1, bias=False)
    def forward(self, x):
        x = torch.sigmoid(self.ff1(x))
        x = torch.sigmoid(self.ff2(x))
        return x

Train the model by defining input and ouptut data.
- x: a batch of 4 input vectors with 2 features each.
- y_true: the corresponding targets (here, this matches the XOR truth table).

In [13]:
x = torch.tensor([ [0, 1], [1, 1], [1, 0], [0, 0] ],
            dtype=torch.float)
y_true = torch.tensor([ [1], [0], [1], [0] ],
            dtype=torch.float)

Instantiate the network and inspect initial output.
- Creates the model with randomly initialized weights.
- Runs a forward pass on the whole batch to see the initial predictions before training.
- Prints those initial predictions (values in (0,1) due to sigmoid).

In [14]:
net = Net()
output = net(x)
print(output)

tensor([[0.4254],
        [0.4325],
        [0.4158],
        [0.4076]], grad_fn=<SigmoidBackward0>)


Loss and optimizer.
- criterion: mean squared error loss between predictions and targets.
- optimizer: stochastic gradient descent over all model parameters with learning rate 1 (quite large, but workable in this tiny example).

In [15]:
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=1)

Training loop. For each of 5000 iterations:
- optimizer.zero_grad(): clears any accumulated gradients from the previous step (PyTorch accumulates by default).
- output = net(x): runs a forward pass to compute current predictions with the latest weights.
- loss = criterion(output, y_true): computes the MSE loss over the batch.
- loss.backward(): uses autograd to compute gradients of loss w.r.t. all parameters in net.
- optimizer.step(): applies SGD to update the weights using those gradients (one step of gradient descent).

In [16]:
for j in range(5000):
    optimizer.zero_grad()   # zero the gradient buffers
    output = net(x) # forward pass
    loss = criterion(output, y_true) # compute loss
    loss.backward()     # backward pass to get gradients
    optimizer.step()    # update parameters

Evaluate after training.
- Runs one more forward pass with the trained weights.
- Prints the final predictions, which (for XOR) should be close to [[1],[0],[1],[0]] (e.g., ~0.99 vs ~0.01).

In [17]:
output = net(x)
print(output)

tensor([[0.9313],
        [0.0682],
        [0.9155],
        [0.0503]], grad_fn=<SigmoidBackward0>)
