# Neural Net Implementation

- 📺 **Video:** [https://youtu.be/IRZCQO18QAI](https://youtu.be/IRZCQO18QAI)

## Overview
- Translate network diagrams into code by defining layers, activations, and loss functions.
- Reinforce how modular implementations ease experimentation with architecture changes.

## Key ideas
- **Layer abstraction:** encapsulate weight initialization, forward, and backward operations.
- **Activation functions:** ReLU, tanh, and sigmoid shape gradient flow.
- **Loss coupling:** classification uses cross-entropy, regression uses mean squared error.
- **Testing:** unit tests on tiny inputs confirm gradients and shapes.

## Demo
Build a minimal neural network class in NumPy with forward/backward methods and train it on XOR, mirroring the coding details in the lecture (https://youtu.be/wYD0WVrNa1I).

In [1]:
import numpy as np
from sklearn.metrics import accuracy_score

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=float)
y = np.array([0, 1, 1, 0])
y_onehot = np.eye(2)[y]

rng = np.random.default_rng(2)
W1 = rng.normal(scale=0.5, size=(2, 4))
B1 = np.zeros(4)
W2 = rng.normal(scale=0.5, size=(4, 2))
B2 = np.zeros(2)

lr = 0.5

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_grad(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

for epoch in range(1, 801):
    z1 = X @ W1 + B1
    a1 = sigmoid(z1)
    z2 = a1 @ W2 + B2
    logits = z2
    exp = np.exp(logits - logits.max(axis=1, keepdims=True))
    probs = exp / exp.sum(axis=1, keepdims=True)
    loss = -np.mean(np.sum(y_onehot * np.log(probs + 1e-8), axis=1))

    grad_logits = (probs - y_onehot) / len(X)
    grad_W2 = a1.T @ grad_logits
    grad_B2 = grad_logits.sum(axis=0)

    grad_a1 = grad_logits @ W2.T
    grad_z1 = grad_a1 * sigmoid_grad(z1)
    grad_W1 = X.T @ grad_z1
    grad_B1 = grad_z1.sum(axis=0)

    W2 -= lr * grad_W2
    B2 -= lr * grad_B2
    W1 -= lr * grad_W1
    B1 -= lr * grad_B1

    if epoch % 200 == 0:
        preds = np.argmax(probs, axis=1)
        acc = accuracy_score(y, preds)
        print(f"epoch {epoch:3d} | loss {loss:.4f} | acc {acc:.3f}")

final_logits = sigmoid(X @ W1 + B1) @ W2 + B2
preds = np.argmax(np.exp(final_logits), axis=1)
print()
print('Final predictions:', preds)


epoch 200 | loss 0.6914 | acc 0.500
epoch 400 | loss 0.6659 | acc 0.500
epoch 600 | loss 0.2218 | acc 1.000
epoch 800 | loss 0.0530 | acc 1.000

Final predictions: [0 1 1 0]


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 4.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Multiclass lecture note](https://www.cs.utexas.edu/~gdurrett/courses/online-course/multiclass.pdf)
- [A large annotated corpus for learning natural language inference](https://www.aclweb.org/anthology/D15-1075/)
- [Authorship Attribution of Micro-Messages](https://www.aclweb.org/anthology/D13-1193/)
- [50 Years of Test (Un)fairness: Lessons for Machine Learning](https://arxiv.org/pdf/1811.10104.pdf)
- [[Article] Amazon scraps secret AI recruiting tool that showed bias against women](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G)
- [[Blog] Neural Networks, Manifolds, and Topology](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)
- [Eisenstein Chapter 3.1-3.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Dropout: a simple way to prevent neural networks from overfitting](https://dl.acm.org/doi/10.5555/2627435.2670313)
- [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167)
- [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)
- [The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://papers.nips.cc/paper/2017/hash/81b3833e2504647f9d794f7d7b9bf341-Abstract.html)


*Links only; we do not redistribute slides or papers.*