# Feedforward Neural Networks, Backpropagation

- 📺 **Video:** [https://youtu.be/8WhPYIWyR5g](https://youtu.be/8WhPYIWyR5g)

## Overview
Dives into the training algorithm for neural networks, namely backpropagation. It starts by describing the architecture of a feedforward neural network (also known as a multilayer perceptron) in more detail: input layer → one or more hidden layers (with nonlinear activations) → output layer.

In [None]:
import os, random
random.seed(0)
CI = os.environ.get('CI') == 'true'

## Key ideas
- The key new challenge here is how to adjust all those weights in potentially multiple layers, given an error from the output.
- The video explains backpropagation as the method to compute gradients efficiently for each weight by recursively applying the chain rule of calculus from the output layer back to the input.
- In practical terms, it might outline a simple 2-layer network: first compute the forward pass (calculate outputs), then measure the error (loss), and then propagate that error backward: compute the gradient at the output layer, then for the hidden layer by seeing how much each hidden neuron contributed to the output error, etc.
- The lecturer likely reassures that while the math can seem heavy, conceptually it's just “the credit assignment” problem - figuring out which weights were responsible for how much of the error, so we can change them accordingly.

## Demo

In [None]:
print('Try the exercises below and follow the linked materials.')

## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 4.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Multiclass lecture note](https://www.cs.utexas.edu/~gdurrett/courses/online-course/multiclass.pdf)
- [A large annotated corpus for learning natural language inference](https://www.aclweb.org/anthology/D15-1075/)
- [Authorship Attribution of Micro-Messages](https://www.aclweb.org/anthology/D13-1193/)
- [50 Years of Test (Un)fairness: Lessons for Machine Learning](https://arxiv.org/pdf/1811.10104.pdf)
- [[Article] Amazon scraps secret AI recruiting tool that showed bias against women](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G)
- [[Blog] Neural Networks, Manifolds, and Topology](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)
- [Eisenstein Chapter 3.1-3.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Dropout: a simple way to prevent neural networks from overfitting](https://dl.acm.org/doi/10.5555/2627435.2670313)
- [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167)
- [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)
- [The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://papers.nips.cc/paper/2017/hash/81b3833e2504647f9d794f7d7b9bf341-Abstract.html)


*Links only; we do not redistribute slides or papers.*