# Neural Network Visualization

- ðŸ“º **Video:** [https://youtu.be/rdohzaGa8aE](https://youtu.be/rdohzaGa8aE)

## Overview
Provides an intuitive visualization of how neural networks learn, likely drawing on Chris Olah's famous blog post on neural nets and topology In this segment, the instructor probably uses low-dimensional examples (like points in a 2D plane colored by class) to show what a neural network is doing internally. For a very simple network (one hidden layer of a few neurons), one can actually plot how the input space is transformed by the network.

In [None]:
import os, random
random.seed(0)
CI = os.environ.get('CI') == 'true'

## Key ideas
- The video might show a sequence of images: initially, the data isn't linearly separable; after the first layer's transformation, the data points in the new space are warped closer to being linearly separable; after the second layer, they are separated by a hyperplane.
- This aligns with the idea from Olah's blog: each layer of a neural network learns a representation of the data that is more linearly separable than the last For instance, the video might visualize two intertwined spirals (a classic toy problem): a neural network can gradually untangle these spirals layer by layer, as seen by plotting the intermediate activations.
- Such visuals give a strong intuition that each hidden layer applies a nonlinear transformation (stretching/squishing the space) to bring the classes into a form that a simple separator (line) can split The video may also highlight that while we can visualize this for 2D, in NLP our feature space is huge (vocabularies of thousands) and hidden layers might be high-dimensional - so we can't directly see it, but the principle holds.
- By the end, students should develop an intuition that neural networks are not black magic - they are systematically remolding the feature space to make the job easier for the final linear classifier.

## Demo

In [None]:
print('Try the exercises below and follow the linked materials.')

## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [Eisenstein 4.2](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Multiclass lecture note](https://www.cs.utexas.edu/~gdurrett/courses/online-course/multiclass.pdf)
- [A large annotated corpus for learning natural language inference](https://www.aclweb.org/anthology/D15-1075/)
- [Authorship Attribution of Micro-Messages](https://www.aclweb.org/anthology/D13-1193/)
- [50 Years of Test (Un)fairness: Lessons for Machine Learning](https://arxiv.org/pdf/1811.10104.pdf)
- [[Article] Amazon scraps secret AI recruiting tool that showed bias against women](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G)
- [[Blog] Neural Networks, Manifolds, and Topology](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)
- [Eisenstein Chapter 3.1-3.3](https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf)
- [Dropout: a simple way to prevent neural networks from overfitting](https://dl.acm.org/doi/10.5555/2627435.2670313)
- [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167)
- [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980)
- [The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://papers.nips.cc/paper/2017/hash/81b3833e2504647f9d794f7d7b9bf341-Abstract.html)


*Links only; we do not redistribute slides or papers.*