# References | 6. Text and Sequences

---

## RNNs, LSTMs, GRUs

[MIT 6.S191 (2021): Recurrent Neural Networks](https://www.youtube.com/watch?v=qjrad0V0uJE)  
More details in [`2nd-ed.chapter10_dl-for-timeseries_rnn_lstm.3.ipynb`](https://github.com/jchwenger/AI/blob/main/lectures/06.more/2nd-ed.chapter10_dl-for-timeseries_rnn_lstm.3.ipynb)

### Blog posts

[Olah, "Understanding LSTM Networks"](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)  
[Madsen, "Visualizing memorization in RNNs", Distill](https://distill.pub/2019/memorization-in-rnns/)  
[Wikipedia](https://en.wikipedia.org/wiki/Long_short-term_memory)

### Tutorials

[Text generation with an RNN](https://www.tensorflow.org/text/tutorials/text_generation)  
[TensorFlow Addons Networks : Sequence-to-Sequence NMT with Attention Mechanism](https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt)

### Papers / Courses

An [in-depth survey from CS230, Stanford University](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks)

#### RNNs

[Rumelhart, David E; Hinton, Geoffrey E, and Williams, Ronald J (Sept. 1985), "Learning internal representations by error propagation"](https://apps.dtic.mil/dtic/tr/fulltext/u2/a164453.pdf)  
[Jordan, Michael I. (May 1986), "Serial order: a parallel distributed processing approach"](https://www.osti.gov/biblio/6910294)

#### LSTMs

[Sepp Hochreiter, JÃ¼rgen Schmidhuber, "Long Short-Term Memory"](https://arxiv.org/abs/2105.06756)

#### GRUs

[Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio, "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches"](https://arxiv.org/abs/1409.1259)  
[Ralf C. Staudemeyer, Eric Rothstein Morris, "Understanding LSTM -- a tutorial into Long Short-Term Memory Recurrent Neural Networks", arXiv](https://arxiv.org/abs/1909.09586)  

---

## Text encoding, Unicode

A good reference: [John Sturtz, "Strings and Character Data in Python", Real Python](https://realpython.com/python-strings/)  
[Characters, Symbols and the Unicode Miracle - Computerphile](https://www.youtube.com/watch?v=MijmeoH9LT4)

---

## NLP,  Word embeddings

See [Huggingface's NLP course](https://huggingface.co/learn/nlp-course/chapter1/1)  
Tutorial on [Word embeddings](https://www.tensorflow.org/text/guide/word_embeddings) (3D embeddings with TensorBoard!)  
[Stanford CS224N: NLP with Deep Learning | Winter 2021 | Lecture 1 - Intro & Word Vectors](https://www.youtube.com/watch?v=rmVRLeJRkl4)

### NLP libraries & tools in Python

#### NLTK: the Natural Language Toolkit

NLTK is a leading platform for building Python programs to work with human language data ([website](https://www.nltk.org/)).

#### Gensim

Perhaps the best dedicated library for word vectors & similar text processing tools ([website](https://radimrehurek.com/gensim/index.html)).

#### spaCy

A a free open-source library for Natural Language Processing in Python ([website](https://spacy.io/)).

---

## Transformers & Attention

[3Blue1Brown, Large Language Models explained briefly  | DL5](https://www.youtube.com/watch?v=LPZh9BOjkQs&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&index=5) (and the rest of the playlist)   
[MIT 6.S191, Recurrent Neural Networks, Transformers, and Attention (2025)](https://www.youtube.com/watch?v=GvezxUdLrEk)  
[MIT 6.S191, Recurrent Neural Networks, Transformers, and Attention (2024)](https://www.youtube.com/watch?v=dqoEU9Ac3ek)  
[MIT 6.S191, Recurrent Neural Networks, Transformers, and Attention (2023)](https://www.youtube.com/watch?v=ySEx_Bqxvvo)  
[MIT 6.S191, Recurrent Neural Networks, Transformers, and Attention (2022)](https://www.youtube.com/watch?v=QvkQ1B3FBqA) 

### Tutorials

[Classify text with BERT](https://www.tensorflow.org/text/tutorials/classify_text_with_bert) (what is done below, without building the net)  
[Neural machine translation with a Transformer and Keras](https://www.tensorflow.org/text/tutorials/transformer) (building the net from scratch, on another task)  
More in the notebook [`2nd-ed.chapter11_part04_sequence_to_sequence_learning.ipynb`](https://github.com/jchwenger/AI/blob/main/lectures/06.more/2nd-ed.chapter11_part04_sequence_to_sequence_learning.ipynb)  
[Huggingface's NLP course](https://huggingface.co/learn/nlp-course/chapter1/1)

### References

[Lucas Beyer, "Transformers"](https://docs.google.com/presentation/d/1ZXFIhYczos679r70Yu8vV9uO6B1J0ztzeDxbnBxD1S0/edit)  
[Jay Alammar, "The Illustrated Transformer"](https://jalammar.github.io/illustrated-transformer/)  
[Vaswani et al, "Attention Is All You Need"](https://arxiv.org/abs/1706.03762)  
[Tensor2Tensor Colab](https://colab.research.google.com/github/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/hello_t2t.ipynb)  
[Peter Bloem, "Transformers From Scratch"](https://peterbloem.nl/blog/transformers) (in PyTorch!)  
[The Annotated Transformer](http://nlp.seas.harvard.edu/annotated-transformer/) (same)  
[BertViz, Visualize Attention in NLP Models](https://github.com/jessevig/bertviz)

[Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 7 - Translation, Seq2Seq, Attention](https://www.youtube.com/watch?v=wzfWHP6SXxY)  
[Stanford CS224n NLP with Deep Learning | 2023 | Lecture 8 - Self-Attention and Transformers](https://www.youtube.com/watch?v=LWMzyfvuehA)  
[Vision Transformers (ViT) Explained + Fine-tuning in Python](https://www.youtube.com/watch?v=qU7wO02urYU)

---

### A bit of history: Sequence to sequence modelling as a precursor to Transformers

A generic, powerful learning framework for many NLP problems (including machine translation): encode a source sequence first, then use that information to generate a target sequence.

<!-- <img style="height:700px" src="images/nlp/stanford.seq2seq.png"> -->
<img style="height:700px" src="https://github.com/jchwenger/AI/blob/main/lectures/06/images/nlp/stanford.seq2seq.png?raw=true">

<small>[Chris Manning, CS224n, Stanford](https://web.stanford.edu/class/cs224n/index.html), [lecture 7](https://web.stanford.edu/class/cs224n/slides/cs224n-2022-lecture07-nmt.pdf)  
The original paper: ["Sequence to Sequence Learning with Neural Networks"](https://arxiv.org/abs/1409.3215)</small>

#### Sequence to sequence with attention

However, the previous setup needed to encode **everything** in that last stage. A real translator would look up the source target while translating...

<!-- <img src="images/nlp/seq2seq-nmt-model-fast.gif"> -->
<img src="https://github.com/jchwenger/AI/blob/main/lectures/06/images/nlp/seq2seq-nmt-model-fast.gif?raw=true">

<small>[Google seq2seq documentation](https://google.github.io/seq2seq/)  
And the paper: ["Neural Machine Translation by Jointly Learning to Align and Translate"](https://arxiv.org/abs/1409.0473)</small>

#### The Transformer: who *needs* RNNs anyway??

<!-- <img src="images/transformer/apply_the_transformer_to_machine_translation.gif"> -->
<img src="https://github.com/jchwenger/AI/blob/main/lectures/06/images/transformer/apply_the_transformer_to_machine_translation.gif?raw=true">

<small>["Neural machine translation with a Transformer and Keras", TensorFlow](https://www.tensorflow.org/text/tutorials/transformer)  
The paper: [Vaswani et al, "Attention Is All You Need"](https://arxiv.org/abs/1706.03762)</small>