# References

In [2]:
from IPython.display import YouTubeVideo

---

## RNNs, LSTMs, GRUs

In [3]:
YouTubeVideo('qjrad0V0uJE', width=853, height=480) #  MIT 6.S191 (2021): Recurrent Neural Networks

More details in [`chapter10_dl-for-timeseries_rnn_lstm.ipynb`](https://github.com/jchwenger/AI/blob/main/6-additional-material/chapter10_dl-for-timeseries_rnn_lstm.ipynb)

### Blog posts

[Olah, "Understanding LSTM Networks"](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)  
[Madsen, "Visualizing memorization in RNNs", Distill](https://distill.pub/2019/memorization-in-rnns/)  
[Wikipedia](https://en.wikipedia.org/wiki/Long_short-term_memory)


### Tutorials

[Text generation with an RNN](https://www.tensorflow.org/text/tutorials/text_generation)  
[TensorFlow Addons Networks : Sequence-to-Sequence NMT with Attention Mechanism](https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt)

### Papers / Courses

An [in-depth survey from CS230, Stanford University](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks)

#### RNNs
[Rumelhart, David E; Hinton, Geoffrey E, and Williams, Ronald J (Sept. 1985), "Learning internal representations by error propagation"](https://apps.dtic.mil/dtic/tr/fulltext/u2/a164453.pdf)  
[Jordan, Michael I. (May 1986), "Serial order: a parallel distributed processing approach"](https://www.osti.gov/biblio/6910294)

#### LSTMs
[Sepp Hochreiter, Jürgen Schmidhuber, "Long Short-Term Memory"](https://arxiv.org/abs/2105.06756)

#### GRUs
[Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio, "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches"](https://arxiv.org/abs/1409.1259)  
[Ralf C. Staudemeyer, Eric Rothstein Morris, "Understanding LSTM -- a tutorial into Long Short-Term Memory Recurrent Neural Networks", arXiv](https://arxiv.org/abs/1909.09586)  

---

## Text encoding, Unicode

A good reference: [John Sturtz, "Strings and Character Data in Python", Real Python](https://realpython.com/python-strings/)

In [4]:
YouTubeVideo("MijmeoH9LT4", width=853, height=480) # Numberphile: Characters, Symbols and the Unicode Miracle - Computerphile

---

## NLP,  Word embeddings

See [Huggingface's NLP course](https://huggingface.co/learn/nlp-course/chapter1/1)

Tutorial on [Word embeddings](https://www.tensorflow.org/text/guide/word_embeddings) (3D embeddings with TensorBoard!)

In [5]:
YouTubeVideo('rmVRLeJRkl4', width=853, height=480) #  Stanford CS224N: NLP with Deep Learning | Winter 2021 |
                                                   #  Lecture 1 - Intro & Word Vectors

### NLP libraries & tools in Python

#### NLTK: the Natural Language Toolkit

NLTK is a leading platform for building Python programs to work with human language data ([website](https://www.nltk.org/)).

#### Gensim

Perhaps the best dedicated library for word vectors & similar text processing tools ([website](https://radimrehurek.com/gensim/index.html)).

#### spaCy

A a free open-source library for Natural Language Processing in Python ([website](https://spacy.io/)).

---

## Transformers & Attention

In [9]:
YouTubeVideo('dqoEU9Ac3ek', width=853, height=480) #  MIT 6.S191: Recurrent Neural Networks, Transformers, and Attention 


### Tutorials

[Classify text with BERT](https://www.tensorflow.org/text/tutorials/classify_text_with_bert) (what is done below, without building the net)  
[Neural machine translation with a Transformer and Keras](https://www.tensorflow.org/text/tutorials/transformer) (building the net from scratch, on another task)  
More in the notebook [`chapter11_part04_sequence_to_sequence_learning.ipynb`](https://github.com/jchwenger/AI/blob/main/6-additional-material/chapter11_part04_sequence_to_sequence_learning.ipynb)  
[Huggingface's NLP course](https://huggingface.co/learn/nlp-course/chapter1/1)

### References


[Lucas Beyer, "Transformers"](https://docs.google.com/presentation/d/1ZXFIhYczos679r70Yu8vV9uO6B1J0ztzeDxbnBxD1S0/edit)  
[Jay Alammar, "The Illustrated Transformer"](https://jalammar.github.io/illustrated-transformer/)  
[Vaswani et al, "Attention Is All You Need"](https://arxiv.org/abs/1706.03762)  
[Tensor2Tensor Colab](https://colab.research.google.com/github/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/hello_t2t.ipynb)  
[Peter Bloem, "Transformers From Scratch"](https://peterbloem.nl/blog/transformers) (in PyTorch!)  
[The Annotated Transformer](http://nlp.seas.harvard.edu/annotated-transformer/) (same)  
[BertViz, Visualize Attention in NLP Models](https://github.com/jessevig/bertviz)

In [8]:
YouTubeVideo('wzfWHP6SXxY', width=853, height=480) # Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 7 - Translation, Seq2Seq, Attention

In [6]:
YouTubeVideo('LWMzyfvuehA', width=853, height=480) # Stanford CS224n NLP with Deep Learning | 2023 | Lecture 8 - Self-Attention and Transformers

In [7]:
YouTubeVideo('qU7wO02urYU', width=853, height=480) # Vision Transformers (ViT) Explained + Fine-tuning in Python