GitHub - mbosley/foundations-of-llms

title	author	date	bibliography
Foundations of LLMs Reading list	Mitchell Bosley	2023-04-19	bibliography.bib

Part 1: Foundations of NLP and Neural Networks

Week 1: Introduction to Word Embeddings

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP).
Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems.
Rong, X. (2016). word2vec Parameter Learning Explained. arXiv preprint arXiv:1411.2738.

Week 2: Introduction to Neural Networks

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. (Chapters 1, 6, and 7)
Goldberg, Y. (2015). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345-420.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.

Week 3: Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179-211.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Karpathy, A. (2015). The Unreasonable Effectiveness of Recurrent Neural Networks. Andrej Karpathy blog. http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Week 4: Sequence-to-Sequence Models and Attention Mechanism

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems.
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Part 2: Large Language Models

Week 5: Introduction to Transformers

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems.
Al-Rfou, R., Choe, D., Constant, N., Guo, M., & Jones, L. (2018). Character-level language modeling with deeper self-attention. In Proceedings of the AAAI Conference on Artificial Intelligence.

Week 6: Self-Attention and Multi-Head Attention

Lin, Z., Feng, M., Santos, C. N. D., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A structured self-attentive sentence embedding. In Proceedings of the 5th International Conference on Learning Representations (ICLR).
Shaw, P., Uszkoreit, J., & Vaswani, A. (2018). Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
Tay, Y., Tuan, L. A., & Hui, S. C. (2018). Multi-head attention with disagreement regularization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Clark, K., Khandelwal, U., Levy, O., & Manning, C. D. (2019). What does BERT look at? An analysis of BERT's attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP.

Week 7: Transformer-based Architectures: BERT, GPT, and T5

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI Blog.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2019). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.

Week 8: Scaling up Language Models

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8).
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Agarwal, S. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
Wang, A., Pruksachatkun, Y., Nangia, N., Ross, D., Jiang, J., Htut, P. M., ... & Bowman, S. (2020). Can you tell me how to get past Sesame Street? Sentence-level pretraining beyond language modeling. arXiv preprint arXiv:1812.10860.
Bubeck et al. (2023). Sparks of Artificial General Intelligence: Early experiments with GPT-4

Week 9: Pretraining, Transfer Learning, and Fine-tuning

Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
Ruder, S. (2019). Neural transfer learning for natural language processing. Ph.D. Thesis, National University of Ireland, Galway.
Gururangan, S., Swayamdipta, S., Levy, O., Schwartz, R., Bowman, S., & Smith, N. A. (2018). Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT).
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., ... & Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
library.bib		library.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

library.bib

library.bib

Repository files navigation

Part 1: Foundations of NLP and Neural Networks

Week 1: Introduction to Word Embeddings

Week 2: Introduction to Neural Networks

Week 3: Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

Week 4: Sequence-to-Sequence Models and Attention Mechanism

Part 2: Large Language Models

Week 5: Introduction to Transformers

Week 6: Self-Attention and Multi-Head Attention

Week 7: Transformer-based Architectures: BERT, GPT, and T5

Week 8: Scaling up Language Models

Week 9: Pretraining, Transfer Learning, and Fine-tuning

Part 3: Applications

Week 10: Large Language Models in Political Science

Week 11: Ethical Considerations and Bias in Large Language Models

Week 12: Limitations and Future Directions of Large Language Models

About

Releases

Packages

Languages

mbosley/foundations-of-llms

Folders and files

Latest commit

History

README.md

README.md

library.bib

library.bib

Repository files navigation

Part 1: Foundations of NLP and Neural Networks

Week 1: Introduction to Word Embeddings

Week 2: Introduction to Neural Networks

Week 3: Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

Week 4: Sequence-to-Sequence Models and Attention Mechanism

Part 2: Large Language Models

Week 5: Introduction to Transformers

Week 6: Self-Attention and Multi-Head Attention

Week 7: Transformer-based Architectures: BERT, GPT, and T5

Week 8: Scaling up Language Models

Week 9: Pretraining, Transfer Learning, and Fine-tuning

Part 3: Applications

Week 10: Large Language Models in Political Science

Week 11: Ethical Considerations and Bias in Large Language Models

Week 12: Limitations and Future Directions of Large Language Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages