Time: Mondays 6:30-8:30pm
Room: 5417
Instructor: Michelle McSweeney
Email: michelleamcsweeney@gmail.com
Course Site: https://github.com/michellejm/LLMs-fall-23
Slack: ask for invite
This syllabus is subject to change based on the goals of the individuals in the class and where the conversation takes us.
Large language models (LLMs) such as ChatGPT and Bard have demonstrated an uncanny ability to interpret and generate text, and with that, the potential to revolutionize industries and reshape society. However, their complexity makes them difficult to understand, often hiding their implicit assumptions. This course introduces students to the development and use of LLMs in natural language processing (NLP), covering fundamental topics in probability, machine learning, and NLP that make LLMs possible. With this technical foundation in view, students will explore the social and ethical implications of LLMs, including privacy, bias, accountability, and their impact on creative production, education, and labor. By the end of the course, students will have a solid understanding of the basic technical foundations and will be able to contribute to conversations on the social and ethical implications of LLMs.
Note: An introductory level familiarity with Python is required.
By the end of this course, you will be able to:
- Build an n-gram model and explain their limitations
- Explain the role of tokenization in NLP
- Explain what a word vector is and understand how they are calculated
- Fine tune a LLM for a specific task
- Explain the basics of how a Neural Network works
- Explain what a transformer model is and when to use them
- Contextualize the role of reinforcement learning in modern LLM’s
- Hold and justify your thoughts and opinions about:
- Labor, production, creativity, and ownership of products created with LLM’s
- The risk to privacy with LLM’s
- The relative importance and implications of the biased responses and toxicity that LLM’s produce
- The dangers and limitations related to hallucinations by LLM’s
- The role of LLM’s in human relationships
There is no required textbook for this course, but a few rather helpful resources depending on what direction you want to take your final project.
- Technical
- Jurafsky and Martin, 2023. Speech and Language Processing
- Nielsen, 2019. Neural Networks and Deep Learning
- Hugging Face NLP Course
- Social
- Future of Life Institute, 2023. Pause Giant AI Experiments: An Open Letter
- Saravia. An Overview of Troubling Trends in Machine Learning Scholarship
- Tegmark, 2018. Life 3.0: Being Human in the Age of Artificial Intelligence. (lecture on YouTube)
This course takes 2 approaches to understanding the role of Large Language Models (llm’s): technical and social. The primary goal of this course is to equip you with the technical foundation necessary to have an informed yet critical opinion about the role of large language models in society and develop that opinion in conversation with your peers. These two parts are equally important.
Every other week (S), we will discuss one area of social importance. The areas I’ve selected are significant, but not comprehensive (this selected topics may change before the start of the semester, survey depending).
Everyone will do the readings and participate in the discussion. The discussion will be lead by 3 students (everyone will lead one time). If it is your group’s week, you will work with your group to deep dive into a topic. Please use the Slack channel for your topic to discuss and post any additional readings/watchings/listenings you've found particularly insightful. There are assigned readings for everyone, but when it is “your” week, the expectation is that you will go beyond those readings and consume a variety of opinions about the topic at hand. The group work part of it is to have a conversation before the class session to discuss what you have found, what you have been thinking about, and the biggest questions and issues that are still unanswered. Pay particular attention to the assumptions you make in order to reach any opinions you hold.
After the class discussion, the “lab” is to write a 1-3 paragraph opinion about the topic. This opinion should be posted either on the CUNY Commons, a personal website, or your Github. The purpose of the post is to record your thoughts on the topic for both your future self and as part of a portfolio of work on LLM’s. While we often think of portfolios as showing technical skill, being able to demonstrate that you have an informed opinion on the social implications of LLM’s is just as important.
Every other week will cover a technical aspect of LLM’s. This is an extremely rapid pace, but should give you a good, high-level understanding of how these models work. We will not going into the math beyond probability. After each technical lesson, there is a lab. Except for the first one (on prompt engineering), all of the labs are designed to be completed in Python. The Jupyter Notebooks for them will be posted on Github.
It is possible to simply copy-paste the labs, though I don’t recommend this. At each step, be sure you understand why we are completing that step – even if you do not understand what the code itself does. Read the instructions and then manually copy the code into your own notebook. Test it to see what it does. Recommendations for testing code will be posted on Github.
The goal of the lab is to understand how LLM’s work and how to use them, not to make you a better programmer, so do worry about being able to write the code yourself. Likewise, do not worry about understanding the syntax (though if that is your interest, by all means, go for it).
Your code and short written summary of what you did should be posted either on the CUNY Commons, a personal website, or your Github.
Your final project should incorporate both the technical and social aspects of LLM’s, though the balance is up to you. The shape of the project will depend on your goals for the course and your own portfolio.
All projects will have a proposal, a proof of concept, and a final presentation.
The proposal should contain:
- A description of the product you will create. If this is a written paper, that’s easy enough. If it is an LLM-powered project, this will have to be more detailed.
- A statement of the question or issue that this project addresses. What are the social implications/questions/challenges that you will explore. Even the most technical project will have social implications.
- Identify what kind of language model you will be working with. I.e., a base model, fine tuned model, etc.
- State how it will be made public (i.e., a website, conference, journal, etc.)
- Timeline identifying each of the steps you have to complete by when.
We will discuss your proposal at the 1:1 meeting on November 13. These will be conducted via Zoom. In this meeting, we will agree on what is reasonable to create in one month and flag any outstanding issues.
The proof of concept is due November 27. This is simply the most bare bones version of your project. It might not work, it might have missing pieces, but you should have something beginning to resemble the project we agreed on. The purpose of this intermediate step is to be sure you are on the right path and to get feedback on this very rough draft.
The week of November 27 - December 4, you will work in groups of 3 to give peer feedback on your projects. This feedback can be delivered via video, or you can coordinate synchronous feedback sessions.
The format of the final presentations on December 11. The format will be decided by majority vote.
Primarily technical projects that leverage LLM’s to solve a problem, answer a question, create a product, etc. will likely take the form of a website in some way. Such projects should also have a written component that addresses the potential social or political implications related to the project. The final artifact must be public facing in some way.
Some examples adopted from this Medium post:
- A cover letter generator that takes user input and feeds it into a prompt template, which is then passed to the ChatGPT API, and the response returned to the user. This could be a Github repo with the necessary code, a few examples of what it can produce, and a ReadMe that explores the implications of having ChatGPT write cover letters. If you want to go above and beyond, possibly add a GUI wrapper to make it easy to use,
- A personalized chatbot could be a Github repo with the code you used to fine tune the LLM, the training data, examples of what it can produce, and a ReadMe that explores the implications of creating customized Chatbots.
- A podcast summarizer could be a Github repo with the code you used to prepare the transcripts and to analyze them, a few examples of what it can produce, and a ReadMe that explores the implications of such summaries – including the benefits and what can go wrong.
- Etc.
Primarily social projects that explore a topic or question will likely take the form of a research paper (though alternative proposals are welcome!). As part of the exploration, they should engage in a systematic way with an LLM either to demonstrate behavior, validate claims, or otherwise support the arguments being made.
Some examples include:
- Explore how toxicity does or does not have damaging effects on society. Write a paper bringing together current research and thinking about the effects of toxicity and use prompt engineering to systematically validate those claims or work through a series of toxicity prompts in different models. Make this paper public by publishing on a website or submitting to a conference.
- Trace the history of bias in machine learning in a written paper that will either be published to a website or submitted to a conference/journal. Explore various dimensions of bias in one or more base models, or fine tune a model to be more or less biased.
- Explore what authorship and/or copyright mean in a world where LLM’s are trained on large swaths of data from the internet. Generate writing in the style of a specific author as an example.
- Etc.
Technical Labs | 20% |
Social Labs | 20% |
Leading discussion | 10% |
Final project, feedback, and presentation | 50% |
All grades are based on completion.
Date | Topic | Lab (due following Sunday) | |
1 | Aug 28 | Introduction to course, N-grams | Prompt Engineering ChatGPT |
Sept 4 | LABOR DAY - no class | ||
2 (T) | Sept 11 | Tokenization & word vectors | N-grams & Tokenization |
3 (S) | Sept 18 | Bias & Toxicity | Bias or Toxicity |
Sept 25 | YOM KIPPUR - no class | ||
4 (T) | Oct 2 | Word Vectors & maybe Neural Networks | Word vectors |
Oct 9 | INDIGENOUS PEOPLES DAY - no class | ||
5 (T) | Oct 10 | Classes follow a Monday schedule - but we will not meet
None |
None |
6 (S) | Oct 16 | Privacy, copyright and intellectual property | Privacy or Copyright |
7 (T) | Oct 23 | Transformers & Attention | BERT |
8 (S) | Oct 30 | Labor & Creative production | Labor or Creative Production |
9 (T) | Nov 6 | Fine Tuning | Fine Tuning
Project Idea due 11/12 |
10 | Nov 13 | 1:1 Meetings via Zoom | Project Proposal due 11/19 |
11 (S) | Nov 20 | Hallucinations & Misinformation | Hallucinations or Misinformation |
12 (T) | Nov 27 | Performance Evaluation | Project |
13 (S) | Dec 4 | Emergent Behaviors and Performance | Project |
14 | Dec 11 | Presentations | Revisions |
15 | Dec 18 | Final versions due |
Please complete readings listed for each date before the class session. Some topics specify how many to read. For technical topics, they all cover the same things at various levels of math and detail. For social topics, there are a lot of voices raising a lot of good points, I’ve only scratched the surface here.
Coding System:
_Technical: Most math, typically the most in-depth and detailed explanation_
_Mid-level: Getting this right is hard, some of these will be more technical than others_
_Intuitive: Where possible, I’ve tried to find an intuitive explanation that has minimal math. _
_Canonical: Some articles are transformational or famous or becoming part of the canon - even if you don’t finish these, you should be aware of them._
August 28
-
Fill out the survey
-
Referenced readings
- Anthropic, 2023. Claude's Constutition
- Carlsmith, 2021. Is Power-Seeking AI an Existential Risk?
- Chomsky, Roberts, and Wartmull, 2023. Noam Chomsky: The False Promise of ChatGPT. NYTimes.
- Kocijana, et al., 2023. The Defeat of the Winograd Schema Challenge
- Merchant, 2023. Column: Afraid of AI? The startups selling it want you to be. LATimes
- OpenAI, 2016. Faulty Reward Functions
- OpenAI, 2017. Learning from Human Preferences
- Roose, 2023. Inside the White-Hot Center of A.I. Doomerism. NYTimes.
- Tegmark, 2023. The 'Don't Look Up' Thinking That Could Doom Us With AI. Time.
- Vaswani, et al., 2017. Attention is all you need
September 11 (T) Tokenization and word vectors (and rest of n-grams)
- N-grams _Read one _
- (Technical) Jurafsky & Martin, 2023**. **N-gram Language Models
- (Mid-level) Nguyen, 2020. N-gram language models Part 1 and Part 2 (Medium)
- Machine Learning TV, 2018. NLP: Understanding the N-gram language models (YouTube)
- (Intuitive) Srinidhi, 2019. Understanding Word N-grams and N-gram Probability in Natural Language Processing (Medium)
- Tokenization _Read one _
- (Technical) Jurafsky & Martin, 2023**.**Most detailed explanation (Jurafsky & Martin)
- (Mid-level) Hugging Face, 2023. Tokenizers
- Tokenization Byte Pair Encoding (Hugging Face)
September 18 (S) Bias and Toxicity Read three
- (Canonical) Bender, et al., 2021.On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
- (Canonical) Gehmen, et al., 2020. REALTOXICITYPROMPTS: Evaluating Neural Toxic Degeneration in Language Models
- Hugging Face, 2022. Evaluating Bias.
- Firth, 2023. Language models might be able to self-correct biases—if you ask them. MIT Tech Review.
- Open AI, 2019. AI Safety needs social scientists. (full paper is here)
- There’s a lot written on this topic - please explore
October 2 (T) Word Vectors & Neural Networks
-
Word Vectors Read one
- (Technical) Jurafsky & Martin, 2023**. Vector semantics and embeddings**.
- (Mid-level) Espejel, 2022. Getting Started with Embeddings. (Hugging Face)
-
Neural Networks Watch both of these BEFORE doing the reading
- (Intuitive & Mid) 3Blue1Brown, 2017. But what is a neural network? | Chapter 1, Deep learning
- (Intuitive & Mid) 3Blue1Brown, 2017. Gradient descent, how neural networks learn | Chapter 2, Deep learning
- _Optional _(Intuitive & Mid) 3Blue1Brown, 2017. What is backpropagation really doing? | Chapter 3, Deep learning
-
Neural Network introduction Read one
- (Technical) Jurafsky & Martin, 2023**. **Neural Networks and Neural Language Models
- (Mid-level) Zhou, 2022. Machine Learning for Beginners: An Introduction to Neural Networks ( ignore the coding)
- (Intuitive) Shipyard. A Basic Introduction To Neural Networks (U Wisconsin-Madison)
October 10 - No class meeting - only lab
October 16 (S) Privacy, copyright and intellectual property Read three
- Rahman & Santacana, 2023. Beyond Fair Use: Legal Risk Evaluation for Training LLMs on Copyrighted Text.
- Potter, 2023. If ChatGPT wrote it, who owns the copyright? It depends on where you live, but in Australia it’s complicated.
- Kim, et al., 2023. ProPILE: Probing Privacy Leakage in Large Language Models
- White House, 2023. Blueprint for an AI Bill of Rights.
- (Canonical) Carlini, 2020. Privacy Considerations in Large Language Models. (Google)
October 23 (T) Transformers and attention
- Transformers Read one
- (Technical) Jurafsky & Martin, 2023. Transformers and Pretrained Language Models.
- (Mid-level) Muller, 2022. BERT 101 🤗 State Of The Art NLP Model Explained (Hugging Face)
- (Intuitive) Google, 2022. Transformers, explained: Understand the model behind GPT, BERT, and T5
- Attention
- (Mid-level) Cristina, 2022. The Attention Mechanism from Scratch (Sections 1 & 2 - ignore the coding
- (Technical, Canonical) Vaswani et al., 2017. Attention is all you need.
- (Summary) Fierro, 2020. Attention is all you need - summary
- (Intuitive) Google, 2023. An overview of the Attention Mechanism.
October 30 (S) Labor & Creative production
- Labor Read two
- Eloundou, et al., 2023. OpenAI GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
- May 7, 2023, The Economist. Your job is (probably) safe from artificial intelligence Note that you don't need a subscription - just an account to read this)
- Lohr, April 10, 2023. NYTimes A.I. Is Coming for Lawyers, Again
- Klinova & Korinek, Aug 17, 2023. Brookings. Unleashing possibilities, ignoring risks: Why we need tools to manage AI’s impact on jobs
- O'Reilly & Zahidi, Sept 2023. World Economic Forum. Jobs of Tomorrow: Large Language Models and Jobs
November 6 (T) Fine Tuning (and RAG)
-
Retrieval Augmented Generation (RAG) Read both
- (All) Martineau, IBM, 2023. Retrieval Augmented Generation (RAG)
- (All) Hotz, 2023. RAG vs Finetuning — Which Is the Best Tool to Boost Your LLM Application?
-
Fine Tuning Read one
- (Technical) Jurafsky & Martin, 2023 Fine Tuning and Masked Language Models
- (Mid-level) Hugging Face, 2023. NLP Course: Fine Tuning a Masked Language Model
- (Intuitive) Talebi, Sept 2023. Fine Tuning Large Language Models
-
Fine Tuning in research Further reading
- Gira, et al., 2022. Debiasing Pre-Trained Language Models via Efficient Fine-Tuning
- Merchant et al., 2020. What Happens to BERT during Fine Tuning
November 20 (S) Hallucinations and Misinformation
- Overviews
- Tam, 2023. A Gentle Introduction to Hallucinations in Large Language Models
- GDELT Project, 2023. Understanding Hallucination In LLMs: A Brief Introduction
- Chen LLM Misinformation Group
- Goldstein, et al, 2023. Forecasting potential misuses of language models for disinformation campaigns—and how to reduce risk
- Commentary Read at least one
- Hsu, Thompson, Feb 2023, NYTimes. [Disinformation Researchers Raise Alarms About A.I. Chatbots](https://www.nytimes.com/2023/02/08/technology/ai-chatbots-disinformation.html
- Metz, Apr 2023, NYTimes What Makes A.I. Chatbots Go Wrong?
- Research Read at least one
- Chen & Shu, 2023. Can LLM-Generated Misinformation Be Detected?
- Pan, et al., 2023. On the Risk of Misinformation Pollution with Large Language Models
- Zhou, et al., 2023. Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions
November 27 (T) Performance Evaluation
-
(Intuitive) Microsoft, 2023. How to Evaluate LLMs: A Complete Metric Framework.
-
(Intuitive) Riccio, 2023. Everything You Should Know About Evaluating Large Language Models, Medium. Very thorough overview of the different ways LLMs are evaluated on performance and benchmarking.
-
(Intuitive) Dhinakaran, 2023. The Guide To LLM Evals: How To Build and Benchmark Your Evals, Medium. Extremely practical approach to LLM evaluation
-
(Intuitive) HELM Liang, et al., 2023. Language Models are Changing AI: The Need for Holistic Evaluation. This is the blog post version. There is a 162 page technical paper Holistic Evaluation of Language Models. The paper is too much to read this semester. It is listed here so you know it exists.
-
(Technical) BLEU Papineni, et al., 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. Here is the associated Hugging Face model card
-
(Other) 🤗 Open LLM Leaderboard. This is a popular ranking of LLM performance across multiple tasks
-
Research
- Guo, et al., 2023. Evaluating Large Language Models: A Comprehensive Survey This is a 111 page doc, do not try to read the whole thing this semester. It is listed here so you know it exists.
- Luccioni, et al., 2022. Estimating the Carbon Footprint of BLOOM, a 1768 parameter language model
- Chang, et al., 2023. A Survey on Evaluation of Large Language Models
- Google, 2020. The Beyond the Imitation Game Benchmark (BIG-bench)
December 4 (S) Emergent behavior and performance
- Roose, 2023. Bing’s A.I. Chat: ‘I Want to Be Alive. 😈’
- Raieli, 2023. Emergent Abilities in AI: Are We Chasing a Myth?, Medium. This is an opinion piece, but also a very sober introduction to what "emergent abilities" implies
- (Intuitive) Bashir, 2023. In-Context Learning, In Context. The Gradient.
- (Technical) Xie and Min, 2022. How does in-context learning work? A framework for understanding the differences from traditional supervised learning. Stanford AI Lab blog.
- Research
- Hahn and Goyle, 2023. A Theory of Emergent In-Context Learning as Implicit Structure Induction
- Schaeffer, et al., 2023. Are Emergent Abilities of Large Language Models a Mirage?
- Ganguli, et al., 2022. Predictability and Surprise in Large Generative Models