# CS 195: Natural Language Processing
## Syllabus, Spring 2025




## Instructor

Eric Manley

Chat: [Microsoft Teams](https://teams.microsoft.com/l/chat/0/0?users=eric.manley@drake.edu)

Email: eric.manley@drake.edu

Office: Collier-Scripps 327

### Office Hours ###

Schedule in [Calendly](https://calendly.com/eric-manley/office-hours) the day before or drop in
* MW 9:30am-12:00pm

## Course Description

In this course, you will investigate modern algorithms and tools used in natural language applications like autocorrect, sentiment analysis, translation, and chat bots â€“ including the technology behind large language models like ChatGPT, Claude, Gemini, and Grok. Topics include building applications that use small and large language models, Retrieval-Augmented Generation, classical NLP (like Markov models), tokenization, embeddings, neural network and transformer architectures for language, fine-tuning pretrained models, and reinforcement learning from human feedback.

Prereq: CS 167 (Machine Learning)

## Learning Objectives

At the end of this course, students should be able to
* Implement classic foundational algorithms for NLP tasks
* Create applications which incorporate pre-trained models for NLP tasks like text classification, translation, summarization, question answering, and conversation
* Create applications that use Retrieval-Augmented Generation
* Use modern libraries to build, train, and fine-tune custom neural networks or transformers for a given task
* Describe neural network architectures, including transformers, that perform well on NLP tasks and explain how various recent neural network innovations have impacted the field

## Course Structure

The course will be broken into seven parts, each representing about two weeks (a fortnight) worth of class time. Class time will run in a workshop style format which intersperses **mini presentations** and individual/pair/group **work on projects**. Each fortnight will end with **demo day** (~30 minutes at the beginning or end of class) where students show off their work to their groups, and possibly to the whole class. The tentative demo schedule is as follows:
* Fortnight 1 Demos: Monday, February 9th
* Fortnight 2 Demos: Monday, February 23rd
* Fortnight 3 Demos: Monday, March 9th
* Fortnight 4 Demos: Monday, March 30th
* Fortnight 5 Demos: Monday, April 13th
* Fortnight 6 Demos: Monday, April 27th
* Fortnight 7 Demos: TBA - either
    * Wednesday, May 6th (last day of class) 
    * or Tuesday, May 12th, 9:30 - 11:20 am (final exam time)

### Attendance and Class Delivery Mode

This class will meet in person, and your attendance is extremely important for your learning and for the benefit of the whole class. You are expected to attend unless you have a good reason. Any accommodations for remote participation should only be used with good reason.



### Changes to Delivery Mode

If the university or instructor determines it is necessary to switch to an online-only delivery mode, either temporarily (e.g., due to illness of the instructor) or permanently (e.g., a prolonged COVID outbreak on campus), arrangements will be communicated via an announcement on the Blackboard page and/or email.

#### Jury Duty

I have been called for jury duty during the *month of February*, which probably means I may have to go in for a day some time during the month. This is very difficult to plan ahead for, but I will probably ask you to come to class and work in groups on something for any day(s) that I am gone.

### Recording

I will attempt to record workshop presentations (though not work time) and post them to the Blackboard Panopto tool. These are provided as a courtesy for your reference, but they will not be a substitute for the in-class experience.

### What to do if you need to work remotely

If you need to miss an in-person class because of illness or other reasons, you are expected to use the resources on the course website and Blackboard to stay caught up and continue your work.

If you need to miss a *demo day*, you will need to **record your demo** and make it available to the instructor and members of your group.

If, for some good reason, you require a **prolonged absence**, please communicate with the instructor to make arrangements.

## Tentative Topics Road Map

The following is an example road map for our adventure. However, I hope to remain agile enough to follow the students in terms of topics and timelines, and I do not actually expect that the topics will line up neatly with each Fortnight.

**Fortnight 1: Exploring pre-trained NLP models**
* HuggingFace models and libraries
* Text classification
* Models for translation, summarization, and question answering

**Fortnight 2: Working with LLMs**
* Conversational models
* Evaluating text-to-text models
* Building applications with the OpenAI API
* Retrieval-Augmented Generation (RAG)

**Fortnight 3:**
* Markov Models and text generation
* Hidden Markov Models and parts-of-speech tagging
* Tokenization algorithms (BPE and/or WordPiece) and libraries

**Fortnight 4:**
* Machine Learning with Text Data
* Neural Networks and Nueral Language Models
* Embeddings (Bag of Words, TF-IDF, Word2Vec, GloVe)

**Fortnight 5:**
* Recurrent Neural Networks (RNN)
* LSTM and GRU neural network architectures
* Encoders and Decoders

**Fortnight 6:**
* Transfer Learning
* Attention
* Transformers

**Fortnight 7:**
* Reinforcement Learning from Human Feedback
* Popular transformer architectures

    


## Learning Materials

Code, slides, data will be posted here: https://github.com/ericmanley/S25-CS195NLP

Blackboard will be used for recordings, turning in portfolios, and other things as needed

### Textbooks

We will use several different open-access books and learning resources. Our most common sources include

* [Hugging Face LLM Course](https://huggingface.co/learn/llm-course/chapter1/1)
* [Daniel Jurafsky and James H. Martin. 2026. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd edition.](https://web.stanford.edu/~jurafsky/slp3). 

## Accountability, Portfolios, and Grading

Each student is responsible for creating a porfolio of their work in this course, which should be shared with the instructor and updated throughout the semester. The portfolio should document all work that the student wants to claim credit for. 

### Sharing Options

* GitHub: make a repo and put all of your notebooks there
* Google Drive: Create a shared folder with all of your notebooks

### Documentation

* Create a README file (or Jupyter Notebook with Markdown in it) where you link/refer to all the other items in the repo/folders
* For each item you want credit for, include in the README file
    * short description of the item
    * the names of the group members you demo'd it for
    * the number of xp you earned
    * links to any additional notebooks or other artifacts involved



### Portfolio Check-in Dates

Cummulative portfolios should be turned in for feedback by the following dates
* Friday, February 27th
* Friday, April 10th

Final portfolios are due Tuesday, May 12th

### Experience Points

You will accumulate Experience Points (xp) as you complete this course. On each demo day, you can earn xp for three different categories of items. Remember to also document them in your portfolio:

1. **Core Practice:** 5xp per Fortnight. This is earned for participating in **all** class/group activities for a fortnight. This includes engaging in discussions, running all sample code given in workshop presentations, and writing appropriate notes in the provided Notebooks. You do *not* need to demo this to earn credit unless it is the only thing you are getting xp for in that Fortnight. You *do* need to include all files in your portfolio and mention that you completed them for the core practice credit.


2. **Applied Exploration:** 5xp per Fortnight. For each Fortnight, you can get this xp for completing additional exploratory activities based on, but going beyond what is completed in class. Usually, you will have a choice about which applied exploration exercise(s) to complete for any given Fortnight, but you only get credit for *one* per Fortnight. 
    - Instructions for how to earn this credit will be given with most workshop presentations
    - Typically represent at least a few hours of work outside of class for each Fortnight
    - You may start many Applied Explorations in class during a Fortnight, though you only need to complete one for the credit


3. **Creative Synthesis:** Unlimited xp available. This is earned for creative projects which involve substantially original code or writing. Defining the project itself is part of the work, but here are some guidelines. In addition to demoing these for your group, you should get the instructor's thumbs up to be sure you earned the xp (I may ask for revisions/additions before signing off on it). You may also negotiate with the instructor for other ideas you have.
    * **Small project prototype:** 5xp. Create a prototype for a software application which uses ideas from the course (no set standard for "size", but I'm thinking about 100 lines of original personally-written code or 5-10 hours of work)
    * **Large project prototype:** 20xp. This is an application with more code and more polish than a small project. I would expect this to take at least 20 hours of work, and it can span multiple fortnights.
    * **Extended implementation:** 5xp. Take the ideas from workshop presentation, *use additional resources*, and take it farther - implement new variations of an algorithm, use additional options/parameters, try things with different libraries, etc.
    * **Present project to the class:** 1xp. After your group demo, groups may nominate someone to present tot he entire class. If you do so, you get an extra point of xp.
    * **Write a tutorial:** 5xp. Write up a tutorial with original writing (citing any sources) explaining concepts or showing how to use tools/modules/libraries related to the course. Here are some examples of articles in this genre (but for non-NLP topics) - you can find many more on sites like these:
        - https://www.kaggle.com/code/prashant111/svm-classifier-tutorial
        - https://medium.com/analytics-vidhya/image-classification-with-tf-keras-introductory-tutorial-7e0ebb73d044
        - https://pyimagesearch.com/2021/11/08/u-net-training-image-segmentation-models-in-pytorch/
    * **Run a workshop:** 10xp - *subject to instructor approval and must be arranged in advance*. Pick a new topic that we haven't scheduled or haven't covered yet, and run a workshop for that day. It should include a mini-presentation and some work for students to do. 
    * **Team work:** 2xp. Earn these *additional* points on one of the above creative synthesis items that you worked on in collaboration with one or more other students. All team members should do enough individual work to meet the requirements on their own, and you must document individual work in your portfolio, but you will be rewarded for coordinating it in a teamwork setting.
    * **Topic integration:** 2xp. Earn these *additional* points on one of the above creative synthesis items that includes substantial combination of two or more different course topics (or a course topic and another topic outside the scope of the class - an NLP topic we didn't cover, something from another course, etc.), integrated into one cohesive deliverable.

### Grade Assignment

The number of xp needed (and documented in your portfolio) to reach a given course grade is as follows:

| xp | Grade |
| --- | :-- |
| 30 | D |
| 40 | C- |
| 50 | C |
| 60 | C+ |
| 70 | B- |
| 80 | B |
| 85 | B+ |
| 90 | A- |
| 95 | A |


A total of less than 30xp total will result in an F.

Note that this means there are a variety of ways to achieve any given grade. Here are some examples:
* If you only do the *Core Practice* for every fortnight and nothing else, you'll pass the class. 
* If you do only the *Core Practice* and *Applied Exploration* for every fortnight, you'll get a B-.
* If you do the *Core Practice* and *Applied Exploration* for every fortnight *and* one extended implementation and one large project, you'll get an A. 
* If you do all the *Core Practice*, no *Applied Exploration*, write three tutorials, do three individual small projects, and two small group projects with topic integration, you will get a B.

**If you are usure of what to do:** Do the *core practice*, an *applied exploration*, and a *small project prototype* for each fortnight. 



### Accommodations for Students with Disabilities ###

I will be happy to discuss any academic accommodations needed for students with disabilities. However, any student seeking accommodation must coordinate them with the [Access and Success office](https://www.drake.edu/disabilityservices/ ) before the accommodations are needed. No retroactive accommodations will be made.

### Academic Integrity ###
Drake University has high standards for academic integrity, and you are expected to read the [Academic Dishonesty Policy](https://www.drake.edu/artsci/studentresources/policiesandregulations/#dishonesty) from the College of Arts and Sciences. 

Below is a particularly relevant excerpt from the  statement:

> Academic dishonesty is an all encompassing term involving any activity that 
> seeks to gain credit for work one has not done or to deliberately damage or 
> destroy the work of others. Academic dishonesty includes, but is not limited 
> to, plagiarism, cheating, fabrication, and knowingly helping another to 
> commit an act of academic dishonesty.

*Bottom line:* don't pass off work of others (including AI-generated content) as your own, and be open and trasparent about all use of sources
