# CS 195: Natural Language Processing
## Syllabus, Fall 2023

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ericmanley/f23-CS195NLP/blob/main/F0_0_Syllabus.ipynb)


## Instructor

Eric Manley

Chat: [Microsoft Teams](https://teams.microsoft.com/l/chat/0/0?users=eric.manley@drake.edu)

Email: eric.manley@drake.edu

Office: Collier-Scripps 327

### Office Hours ###

Schedule in [Starfish](https://drake.starfishsolutions.com/starfish-ops/dl/instructor/serviceCatalog.html?bookmark=connection/8352/schedule) the day before or drop in
* M 10:00am-1:00pm
* T 10:00am-12:00pm

## Course Description

In this course, you will investigate modern algorithms and tools used in natural language applications like autocorrect, sentiment analysis, translation, and chat bots – including the technology behind large language models like ChatGPT and Google Bard. Topics include variations of neural networks for language tasks, hidden Markov models, and transformers.

Prereq: CS 167 (Machine Learning) or instructor permission

## Learning Objectives

At the end of this course, students should be able to
* Implement classic foundational algorithms for NLP tasks, like Hidden Markov Models for part-of-speech tagging and the Lesk Algorithm for word sense disambiguation
* Create applications which incorporate pre-trained models for NLP tasks like text classification, translation, summarization, and question answering
* Use modern libraries to build and train custom neural networks for NLP tasks
* Describe neural network architectures, including transformers, that perform well on NLP tasks and explain how various recent neural network innovations have impacted the field
* Fine-tune pre-trained NLP models with new data for a specific task

## Course Structure

The course will be broken into seven parts, each representing about two weeks (a fortnight) worth of class time. Class time will run in a workshop style format which intersperses **mini presentations** and individual/pair/group **work on projects**. Each fortnight will end with **demo day** (~30 minutes at the beginning or end of class) where students show off their work to their groups, and possibly to the whole class. The tentative demo schedule is as follows:
* Fortnight 1 Demos: Tuesday, September 12th
* Fortnight 2 Demos: Tuesday, September 26th
* Fortnight 3 Demos: Tuesday, October 10th
* Fortnight 4 Demos: Tuesday, October 24th
* Fortnight 5 Demos: Tuesday, November 7th
* Fortnight 6 Demos: Tuesday, November 21st
* Fortnight 7 Demos: Thursday, December 7th

### Attendance and Class Delivery Mode

This class will meet in person, and your attendance is extremely important for your learning and for the benefit of the whole class. You are expected to attend unless you have a good reason. Any accommodations for remote participation should only be used with good reason.



### Changes to Delivery Mode

If the university or instructor determines it is necessary to switch to an online-only delivery mode, either temporarily (e.g., due to illness of the instructor) or permanently (e.g., a prolonged COVID outbreak on campus), arrangements will be communicated via an announcement on the Blackboard page and/or email.

### Recording

I will attempt to record workshop presentations (though not work time) and post them to the Blackboard Panopto tool. These are provided as a courtesy for your reference, but they will not be a substitute for the in-class experience.

### What to do if you need to work remotely

If you need to miss an in-person class because of illness or other reasons, you are expected to use the resources on the course website and Blackboard to stay caught up and continue your work.

If you need to miss a *demo day*, you will need to **record your demo** and make it available to the instructor and members of your group.

If, for some good reason, you require a **prolonged absence**, please communicate with the instructor to make arrangements.

## Tentative Topics Road Map

The following is an example road map for our adventure. However, I hope to remain agile enough to follow the students in terms of topics and timelines, and I do not actually expect that the topics will line up neatly with each Fortnight.

**Fortnight 1: The good stuff. Exploring pre-trained NLP models.**
* HuggingFace models and libraries
* Text classification
* Translation
* Summarization
* Question answering and chatbots

**Fortnight 2: Classic algorithms**
* Working with the NLTK, working with corpora, and text processing
* Markov Model text generation
* Hidden Markov Models
* Lesk Algorithm

**Fortnight 3: Syntax, Parsing, and Linguistic Structures**
* Syntax and Grammatical Structures
* Constituency Parsing vs. Dependency Parsing
* Context-Free Grammars (CFG) and Parsing Algorithms (like CYK)
* Role of syntax in NLP (e.g., improving translation models, named entity recognition, etc.)
    
**Fortnight 4: Neural Networks for NLP**
* Building and training neural nets in Keras (maybe TensorFlow or PyTorch too)
* Recurrent neural networks and LSTM

**Fortnight 5: Word Embeddings**
* Bag of Words
* TF-IDF
* Word2Vec
* GloVe

**Fortnight 6: Transformers**
* Attention and transformer overview
* Popular transformer-based architectures

**Fortnight 7: Fine-Tuning Pretrained Models**
    


## Learning Materials

Code, slides, data will be posted here: https://github.com/ericmanley/F23-CS195NLP

Blackboard will be used for recordings, turning in portfolios, and other things as needed

### Textbook

We will not be following any one book very closely. I will share references for various topics as we go. However, the library has made the following textbook available to us for free as a reference:

[Natural Language Understanding with Python : Combine Natural Language Technology, Deep Learning, and Large Language Models to Create Human-Like Comprehension](https://ebookcentral.proquest.com/lib/drake/reader.action?docID=30609769) by Deborah A. Dahl

## Accountability, Portfolios, and Grading

Each student is responsible for creating a porfolio of their work in this course, which should be shared with the instructor and updated throughout the semester. The portfolio should document all work that the student wants to claim credit for. It should be in the form of a Jupyter Notebook which includes a short description of each item, the names of group members who you demo'd it for, the number of xp earned, and a link to any additional notebooks or other artifacts involved. Example notebooks will be provided to show the expected format, but feel free to be creative in the presentation.

### Experience Points

You will accumulate Experience Points (xp) as you complete this course. On each demo day, you can earn xp for three different categories of items:

1. **Core Practice:** 5xp per Fortnight. This is earned for running all sample code given in workshop presentations and keeping your own notes on it (in Markdown cells in the Jupyter Notebooks). You do not need to demo this to earn credit unless it is the only thing you are getting xp for in that Fortnight.
2. **Applied Exploration:** 5xp per Fortnight. For each Fortnight, you can get this xp for completing exercises that apply the sample code in new ways. Usually, this will mean doing the same kinds of things in the sample code but with different models or data and writing a short evaluation of what you found in a Markdown cell. Instructions for how to earn this credit will be given with most workshop presentations, but it should typically represent at least a few hours of work outside of class for each Fortnight. You will typically start each of the Applied Explorations in class, but for the credit, you will need to complete at least one of them in detail with all of the requested notes, answers to questions, etc.
3. **Creative Synthesis:** Unlimited xp available. This is earned for creative projects which involve substantially original code or writing. Defining the project itself is part of the work, but here are some guidelines. In addition to demoing these for your group, you should get the instructor's thumbs up to be sure you earned the xp (I may ask for revisions/additions before signing off on it). You may also negotiate with the instructor for other ideas you have.
    * **Small project prototype:** 5xp. Create a prototype for a software application which uses ideas from the course (no set standard for "size", but I'm thinking about 100 lines of code or 5-10 hours of work)
    * **Large project prototype:** 20xp. This is an application with more code and more polish than a small project. I would expect this to take at least 20 hours of work, and it can span multiple fortnights.
    * **Extended implementation:** 5xp. Take the ideas from workshop presentation, *use additional resources*, and take it farther - implement new variations of an algorithm, use additional options/parameters, try things with different libraries, etc.
    * **Write a tutorial:** 5xp. Write up a tutorial with original writing (citing any sources) explaining concepts or showing how to use tools/modules/libraries related to the course. Here are some examples of articles in this genre (but for non-NLP topics) - you can find many more on sites like these:
        - https://www.kaggle.com/code/prashant111/svm-classifier-tutorial
        - https://medium.com/analytics-vidhya/image-classification-with-tf-keras-introductory-tutorial-7e0ebb73d044
        - https://pyimagesearch.com/2021/11/08/u-net-training-image-segmentation-models-in-pytorch/
    * **Run a workshop:** 10xp - *subject to instructor approval and must be arranged in advance*. Pick a new topic that we haven't scheduled or haven't covered yet, and run a workshop for that day. It should include a mini-presentation and some work for students to do. 
    * **Team work:** 2xp. Earn these *additional* points on one of the above creative synthesis items that you worked on in collaboration with one or more other students. All team members should do enough individual work to meet the requirements on their own, and you must document individual work in your portfolio, but you will be rewarded for coordinating it in a teamwork setting.
    * **Topic integration:** 2xp. Earn these *additional* points on one of the above creative synthesis items that includes substantial combination of two or more different course topics (or a course topic and another topic outside the scope of the class - an NLP topic we didn't cover, something from another course, etc.), integrated into one cohesive deliverable.

### Grade Assignment

The number of xp needed (and documented in your portfolio) to reach a given course grade is as follows:

| xp | Grade |
| --- | :-- |
| 30 | D |
| 40 | C- |
| 50 | C |
| 60 | C+ |
| 70 | B- |
| 80 | B |
| 85 | B+ |
| 90 | A- |
| 95 | A |


A total of less than 30xp total will result in an F.

Note that this means there are a variety of ways to achieve any given grade. Here are some examples:
* If you only do the *Core Practice* for every fortnight and nothing else, you'll pass the class. 
* If you do only the *Core Practice* and *Applied Exploration* for every fortnight, you'll get a B-.
* If you do the *Core Practice* and *Applied Exploration* for every fortnight *and* one extended implementation and one large project, you'll get an A. 
* If you do all the *Core Practice*, write three tutorials, do three individual small projects, and two small group projects with topic integration, you will get a B.

**If you are usure of what to do:** Do the *core practice*, an *applied exploration*, and a *small project prototype* for each fortnight. 



### Accommodations for Students with Disabilities ###

I will be happy to discuss any academic accommodations needed for students with disabilities. However, any student seeking accommodation must coordinate them with the [Access and Success office](https://www.drake.edu/disabilityservices/ ) before the accommodations are needed. No retroactive accommodations will be made.

### Academic Integrity ###
Drake University has high standards for academic integrity, and you are expected to read the [Academic Dishonesty Policy](https://www.drake.edu/artsci/studentresources/policiesandregulations/#dishonesty) from the College of Arts and Sciences. 

Below is a particularly relevant excerpt from the  statement:

> Academic dishonesty is an all encompassing term involving any activity that 
> seeks to gain credit for work one has not done or to deliberately damage or 
> destroy the work of others. Academic dishonesty includes, but is not limited 
> to, plagiarism, cheating, fabrication, and knowingly helping another to 
> commit an act of academic dishonesty.

*Bottom line:* don't pass off work of others (including AI-generated content) as your own, and be open and trasparent about all use of sources
