In [5]:
%%html
<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>

# NDAK18000U Overview
Content at [https://github.com/coastalcph/nlp-course](github.com/coastalcph/nlp-course).
Click on slides for *Course Logistics*.

### NDAK18000U Details

- **Course Organizers**: [Daniel Hershcovich](https://danielhers.github.io/) and [Anders Søgaard](https://anderssoegaard.github.io/)
- **Teachers**: Daniel Hershcovich, [Desmond Elliott](https://elliottd.github.io/), and Anders Søgaard
- **Teaching Assistants**:
    - Constanza Catalina Fierro Mella
    - Emma Cathrine Liisborg Leschly
    - Rasmus Pallisgaard
    - Thomas Brun Lau Christensen
    - Zixuan Xu

### NDAK18000U Schedule

- Lectures:  
    -  Tuesdays, 13-15 in Aud 05, Universitetsparken 5 (HCØ), Weeks 36-41 + 43-44
    
- Lab Sessions:
    - Group 1: Mondays, 10-12 in the old library (4-0-17), Universitetsparken 1, Weeks 37-41 + 43-44
    - Group 2: Fridays, 10-12 in the old library (4-0-17), Universitetsparken 1, Weeks 36-41 + 43-44

We will assign you to one of two lab session groups based on your answers to the [Getting to Know You survey](https://absalon.ku.dk/courses/68562/quizzes/87334).
If you have not filled it in yet, do it as soon as possible.
You will receive an announcement about your assignment before the first lab session.

In [21]:
import re
from IPython.display import Markdown, display

with open('README.md', 'r') as f:
    content = f.read()

# Regular expression to find all <a href="..."> that don't start with 'http'
pattern = re.compile(r'<a href=([\'"])(?!http)([^\'"]+)\1>')

# Function to prepend '../' to each link
def replacer(match):
    quote = match.group(1)
    link = match.group(2)
    new_link = f"../{link}"
    return f'<a href={quote}{new_link}{quote}>'

# Replace links in content to be relative to the top directory of the repo
display(Markdown(pattern.sub(replacer, content)))

# Natural Language Processing (NDAK18000U)
## Course at the University of Copenhagen

Materials from this interactive book are used throughout the Natural Language Processing course at the Department of Computer Science, University of Copenhagen. The official course description can be found [here](https://kurser.ku.dk/course/ndak18000u/2023-2024). Materials covered each week are listed below. The course schedule and materials are tentative and subject to minor changes. Most reading material is from [Speech and Language Processing by Jurafsky & Martin](https://web.stanford.edu/~jurafsky/slp3).

<table><tr><th>Week</th><th>Reading (before lecture)</th><th>Lecture (Tuesday)</th><th>Lab (Friday &amp; Monday)</th><th>Lab notebook</th></tr>
     <tr><td>36</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/2.pdf'>Chapter 2 up to end of 2.4</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/4.pdf'>Chapter 4 up to end of 4.4</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/5.pdf'>Chapter 5 up to end of 5.2</a><br>
      </td><td>5. Sep. 2023:<br>
      Course Logistics (<a href='../chapters/course_logistics.ipynb'>slides</a>)<br>
      Introduction to NLP (<a href='../chapters/intro_short.ipynb'>slides</a>)<br>
      Tokenisation &amp; Sentence Splitting (<a href='../chapters/tokenization.ipynb'>notes</a>, <a href='../chapters/tokenization_slides.ipynb'>slides</a>, <a href='../exercises/tokenization.ipynb'>exercises</a>)<br>
      Text Classification (<a href='../chapters/doc_classify_slides_short.ipynb'>slides</a>)<br>
      </td><td>8. &amp; 11. Sep. 2023:<br>
      Jupyter notebook setup, introduction to <a href='https://colab.research.google.com/'>Colab</a><br>
      Introduction to <a href='https://pytorch.org/tutorials/'>PyTorch</a><br>
      Project group arrangements<br>
      Questions about the course project<br>
      </td><td><a href='../labs/notebooks_2023/lab_1.ipynb'>lab 1</a></td></tr>
     <tr><td>37</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/3.pdf'>Chapter 3 up to end of 3.5</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/6.pdf'>Chapter 6 up to end of 6.4</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/7.pdf'>Chapter 7 up to end of 7.4</a><br>
      </td><td>12. Sep. 2023:<br>
      Language Modelling (<a href='../chapters/language_models_slides.ipynb'>slides</a>)<br>
      Word Embeddings (<a href='../chapters/dl-representations_simple.ipynb'>slides</a>)<br>
      </td><td>15. &amp; 18. Sep. 2023:<br>
      Word representations and sentiment classification<br>
      Project help<br>
      </td><td><a href='../labs/notebooks_2023/lab_2.ipynb'>lab 2</a></td></tr>
     <tr><td>38</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/7.pdf'>Chapter 7 up to end of 7.5</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/9.pdf'>Chapter 9 up to end of 9.2</a></td><td>19. Sep. 2023:<br>
      Recurrent Neural Networks (<a href='../chapters/rnn_slides_ucph.ipynb'>slides</a>)<br>
      Neural Language Models (<a href='../chapters/dl-representations_contextual.ipynb'>slides</a>)<br>
      </td><td>22. &amp; 25. Sep. 2023:<br>
      Error analysis and explainability<br>
      Project help<br>
      </td><td><a href='../labs/notebooks_2023/lab_3.ipynb'>lab 3</a></td></tr>
    <tr><td>39</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/8.pdf'>Chapter 8 up to end of 8.3</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/18.pdf'>Chapter 18 up to end of 18.2</a><br>
      </td><td>26. Sep. 2023:<br>
      Sequence Labelling (<a href='../chapters/sequence_labeling_slides.ipynb'>slides</a>, <a href='../chapters/sequence_labeling.ipynb'>notes</a>)<br>
      Parsing (<a href='../chapters/dependency_parsing_slides_active.ipynb'>slides</a>)<br>
      </td><td>29. Sep. &amp; 2. Oct. 2023:<br>
      Sequence labelling and beam search<br>
      Project help<br>
      </td><td><a href='../labs/notebooks_2023/lab_4.ipynb'>lab 4</a></td></tr>
     <tr><td>40</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/9.pdf'>9.8</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/10.pdf'>Chapter 10</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/11.pdf'>Chapter 11</a><br>
      </td><td>3. Oct. 2023:<br>
      Attention (<a href='../chapters/attention_slides2.ipynb'>slides</a>)<br>
      Transformers (<a href='../chapters/dl-representations_contextual_transformers.ipynb'>slides</a>)<br>
      </td><td>6. &amp; 9. Oct. 2023:<br>
      Language Models with <a href='https://huggingface.co/course/chapter1'>Transformers</a> and RNNs<br>
      Project help<br>
      </td><td><a href='../labs/notebooks_2023/lab_5.ipynb'>lab 5</a></td></tr>
     <tr><td>41</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/14.pdf'>Chapter 14</a><br>
      </td><td>10. Oct. 2023:<br>
      Information Extraction (<a href='../chapters/information_extraction_slides.ipynb'>slides</a>)<br>
      Question Answering (<a href='../chapters/question_answering_slides.ipynb'>slides</a>)<br>
      </td><td>13. &amp; 23. Oct. 2023:<br>
      In-depth look at Transformers and Multilingual QA<br>
      Project help<br>
      </td><td><a href='../labs/notebooks_2023/lab_6.ipynb'>lab 6</a></td></tr>
    <tr><td>43</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/13.pdf'>Chapter 13</a><br>
      <a href='https://shanzhenren.github.io/csci-699-replnlp-2019fall/lectures/W6-L3-Cross_Lingual_Transfer.pdf'>Wang, 2019</a><br>
      </td><td>24. Oct. 2023:<br>
      Machine Translation (<a href='../chapters/nmt_slides_active.ipynb'>slides</a>)<br>
      Transfer Learning (<a href='../chapters/xling_transfer_learning_slides.ipynb'>slides</a>)<br>
      </td><td>27. &amp; 30. Oct. 2023: Project help.</td><td></td></tr>
    <tr><td>44</td><td>
      <a href='https://aclanthology.org/Q19-1004.pdf'>Belinkov and Glass, 2019</a>
      </td><td>31. Oct. 2023:<br>
      Interpretability (<a href='../chapters/interpretability_slides.ipynb'>slides</a>)<br>
      </td><td>3. Nov. 2023: Project help.</td><td></td></tr></table>

The easiest way to view the course content is via the static [nbviewer](https://nbviewer.jupyter.org/github/coastalcph/nlp-course/blob/master/overview.ipynb). 
To be able to make changes to the book and render it dynamically, see the [installation instructions](INSTALL.md).


### Course Requirements
* Familiarity with machine learning (probability theory, linear algebra, classification)
* Knowledge of programming (Python)
* No prior knowledge of natural language processing or linguistics is required

Relevant machine learning competencies can be obtained through one of the following courses: 
* [NDAK22000U Machine Learning A (MLA)](https://kurser.ku.dk/course/ndak22000u) and/or [NDAK22001U Machine Learning B (MLB)](https://kurser.ku.dk/course/ndak22001u)
* [NDAK16003U Introduction to Data Science (IDS)](https://kurser.ku.dk/course/ndak16003u)
* [NDAB23000U Grundlæggende Data Science (GDS)](https://kurser.ku.dk/course/ndak23000u)
* [Machine Learning, Coursera](https://www.coursera.org/learn/machine-learning)

See also the [course description](https://kurser.ku.dk/course/ndak18000u).

### About You: previously taken courses related to NLP?

![survey_q1](../img/survey_q1.png)

### About You: previously taken courses in Machine Learning?

![survey_q2](../img/survey_q2.png)

### About You: experience with using neural network software libraries?

![survey_q3](../img/survey_q3.png)

### About You: degree are you enrolled in

![survey_q4](../img/survey_q4.png)

### About You: what you want to get out of this course

![survey_q5](../img/survey_q5.png)

### About You: what you want to get out of the lab sessions

![survey_q6](../img/survey_q6.png)

### Course Materials
* We will be using the [nlp-course](../overview.ipynb) book 
* Contains **interactive** [jupyter](http://jupyter.org/) notebooks and slides
    * View statically [here](https://nbviewer.jupyter.org/github/coastalcph/nlp-course/blob/master/overview.ipynb)
    * Use interactively via install, see [github repo](https://github.com/coastalcph/nlp-course) instructions  
* Recordings of 2020 lectures are available on [Absalon](https://absalon.ku.dk/courses/68562/external_tools/14563)
* References to other material are given in context
* This is work in progress.
    * Course materials are adapted from [previous iterations of the course at DIKU](https://github.com/copenlu/stat-nlp-book), which are in turn adapted from a [course that Isabelle Augenstein co-taught at UCL](https://github.com/uclmr/stat-nlp-book) (course organiser: [Sebastian Riedel](http://www.riedelcastro.org/))
    * Use `git pull` regularly for updates
    * *Watch* for updates
    * Please contribute by adding issues on github when you see errors
* For assignment hand-in, announcements, discussion forum, check [Absalon](https://absalon.instructure.com/courses/68562)

### Teaching Methods
* Course combines
    * Traditional lectures
    * Hands-on exercises
    * Group work
* Occasional small exercises during lectures, so bring your laptop
* You are expected to read some background material for each lecture
    * This is such that everyone is on the same page
    * And so that there is more time for exercises and discussions in lectures
* The background material will be made available a week before each lecture at the latest

### Lecture Preparation

* Read Background Material (required)
* Go through lecture notes, play with code (optional)
* Watch recordings from 2020 (optional)
* Do exercises (optional)

### Assessment Methods

* **[Group project (50%)](https://absalon.ku.dk/courses/68562/assignments/186503)**, can be completed in a group of up to 3 students
    * Released 1 September, **hand-in 3 November 17:00**
    * Joint report, contribution of each student should be stated clearly
    * Code to be uploaded as attachment
    * Individual grade for each group member, based on the quality and quantity of their contributions
    * Submission via Digital Exam
    * Consists of several parts tied to weekly lecture topics
    * AI assistance is allowed **with restrictions**
    * We cannot guarantee responses to queries about the project after 2 November 15:00

### Assessment Methods

* **Group project (50%)**, can be completed in a group of up to 3 students
    * AI assistance is allowed **with restrictions**:
        * As coding tools (e.g., GitHub Copilot): no restrictions.
        * As writing tools: no restrictions.
        * As search tools: no restrictions. Usual citation requirements apply.
        * As generation tools for *new* ideas: generated content must be clearly highlighted. Prompts/transcripts must be included.

See project description for more details.

### Assessment Methods

* **Group project (50%)**, can be completed in a group of up to 3 students
    * Finding a group: 
       * Deadline for group forming: **11 September 17:00**
       * We offer to help you find a group -- fill in the [Getting to Know You survey](https://absalon.ku.dk/courses/68562/quizzes/87334) by the end of *first lecture day,* **5 September 17:00**
       * If you choose this option, you will be informed of your assigned group on **6 September**
       * You can still change groups afterwards by asking other students to swap groups (it's your responsibility to arrange this)
       * Otherwise, we assume you will find a group by yourself in the first course week, e.g. by coordinating with other students in the lab session

### Assessment Methods

* **In-person written exam (50%)**, to be completed individually
    * Date: 10 November
    * Duration: 1.5 hours
    * Theoretical exam, covering the whole course curriculum
    * All aids allowed - per [UCPH policy](https://kunet.ku.dk/work-areas/teaching/digital-learning/chatgpt-and-ai/guidelines-and-rules-for-chatgpt/Pages/default.aspx), ChatGPT/GPT-4 and similar LLMs/generative AI are **not** permitted for the the exam

### Late Hand-In

* Late hand-ins **cannot be accepted**
* Exceptions can be made in rare cases, e.g. due to illness with doctor's notice
    * Get in touch with course organizers at least one working day in advance

### Plagiarism

* Don't do it
* Don't enable it
* Check [rules and consequences](https://student-ambassador.ku.dk/rights/avoid-plagiarism/) if unclear

### Docker

* The book and tutorials run in a [docker](https://www.docker.com/) container
* Container comes with all dependencies pre-installed
* You can install it on your machine or on Google Colab/Azure/AWS machines
* We provide no support for non-docker installations
* We recommend you use this container for your project
   * Contains all core software packages for solving the project
   * You may use additional packages if needed

In [7]:
display(Markdown("../INSTALL.md"))

# Installation Instructions

There are several ways to set-up and run the project:
1. [ Render Book Statically ](#render-book-statically)
2. [ Docker installation ](#install-docker)
3. [ Set-up a Local Virtual Environment ](#set-up-a-local-virtual-environment)

Important notes:
1. [ Access Content ](#access-content)
2. [ Pull new content regularly ](#pull-new-content-regularly)

## Render Book Statically
The easiest way to view the course content is via the static [nbviewer](https://nbviewer.jupyter.org/github/coastalcph/nlp-course/blob/master/overview.ipynb). 
While this does not allow you to change and execute code, it also doesn't require you to install software locally and only needs a browser.

## Docker installation 

To be able to make changes to the book and render it dynamically, we recommend you use Docker.
We assume you have a command line interface (CLI) in your OS 
(bash, zsh, cygwin, git-bash, power-shell, etc.). We assume this CLI sets 
 the variable `$(pwd)` to the current directory. If it doesn't replace
 all mentions of `$(pwd)` with the current directory you are in. 

### Install Docker

For Mac and Windows, go to the [docker webpage](https://www.docker.com/get-started) and follow the instruction for your platform. Instructions for Ubuntu can be found [here](https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-docker-ce-1). 

### Download Image

Next, you can download the `stat-nlp-book` docker image like so:

    docker pull bjerva/stat-nlp-book:ndak18000u

If you get a permission error here and at any later point, try prepending `sudo ` to the command:

    sudo docker pull bjerva/stat-nlp-book:ndak18000u
    
This process may take a while, so use the time to start familiarising yourself with [the structure of the course](https://nbviewer.jupyter.org/github/coastalcph/nlp-course/blob/master/overview.ipynb).

### Get Repository

You can use the git installation in the docker container to get the repository:

    docker run -v "$(pwd)":/home/jovyan/work bjerva/stat-nlp-book:ndak18000u git clone https://github.com/coastalcph/nlp-course 

Note: this will create a new `nlp-course` directory in your current directory.

### Change into directory

We assume from here on that you are in the top level `nlp-course` directory:

    cd nlp-course

Note: you need to be in the `nlp-course` directory every time you want to run/update the book.

### Run Notebook

    docker run -it --rm -p 8888:8888 -v "$(pwd)":/home/jovyan/work bjerva/stat-nlp-book:ndak18000u

You are now ready to visit the [overview page](http://localhost:8888/notebooks/overview.ipynb) *locally* through the installed book . 

### Usage

Once installed you can always run your notebook server by first changing
into your local `nlp-course` directory, and then executing:

    docker run -it --rm -p 8888:8888 -v "$(pwd)":/home/jovyan/work bjerva/stat-nlp-book:ndak18000u
    
This is **assuming that your docker daemon is running** and that you are
**in the `nlp-course` directory**. How to run the docker daemon
depends on your system.

### Update the notebook

We frequently make changes to the book. To get these changes you
should first make sure to clean your *local changes* to avoid merge 
conflicts. That is, you might have made changes (by changing the code
or simply running it) to the files that we changed. In these cases `git`
 will complain when you do the update. To overcome this you can undo all
 your changes by executing:
 
    docker run -v "$(pwd)":/home/jovyan/work bjerva/stat-nlp-book:ndak18000u git checkout -- .
    
If you want to keep your changes **create copies of the changed files**.
Jupyter has a "Make a copy" option in the "File" menu for this. You can also create a clone of this repository
to keep your own changes and merge our changes in a more controlled manner. 

To get the actual updates then run

    docker run -v "$(pwd)":/home/jovyan/work bjerva/stat-nlp-book:ndak18000u git pull

## Set-up a Local Virtual Environment

If you cannot use Docker, you can alternatively set up the book directly.

### git clone the repository

    git clone https://github.com/coastalcph/nlp-course

### Create virtual environment
Enter the cloned directory:

    cd nlp-course

and create the virtual environment:

    python -m venv nlp_venv

### Enter the virtual environment

    source nlp_venv/bin/activate

### Install dependencies

    pip install --upgrade pip
    
**MacOS**: Install rust

    curl https://sh.rustup.rs -sSf | sh
    
**MacOS**: Install xcode

    xcode-select --install
    
    pip install -r requirements.txt
    jupyter-nbextension install rise --py --sys-prefix
    jupyter-nbextension enable rise --py --sys-prefix    

### Run the notebook server 
(the UI of the server will be opened automatically)

    jupyter notebook
   

## Access Content

The repository contains a lot of material, some of which may not be ready
for consumption yet. This is why you should always access content through
the top-level [overview page (local-link)](http://localhost:8888/notebooks/overview.ipynb).

## Pull new content regularly
Receive notifications for new updates by "Watch" -ing the repo.


### Python

* Lectures, lab exercises and assignments focus on **Python**
* Python is a leading language for data science, machine learning etc., with many relevant libraries
* We expect you to know Python, or be willing to learn it **on your own**
* Labs and assignments focus on development within [jupyter notebooks](http://jupyter.org/)

### Lab Sessions

* Some lab sessions are tutorial-style (to introduce you to practical aspects of the course)
* Other lab sessions are open-topic. You can use them as an opportunity to:
   * ask the TAs clarifying questions about the lectures and/or project
   * ask the TAs for informal feedback on your project so far
   * work on your project with your group

### Discussion Forum

* Our Absalon page has a [**discussion forum**](https://absalon.ku.dk/courses/68562/discussion_topics).
* Please post questions there (instead of sending private emails) 
* We give low priority to **questions already answered** in previous lectures, tutorials and posts, 
    * and to **pure programming related issues**
* We expect you to **search online** for answers to your questions before you contact us.
* You are highly encouraged to participate and **help each other** on the forum. 
* The teaching team will check the discussion forum regularly **within normal working hours**
    * do not expect answers late in the evenings and on weekends
    * **start working on your project early**
    * come to the lab sessions and ask questions there

### DIKU NLP

* Research Section, UCPH Computer Science Department
* Faculty members: Isabelle Augenstein (head of section), Daniel Hershcovich, Desmond Elliott, Anders Søgaard
* Official webpage: https://di.ku.dk/english/research/nlp/
* List of group members: http://copenlu.github.io ; http://coastalcph.github.io/; https://elliottd.github.io/people.html
* Twitter: 
    * @copenlu https://twitter.com/CopeNLU
    * @coastalcph https://twitter.com/coastalcph
* Always looking for strong MSc students
* PhD positions available dependent on funding