# NNIA Assignment 10

**(Tentative) DEADLINE: 25. 01. 2023 08:00 CET**

Submission more than 10 minutes past the deadline will **not** be graded!

- Name & ID 1 (Teams username e.g. s8xxxxx):
- Name & ID 2 (Teams username e.g. s8xxxxx):
- Name & ID 3 (Teams username e.g. s8xxxxx):
- Hours of work per person:

# Submission Instructions

**IMPORTANT** Please make sure you read the following instructions carefully. If you are unclear about any part of the assignment, ask questions **before** the assignment deadline. All course-related questions can be addressed on the course **[Piazza Platform](https://piazza.com/class/kvc3vzhsvh55rt)**.

* Assignments are to be submitted in a **team of 2 or 3**.
* Please include your **names**, **ID's**, **Teams usernames**, and **approximate total time spent per person** at the beginning of the Notebook in the space provided
* Make sure you appropriately comment your code wherever required.
* Your final submission should contain this completed Jupyter Notebook, including the bonus question (if you attempt it), and any necessary Python files.
* Do **not** submit any **data or cache files** (e.g. `__pycache__`, the dataset PyTorch downloads, etc.). 
* Upload the **zipped** folder (*.zip* is the only accepted extension) in **Teams**.
* Only **one member** of the group should make the submisssion.
* **Important** please name the submitted zip folder as: `Name1_id1_Name2_id2.zip`. The Jupyter Notebook should also be named: `Name1_id1_Name2_id2.ipynb`. This is **very important** for our internal organization epeatedly students fail to do this.

## 1 Theory Review (1 point)

Please consult the deeplearning book (namely [chapter 10](https://www.deeplearningbook.org/contents/rnn.html)), the lecture slides, or other resources to answer the following questions.

Please note that some of these may not be covered in the lectures or may not be reached in this week's lecture. However, the deadline for this assignment is later than usual so you will have heard two lectures before the assignment is due.

1. Explain what the teacher forcing training approach is like. (0.5 pts)
1. Explain what problem the LSTM type of recurrent network and what each of the gates it uses addresses (0.5 pts)

## 2 RNN (5 pts)

The English alphabet has 26 letters. This is not the case for Hungarian(and many more European languages), which has many more extra letters, namely those with an acute accent (A) : `áéíóú`, those with an umlaut (U): `öü` and those with a double acute accent (D):  `őű`.
Imagine you've been tasked by a telecommunication company to come up with a system to automatically add all the accents correctly given a word without accents.
Each line in the input contains two words: with and without accents.
We provide the train/dev split for you (in separate files).  
  
Build a character-level recurrent neural network (many (n) to many (n)) that classifies the correct accent (or no accent) for each letter of an input word.  E.g. for the word `ölében` with the accents removed: `oleben -> UNANNN`, where N refers to no accent.  
  
You can use non-vanilla RNN cells (such as LSTM or GRU) as well as any other tricks you know. A classification FFNN should follow the recurrent part. **Report train and dev results.**  
You do not necessarily need a GPU for this task, but feel free to use Google Colab. Either way, we recommend that you work with a subset of the data first for faster development.

**NOTE**: You should be able to generate the "accented" string using the output of your network. We recommend you use each type of accent (or lack thereof) as a possible output class. Once your model can predict these classes, use the predictions to apply "edits" to the unaccented string.

In [None]:
# Read the data

In [None]:
# Prepare the data

In [None]:
# Define and train your model

In [None]:
# Report your results

## 3 Introduction to HuggingFace and Pre-trained models (4 points)

Now that you've worked with RNNs in their raw form, we will move on to using a *hub* of models. Well, what does that even mean?

[HuggingFace](https://huggingface.co/) (HF) has taken it upon themselves to build a hub for models, datasets, metrics, etc. This basically means that dataset, metrics and models developed and trained by big labs are readily accessible through just a few lines of code.

In this exercise your task will be to:

1. Download a dataset from the HuggingFace repositories
1. Download a model from the hub
1. Train/fine-tune the aforementioned model
1. Test your performance on the dataset

We encourage you to look at the documentation and brief introductions to HuggingFace [here](https://huggingface.co/docs/transformers/index) and a quick tour of the library and hub [here](https://huggingface.co/docs/transformers/quicktour).

In [None]:
# Before we begin, we must install our datasets and transformers libraries
!pip install -q datasets==2.8.0 transformers==4.25.1

### 2.1 Downloading the relevant dataset (0.5 points)

For this task, we will be addressing yet another NLP task. A quite common one at that: sentiment analysis/classification.

There are various datasets already pre-built for a task as popular as this. We will be using the Rotten Tomatoes dataset in particular [the one hosted here](https://huggingface.co/datasets/rotten_tomatoes).

The link above leads you to the dataset page on HF. This page contains a lot of valuable information:

1. A brief summary of the dataset. From this we learn how many datapoints we have for this classification task. Furthermore we learn where/how the data was collected as we are pointed to a publication.
1. Information on the data structure. Here we learn how the data looks in its raw form, how much space it takes, etc. Importantly we also learn what default splits are included in the dataset.
1. Additional information on curation processes, social impact, etc.

Next you will be tasked to download the dataset and show 5 random examples (along with their labels).

In [None]:
# TODO: Download the dataset. Remember to import the necessary functions.
# HINT: Check the HF quick tour and/or documentation pages


In [None]:
# TODO: Show 5 random examples for each split of the dataset. Use a seed of `42`

Hopefully, after seeing a sample of the reviews we will be trying to classify, you already get a feeling for how hard this task could be.

### 2.2 Downloading our relevant model (1 point)

Here, we task you with downloading a model from the HF hub. The relevant model we will be working with is the RoBERTa model.

[[model card]](https://huggingface.co/roberta-base) [[paper]](https://arxiv.org/abs/1907.11692) [[blog post]](https://ai.facebook.com/blog/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems/)

You will learn more about this type of models in the next lecture. What you need to know right now is that RoBERTa is a powerful model built ontop of a transformer architecture. It is the result of a lot of progress in NLP and new architecture proposals and pre-training strategies.

#### A brief note on transfer learning

But **what is pre-training**? In Deep Learning for NLP, a paradigm shift happened around 2017/2018, where researchers went from training models from scratch for each downstream task to pre-training models on a general, usually self-supervised, task. After this pre-training was done (on massive amounts of data), the resulting model was then simply *fine-tuned* on the final downstream task.

Pre-training is then simply this training phase on a very broad and general task that is usually not immediately applicable to solving a problem.

**Why does this work?** Not everything about transfer learning is well-understood. The intuition behind it is that if you learn a general task, adaptaing your knowledge to more specific tasks should not take as much effort as learning from scratch. Furthermore, you can leverage the more general knowledge when adapting to a more specific task and end up performing better.

For this task you are asked to download the RoBERTa model from the HF hub, both in its raw form (i.e. no pre-training) and in its pre-trained weights (i.e. the resulting weights learned from the pre-training are automatically loaded for you).

### The task

* Load the model architecture without any pre-trained weights loaded
* Load the model with the pre-trained weights

Note, our task, as we saw in the previous point, is a sequence classification, i.e. we assign a class to a whole sequence and not e.g. a label for each token.
For this reason, you should load the model in its [SequenceClassification](https://huggingface.co/docs/transformers/model_doc/roberta#transformers.RobertaForSequenceClassification) configuration.

In [None]:
# We will load the tokenizer for you
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("roberta-base")

In [None]:
# TODO: Download the RoBERTa model from the HF hub *without* pretrained weights
# HINT: Check the documentation here https://huggingface.co/docs/transformers/model_doc/roberta

# TODO: model without pretrained weights

# TODO: Download RoBERTa model from HF hub *with* the pretrained weights loaded

### 2.3 Training your models! (2.5 points)

Now you will train your models.

For this you have three alternative, two viable ones in the context of this course.

HF offers a trainer class ([here is a quick guide on how to use it](https://huggingface.co/docs/transformers/quicktour#trainer-a-pytorch-optimized-training-loop)). Feel free to use this, as this will probably be the fastest way to get working code running.

Alternatively, the models you have loaded both inherit from pytorch's `nn.Module`, meaning you can train them as you have trained models so far.

1. Set up your training code
1. Pick whatever hyper-parameters you deem necessary
1. Use the validation split for the evaluation during training.
1. Evaluate your final model with the test split.
1. Report your findings for both models.
1. Discuss what you expected to happen for each and how the end result compares to your expectations.