<a href="https://colab.research.google.com/github/jb-diplom/jb-thesis/blob/main/NLM-Trainer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#The effect of humour in political messaging: 
An investigation combining fine-tuned neural language models and social network analysis
by
Janice Butler: University of Amsterdam, Master Thesis 2021

### Introduction
This notebook implements the fine-tuning of various neural language models (NLMs) based on a new corpus of annotated humorous texts.
Two classifications are made

1.   Degree of humour
2.   Comic styles



### Install dependencies
Install the Hugging Face and Weights & Biases libraries, and the dataset and training script for humour fine-tuning.

In [None]:
!pip install transformers -qq           # huggingface framework for loading and training models, preprocessing of data
!pip install transformers datasets -qq  # currently transformers datasets --> add own data
!pip install wandb -qq                  # for visualization of results on the project dashboard https://wandb.ai/jb-diplom/janice-demo
!pip install sentencepiece              # required for deberta
# this was the basis for the inital imlementation
# !wget https://github.com/huggingface/transformers/blob/master/examples/text-classification/run_glue.py -qq

## API Key
The following call registers this run at Weights and Biases github unless a session is already active.
Optionally, we can set environment variables to customize W&B logging. See [documentation](https://docs.wandb.com/library/integrations/huggingface).

In [None]:
import wandb
wandb.login()
# Optional: log both gradients and parameters
%env WANDB_WATCH=all

In [None]:
# uncomment to show what gpu is i use
#gpu = !nvidia-smi
#gpu = '\n'.join(gpu)
#print(gpu)

## Train the model
Next, call the downloaded training script [run_glue.py](https://huggingface.co/transformers/examples.html#glue) and see training automatically get tracked to the Weights & Biases dashboard. This script fine-tunes BERT on the Microsoft Research Paraphrase Corpus— pairs of sentences with human annotations indicating whether they are semantically equivalent.

In [None]:
%env WANDB_PROJECT=janice-demo
%env TASK_NAME=sst2
!pwd
from google.colab import drive
drive.mount('/content/gdrive',True)
!python '/content/gdrive/My Drive/Colab Notebooks/Visualization/run_glue.py' \
  --model_name_or_path microsoft/deberta-base \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 256 \
  --per_device_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 4 \
  --output_dir /tmp/sst2-DeBERTa/ \
  --overwrite_output_dir \
  --logging_steps 50

## Visualization of results in dashboard
Analyze results (as they happen) on the project dashboard https://wandb.ai/jb-diplom/janice-demo
