A demo of this code was presented in a Talk (Subject: Transformer applications in NLP) as part of the requirements for the Master program (2023) in Computer Science at the Frankfurt University of Applied Sciences in Frankfurt. It only serves demonstration purposes.
The program shows how to download a pretrained language model from huggingface and finetune it according to your topic domain and needs.
Here, I use a pretrained Google BERT (bert-base-uncased) model and finetune it with the emotion (train) dataset that contains Twitter text messages labelled with one of six sentiment classes (sadness, joy, fear, anger, love, surprise).
The emotion (validation) dataset is then used to validate the freshly finetuned model and the emotion (test) datset is used to make predictions on unseen data.
Follow the steps below and you can do exactly that yourself.
- Go to settings.py and insert the names of your desired pretrained "model_name" (default: bert-base-uncased) and "dataset_name" (default: emotion). Choose any other from here and here.
- To download and finetune the pretrained model, go to the Jupyter Notebook "1_finetune_pretrained_model.ipynb" and run it. -> CAUTION: Finetuning can take some time (~ 3 to 30 min) depending on your setup. The freshly finetuned model will eventually be saved into the "finetuned_models_folder" defined in "settings.py".
- To make predictions with your just finetuned model, go to the Jupyter Notebook "2_make_predictions_with_finetuned_model.ipynb" and run it. Predictions will be made for the test subset of your dataset. Results will be shown for 100 random samples of your test subset in a pandas DataFrame.