BERT Optimisation

Trains BERT, a NLP library developed by Google (specifically 'bert-base-uncased') several times on the IMDB dataset with different optimisation techniques to maximise performance while minimising training time. The objective of this experiment is to compare how efficiently certain optimisers are when testing the early stages of model performance running on a local machine (i.e. no cloud resources).

Model Parameter Size: 110M

The IMDB dataset is a collection of critic-produced movie reviews with sentiment results commonly used in NLP especially for sentiment analysis. It consists of 2 columns, 'review' and 'sentiment' where review is a long string of text delimited by spaces and sentiment is a binary result between 'positive' and 'negative'.

IMDB Train Dataset Full Size: 25k (scaled down to 3.2k for faster training)
IMDB Test Dataset Full Size: 25k (scaled down to 800 for maintaining 80:20 split)
Both datasets are completely independent to one another.

Create Virtual Environment:
python -m venv bert-env

Activate Virtual Environment:
source bert-env/bin/activate (MacOS/Linux)
bert-end\Scripts\Activate.ps1 (Windows)

Libraries/Dependencies used:

PyTorch
HuggingFace
Matplotlib
sklearn
tqdm, gc, warnings

Python Version: 3.12.2

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
adamw_cosine_scheduler_lrs.png		adamw_cosine_scheduler_lrs.png
adamw_results_noisy_lr.png		adamw_results_noisy_lr.png
adamw_results_noisy_lr_errorfill.png		adamw_results_noisy_lr_errorfill.png
adamw_results_noisy_wd.png		adamw_results_noisy_wd.png
adamw_results_noisy_wd_errorfill.png		adamw_results_noisy_wd_errorfill.png
bert.ipynb		bert.ipynb
multi-optimiser_results.png		multi-optimiser_results.png
multi-optimiser_results_errorfill.png		multi-optimiser_results_errorfill.png
multi-optimiser_results_noisy.png		multi-optimiser_results_noisy.png
multi-optimiser_results_noisy_errorfill.png		multi-optimiser_results_noisy_errorfill.png
sample_training_results.txt		sample_training_results.txt
sample_training_results_noisy_lr.txt		sample_training_results_noisy_lr.txt
sample_training_results_noisy_wd.txt		sample_training_results_noisy_wd.txt
training.json		training.json
training_adamw.json		training_adamw.json
training_opt.json		training_opt.json
training_opt_noisy.json		training_opt_noisy.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT Optimisation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BERT Optimisation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages