Skip to content

jaymody/picoBERT

Repository files navigation

picoBERT

Like picoGPT, but for BERT.

Dependencies

pip install -r requirements.txt

Tested on Python 3.9.10.

Usage

  • bert.py contains the actual BERT model code.
  • utils.py includes utility code to download, load, and tokenize stuff.
  • tokenization.py includes BERT WordPiece tokenizer code.
  • pretrain_demo.py code to demo BERT doing pre-training tasks (MLM and NSP).
  • classify_demo.py code to demo training an SKLearn classifier using the BERT output embeddings as input. This is not the same as actually fine-tuning the BERT model.

To demo BERT on pre-training tasks:

python pretrain_demo.py \
    --text_a "The apple doesn't fall far from the tree." \
    --text_b "Instead, it falls on Newton's head." \
    --model_name "bert-base-uncased" \
    --mask_prob 0.20

Which outputs:

mlm_accuracy = 0.75
is_next_sentence = True

If we add the --verbose flag, we can also see where the model went wrong with masked language modeling:

input = ['[CLS]', 'the', 'apple', 'doesn', "'", '[MASK]', 'fall', 'far', 'from', 'the', 'tree', '.', '[SEP]', 'instead', ',', 'it', 'falls', 'on', '[MASK]', "'", '[MASK]', '[MASK]', '.', '[SEP]']

actual: t
pred: t

actual: newton
pred: one

actual: s
pred: s

actual: head
pred: head

Instead of predicting the word "newton", it predicted the word "one", which still gives a valid sentence "Instead, it falls on one's head.".

For a demo of training an SKLearn classifier for the IMDB dataset, using BERT output embeddings as input to the classifier:

python classify_demo.py
    dataset_name "imdb" \
    N 1000 \
    test_ratio 0.2 \
    model_name "bert-base-uncased" \
    models_dir "models"

Which outputs (note, it takes a while to run the BERT model and extract all the embeddings):

              precision    recall  f1-score   support

           0       0.78      0.85      0.81       104
           1       0.82      0.74      0.78        96

    accuracy                           0.80       200
   macro avg       0.80      0.79      0.79       200
weighted avg       0.80      0.80      0.79       200

Not bad, 80% accuracy using only 800 training examples and a simple SKLearn model. Of course, fine-tuning the entire model over all the training examples would yield much better results.

About

Like picoGPT but for BERT.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages