Skip to content

joobumjin/BERT_QA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BERT_QA

Bumjin Joo

November, 2023

Learning experience; it looks like model saving works way better in PyTorch than TensorFlow.

In terms of results, I used a pretrained HuggingFace DistilBertForQuestionAnswering model.

This model performs very well on the dataset and task. Though the model is likely overfitting, as evidenced by the very erratic training loss but still extremely good validation metrics, it seems to perform in line with expectations based upon both the provided figures in the handout as well as the fact that I am using a pretrained base.

In terms of technical implementation details, I ran out of memory every time I needed to iterate through the batches of a second data loader after having traversed through a first one (e.g. when I attempt to get validation loss in train after already having trained in the same Runtime instance). Therefore, I downloaded the weights after each trainstep and used that checkpoint for evaluation. I tried really hard to spam memory freeing everywhere, but to no avail. I also tried extra Google Cloud GPUs, but I couldn't get more than 1 so my GPU memory remained at 15GB.

I thought it would be interesting to use the general DistilBert model and apply my own classifier heads ontop of that model's outputs. However, due to time constraints from debugging the above issues, I ran out of time to do so.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published