BERT_QA

Bumjin Joo

November, 2023

Learning experience; it looks like model saving works way better in PyTorch than TensorFlow.

In terms of results, I used a pretrained HuggingFace DistilBertForQuestionAnswering model.

This model performs very well on the dataset and task. Though the model is likely overfitting, as evidenced by the very erratic training loss but still extremely good validation metrics, it seems to perform in line with expectations based upon both the provided figures in the handout as well as the fact that I am using a pretrained base.

In terms of technical implementation details, I ran out of memory every time I needed to iterate through the batches of a second data loader after having traversed through a first one (e.g. when I attempt to get validation loss in train after already having trained in the same Runtime instance). Therefore, I downloaded the weights after each trainstep and used that checkpoint for evaluation. I tried really hard to spam memory freeing everywhere, but to no avail. I also tried extra Google Cloud GPUs, but I couldn't get more than 1 so my GPU memory remained at 15GB.

I thought it would be interesting to use the general DistilBert model and apply my own classifier heads ontop of that model's outputs. However, due to time constraints from debugging the above issues, I ran out of time to do so.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Colab Export		Colab Export
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT_QA

About

Releases

Packages

Languages

joobumjin/BERT_QA

Folders and files

Latest commit

History

Repository files navigation

BERT_QA

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages