- What is BERT? https://blog.naver.com/qbxlvnf11/222140142456
- The Distilling Knowledge method is applied to the existing BERT model
- Similar performance while having a much smaller size and faster speed
- Distilling Knowledge method
- BERT implementation with pytorch-pretrained-BERT library
- BERT implementation with keras ktrain library
- DistilBERT implementation with huggingface transformers library
https://www.kaggle.com/uciml/news-aggregator-dataset
https://www.kaggle.com/c/detecting-insults-in-social-commentary
https://www.kaggle.com/competitions/nlp-getting-started/data?select=train.csv
@article{BERT,
title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
author={Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova},
journal = {arXiv},
year={2018}
}
@article{DistilBERT,
title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
author={Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf},
journal = {arXiv},
year={2019}
}
https://towardsdatascience.com/bert-text-classification-in-3-lines-of-code-using-keras-264db7e7a358
https://github.com/amaiya/ktrain/blob/master/examples/text/IMDb-BERT.ipynb
https://pypi.org/project/pytorch-pretrained-bert/
https://github.com/shudima/notebooks
https://github.com/huggingface/transformers
https://www.kaggle.com/code/donkeys/distilbert-xlnet-with-tf-and-huggingface/notebook
- [66] block: train_loss -> loss
- [4] block: changing name of X axis and Y axis of plot