nlp-paper-summary

This repo contains the summary of following NLP papers.

BERT-vs-LSTM

TinyBERT

ALBERT

Poor Man's BERT

SpanBERT

BERT-vs-LSTM

This paper talks about:

Give a small dataset, can we use a large pre-trained model like BERT and get better results than simple models?

Checkout the summary here

TinyBERT: Distilling BERT for Natural Language Understanding

This paper talks about:

BERT based models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on resource-restricted devices. How to reduce the size while keeping the performance drop to minimum?

TinyBERT is empirically effective and achieves more than 96% the performance of teacher BERT(base) on GLUE benchmark, while being 7.5x smaller and 9.4x faster on inference.

Checkout the summary here

ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

This paper talks about:

Increasing model size while pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. How to handle these issues?

ALBERT proposes two parameter reduction techniques to lower memory consumption and increase the training speed of BERT. It also proposes a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs.

Checkout the summary here

Poor Man's BERT: Smaller and Faster Transformer Models

This paper talks about:

NLP has recently been dominated by large-scale pre-trained Transformer models, where size does matter. Models such as BERT, XLNet, RoBERTa, etc. are now out of reach for researchers and practitioners without large GPUs/TPUs. How to reduce the model size that do not require model pretraining from scratch?

There are many ways to reduce the size of pre-trained models. Some notable approaches are:

Prune parts of the network after training
Reduction through weight factorization and sharing (Albert)
Compression through knowledge distillation (Distilbert, Tinybert)
Quantization (Q-bert)

This work falls under the class of pruning methods.This paper question's whether it is necessary to use all layers of a pre-trained model in downstream tasks and propose straight-forward strategies to drop some layers from the neural network.

Checkout the summary here

SpanBERT: Improving Pre-training by Representing and Predicting Spans

This paper talks about:

Pre-training objective plays an important role in learning the representations of language. BERT's pretraining objective contains 2 parts: Masked Language Modelling (MLM) and Next Sentence Prediction (NSP). This paper proposes new pre-training obejective which can better encode the sentences.

Checkout the summary here

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
albert		albert
bert-vs-lstm		bert-vs-lstm
poor-man-bert		poor-man-bert
span-bert		span-bert
tiny-bert		tiny-bert
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

albert

albert

bert-vs-lstm

bert-vs-lstm

poor-man-bert

poor-man-bert

span-bert

span-bert

tiny-bert

tiny-bert

LICENSE

LICENSE

README.md

README.md

Repository files navigation

nlp-paper-summary

BERT-vs-LSTM

Give a small dataset, can we use a large pre-trained model like BERT and get better results than simple models?

TinyBERT: Distilling BERT for Natural Language Understanding

BERT based models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on resource-restricted devices. How to reduce the size while keeping the performance drop to minimum?

ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

Increasing model size while pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. How to handle these issues?

Poor Man's BERT: Smaller and Faster Transformer Models

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Pre-training objective plays an important role in learning the representations of language. BERT's pretraining objective contains 2 parts: Masked Language Modelling (MLM) and Next Sentence Prediction (NSP). This paper proposes new pre-training obejective which can better encode the sentences.

About

Releases

Packages

License

graviraja/nlp-paper-summary

Folders and files

Latest commit

History

Repository files navigation

nlp-paper-summary

BERT-vs-LSTM

Give a small dataset, can we use a large pre-trained model like BERT and get better results than simple models?

TinyBERT: Distilling BERT for Natural Language Understanding

BERT based models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on resource-restricted devices. How to reduce the size while keeping the performance drop to minimum?

ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

Increasing model size while pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. How to handle these issues?

Poor Man's BERT: Smaller and Faster Transformer Models

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Pre-training objective plays an important role in learning the representations of language. BERT's pretraining objective contains 2 parts: Masked Language Modelling (MLM) and Next Sentence Prediction (NSP). This paper proposes new pre-training obejective which can better encode the sentences.

About

Resources

License

Stars

Watchers

Forks