Skip to content

moon23k/Efficient_PLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient PreTrained Language Models

After the advent of BERT, many pretrained language models were introduced, and most of them opted for larger model sizes to achieve better performance. While large-scale pretrained models indeed ensure good performance, they come with the drawback of being challenging to use in typical computing environments. To address this issue, there has been a movement to build efficient pretrained models that can offer a certain level of performance.

There are broadly two ways to enhance model efficiency: one is by reducing the number of model parameters, and the other is by improving the attention mechanism. This project compares representative models of these two approaches with the baseline model, BERT, and assess the efficiency gains in real tasks. Models aimed at lightweighting include ALBERT, Distil BERT, and Mobile BERT, while those focused on enhancing attention mechanisms include Reformer, Longformer, and BigBird.


Model Descs


LightWeight Focused Models

  • ALBERT
    A Lite BERT

  • Distil BERT
    Distilled BERT

  • Mobile BERT



Attention Focused Models

  • Reformer

  • Longformer

  • BigBird



Model Specs

LightWeight Focused Models

Model Params Size LightWeight Ratio (BERT Based)
BERT   109,482,240     417.649 MB   100%
AlBERT 11,683,584 44.577 MB   10.67%  
Distil BERT 66,362,880 253.158 MB 60.62%
  Mobile BERT   24,581,888 93.776 MB 22.45%



Attention Focused Models

Model Params Size Attention Type
BERT   109,482,240     417.649 MB   Full Attention
Reformer 148,654,080 567.070 MB   Sparse Attention  
  Longformer   148,659,456 567.091 MB -
Big Bird 127,468,800 486.317 MB -



Results

LightWeight Focused Models

BERT AlBERT Distil BERT Mobile BERT
COLA Accuracy - - - -
Training Speed per Batch - - - -



Attention Focused Models

BERT Reformer Longformer Big Bird
IMDB Accuracy - - - -
Training Speed per Batch - - - -

How to Use

python3 run.py -mode ['lightweight', 'attention']



References


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages