## Neural Network Compression and Accleration Experiments on BERT and TinyLlama-1.1B

TODO:
1) If you have time, create better graphs, or tables for your outputs

This notebook contains the aggregate experimentation results for our network compression and acceleration methods

In [8]:
import sys
import importlib
sys.path.append("..")
importlib.reload(sys.modules['CompressionMethods.distillation'])
#importlib.reload(sys.modules['CompressionMethods.utils'])

from datasets import load_dataset

from CompressionMethods.BERTFineTuning import BERTFineTuning
from CompressionMethods.distillation import DistillationModule
from CompressionMethods.GPTQQuantizer import GPTQQuantizer
from CompressionMethods.utils import utils
import warnings
warnings.filterwarnings('ignore')

### Finetuning BERT on Multilabel Classification Model

In [32]:
bft = BERTFineTuning("bert-base-uncased")
bft.get_device()
finetuned_bert = bft.finetune()

Using device:  cuda


Map: 100%|██████████| 886/886 [00:00<00:00, 10877.59 examples/s]
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initializ

Epoch,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
1,0.4035,0.31914,0.675318,0.77259,0.288939
2,0.2817,0.309701,0.704274,0.798457,0.28781
3,0.24,0.310101,0.706287,0.801536,0.276524
4,0.2163,0.310814,0.708271,0.801399,0.274266
5,0.1901,0.314582,0.709167,0.802292,0.276524


Training time: 403.6719915866852


In [4]:
dataset = load_dataset("sem_eval_2018_task_1", "subtask5.english")
utils_bert_finetuned = utils('./bert-finetuned', dataset = dataset)
utils_bert_finetuned.get_model_size()
utils_bert_finetuned.evaluate_bert_model()

Model size: 417.682MB
count    886.000000
mean       0.013172
std        0.018991
min        0.012010
25%        0.012294
50%        0.012448
75%        0.012603
max        0.577584
dtype: float64
{'f1': 0.6886717718510694, 'roc_auc': 0.7804860661385984, 'accuracy': 0.2742663656884876}


### BERT - Distillation

In [9]:
dm = DistillationModule()
distilled_model = dm.perform_distillation(teacher_model_id = f'./bert-finetuned', student_model_id = 'distilbert/distilbert-base-uncased', dataset = dataset, num_labels = 11)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
1,1.238,0.427604,0.688204,0.7911,0.264108
2,0.4795,0.332423,0.694124,0.790821,0.282167
3,0.3298,0.305789,0.69799,0.793034,0.278781
4,0.2629,0.29443,0.697372,0.793234,0.275395
5,0.2191,0.292132,0.697326,0.793433,0.27991


Training time:  445.8245093822479


In [10]:
dataset = load_dataset("sem_eval_2018_task_1", "subtask5.english")
utils_bert_dist = utils('./bert-distilled', dataset = dataset)
utils_bert_dist.get_model_size()
utils_bert_dist.evaluate_bert_model()

Model size: 255.443MB
count    886.000000
mean       0.006462
std        0.000528
min        0.006187
25%        0.006307
50%        0.006387
75%        0.006469
max        0.014305
dtype: float64
{'f1': 0.6690159934941718, 'roc_auc': 0.766135608865375, 'accuracy': 0.2618510158013544}


### TinyLlama-1.1B GPTQ

In [3]:
llm = "heegyu/TinyLlama-augesc-context"
gptq_quantizer = GPTQQuantizer(llm)
dataset = load_dataset("heegyu/augesc")
x, y, label_map = utils(llm).process_tinyllama_dataset(dataset)
llm_model_gptq = gptq_quantizer.quantize(x, "heegyu/TinyLlama-augesc-context")

Repo card metadata block was not found. Setting CardData to empty.
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at heegyu/TinyLlama-augesc-context and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Quantizing model.layers blocks : 100%|██████████| 22/22 [09:30<00:00, 25.91s/it]
The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class


In [20]:
print("base model size:")
utils(llm).get_model_size()
gptq_quantized_model = gptq_quantizer.load_model() #you need to run this for all GPTQ models for it to work

base model size:
Model size: 3968.417MB


Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at ./gptq-quantized-model and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model size: 618.512MB


In [6]:
dataset = load_dataset("heegyu/augesc")
utils_llm_gptq = utils('./gptq-quantized-model', dataset = dataset)
utils_llm_gptq.get_model_size()
utils_llm_gptq.evaluate_llm_model()

Repo card metadata block was not found. Setting CardData to empty.
`low_cpu_mem_usage` was None, now set to True since model is quantized.
The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at ./gptq-quantized-model and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model size: 618.512MB
0.032
count    1000.000000
mean        0.058827
std         0.020093
min         0.054847
25%         0.056873
50%         0.057546
75%         0.058401
max         0.680813
dtype: float64
