# [Project] BiBERT

## Content

Introduction
<br/>Development Environment
<br/>BiBERT
<br/>Conclusion
<br/>Reference

## Introduction

<br>Dataset: SST-2
<br><br>Task: Natural Language Processing
<br><br>Method: Full Binarized Quantization, Straight Through Estimator (STE)
<br><br>Compression: numpy.packbits

<br>

## Development Environment

In [None]:
%pip install torch==1.13.1  #cuda=11.8
%pip install scipy
%pip install seaborn
%pip install openpyxl
%pip install matplotlib
%pip install tensorboard
%pip install scikit-learn
%pip install setuptools==59.5.0
%pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com pytorch-quantization

In [1]:
import os
import glob
import torch
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

## BiBERT

In [3]:
pwd

'/workspace/deep_learning_bibert'

In [2]:
import torch
print(torch.__version__)
print(torch.version.cuda)

1.13.1+cu117
11.7


In [None]:
/usr/local/bin/python3.6 -m pip install ipywidgets
jupyter nbextension enable --py --sys-prefix widgetsnbextensio

[Quantize Transformers models](https://huggingface.co/docs/transformers/main_classes/quantization)
[NF4 (4-bit NormalFloat) and bitsandbytes](https://www.tensorops.ai/post/what-are-quantized-llms)

In [None]:
from pytorch_quantization.tensor_quant import QuantDescriptor
from pytorch_quantization.nn.modules.tensor_quantizer import TensorQuantizer

quant_desc = QuantDescriptor(num_bits=1, fake_quant=False, axis=(0), unsigned=True)
quantizer = TensorQuantizer(quant_desc)

torch.manual_seed(12345)
x = torch.rand(10, 9, 8, 7)

quant_x = quantizer(x).to(dtype=torch.int8) # signed
print(quant_x.dtype) 

In [16]:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # revised
os.environ["CUDA_VISIBLE_DEVICES"]="1"

!python quant_task_glue.py \
    --data_dir 'data' \
    --model_dir 'models/bert-base-uncased' \
    --task_name 'sst-2' \
    --output_dir 'output' \
    --log_dir 'log/bibert/sst2' \
    --learning_rate 2e-4 \
    --num_train_epochs 2 \
    --batch_size 128 \
    --seed 42 \
    --weight_bits 1 \
    --embedding_bits 1 \
    --input_bits 1 \
    --do_evaluation False

10/10 08:05:26 AM Model config {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.8.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

10/10 08:05:27 AM Loading model models/bert-base-uncased/sst-2/pytorch_model.bin
10/10 08:05:27 AM loading model...
10/10 08:05:27 AM done!
10/10 08:05:27 AM Weights of BertForSequenceClassification not initialized from pretrained model: ['bert.embeddings.word_embeddings.weight', 'bert.embeddings.position_embeddings.weight', 'bert.embeddings.token_type_embeddings.wei

In [5]:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # revised
os.environ["CUDA_VISIBLE_DEVICES"]="1"

!python quant_task_glue.py \
    --data_dir 'data' \
    --model_dir 'models/bert-base-uncased' \
    --task_name 'sst-2' \
    --output_dir 'output' \
    --log_dir 'log/bibert/sst2' \
    --learning_rate 2e-4 \
    --num_train_epochs 2 \
    --batch_size 128 \
    --seed 42 \
    --weight_bits 1 \
    --embedding_bits 1 \
    --input_bits 1 \
    --do_evaluation False

10/09 09:48:47 AM Model config {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.8.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

10/09 09:48:49 AM Loading model models/bert-base-uncased/sst-2/pytorch_model.bin
10/09 09:48:49 AM loading model...
10/09 09:48:49 AM done!
10/09 09:48:49 AM Weights of BertForSequenceClassification not initialized from pretrained model: ['bert.embeddings.word_embeddings.weight', 'bert.embeddings.position_embeddings.weight', 'bert.embeddings.token_type_embeddings.wei

## Reference

**Paper**
<br/>[Haotong et al. BiBERT: Accurate Fully Binarized BERT, ICLR, 2022](https://arxiv.org/abs/2010.11929)

<br/>**Github**
<br/>[htqin/BiBERT](https://github.com/htqin/BiBERT)
<br/>[Zhen-Dong/BitPack](https://github.com/Zhen-Dong/BitPack)