# `transformers` meets `bitsandbytes` for democratzing Large Language Models (LLMs) through 4bit quantization

<center>
<img src="https://github.com/huggingface/blog/blob/main/assets/96_hf_bitsandbytes_integration/Thumbnail_blue.png?raw=true" alt="drawing" width="700" class="center"/>
</center>

Welcome to this notebook that goes through the recent `bitsandbytes` integration that includes the work from XXX that introduces no performance degradation 4bit quantization techniques, for democratizing LLMs inference and training.

In this notebook, we will learn together how to load a large model in 4bit (`gpt-neo-x-20b`) and train it using Google Colab and PEFT library from Hugging Face 🤗.

[In the general usage notebook](https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf?usp=sharing), you can learn how to propely load a model in 4bit with all its variants.

If you liked the previous work for integrating [*LLM.int8*](https://arxiv.org/abs/2208.07339), you can have a look at the [introduction blogpost](https://huggingface.co/blog/hf-bitsandbytes-integration) to lean more about that quantization method.


In [1]:

!git clone https://github.com/tloen/alpaca-lora.git

Cloning into 'alpaca-lora'...
remote: Enumerating objects: 607, done.[K
remote: Total 607 (delta 0), reused 0 (delta 0), pack-reused 607[K
Receiving objects: 100% (607/607), 27.84 MiB | 14.17 MiB/s, done.
Resolving deltas: 100% (357/357), done.


In [2]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets
!pip install ast



[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m56.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m65.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.

In [3]:
# import locale
# locale.getpreferredencoding = lambda: "UTF-8"
!pip install -q jsonlines


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) Y
Token is valid (permission: read).
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 's

In [None]:
### uncomment the following line and log in with your token in case the codes at a later point raises an error when reading the model.
# !huggingface-cli login

In [25]:
################################
### MULTIPLE-CHOICE QUESTIONS
################################
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from datasets import load_dataset
import pandas as pd

df = pd.read_csv("raw_data.csv")

### MULTIPLE-CHOICE QUESTIONS
for model_id in ["meta-llama/Llama-2-7b-chat-hf", "meta-llama/Llama-2-13b-chat-hf", "meta-llama/Llama-2-7b-hf", "meta-llama/Llama-2-13b-hf"]:
    # model_id = "meta-llama/Llama-2-7b-hf"

    ### remove quantization_config=bnb_config when downloading the model if the 4-bit quantisation is not needed.
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )

    token="YOUR_TOKEN_HERE"
    tokenizer = AutoTokenizer.from_pretrained(model_id,  use_auth_token=token)
    model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, use_auth_token=token)

    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id

    ### Running the Subject System
    import math
    from transformers import GenerationConfig

    output_lis = []
    confidence_lis = []
    for index, row in df.iterrows():
      PROMPT = row["input"]

      inputs = tokenizer(
          PROMPT,
          return_tensors="pt",
      )
      input_ids = inputs["input_ids"].cuda()

      generation_config = GenerationConfig(
          temperature=0,
      )
      print("Generating...")
      generation_output = model.generate(
          input_ids=input_ids,
          generation_config=generation_config,
          return_dict_in_generate=True,
          output_scores=True,
          max_new_tokens=1,
      )

      string = ''

      for s in generation_output.sequences:
          s_dec = tokenizer.decode(s[len(input_ids[0]):])  # remove input IDs before decoding
          string += s_dec
      print("\n\n\n\n", row["input"])
      print(row["correct_option"], "  ", string)
      output_lis.append(string)

      # Print log probabilities and corresponding tokens
      for score, token_id in zip(generation_output.scores[0], generation_output.sequences[0][-1:]):
          a = [float(i) for i in score]
          prob_arr = torch.softmax(torch.tensor(a), dim=0)
          row["num_options"] = len(row["options"].split(":")) - 1
          # Get the probabilities for the specific tokens
          options = list('ABCDEFGHIJKLM')[:row["num_options"]]
          option_ids = [tokenizer.encode(option, add_special_tokens=False)[0] for option in options]
          option_probs = prob_arr[option_ids]

          option_prob_dict = {option: float(prob) for option, prob in zip(options, option_probs)}
          print(f"Option probabilities: {option_prob_dict}")
      confidence_lis.append(str(option_prob_dict))
    df["output"] = output_lis
    df["confidence"] = confidence_lis
    df.to_csv(f"{model_id}_SS_data.csv")

import time
while True:
    print("This program is continuously running")
    time.sleep(5)  # sleep for 5 seconds



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Generating...




 Question: Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q.
A. 0
B. 4
C. 2
D. 6
Answer:
B    B
Option probabilities: {'A': 0.007902931421995163, 'B': 0.618074357509613, 'C': 0.28297504782676697, 'D': 0.014997129328548908}
Generating...




 Question: Let p = (1, 2, 5, 4)(2, 3) in S_5 . Find the index of <p> in S_5.
A. 8
B. 2
C. 24
D. 120
Answer:
C    B




Option probabilities: {'A': 0.02784772962331772, 'B': 0.8528864979743958, 'C': 0.04520132765173912, 'D': 0.002433307468891144}
Generating...




 Question: Find all zeros in the indicated finite field of the given polynomial with coefficients in that field. x^5 + 3x^3 + x^2 + 2x in Z_5
A. 0
B. 1
C. 0,1
D. 0,4
Answer:
D    B




Option probabilities: {'A': 0.034796882420778275, 'B': 0.3982292413711548, 'C': 0.3683018386363983, 'D': 0.05560526251792908}
Generating...




 Question: Statement 1 | A factor group of a non-Abelian group is non-Abelian. Statement 2 | If K is a normal subgroup of H and H is a normal subgroup of G, then K is a normal subgroup of G.
A. True, True
B. False, False
C. True, False
D. False, True
Answer:
B    B




Option probabilities: {'A': 0.10681180655956268, 'B': 0.6965004205703735, 'C': 0.10681180655956268, 'D': 0.05717223510146141}
Generating...




 Question: Find the product of the given polynomials in the given polynomial ring. f(x) = 4x - 5, g(x) = 2x^2 - 4x + 2 in Z_8[x].
A. 2x^2 + 5
B. 6x^2 + 4x + 6
C. 0
D. x^2 + 1
Answer:
B    





Option probabilities: {'A': 0.10683365166187286, 'B': 0.14376039803028107, 'C': 0.0431646965444088, 'D': 0.01337179820984602}
Generating...




 Question: Statement 1 | If a group has an element of order 15 it must have at least 8 elements of order 15. Statement 2 | If a group has more than 8 elements of order 15, it must have at least 16 elements of order 15.
A. True, True
B. False, False
C. True, False
D. False, True
Answer:
A    B




Option probabilities: {'A': 0.0860075131058693, 'B': 0.5186917185783386, 'C': 0.27763569355010986, 'D': 0.06803754717111588}
Generating...




 Question: Statement 1 | Every homomorphic image of a group G is isomorphic to a factor group of G. Statement 2 | The homomorphic images of a group G are the same (up to isomorphism) as the factor groups of G.
A. True, True
B. False, False
C. True, False
D. False, True
Answer:
A    B




Option probabilities: {'A': 0.15488488972187042, 'B': 0.6320270895957947, 'C': 0.09692449867725372, 'D': 0.08964050561189651}
Generating...




 Question: Statement 1 | A ring homomorphism is one to one if and only if the kernel is {0}. Statement 2 | Q is an ideal in R.
A. True, True
B. False, False
C. True, False
D. False, True
Answer:
D    B




Option probabilities: {'A': 0.2778986096382141, 'B': 0.39189982414245605, 'C': 0.1608353555202484, 'D': 0.03941440209746361}
Generating...




 Question: Find the degree for the given field extension Q(sqrt(2) + sqrt(3)) over Q.
A. 0
B. 4
C. 2
D. 6
Answer:
B    B




Option probabilities: {'A': 0.009834702126681805, 'B': 0.7225540280342102, 'C': 0.1944727748632431, 'D': 0.008544538170099258}
Generating...




 Question: Find all zeros in the indicated finite field of the given polynomial with coefficients in that field. x^3 + 2x + 2 in Z_7
A. 1
B. 2
C. 2,3
D. 6
Answer:
C    B




Option probabilities: {'A': 0.030413147062063217, 'B': 0.4331676959991455, 'C': 0.33735135197639465, 'D': 0.018160520121455193}
Generating...




 Question: Statement 1 | If H is a subgroup of G and a belongs to G then |aH| = |Ha|. Statement 2 | If H is a subgroup of G and a and b belong to G, then aH and Hb are identical or disjoint.
A. True, True
B. False, False
C. True, False
D. False, True
Answer:
C    B




Option probabilities: {'A': 0.1264345496892929, 'B': 0.6521990299224854, 'C': 0.10814519226551056, 'D': 0.04874486103653908}
Generating...




 Question: If A = {1, 2, 3} then relation S = {(1, 1), (2, 2)} is
A. symmetric only
B. anti-symmetric only
C. both symmetric and anti-symmetric
D. an equivalence relation
Answer:
C    C




Option probabilities: {'A': 0.034421131014823914, 'B': 0.10116933286190033, 'C': 0.8210185170173645, 'D': 0.01900915428996086}
Generating...




 Question: Find the order of the factor group (Z_11 x Z_15)/(<1, 1>)
A. 1
B. 2
C. 5
D. 11
Answer:
A    B




Option probabilities: {'A': 0.0637996569275856, 'B': 0.361449658870697, 'C': 0.33428627252578735, 'D': 0.04184083640575409}
Generating...




 Question: The polynomial x^3 + 2x^2 + 2x + 1 can be factored into linear factors in Z_7[x]. Find this factorization.
A. (x − 2)(x + 2)(x − 1)
B. (x + 1)(x + 4)(x − 2)
C. (x + 1)(x − 4)(x − 2)
D. (x - 1)(x − 4)(x − 2)
Answer:
C    A




Option probabilities: {'A': 0.4839468002319336, 'B': 0.2510682940483093, 'C': 0.14759543538093567, 'D': 0.007822271436452866}
Generating...




 Question: Find the maximum possible order for an element of S_n for n = 10.
A. 6
B. 12
C. 30
D. 105
Answer:
C    B




Option probabilities: {'A': 0.032576750963926315, 'B': 0.5865300297737122, 'C': 0.17068995535373688, 'D': 0.12294337153434753}
Generating...




 Question: Statement 1 | R is a splitting field of some polynomial over Q. Statement 2 | There is a field with 60 elements.
A. True, True
B. False, False
C. True, False
D. False, True
Answer:
B    B




Option probabilities: {'A': 0.15513959527015686, 'B': 0.5764135122299194, 'C': 0.15513959527015686, 'D': 0.0411079004406929}
Generating...




 Question: The inverse of -i in the multiplicative group, {1, -1, i , -i} is
A. 1
B. -1
C. i
D. -i
Answer:
C    B




Option probabilities: {'A': 0.05086496099829674, 'B': 0.27930358052253723, 'C': 0.22094731032848358, 'D': 0.14950044453144073}
Generating...




 Question: Compute the product in the given ring. (2,3)(3,5) in Z_5 x Z_9
A. (1,1)
B. (3,1)
C. (1,6)
D. (3,6)
Answer:
C    B




Option probabilities: {'A': 0.02865668199956417, 'B': 0.3774777352809906, 'C': 0.03737535700201988, 'D': 0.016328083351254463}
Generating...




 Question: The set of all real numbers under the usual multiplication operation is not a group since
A. multiplication is not a binary operation
B. multiplication is not associative
C. identity element does not exist
D. zero has no inverse
Answer:
D    C




Option probabilities: {'A': 0.003291604109108448, 'B': 0.038866203278303146, 'C': 0.927043080329895, 'D': 0.007534556556493044}
Generating...




 Question: Statement 1| Every group of order p^2 where p is prime is Abelian. Statement 2 | For a fixed prime p a Sylow p-subgroup of a group G is a normal subgroup of G if and only if it is the only Sylow p-subgroup of G.
A. True, True
B. False, False
C. True, False
D. False, True
Answer:
A    B




Option probabilities: {'A': 0.159196138381958, 'B': 0.6598496437072754, 'C': 0.07403308153152466, 'D': 0.057657018303871155}
Generating...




KeyboardInterrupt: ignored

In [None]:
import time

while True:
    print("This program is continuously running")
    time.sleep(5)  # sleep for 5 seconds

This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
This program is continuously running
T

KeyboardInterrupt: ignored