Installing the dependencies

In [None]:
pip install -r requirements.txt



In [None]:
import json
import torch
from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          BitsAndBytesConfig,
                          pipeline)

**HF account Configuration**

In [None]:
config_data = json.load(open("config.json"))
HF_TOKEN = config_data["HF_TOKEN"]


In [None]:
model_name = "meta-llama/Meta-Llama-3-8B"

**Quantisation Configuratiion**

In [None]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True, # preserve the performance after losing the precision and helps not to lose lot of informations.
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

**Loading the Tokenizer and the LLM**

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name,
                                          token=HF_TOKEN)
tokenizer.pad_token = tokenizer.eos_token


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map={"": "cuda"},
    quantization_config=bnb_config,
    token=HF_TOKEN
)

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

In [None]:
text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128
)

In [None]:
def get_response(prompt):
  sequences = text_generator(prompt)
  gen_text = sequences[0]["generated_text"]
  return gen_text

In [None]:
prompt = "what is machine learning"

In [None]:
llama3_response = get_response(prompt)

Both `max_new_tokens` (=128) and `max_length`(=4096) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


In [None]:
llama3_response

'what is machine learning pdf\nMachine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually … Machine learning is a subfield of artificial intelligence (AI) that gives computer systems the ability to learn without explicitly being programmed. Machine learning is a branch of artificial intelligence (AI) that provides computers with the ability to learn without explicitly being programmed. 2. Machine learning is a field of computer science and statistics concerned with the outcome of computer algorithms improving automatically through experience. Machine Learning is a subset of Artificial Intelligence. It is a subset of artificial intelligence. Machine'

In [None]:
print(llama3_response[len(prompt):])

 pdf
Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually … Machine learning is a subfield of artificial intelligence (AI) that gives computer systems the ability to learn without explicitly being programmed. Machine learning is a branch of artificial intelligence (AI) that provides computers with the ability to learn without explicitly being programmed. 2. Machine learning is a field of computer science and statistics concerned with the outcome of computer algorithms improving automatically through experience. Machine Learning is a subset of Artificial Intelligence. It is a subset of artificial intelligence. Machine
