# Gemma Model 2B Model (2 Billion Parameter Base Model) - Hugging Face TF 

# Import Hugging Face 
## What does the `transformers` library do?

The `transformers` library, developed by Hugging Face, is one of the most popular and widely used libraries for natural language processing (NLP) and deep learning tasks related to text data. Here's an overview of what it does:

1. **Pre-trained Models**: Transformers provides a wide range of pre-trained models for various NLP tasks such as text classification, named entity recognition, question answering, language translation, and more. These models are trained on large datasets and can be fine-tuned on specific tasks with relatively little labeled data, which is known as transfer learning.

2. **Model Architecture**: The library implements state-of-the-art transformer-based architectures such as BERT (Bidirectional Encoder Representations from Transformers), GPT (Generative Pre-trained Transformer), RoBERTa, DistilBERT, T5, and more. These architectures have significantly advanced the field of NLP by achieving impressive results on benchmark datasets.

3. **Tokenization**: Transformers offers tokenization utilities to convert raw text into numerical inputs that can be fed into the models. This includes handling tasks like splitting text into words or subwords, mapping tokens to their corresponding indices, and adding special tokens required for tasks like classification or generation.

4. **Model Training and Inference**: The library provides tools for training and fine-tuning models on custom datasets. It includes utilities for data loading, model configuration, training loop management, evaluation metrics, and model serialization. Additionally, it offers inference capabilities for deploying trained models to make predictions on new data.

5. **Integration with Deep Learning Frameworks**: Transformers is compatible with popular deep learning frameworks such as TensorFlow and PyTorch, allowing users to seamlessly integrate transformer-based models into their existing workflows.

6. **Community and Resources**: The Transformers library has a large and active community of users and contributors who provide support, share resources, and contribute to the development of new features and models. The library is well-documented with tutorials, examples, and API references to help users get started with NLP tasks.

Overall, the Transformers library is a powerful tool for researchers, developers, and data scientists working on NLP tasks, offering a wide range of pre-trained models, easy-to-use APIs, and extensive documentation.


Enable GPU (GPI P100)

Here I use, GPU P100 because not much parellel computing is required.

In [65]:
!pip install -U git+https://github.com/huggingface/transformers  accelerate bitsandbytes #pip install transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-zxevwpxt
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-zxevwpxt
  Resolved https://github.com/huggingface/transformers to commit 3b8c053631a2088d74fbb6ef4db47dbed8fa1470
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone


In [66]:
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch


A. Create Quantization Configuration

In [67]:

quantization_config = BitsAndBytesConfig(load_in_8bit=False) #load_in_4bit

B. Download Model (Using Version 2 Gemma 2TB)

*Add the directory path 
*For the 7B model, please use the commented after #


In [68]:

model_id = "/kaggle/input/gemma/transformers/2b-it/2" #"gg-hf/gemma-2b-it" #/kaggle/input/gemma/transformers/7b-it/1
dtype = torch.float16

In [69]:


tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",#move my model weight between CPU and gPU
    torch_dtype=dtype,
    quantization_config = quantization_config
)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# Option 1 (without Chat Template)
1. Simply input the text 
2. Tokenization
3. Generating Ouput
4. Decoding

In [70]:
# Use the model
input_text = "Who is the most popular CEO?"
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids, max_new_tokens=250)
print(tokenizer.decode(outputs[0]))

<bos>Who is the most popular CEO?

According to Forbes, the most popular CEO is Elon Musk, with an estimated net worth of $292 billion.<eos>


# Option 2 (with chat template)

Uploading a correct chat templates improves the performance of LLM

In [71]:

chat = [
    { "role": "user", "content": "Write a joke about Elon Musk" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)


In [72]:
prompt

'<start_of_turn>user\nWrite a joke about Elon Musk<end_of_turn>\n<start_of_turn>model\n'

In [73]:
inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)


In [74]:
print(tokenizer.decode(outputs[0]))

<bos><start_of_turn>user
Write a joke about Elon Musk<end_of_turn>
<start_of_turn>model
What do you call Elon Musk's new car?

A Tesla Autopilot!<eos>
