<a href="https://colab.research.google.com/github/inuwamobarak/Granite-3-0/blob/main/Granite_3_0_2B_Instruct_A_Guide_to_Model_Setup_and_Usage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Granite-3.0-2B-Instruct: A Guide to Model Setup and Usage

---

#### **Overview of Granite-3.0-2B-Instruct**

Granite-3.0-2B-Instruct, developed by the Granite Team at IBM, is a large language model fine-tuned from Granite-3.0-2B-Base. The model was enhanced using a variety of instruction datasets and synthetic data, optimized for structured chat responses. Key capabilities of the model include text generation, summarization, question-answering, code-related tasks, multilingual dialogue, and more.

### 1. **Environment Setup**



First, ensure that the environment has the necessary libraries for working with PyTorch and Huggingface Transformers.


In [1]:
# Install required libraries
!pip install torch torchvision torchaudio
!pip install accelerate
!pip install git+https://github.com/huggingface/transformers.git # Since it is not available via pip yet

Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-9g56zzy3
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-9g56zzy3
  Resolved https://github.com/huggingface/transformers.git to commit 5779bac4c45b2c881603cafd20663892869d5860
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tokenizers<0.21,>=0.20 (from transformers==4.47.0.dev0)
  Downloading tokenizers-0.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading tokenizers-0.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m34.4 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages:

### 2. **Model and Tokenizer Initialization**

Now, load the Granite-3.0-2B-Instruct model and tokenizer. This model is hosted on Huggingface, and the `AutoModelForCausalLM` class is used for language generation tasks.


In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define device as 'cuda' if a GPU is available for faster computation
device = "cuda" if torch.cuda.is_available() else "cpu"

# Model and tokenizer paths
model_path = "ibm-granite/granite-3.0-2b-instruct"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Load the model; set device_map based on your setup
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
model.eval()

tokenizer_config.json:   0%|          | 0.00/5.64k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/974k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/442k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/701 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/785 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/29.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

GraniteForCausalLM(
  (model): GraniteModel(
    (embed_tokens): Embedding(49155, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-39): 40 x GraniteDecoderLayer(
        (self_attn): GraniteSdpaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=512, bias=False)
          (v_proj): Linear(in_features=2048, out_features=512, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
        )
        (mlp): GraniteMLP(
          (gate_proj): Linear(in_features=2048, out_features=8192, bias=False)
          (up_proj): Linear(in_features=2048, out_features=8192, bias=False)
          (down_proj): Linear(in_features=8192, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): GraniteRMSNorm((2048,), eps=1e-05)
        (post_attention_layernorm): GraniteRMSNorm((2048,), eps=1e-05)
      )
    )
    (norm): GraniteRMSNorm((20


### 3. **Input Format for Instruction-based Queries**

The model takes input in a structured chat format. To ensure the prompt is in the correct format, create a `chat` dictionary with roles like `"user"` or `"assistant"` to distinguish instructions.


In [11]:
# Define a user query in a structured format
chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]

# Prepare the chat data with the required prompts
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)



### 4. **Tokenize the Input**

Tokenize the structured `chat` data for the model. This tokenization step converts the text input into a format the model understands.


In [12]:
# Tokenize the input chat
input_tokens = tokenizer(chat, return_tensors="pt").to(device)

### 5. **Generate a Response**

With the input tokenized, use the model to generate a response based on the instruction.


In [13]:
# Generate output tokens with a maximum of 100 new tokens in the response
output = model.generate(**input_tokens, max_new_tokens=100)


### 6. **Decode and Print the Output**

Finally, decode the generated tokens back into readable text and print the output to see the model’s response.


In [21]:
# Decode and print the response
response = tokenizer.batch_decode(output, skip_special_tokens=True)
# print(response[0])
print(response[0])

userPlease list one IBM Research laboratory located in the United States. You should only output its name and location.
assistant1. IBM Research - Austin, Texas


In [22]:
# Define the input prompt
chat = [
    {
        "role": "user",
        "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location."
    },
]

# Tokenize input
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)

# Generate model output
output = model.generate(**input_tokens, max_new_tokens=100)

# Decode and print the response
response = tokenizer.batch_decode(output, skip_special_tokens=True)
print(response)


['userPlease list one IBM Research laboratory located in the United States. You should only output its name and location.\nassistant1. IBM Research - Austin, Texas']



### **Example Use Cases**

Here are a few additional examples to explore Granite-3.0-2B-Instruct's versatility:

#### **1. Text Summarization**

In [7]:
chat = [
    { "role": "user", "content": " Summarize the following paragraph: Granite-3.0-2B-Instruct is developed by IBM for handling multilingual and domain-specific tasks with general instruction following capabilities." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=1000)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])

user Summarize the following paragraph: Granite-3.0-2B-Instruct is developed by IBM for handling multilingual and domain-specific tasks with general instruction following capabilities.
assistantGranite-3.0-2B-Instruct is an AI model by IBM, designed to manage multilingual and domain-specific tasks while adhering to general instructions.



#### **2. Question Answering**


In [8]:
chat = [
    { "role": "user", "content": "What are the capabilities of Granite-3.0-2B-Instruct?" },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=100)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])

userWhat are the capabilities of Granite-3.0-2B-Instruct?
assistant1. Text Generation: Granite-3.0-2B-Instruct can generate human-like text based on the input it receives.
2. Question Answering: It can provide accurate and relevant answers to a wide range of questions.
3. Translation: It can translate text from one language to another.
4. Summarization: It can summarize long pieces of text into shorter, more digestible versions.
5. Sentiment Analysis: It can analyze text



#### **3. Code-related Tasks**


In [24]:
chat = [
    { "role": "user", "content": "Write a Python function to compute the factorial of a number." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=1000)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])

userWrite a Python function to compute the factorial of a number.
assistantHere is the code to compute the factorial of a number:

```python
def factorial(n: int) -> int:
    if n < 0:
        raise ValueError("Factorial is not defined for negative numbers")
    elif n == 0:
        return 1
    else:
        result = 1
        for i in range(1, n + 1):
            result *= i
        return result
```

```python
import unittest

class TestFactorial(unittest.TestCase):
    def test_factorial(self):
        self.assertEqual(factorial(0), 1)
        self.assertEqual(factorial(1), 1)
        self.assertEqual(factorial(5), 120)
        self.assertEqual(factorial(10), 3628800)
        with self.assertRaises(ValueError):
            factorial(-5)

if __name__ == '__main__':
    unittest.main(argv=[''], verbosity=2, exit=False)
```

This code defines a function `factorial` that takes an integer `n` as input and returns the factorial of `n`. The function first checks if `n` is less than 0, and

### **Model Architecture Summary**

Granite-3.0-2B-Instruct is built on a dense, decoder-only transformer architecture. The core components include:
- **GQA (Grouped Query Attention)** and **RoPE (Rotary Position Embedding)** for better sequence representation
- **MLP with SwiGLU** for enhanced non-linearity in activation functions
- **RMSNorm** for layer normalization
- **Shared Input/Output Embeddings** to reduce model size while retaining quality

### **References**
- **GitHub Repository:** [ibm-granite/granite-3.0-language-models](https://github.com/ibm-granite/granite-3.0-language-models)
- **Website:** [Granite Docs](https://www.ibm.com)
- **License:** Apache 2.0