<a href="https://colab.research.google.com/github/ruslanmv/IBM-Granite-3.1-AI-Reasoning-and-Vision/blob/main/Granite.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# IBM Granite 3.1 8b Reasoning & Vision Preview
code by [ruslamv.com](https://ruslamv.com)

In [None]:
import os
import shutil
from IPython.display import clear_output

# Repository URL and name
repo_url = "https://huggingface.co/spaces/ruslanmv/Granite-Vision-Chatbot"
repo_name = "Granite-Vision-Chatbot"

# Check if the repository directory exists (if so, we'll assume it's already cloned)
if not os.path.exists(repo_name):
    print(f"Cloning {repo_name} repository...")
    !git clone {repo_url}
    clear_output()
    print(f"{repo_name} repository cloned successfully!")
    !mkdir {repo_name}
    # Move contents of the repository to the current directory
    print(f"Moving contents of {repo_name} to current directory...")
    for item in os.listdir(repo_name):
        s = os.path.join(repo_name, item)
        d = os.path.join(".", item)  # Current directory
        if os.path.isdir(s):
            shutil.move(s, d) # Use move for directories
        else:
            shutil.move(s, d) # Use move for files
    shutil.rmtree(repo_name)  # Remove the now-empty repo directory
    print(f"Contents of {repo_name} moved successfully!")
else:
    print(f"{repo_name} repository already exists. Skipping cloning.")
print("Finished.")

Granite-Vision-Chatbot repository cloned successfully!
mkdir: cannot create directory ‘Granite-Vision-Chatbot’: File exists
Moving contents of Granite-Vision-Chatbot to current directory...
Contents of Granite-Vision-Chatbot moved successfully!
Finished.


In [7]:
import os
from IPython.display import clear_output

def install_requirements(requirements_file="requirements.txt"):
    """Installs requirements from a given file and clears the output."""

    if os.path.exists(requirements_file):
        print(f"Installing requirements from {requirements_file}...")
        try:
             !pip install -r {requirements_file}
             !pip install spaces
             clear_output()  # Clear pip install output
             print(f"Requirements from {requirements_file} installed successfully!")
        except Exception as e:
            print(f"Error installing requirements: {e}")
    else:
        print(f"Requirements file {requirements_file} not found.")

# 2. Install the requirements

install_requirements()  # Installs from requirements.txt in current directory
print("Finished installing requirements (or skipped if not found).")

Requirements file requirements.txt not found.
Finished installing requirements (or skipped if not found).


In [1]:
import os
os.chdir("src")  # Change to the 'src' directory
print(f"Current directory: {os.getcwd()}") # Verify the change
import gradio as gr
import threading
import time
import subprocess
from app import demo

In [4]:
demo.queue().launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://3b3aa15b688bbe9879.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# IBM Granite 3.1 8b Reasoning

In [6]:
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, BitsAndBytesConfig
import torch

# Model and tokenizer
model_name = "ruslanmv/granite-3.1-8b-Reasoning"  # Or "ruslanmv/granite-3.1-2b-Reasoning"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Configure 4-bit quantization properly
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,  # Enable 4-bit quantization
    bnb_4bit_compute_dtype=torch.float16  # Match dtype to avoid slow inference
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",  # Use GPU if available
    torch_dtype=torch.float16,  # Use float16 for faster inference
    quantization_config=quantization_config  # Use proper config instead of deprecated argument
)

# Prepare dataset
SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
text = tokenizer.apply_chat_template([
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "Calculate pi."}  # Example prompt - change this!
], tokenize=False, add_generation_prompt=True)

inputs = tokenizer(text, return_tensors="pt").to("cuda")  # Move input tensor to GPU

# Sampling parameters (Fixed `do_sample` warning)
generation_config = GenerationConfig(
    do_sample=True,  # Ensure sample-based settings are applied
    temperature=0.8,  # Active only when `do_sample=True`
    top_p=0.95,  # Active only when `do_sample=True`
    max_new_tokens=1024,  # Control response length
)

# Inference
with torch.inference_mode():  # Use inference mode for faster generation
    outputs = model.generate(**inputs, generation_config=generation_config)

output = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Find the start of the actual response
start_index = output.find("assistant")
if start_index != -1:
    # Remove the initial part including "assistant"
    output = output[start_index + len("assistant"):].strip()

print(output)


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

<reasoning>
Pi (π) is an irrational number, which means it cannot be precisely calculated to a finite number of decimal places. However, it can be approximated using various mathematical formulas. One common method to approximate pi is using the Leibniz formula for π: π = 4 * (1 - 1/3 + 1/5 - 1/7 + 1/9 -...). This formula converges very slowly, so it requires many terms to get an accurate approximation.
</reasoning>

<answer>
The value of pi using the Leibniz formula with the first five terms is approximately 3.439692653589793.
</answer>
