# Microsoft Semantic Kernel With IBM Granite 3.3 LLM

> **⚠️ Important Note**: This notebook is **not compatible with Google Colab** as it requires a local Ollama installation running a Granite model. Please run this notebook in a local environment with Ollama properly configured and running on port 11434.

## What is Microsoft Semantic Kernel?

**Microsoft Semantic Kernel (SK)** is an open-source SDK that allows developers to integrate AI capabilities into their applications through a unified framework. It acts as a middleware layer that orchestrates AI models, plugins, and traditional programming logic.

## Key Features & Differentiators

### 🔌 **Plugin Architecture**
- **Semantic Kernel**: Uses a sophisticated plugin system where AI functions can be chained together with traditional code
- **Other Frameworks**: Often focus on single model interactions or require custom integration work

### 🧠 **AI Orchestration** 
- **Semantic Kernel**: Built-in planning capabilities that can automatically sequence multiple AI operations
- **LangChain/LlamaIndex**: Primarily focused on chaining operations manually
- **OpenAI API**: Direct model access without orchestration features

### 🔄 **Multi-Model Support**
- **Semantic Kernel**: Vendor-agnostic - works with OpenAI, Azure OpenAI, Hugging Face, Ollama, and more
- **Other Frameworks**: Often tied to specific providers or require extensive configuration

### 🎯 **Enterprise-Ready**
- **Semantic Kernel**: Built with enterprise needs in mind - security, scalability, and integration with Microsoft ecosystem
- **Other Frameworks**: May require additional tooling for enterprise deployment

## About Granite 3.3

**IBM Granite 3.3** models feature enhanced reasoning capabilities and support for Fill-in-the-Middle (FIM) code completion. Key highlights include:

- **Enhanced Architecture**: New dense architecture trained with 12 trillion tokens across 12 languages and 116 programming languages
- **128K Context Length**: Extended context for complex tasks
- **Strong RAG Performance**: Excellent retrieval-augmented generation capabilities
- **Function Calling**: Native support for tool/function calling
- **Response Controls**: Built-in length and originality controls
- **Apache 2.0 License**: Fully open-source

## Prerequisites

Before running this notebook, ensure you have:

1. **Ollama installed locally**: Download and install from [https://ollama.ai](https://ollama.ai)
2. **Granite 3.3 8B model downloaded**: Run `ollama pull granite3.3:8b` in your terminal
3. **Ollama running**: Start Ollama service (usually `ollama serve` or it starts automatically)
4. **Python 3.10, 3.11, or 3.12**: This notebook requires a recent Python version

## This Demo

This notebook demonstrates a simple chatbot using Semantic Kernel with:
- **Ollama** as the local AI provider
- **Granite 3.3 8B** model for responses  
- **Granite 3.3 prompt template** following official guidelines
- **Async execution** for responsive interactions

In [None]:
# 0. Install granite community utils and check Python version compatibility
%pip install git+https://github.com/ibm-granite-community/utils

import sys
assert sys.version_info >= (3, 10) and sys.version_info < (3, 13), \
    f"Python 3.10, 3.11, or 3.12 is required. Current version: {sys.version_info.major}.{sys.version_info.minor}"

print(f"✅ Python version {sys.version_info.major}.{sys.version_info.minor} is compatible.")

# Install required packages
%pip install semantic-kernel requests ipywidgets ollama

In [None]:
# 1. Setup Kernel and Granite 3.3 8B Model for Chatbot
import requests
from semantic_kernel import Kernel
from semantic_kernel.prompt_template import PromptTemplateConfig
from semantic_kernel.connectors.ai.ollama import OllamaChatCompletion
from semantic_kernel.connectors.ai.ollama.ollama_prompt_execution_settings import OllamaChatPromptExecutionSettings

# Initialize kernel
kernel = Kernel()

# Configure execution settings
execution_settings = OllamaChatPromptExecutionSettings()

# Validate Ollama is running and has the Granite model
ollama_host = "http://localhost:11434"
granite_model = "granite3.3:8b"

try:
    # Check if Ollama is running
    response = requests.get(f"{ollama_host}/api/tags", timeout=5)
    response.raise_for_status()
    
    # Check if Granite model is available
    models = response.json()
    available_models = [model['name'] for model in models['models']]
    
    if granite_model not in available_models:
        print(f"❌ Granite model '{granite_model}' not found in Ollama.")
        print(f"Available models: {available_models}")
        print(f"Please run: ollama pull {granite_model}")
        raise ValueError(f"Required model {granite_model} not available")
    
    print(f"✅ Ollama is running and {granite_model} model is available")
    
except requests.exceptions.RequestException as e:
    print(f"❌ Cannot connect to Ollama at {ollama_host}")
    print("Please ensure Ollama is installed and running:")
    print("1. Install Ollama from https://ollama.ai")
    print("2. Run 'ollama serve' in terminal")
    print(f"3. Run 'ollama pull {granite_model}' to download the model")
    raise ConnectionError(f"Ollama connection failed: {e}")

# Initialize Semantic Kernel with Ollama service
service_id = "ollama"
kernel.add_service(
    OllamaChatCompletion(
        service_id=service_id,
        host=ollama_host,
        ai_model_id=granite_model,
    )
)
print("✅ Semantic Kernel initialized successfully with Granite")

In [None]:
# 2. Define Chat Template using Jinja2 format with official Granite 3.3 template
from datetime import datetime
from transformers import AutoTokenizer

# Install transformers to access the tokenizer
%pip install transformers

# Get current date for the system prompt
current_date = datetime.now().strftime("%B %d, %Y")

# Load the Granite 3.3 tokenizer to get its official chat template
print("📥 Loading Granite 3.3 tokenizer...")
tokenizer = AutoTokenizer.from_pretrained("ibm-granite/granite-3.3-8b-instruct")

# Get the chat template from the tokenizer
chat_template = tokenizer.chat_template
print("✅ Retrieved official chat template from Granite 3.3 tokenizer")

# Create the function configuration using jinja2 format
prompt_template_config = PromptTemplateConfig(
    template=chat_template, 
    name="chat", 
    template_format="jinja2"
)

# Create the chat function
try:
    function = kernel.add_function(
        function_name="chat_function",
        plugin_name="chat_plugin",
        prompt_template_config=prompt_template_config,
        prompt_execution_settings=execution_settings,
    )
    print("✅ Chat function configured with jinja2 template format")
    print("✅ Using official Granite 3.3 chat template from tokenizer")
    print(f"✅ System prompt includes current date: {current_date}")
    
except Exception as e:
    print(f"❌ Function creation failed: {e}")
    raise

In [None]:
# 3. Define chat function to interact with Granite using the template
import textwrap
import time

async def chat_with_granite(user_message):
    """
    Chat function that processes a single user input and returns the response.
    Uses jinja2 template format with official Granite 3.3 chat template.
    """
    try:
        if function is None:
            raise RuntimeError("Function not initialized - check previous cells for errors")
        
        print(f"🤔 Thinking... (User: {user_message[:50]}{'...' if len(user_message) > 50 else ''})")
        
        start_time = time.time()
        
        # The Granite chat template expects a messages format
        # Create proper messages array for the jinja2 template
        messages = [
            {"role": "system", "content": f"Knowledge Cutoff Date: April 2024.\nToday's Date: {current_date}. You are Granite, developed by IBM. You are a helpful AI assistant."},
            {"role": "user", "content": user_message}
        ]
        
        # Pass messages and required template variables
        result = await kernel.invoke(function, 
                                   messages=messages,
                                   add_generation_prompt=True)
        
        end_time = time.time()
        response_time = end_time - start_time
        
        # Get the response text
        response_text = str(result)
        
        # Format the output with line wrapping for display
        wrapped_text = textwrap.fill(response_text, width=80, break_long_words=False, break_on_hyphens=False)
        
        print(f"\n🤖 Assistant (responded in {response_time:.2f}s):")
        print("=" * 50)
        print(wrapped_text)
        print("=" * 50)
        
        return response_text
        
    except Exception as e:
        print(f"❌ Error occurred: {e}")
        print("\n🔧 Troubleshooting:")
        print("1. Ensure Ollama is running: ollama serve")
        print("2. Ensure Granite model is available: ollama pull granite3.3:8b")
        print("3. Check if Ollama is accessible at http://localhost:11434")
        print("4. Try restarting the notebook kernel")
        raise

def interactive_chat():
    """
    Interactive chat mode - prompts user for input via terminal and returns response.
    """
    print("💬 Chat with Granite 3.3 8B via Semantic Kernel (Jinja2 Template)!")
    print("Example prompts:")
    print("- 'What are the benefits of using Semantic Kernel?'")
    print("- 'Explain quantum computing in simple terms'")
    print("- 'Write a Python function to calculate fibonacci numbers'")
    print()
    
    user_input = input("Enter your message: ")
    
    if user_input and user_input.strip():
        return user_input
    else:
        print("❌ Please enter a message to continue")
        return None

print("✅ Chat functions initialized!")
print("To start chatting, run the next cell!")

In [None]:
# Send your message to Granite

message = interactive_chat()

if message:
    response = await chat_with_granite(message)