# Microsoft Semantic Kernel With IBM Granite 4 Micro

> **‚ö†Ô∏è Important Note**: This notebook is **not compatible with Google Colab** as it requires a local Ollama installation running a Granite model. Please run this notebook in a local environment with Ollama properly configured and running on port 11434.

## What is Microsoft Semantic Kernel?

**Microsoft Semantic Kernel (SK)** is an open-source SDK that allows developers to integrate AI capabilities into their applications through a unified framework. It acts as a middleware layer that orchestrates AI models, plugins, and traditional programming logic.

## Key Features & Differentiators

### üîå **Plugin Architecture**
- **Semantic Kernel**: Uses a sophisticated plugin system where AI functions can be chained together with traditional code
- **Other Frameworks**: Often focus on single model interactions or require custom integration work

### üß† **AI Orchestration** 
- **Semantic Kernel**: Built-in planning capabilities that can automatically sequence multiple AI operations
- **LangChain/LlamaIndex**: Primarily focused on chaining operations manually
- **OpenAI API**: Direct model access without orchestration features

### üîÑ **Multi-Model Support**
- **Semantic Kernel**: Vendor-agnostic - works with OpenAI, Azure OpenAI, Hugging Face, Ollama, and more
- **Other Frameworks**: Often tied to specific providers or require extensive configuration

### üéØ **Enterprise-Ready**
- **Semantic Kernel**: Built with enterprise needs in mind - security, scalability, and integration with Microsoft ecosystem
- **Other Frameworks**: May require additional tooling for enterprise deployment

## About Granite 4 Micro

**IBM Granite 4 Micro** model highlights include:

- **Enhanced Architecture**: New dense architecture trained with 12 trillion tokens across 12 languages and 116 programming languages
- **128K Context Length**: Extended context for complex tasks
- **Strong RAG Performance**: Excellent retrieval-augmented generation capabilities
- **Function Calling**: Native support for tool/function calling
- **Response Controls**: Built-in length and originality controls
- **Apache 2.0 License**: Fully open-source

## Prerequisites

Before running this notebook, ensure you have:

1. **Ollama installed locally**: Download and install from [https://ollama.ai](https://ollama.ai)
2. **Granite 4 Micro model downloaded**: Run `ollama pull ibm/granite4:micro` in your terminal
3. **Ollama running**: Start Ollama service (usually `ollama serve` or it starts automatically)
4. **Python 3.11, or 3.12**: This notebook requires a recent Python version

## This Demo

This notebook demonstrates a simple chatbot using Semantic Kernel with:
- **Ollama** as the local AI provider
- **Granite 4 Micro** model for responses  
- **Granite prompt template** following official guidelines
- **Async execution** for responsive interactions

In [None]:
# 0. Install granite utils
! echo "::group::Install Dependencies"
%pip install uv
! uv pip install "git+https://github.com/ibm-granite-community/utils.git"

# Install required packages
! uv pip install semantic-kernel requests ipywidgets ollama

! echo "::endgroup::"

In [None]:
# 1. Setup Kernel and Granite Model for Chatbot
import requests
from semantic_kernel import Kernel
from semantic_kernel.prompt_template import PromptTemplateConfig
from semantic_kernel.connectors.ai.ollama import OllamaChatCompletion
from semantic_kernel.connectors.ai.ollama.ollama_prompt_execution_settings import OllamaChatPromptExecutionSettings

# Initialize kernel
kernel = Kernel()

# Configure execution settings
execution_settings = OllamaChatPromptExecutionSettings()

# Validate Ollama is running and has the Granite model
ollama_host = "http://localhost:11434"
granite_model = "ibm/granite4:micro"

try:
    # Check if Ollama is running
    response = requests.get(f"{ollama_host}/api/tags", timeout=5)
    response.raise_for_status()

    # Check if Granite model is available
    models = response.json()
    available_models = [model['name'] for model in models['models']]

    if granite_model not in available_models:
        print(f"‚ùå Granite model '{granite_model}' not found in Ollama.")
        print(f"Available models: {available_models}")
        print(f"Please run: ollama pull {granite_model}")
        raise ValueError(f"Required model {granite_model} not available")

    print(f"‚úÖ Ollama is running and {granite_model} model is available")

except requests.exceptions.RequestException as e:
    print(f"‚ùå Cannot connect to Ollama at {ollama_host}")
    print("Please ensure Ollama is installed and running:")
    print("1. Install Ollama from https://ollama.ai")
    print("2. Run 'ollama serve' in terminal")
    print(f"3. Run 'ollama pull {granite_model}' to download the model")
    raise ConnectionError(f"Ollama connection failed: {e}")

# Initialize Semantic Kernel with Ollama service
service_id = "ollama"
kernel.add_service(
    OllamaChatCompletion(
        service_id=service_id,
        host=ollama_host,
        ai_model_id=granite_model,
    )
)
print("‚úÖ Semantic Kernel initialized successfully with Granite")

In [None]:
# 2. Define Chat Template using a simple approach that works with Semantic Kernel

# Create a simple jinja2 template string that works with Semantic Kernel
jinja2_template_string = """{{ input }}"""

# Create the prompt template configuration using jinja2 format
prompt_config = PromptTemplateConfig(
    template=jinja2_template_string,
    name="chat",
    template_format="jinja2"
)

# Create the chat function using the prompt configuration
try:
    chat_function = kernel.add_function(
        function_name="chat_function",
        plugin_name="chat_plugin",
        prompt_template_config=prompt_config,
        prompt_execution_settings=execution_settings,
    )
    print("‚úÖ Chat function configured successfully")
    print("‚úÖ Using simple input passthrough - Ollama will handle chat formatting")

except Exception as e:
    print(f"‚ùå Function creation failed: {e}")
    raise

In [None]:
# 3. Define chat function to interact with Granite
import textwrap
import time

async def chat_with_granite(user_message):
    """
    Chat function that processes a single user input and returns the response.
    Ollama will handle the chat formatting automatically.
    """
    try:
        if chat_function is None:
            raise RuntimeError("Function not initialized - check previous cells for errors")

        print(f"ü§î Thinking... (User: {user_message[:50]}{'...' if len(user_message) > 50 else ''})")

        start_time = time.time()

        # Pass the user message directly - Ollama will handle chat formatting
        result = await kernel.invoke(chat_function, input=user_message)

        end_time = time.time()
        response_time = end_time - start_time

        # Get the response text
        response_text = str(result)

        # Format the output with line wrapping for display
        wrapped_text = textwrap.fill(response_text, width=80, break_long_words=False, break_on_hyphens=False)

        print(f"\nü§ñ Assistant (responded in {response_time:.2f}s):")
        print("=" * 50)
        print(wrapped_text)
        print("=" * 50)

        return response_text

    except Exception as e:
        print(f"‚ùå Error occurred: {e}")
        print("\nüîß Troubleshooting:")
        print("1. Ensure Ollama is running: ollama serve")
        print("2. Ensure Granite model is available: ollama pull ibm/granite4:micro")
        print("3. Check if Ollama is accessible at http://localhost:11434")
        print("4. Try restarting the notebook kernel")
        raise

def interactive_chat():
    """
    Interactive chat mode - prompts user for input via terminal and returns response.
    """
    print("üí¨ Chat with Granite 4 Micro via Semantic Kernel!")
    print("Example prompts:")
    print("- 'What are the benefits of using Semantic Kernel?'")
    print("- 'Explain quantum computing in simple terms'")
    print("- 'Write a Python function to calculate fibonacci numbers'")
    print()

    user_input = input("Enter your message: ")

    if user_input and user_input.strip():
        return user_input
    else:
        print("‚ùå Please enter a message to continue")
        return None

print("‚úÖ Chat functions initialized!")
print("To start chatting, run the next cell!")

In [None]:
# 4. Send your message to Granite
message = interactive_chat()

if message:
    response = await chat_with_granite(message)