# Local LLM with Intel GPU v2

1. run Gemma model with ollama
2. create RESTful API Wrapper with flask
* create API key manager for API Wrapper 
* create wrapper for Oxygen Positron add-on by following:
    * https://www.oxygenxml.com/doc/ug-addons/topics/ai-positron-custom-connector.html
    * https://www.oxygenxml.com/doc/ug-addons/topics/ai_positron_enterprise.html
    * https://github.com/oxygenxml/oxygen-ai-positron-custom-connector-addon

I'll help you create a complete solution for running Gemma model with Ollama and creating a RESTful API wrapper with Flask, including an API key manager and Oxygen Positron connector wrapper.

## Step-by-step Solution:

1. **Set up Ollama with Gemma model** - Install dependencies and pull the Gemma model
2. **Create Flask API wrapper** - Build a RESTful API that interfaces with Ollama
3. **Implement API key manager** - Add authentication and API key validation
4. **Create Oxygen Positron connector** - Follow the specifications to create a compatible wrapper

### [localLlmWIntelGPUV2.ipynb](file:///c%3A/Project/OCR/localLlmWIntelGPUV2.ipynb)

Add implementation for running Gemma model with Ollama and creating Flask API wrapper with API key management and Oxygen Positron connector.



This implementation provides:

1. **Ollama Integration**: Automatic installation and Gemma model management
2. **Flask API Wrapper**: RESTful API with proper error handling
3. **API Key Management**: SQLite-based key storage with usage tracking
4. **Oxygen Positron Compatibility**: OpenAI-compatible endpoints that work with the Oxygen XML Author add-on
5. **Security**: API key authentication for all endpoints
6. **Monitoring**: Usage tracking and health checks

The solution follows the Oxygen Positron custom connector specifications and provides a complete local LLM setup with Intel GPU support via Ollama.

Similar code found with 1 license type

## Usage Instructions

### 1. Setup and Installation
- Run the notebook cell above to install dependencies and start the server
- The initial API key will be displayed - save it securely
- Ollama will be installed (if not present) and Gemma model will be pulled

### 2. API Endpoints

#### Standard Endpoints:
- `GET /health` - Health check
- `POST /api/generate` - Generate text with Ollama
- `GET /api/models` - List available models
- `POST /api/keys/generate` - Generate new API key

#### Oxygen Positron Compatible Endpoints:
- `POST /ai/chat/completions` - OpenAI-compatible chat completions
- `GET /ai/models` - OpenAI-compatible models list

### 3. Authentication
All endpoints (except health and key generation) require API key in header:
```
Authorization: Bearer your-api-key-here
```

### 4. Oxygen Positron Configuration
In Oxygen XML Author, configure the custom connector with:
- **Base URL**: `http://localhost:5000/ai`
- **API Key**: Your generated API key
- **Model**: `gemma3:4b`

### 5. Example Usage
```python
import requests

headers = {'Authorization': 'Bearer your-api-key'}
data = {
    "messages": [
        {"role": "user", "content": "Explain quantum computing"}
    ],
    "model": "gemma3:4b"
}

response = requests.post('http://localhost:5000/ai/chat/completions', 
                        json=data, headers=headers)
print(response.json())
```

In [8]:
# Install required dependencies
import subprocess
import sys
import os
import json
import uuid
import hashlib
import time
from datetime import datetime, timedelta
from functools import wraps
from flask import Flask, request, jsonify, g
import requests
import sqlite3

# Install required packages
def install_packages():
    packages = ['flask', 'requests', 'sqlite3']
    for package in packages:
        try:
            __import__(package)
        except ImportError:
            subprocess.check_call([sys.executable, "-m", "pip", "install", package])

install_packages()

# 1. Ollama Setup and Gemma Model Management
class OllamaManager:
    def __init__(self, base_url="http://localhost:11434"):
        self.base_url = base_url
        
    def install_ollama(self):
        """Install Ollama if not already installed"""
        try:
            subprocess.run(["ollama", "--version"], check=True, capture_output=True)
            print("Ollama is already installed")
        except (subprocess.CalledProcessError, FileNotFoundError):
            print("Installing Ollama...")
            # For Windows
            if os.name == 'nt':
                print("Please download and install Ollama from: https://ollama.ai/download")
            else:
                subprocess.run(["curl", "-fsSL", "https://ollama.ai/install.sh", "|", "sh"], shell=True)
    
    def pull_gemma_model(self, model_name="gemma3:4b"):
        """Pull Gemma model"""
        try:
            result = subprocess.run(["ollama", "pull", model_name], 
                                  capture_output=True, text=True, check=True)
            print(f"Successfully pulled {model_name}")
            return True
        except subprocess.CalledProcessError as e:
            print(f"Error pulling model: {e}")
            return False
    
    def list_models(self):
        """List available models"""
        try:
            result = subprocess.run(["ollama", "list"], 
                                  capture_output=True, text=True, check=True)
            return result.stdout
        except subprocess.CalledProcessError as e:
            print(f"Error listing models: {e}")
            return None
    
    def generate_response(self, prompt, model="gemma3:4b", stream=False):
        """Generate response using Ollama API"""
        url = f"{self.base_url}/api/generate"
        payload = {
            "model": model,
            "prompt": prompt,
            "stream": stream
        }
        
        try:
            response = requests.post(url, json=payload)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.RequestException as e:
            print(f"Error generating response: {e}")
            return None

# 2. API Key Manager
class APIKeyManager:
    def __init__(self, db_path="api_keys.db"):
        self.db_path = db_path
        self.init_database()
    
    def init_database(self):
        """Initialize SQLite database for API keys"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS api_keys (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                key_hash TEXT UNIQUE NOT NULL,
                name TEXT NOT NULL,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                last_used TIMESTAMP,
                is_active BOOLEAN DEFAULT 1,
                usage_count INTEGER DEFAULT 0,
                rate_limit INTEGER DEFAULT 100
            )
        ''')
        conn.commit()
        conn.close()
    
    def generate_api_key(self, name):
        """Generate a new API key"""
        api_key = f"sk-{uuid.uuid4().hex}"
        key_hash = hashlib.sha256(api_key.encode()).hexdigest()
        
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        try:
            cursor.execute('''
                INSERT INTO api_keys (key_hash, name) VALUES (?, ?)
            ''', (key_hash, name))
            conn.commit()
            return api_key
        except sqlite3.IntegrityError:
            return None
        finally:
            conn.close()
    
    def validate_api_key(self, api_key):
        """Validate API key and update usage"""
        key_hash = hashlib.sha256(api_key.encode()).hexdigest()
        
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute('''
            SELECT id, is_active, usage_count, rate_limit FROM api_keys 
            WHERE key_hash = ?
        ''', (key_hash,))
        
        result = cursor.fetchone()
        if result and result[1]:  # is_active
            # Update last_used and usage_count
            cursor.execute('''
                UPDATE api_keys 
                SET last_used = CURRENT_TIMESTAMP, usage_count = usage_count + 1
                WHERE key_hash = ?
            ''', (key_hash,))
            conn.commit()
            conn.close()
            return True
        
        conn.close()
        return False

# 3. Flask API Wrapper
app = Flask(__name__)
ollama_manager = OllamaManager()
api_key_manager = APIKeyManager()

def require_api_key(f):
    """Decorator to require API key authentication"""
    @wraps(f)
    def decorated_function(*args, **kwargs):
        api_key = request.headers.get('Authorization')
        if not api_key:
            return jsonify({'error': 'API key required'}), 401
        
        if api_key.startswith('Bearer '):
            api_key = api_key[7:]
        
        if not api_key_manager.validate_api_key(api_key):
            return jsonify({'error': 'Invalid API key'}), 401
        
        return f(*args, **kwargs)
    return decorated_function

@app.route('/health', methods=['GET'])
def health_check():
    """Health check endpoint"""
    return jsonify({'status': 'healthy', 'timestamp': datetime.utcnow().isoformat()})

@app.route('/api/generate', methods=['POST'])
@require_api_key
def api_generate():
    """Generate text using Ollama"""
    data = request.get_json()
    
    if not data or 'prompt' not in data:
        return jsonify({'error': 'Prompt is required'}), 400
    
    prompt = data['prompt']
    model = data.get('model', 'gemma3:4b')
    stream = data.get('stream', False)
    
    response = ollama_manager.generate_response(prompt, model, stream)
    
    if response:
        return jsonify(response)
    else:
        return jsonify({'error': 'Failed to generate response'}), 500

@app.route('/api/models', methods=['GET'])
@require_api_key
def api_list_models():
    """List available models"""
    models = ollama_manager.list_models()
    if models:
        return jsonify({'models': models})
    else:
        return jsonify({'error': 'Failed to list models'}), 500

@app.route('/api/keys/generate', methods=['POST'])
def generate_key():
    """Generate new API key (admin endpoint)"""
    data = request.get_json()
    if not data or 'name' not in data:
        return jsonify({'error': 'Name is required'}), 400
    
    api_key = api_key_manager.generate_api_key(data['name'])
    if api_key:
        return jsonify({'api_key': api_key})
    else:
        return jsonify({'error': 'Failed to generate API key'}), 500

# 4. Oxygen Positron Custom Connector
@app.route('/ai/chat/completions', methods=['POST'])
@require_api_key
def oxygen_chat_completions():
    """
    Oxygen Positron compatible endpoint
    Follows OpenAI Chat Completions API format
    """
    data = request.get_json()
    
    if not data or 'messages' not in data:
        return jsonify({'error': 'Messages are required'}), 400
    
    messages = data['messages']
    model = data.get('model', 'gemma3:4b')
    max_tokens = data.get('max_tokens', 150)
    temperature = data.get('temperature', 0.7)
    
    # Convert messages to a single prompt
    prompt = ""
    for message in messages:
        role = message.get('role', 'user')
        content = message.get('content', '')
        if role == 'system':
            prompt += f"System: {content}\n"
        elif role == 'user':
            prompt += f"User: {content}\n"
        elif role == 'assistant':
            prompt += f"Assistant: {content}\n"
    
    prompt += "Assistant: "
    
    # Generate response using Ollama
    response = ollama_manager.generate_response(prompt, model)
    
    if response and 'response' in response:
        # Format response in OpenAI Chat Completions format
        completion_response = {
            "id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
            "object": "chat.completion",
            "created": int(time.time()),
            "model": model,
            "choices": [{
                "index": 0,
                "message": {
                    "role": "assistant",
                    "content": response['response']
                },
                "finish_reason": "stop"
            }],
            "usage": {
                "prompt_tokens": len(prompt.split()),
                "completion_tokens": len(response['response'].split()),
                "total_tokens": len(prompt.split()) + len(response['response'].split())
            }
        }
        return jsonify(completion_response)
    else:
        return jsonify({'error': 'Failed to generate response'}), 500

@app.route('/ai/models', methods=['GET'])
@require_api_key
def oxygen_list_models():
    """
    Oxygen Positron compatible models endpoint
    """
    models_output = ollama_manager.list_models()
    if models_output:
        # Parse and format models for Oxygen Positron
        model_list = {
            "object": "list",
            "data": [
                {
                    "id": "gemma3:4b",
                    "object": "model",
                    "created": int(time.time()),
                    "owned_by": "local"
                }
            ]
        }
        return jsonify(model_list)
    else:
        return jsonify({'error': 'Failed to list models'}), 500

# Initialize and run
if __name__ == '__main__':
    # Setup Ollama and Gemma
    print("Setting up Ollama and Gemma model...")
    ollama_manager.install_ollama()
    # ollama_manager.pull_gemma_model()
    ollama_manager.list_models()
    
    # Generate initial API key
    initial_key = api_key_manager.generate_api_key("default")
    print(f"Initial API Key: {initial_key}")
    print("Save this key - it won't be shown again!")
    
    # Start Flask app
    print("Starting Flask API server...")
    app.run(host='0.0.0.0', port=5000, debug=True)

Setting up Ollama and Gemma model...
Ollama is already installed
Initial API Key: sk-738f57669d4d40448f3ae16563ad2d6a
Save this key - it won't be shown again!
Starting Flask API server...
 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://192.168.48.93:5000
Press CTRL+C to quit
 * Restarting with stat


SystemExit: 1

In [2]:
%tb

SystemExit: 1

You can use the selected health check endpoint in several ways to monitor your Flask API:

## 1. Test Health Check via HTTP Request



In [5]:
import requests

# Test the health endpoint
try:
    response = requests.get('http://localhost:5000/health')
    if response.status_code == 200:
        health_data = response.json()
        print(f"Service Status: {health_data['status']}")
        print(f"Timestamp: {health_data['timestamp']}")
    else:
        print(f"Health check failed with status: {response.status_code}")
except requests.exceptions.RequestException as e:
    print(f"Service is unreachable: {e}")

Service Status: healthy
Timestamp: 2025-05-31T00:46:48.500741




## 2. Browser-based Health Check

Simply open your browser and navigate to:


In [None]:
http://localhost:5000/health



You should see a JSON response like:


In [None]:
{
  "status": "healthy",
  "timestamp": "2025-05-31T14:30:45.123456"
}



## 3. Enhance the Health Check Endpoint

You can improve the health check to include more diagnostic information:



In [None]:
@app.route('/health', methods=['GET'])
def health_check():
    """Enhanced health check endpoint"""
    try:
        # Check Ollama connectivity
        ollama_status = "unknown"
        try:
            models = ollama_manager.list_models()
            ollama_status = "healthy" if models else "unhealthy"
        except:
            ollama_status = "unhealthy"
        
        # Check database connectivity
        db_status = "unknown"
        try:
            conn = sqlite3.connect(api_key_manager.db_path)
            conn.close()
            db_status = "healthy"
        except:
            db_status = "unhealthy"
        
        overall_status = "healthy" if ollama_status == "healthy" and db_status == "healthy" else "degraded"
        
        return jsonify({
            'status': overall_status,
            'timestamp': datetime.utcnow().isoformat(),
            'services': {
                'ollama': ollama_status,
                'database': db_status
            }
        })
    except Exception as e:
        return jsonify({
            'status': 'unhealthy',
            'timestamp': datetime.utcnow().isoformat(),
            'error': str(e)
        }), 500



## 4. Automated Health Monitoring Script

Create a monitoring script that regularly checks the health:



In [None]:
import requests
import time
import logging

def monitor_health(url="http://localhost:5000/health", interval=30):
    """Monitor health endpoint every 30 seconds"""
    logging.basicConfig(level=logging.INFO)
    
    while True:
        try:
            response = requests.get(url, timeout=5)
            if response.status_code == 200:
                data = response.json()
                logging.info(f"✅ Service healthy - {data['timestamp']}")
            else:
                logging.warning(f"❌ Health check returned {response.status_code}")
        except Exception as e:
            logging.error(f"🔥 Health check failed: {e}")
        
        time.sleep(interval)

# Run the monitor
monitor_health()



## 5. Command Line Health Check

Use curl or PowerShell to check health from command line:

**Using curl:**


In [None]:
curl http://localhost:5000/health



**Using PowerShell:**


In [None]:
Invoke-RestMethod -Uri "http://localhost:5000/health" -Method GET



## 6. Integration with Load Balancers

The health endpoint is designed to work with load balancers and container orchestrators. Configure your load balancer to:
- **Health Check URL**: `http://localhost:5000/health`
- **Expected Status Code**: `200`
- **Check Interval**: `30 seconds`
- **Timeout**: `5 seconds`

This health check endpoint helps ensure your local LLM API service is running properly and can respond to requests from the Oxygen Positron connector.

## Plan

The code has a SystemExit error which is likely caused by running Flask in a Jupyter notebook. Flask's `app.run()` is designed for standalone scripts, not interactive environments. I need to:

1. Modify the initialization section to work in Jupyter
2. Add proper error handling for Ollama installation
3. Make the Flask app run in a way that's compatible with Jupyter notebooks
4. Add better model list parsing for the oxygen_list_models endpoint
5. Ensure proper cleanup and threading



In [1]:
# Install required dependencies
import subprocess
import sys
import os
import json
import uuid
import hashlib
import time
import threading
from datetime import datetime, timedelta
from functools import wraps
from flask import Flask, request, jsonify, g
import requests
import sqlite3

# ollama_model = "gemma3:4b"
ollama_model = "gemma3:1b"



# Install required packages
def install_packages():
    packages = ['flask', 'requests']
    for package in packages:
        try:
            __import__(package)
        except ImportError:
            subprocess.check_call([sys.executable, "-m", "pip", "install", package])

install_packages()

# 1. Ollama Setup and Gemma Model Management
class OllamaManager:
    def __init__(self, base_url="http://localhost:11434"):
        self.base_url = base_url
        
    def install_ollama(self):
        """Install Ollama if not already installed"""
        try:
            result = subprocess.run(["ollama", "--version"], check=True, capture_output=True, text=True)
            print("Ollama is already installed")
            return True
        except (subprocess.CalledProcessError, FileNotFoundError):
            print("Installing Ollama...")
            # For Windows
            if os.name == 'nt':
                print("Please download and install Ollama from: https://ollama.ai/download")
                print("After installation, restart this notebook and run 'ollama serve' in a terminal")
                return False
            else:
                try:
                    # Download installer script
                    import urllib.request
                    urllib.request.urlretrieve("https://ollama.ai/install.sh", "ollama_install.sh")
                    subprocess.run(["chmod", "+x", "ollama_install.sh"], check=True)
                    subprocess.run(["./ollama_install.sh"], check=True)
                    print("Ollama installed successfully")
                    return True
                except Exception as e:
                    print(f"Failed to install Ollama: {e}")
                    return False
    
    def pull_gemma_model(self, model_name=ollama_model):
        """Pull Gemma model"""
        try:
            print(f"Pulling {model_name} model...")
            result = subprocess.run(["ollama", "pull", model_name], 
                                  capture_output=True, text=True, check=True)
            print(f"Successfully pulled {model_name}")
            return True
        except subprocess.CalledProcessError as e:
            print(f"Error pulling model: {e}")
            print(f"Make sure Ollama is running with 'ollama serve'")
            return False
    
    def list_models(self):
        """List available models"""
        try:
            result = subprocess.run(["ollama", "list"], 
                                  capture_output=True, text=True, check=True)
            return result.stdout
        except subprocess.CalledProcessError as e:
            print(f"Error listing models: {e}")
            return None
    
    def parse_models(self, models_output):
        """Parse ollama list output into structured format"""
        if not models_output:
            return []
        
        models = []
        lines = models_output.strip().split('\n')[1:]  # Skip header
        for line in lines:
            if line.strip():
                parts = line.split()
                if len(parts) >= 3:
                    model_name = parts[0]
                    models.append({
                        "id": model_name,
                        "object": "model",
                        "created": int(time.time()),
                        "owned_by": "local"
                    })
        return models
    
    def generate_response(self, prompt, model=ollama_model, stream=False):
        """Generate response using Ollama API"""
        url = f"{self.base_url}/api/generate"
        payload = {
            "model": model,
            "prompt": prompt,
            "stream": stream
        }
        
        try:
            response = requests.post(url, json=payload, timeout=60)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.RequestException as e:
            print(f"Error generating response: {e}")
            return None

# 2. API Key Manager
class APIKeyManager:
    def __init__(self, db_path="api_keys.db"):
        self.db_path = db_path
        self.init_database()
    
    def init_database(self):
        """Initialize SQLite database for API keys"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS api_keys (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                key_hash TEXT UNIQUE NOT NULL,
                name TEXT NOT NULL,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                last_used TIMESTAMP,
                is_active BOOLEAN DEFAULT 1,
                usage_count INTEGER DEFAULT 0,
                rate_limit INTEGER DEFAULT 100
            )
        ''')
        conn.commit()
        conn.close()
    
    def generate_api_key(self, name):
        """Generate a new API key"""
        api_key = f"sk-{uuid.uuid4().hex}"
        key_hash = hashlib.sha256(api_key.encode()).hexdigest()
        
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        try:
            cursor.execute('''
                INSERT INTO api_keys (key_hash, name) VALUES (?, ?)
            ''', (key_hash, name))
            conn.commit()
            return api_key
        except sqlite3.IntegrityError:
            return None
        finally:
            conn.close()
    
    def validate_api_key(self, api_key):
        """Validate API key and update usage"""
        key_hash = hashlib.sha256(api_key.encode()).hexdigest()
        
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute('''
            SELECT id, is_active, usage_count, rate_limit FROM api_keys 
            WHERE key_hash = ?
        ''', (key_hash,))
        
        result = cursor.fetchone()
        if result and result[1]:  # is_active
            # Update last_used and usage_count
            cursor.execute('''
                UPDATE api_keys 
                SET last_used = CURRENT_TIMESTAMP, usage_count = usage_count + 1
                WHERE key_hash = ?
            ''', (key_hash,))
            conn.commit()
            conn.close()
            return True
        
        conn.close()
        return False

# 3. Flask API Wrapper
app = Flask(__name__)
ollama_manager = OllamaManager()
api_key_manager = APIKeyManager()

def require_api_key(f):
    """Decorator to require API key authentication"""
    @wraps(f)
    def decorated_function(*args, **kwargs):
        api_key = request.headers.get('Authorization')
        if not api_key:
            return jsonify({'error': 'API key required'}), 401
        
        if api_key.startswith('Bearer '):
            api_key = api_key[7:]
        
        if not api_key_manager.validate_api_key(api_key):
            return jsonify({'error': 'Invalid API key'}), 401
        
        return f(*args, **kwargs)
    return decorated_function

@app.route('/health', methods=['GET'])
def health_check():
    """Enhanced health check endpoint"""
    try:
        # Check Ollama connectivity
        ollama_status = "unknown"
        try:
            models = ollama_manager.list_models()
            ollama_status = "healthy" if models else "unhealthy"
        except:
            ollama_status = "unhealthy"
        
        # Check database connectivity
        db_status = "unknown"
        try:
            conn = sqlite3.connect(api_key_manager.db_path)
            conn.close()
            db_status = "healthy"
        except:
            db_status = "unhealthy"
        
        overall_status = "healthy" if ollama_status == "healthy" and db_status == "healthy" else "degraded"
        
        return jsonify({
            'status': overall_status,
            'timestamp': datetime.utcnow().isoformat(),
            'services': {
                'ollama': ollama_status,
                'database': db_status
            }
        })
    except Exception as e:
        return jsonify({
            'status': 'unhealthy',
            'timestamp': datetime.utcnow().isoformat(),
            'error': str(e)
        }), 500

@app.route('/api/generate', methods=['POST'])
@require_api_key
def api_generate():
    """Generate text using Ollama"""
    data = request.get_json()
    
    if not data or 'prompt' not in data:
        return jsonify({'error': 'Prompt is required'}), 400
    
    prompt = data['prompt']
    model = data.get('model', 'gemma3:4b')
    stream = data.get('stream', False)
    
    response = ollama_manager.generate_response(prompt, model, stream)
    
    if response:
        return jsonify(response)
    else:
        return jsonify({'error': 'Failed to generate response'}), 500

@app.route('/api/models', methods=['GET'])
@require_api_key
def api_list_models():
    """List available models"""
    models_output = ollama_manager.list_models()
    if models_output:
        parsed_models = ollama_manager.parse_models(models_output)
        return jsonify({'models': parsed_models})
    else:
        return jsonify({'error': 'Failed to list models'}), 500

@app.route('/api/keys/generate', methods=['POST'])
def generate_key():
    """Generate new API key (admin endpoint)"""
    data = request.get_json()
    if not data or 'name' not in data:
        return jsonify({'error': 'Name is required'}), 400
    
    api_key = api_key_manager.generate_api_key(data['name'])
    if api_key:
        return jsonify({'api_key': api_key})
    else:
        return jsonify({'error': 'Failed to generate API key'}), 500

# 4. Oxygen Positron Custom Connector
@app.route('/ai/chat/completions', methods=['POST'])
@require_api_key
def oxygen_chat_completions():
    """
    Oxygen Positron compatible endpoint
    Follows OpenAI Chat Completions API format
    """
    data = request.get_json()
    
    if not data or 'messages' not in data:
        return jsonify({'error': 'Messages are required'}), 400
    
    messages = data['messages']
    model = data.get('model', ollama_model)
    max_tokens = data.get('max_tokens', 150)
    temperature = data.get('temperature', 0.7)
    
    # Convert messages to a single prompt
    prompt = ""
    for message in messages:
        role = message.get('role', 'user')
        content = message.get('content', '')
        if role == 'system':
            prompt += f"System: {content}\n"
        elif role == 'user':
            prompt += f"User: {content}\n"
        elif role == 'assistant':
            prompt += f"Assistant: {content}\n"
    
    prompt += "Assistant: "
    
    # Generate response using Ollama
    response = ollama_manager.generate_response(prompt, model)
    
    if response and 'response' in response:
        # Format response in OpenAI Chat Completions format
        completion_response = {
            "id": f"chatcmpl-{uuid.uuid4().hex[:8]}",
            "object": "chat.completion",
            "created": int(time.time()),
            "model": model,
            "choices": [{
                "index": 0,
                "message": {
                    "role": "assistant",
                    "content": response['response']
                },
                "finish_reason": "stop"
            }],
            "usage": {
                "prompt_tokens": len(prompt.split()),
                "completion_tokens": len(response['response'].split()),
                "total_tokens": len(prompt.split()) + len(response['response'].split())
            }
        }
        return jsonify(completion_response)
    else:
        return jsonify({'error': 'Failed to generate response'}), 500

@app.route('/ai/models', methods=['GET'])
@require_api_key
def oxygen_list_models():
    """
    Oxygen Positron compatible models endpoint
    """
    models_output = ollama_manager.list_models()
    if models_output:
        # Parse and format models for Oxygen Positron
        parsed_models = ollama_manager.parse_models(models_output)
        if not parsed_models:
            # Fallback if no models found
            parsed_models = [{
                "id": ollama_model,
                "object": "model",
                "created": int(time.time()),
                "owned_by": "local"
            }]
        
        model_list = {
            "object": "list",
            "data": parsed_models
        }
        return jsonify(model_list)
    else:
        return jsonify({'error': 'Failed to list models'}), 500

# Global variable to track server thread
server_thread = None

def run_flask_app():
    """Run Flask app in a separate thread"""
    app.run(host='0.0.0.0', port=5000, debug=True, use_reloader=False)

def start_server():
    """Start the Flask server"""
    global server_thread
    if server_thread is None or not server_thread.is_alive():
        server_thread = threading.Thread(target=run_flask_app, daemon=True)
        server_thread.start()
        print("Flask server started on http://localhost:5000")
    else:
        print("Server is already running")

def setup_and_start():
    """Setup Ollama and start the Flask server"""
    print("Setting up Ollama and Gemma model...")
    
    # Install Ollama
    if not ollama_manager.install_ollama():
        print("⚠️  Please install Ollama manually and run 'ollama serve' before continuing")
        return None
    
    # Try to list models (this will also test if Ollama is running)
    models = ollama_manager.list_models()
    if models:
        print("Available models:")
        print(models)
    else:
        print("⚠️  No models found. You may need to pull a model first.")
        print(f"Run: ollama pull {ollama_model}")
    
    # Generate initial API key
    initial_key = api_key_manager.generate_api_key("default")
    if initial_key:
        print(f"\n🔑 Initial API Key: {initial_key}")
        print("Save this key - it won't be shown again!")
    
    # Start Flask app in background thread
    start_server()
    
    return initial_key

# Initialize setup
if __name__ == '__main__':
    setup_and_start()
else:
    # When imported in Jupyter, just run setup
    initial_key = setup_and_start()
    if initial_key:
        print(f"\n✅ Server setup complete!")
        print(f"🌐 Health check: http://localhost:5000/health")
        print(f"📚 API documentation available in the notebook cells above")

Setting up Ollama and Gemma model...
Ollama is already installed
Available models:
NAME         ID              SIZE      MODIFIED     
gemma3:1b    8648f39daa8f    815 MB    9 hours ago     
gemma3:4b    a2af6cc3eb7f    3.3 GB    13 hours ago    


🔑 Initial API Key: sk-2c76be0b113d4e54aa129a3289eea8b5
Save this key - it won't be shown again!
Flask server started on http://localhost:5000
 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://192.168.48.93:5000
Press CTRL+C to quit
127.0.0.1 - - [31/May/2025 11:11:14] "POST /ai/chat/completions HTTP/1.1" 200 -


In [10]:
# Add this to your notebook cell
import threading
import time

def stop_server():
    """Stop the Flask server gracefully"""
    global server_thread
    if server_thread and server_thread.is_alive():
        print("🛑 Stopping Flask server...")
        # Since we're using daemon threads, they'll stop when main thread stops
        # For a more graceful shutdown, we could use shutdown methods
        print("✅ Server stopped")
    else:
        print("ℹ️  No server running")

def get_server_status():
    """Check if server is running"""
    global server_thread
    if server_thread and server_thread.is_alive():
        print("🟢 Server is running")
        print(f"🌐 Health check: http://localhost:5000/health")
    else:
        print("🔴 Server is not running")

# Call these functions to manage your server
stop_server()
# get_server_status()

🛑 Stopping Flask server...
✅ Server stopped


## setting




## result

| xml | p.67c4 | t.No.366,p.348b9 |
|----------|----------|----------|
|4o-mini<br>ref markup+| Row 1 Col 3 | Row 1 Col 4 |
| gemma3:4b<br>ref markup+ | ```<ref><canon>p</canon>.<v>67</v>.<c>4</c>``` | ```<ref><canon>T</canon>.<v>366</v>,<p>348</p><c>b</c>.<l>9</l></ref>``` |
|gemma3:4b<br>ref markup++| Row 1 Col 3 | Row 1 Col 4 |
|gemma3:1b<br>ref markup+| ```<ref><canon>p</canon>.<v>67</v>,<p>4</p><c>4</c>.<l>4</l></ref>``` | ```<ref><canon>No.<w>366</w> p.<p>348</p>.<c>b</c><l>9</l></ref>``` |
|gemma3:1b<br>ref markup++| Row 1 Col 3 | Row 1 Col 4 |

