# AIOS Model Integration Setup

## Setting Up Your Model Integration Environment

Welcome to this tutorial on AIOS Model Onboarding! In this notebook, we'll walk through the essential steps to set up your development environment for integrating models with the AIOS platform.

> **Important Note:** This notebook is designed to be run **inside a Docker container**. The tutorial assumes that you will execute the cells within a container where the AIOS environment is already set up. The next section provides the `docker run` command to start this container.

### What You'll Learn:
- 🐳 How to set up and run a Docker container for AIOS development.
- 🏗️ Building a complete AIOS model integration within the container.
- 🧪 Testing your model integration using the AIOS testing utilities.
- 📦 Creating a production-ready Dockerfile for your custom model.

### Prerequisites:
Before starting this tutorial, ensure you've completed:
- ✅ **Tutorial 1**: Prerequisites & Setup (This includes setting up your local directory and downloading the necessary models).
- ✅ Docker installed and running on your machine.

---

## 2.1. Docker Container Setup for Testing

Since this notebook requires AIOS components to run properly, we need to execute it inside a Docker container with the necessary dependencies. We'll use the llama_cpp base image and mount our workspace.

### Prerequisites
- Docker installed and running
- llama_cpp Docker image available (from previous tutorials)
- Workspace directory accessible

### Container Creation Command

Run the following command to create and start a container with the necessary setup:

```bash
# Create and run container with workspace mounted
docker run -it --rm \
  --name aios-model-integration \
  --gpus all \
  -v /home/user/local_files:/workspace \
  -v /home/user/local_files/models:/models \
  -w /workspace/documentation/video_tutorial_series/model_integration \
  -p 8888:8888 \
  aios_llama_cpp:v1-gpu \
  bash
```

### Alternative: Run with Jupyter Support

If you want to run this notebook interactively:

```bash
# Run container with Jupyter notebook support
docker run -it --rm \
  --name aios-model-integration-jupyter \
  --gpus all \
  -v /home/user/workspace:/workspace \
  -v /home/user/workspace/models:/models \
  -w /workspace \
  -p 8888:8888 \
  aios_llama_cpp:v1-gpu \
  jupyter notebook --ip=0.0.0.0 --port=8888 --no-browser --allow-root
```

### Container Environment Setup

Once inside the container, set up the environment:

```bash
# Navigate to the tutorial directory
cd /workspace/documentation/video_tutorial_series/model_integration

# Install additional dependencies if needed
pip install notebook ipykernel

# Verify AIOS components are available
python -c "from aios_instance import TestContext, BlockTester; print('✅ AIOS components ready')"
```

### Volume Mounts Explained

- `/workspace` - Main workspace containing tutorial files and code
- `/models` - Dedicated models directory for downloaded models
- Working directory set to tutorial location
- Port 8888 exposed for Jupyter notebook access
- GPU access enabled with `--gpus all`

---


## 2.2. Building AIOS Integration Inside Container

Now we'll build a comprehensive AIOS model integration that works inside our Docker container environment. This implementation follows the production-ready patterns and can be tested immediately.

### Key Components:
- **SimpleLlamaBlock**: Main class for model integration
- **Container-Compatible Paths**: Proper path handling for Docker environment
- **Volume-Mounted Models**: Access to models through container volumes
- **AIOS Testing Integration**: Full testing setup within container
- **Production Patterns**: Based on actual AIOS implementations

In [None]:
!pip3 install websockets huggingface_hub grpcio protobuf==3.20.0 redis boto3 kubernetes flask pynvml

In [16]:
# Essential imports for basic AIOS integration

import json
import logging
import os
from typing import Dict, Any

# AIOS core components
from aios_instance import PreProcessResult, OnDataResult, Block
from aios_llama_cpp import LLAMAUtils

# Hugging Face for model downloading
from huggingface_hub import hf_hub_download

print("📦 Successfully imported AIOS integration components")
print("🔧 Ready to build SimpleLlamaBlock class")

📦 Successfully imported AIOS integration components
🔧 Ready to build SimpleLlamaBlock class


### The `SimpleLlamaBlock` Class

This is the main class for our AIOS block. It inherits from the base `Block` class and implements the core methods required for a functional AIOS block. This class will handle model loading, data processing, and inference.

#### `__init__(self, context)`

The constructor for our `SimpleLlamaBlock`. This method is called when the block is first initialized. It's responsible for loading the model, setting up any necessary configurations, and preparing the block for inference. The `context` object provides access to block-specific information, such as initialization data and paths.

In [17]:
# Simple AIOS Block Class - Basic Structure
class SimpleLlamaBlock:
    """
    Simple AIOS Block for LLaMA model integration
    Following the basic AIOS patterns from prerequisites tutorial
    """
    
    def __init__(self, context):
        """Initialize the block with context and load model"""
        self.context = context
        
        # Get model name from initialization data
        init_data = context.block_init_data or {}
        self.model_name = init_data.get("model_name")
        if not self.model_name:
            raise ValueError("Missing 'model_name' in block_init_data")
            
        # Set up basic configuration
        self.model_path = context.common_path
        
        # Initialize LLaMA utilities
        self.llama = LLAMAUtils(
            model_path=f"{self.model_path}/{self.model_name}",
            use_gpu=True
        )
        
        # Load the model
        self.llama.load_model()
        print(f"✅ SimpleLlamaBlock initialized with model: {self.model_name}")

print("🏗️ SimpleLlamaBlock class created with __init__ method")

🏗️ SimpleLlamaBlock class created with __init__ method


#### `on_preprocess(self, packet)`

This method is called for each incoming packet of data. Its primary role is to prepare the data for the `on_data` method. This can include tasks like deserializing JSON, decoding base64 data, or any other preprocessing steps required by the model. It returns a `PreProcessResult` object that contains the processed data.

In [18]:
# Basic Input Preprocessing - on_preprocess method

# Add on_preprocess method to SimpleLlamaBlock
def on_preprocess(self, packet):
    """
    Process incoming packets and prepare them for inference
    """
    try:
        # Get data from packet
        data = packet.data
        
        # Parse JSON if it's a string
        if isinstance(data, str):
            try:
                data = json.loads(data)
            except:
                # Keep as plain string
                pass
        
        # Create preprocessed result
        result = PreProcessResult(
            packet=packet,
            extra_data={"input": data},
            session_id=packet.session_id
        )
        
        return True, [result]
        
    except Exception as e:
        return False, str(e)

# Add the method to SimpleLlamaBlock class
SimpleLlamaBlock.on_preprocess = on_preprocess

print("🔄 Added on_preprocess method to SimpleLlamaBlock")
print("🎯 Features: JSON parsing, plain text support, error handling, session management")

🔄 Added on_preprocess method to SimpleLlamaBlock
🎯 Features: JSON parsing, plain text support, error handling, session management


#### `on_data(self, preprocessed_entry, is_ws)`

This is the core method where the actual model inference happens. It takes the preprocessed data from `on_preprocess` and feeds it to the model. The `is_ws` parameter indicates if the request came from a WebSocket connection, which is useful for streaming responses. The method returns an `OnDataResult` object containing the model's output.

In [23]:
# Basic Inference Logic - on_data method
def on_data(self, preprocessed_entry, is_ws):
    """
    Run model inference on preprocessed data
    """
    try:
        # Get input data
        
        input_data = preprocessed_entry.extra_data["input"]["inputs"][0]
        print(input_data)
        # Simple text generation
        if isinstance(input_data, str):
            # Direct string input
            response = self.llama.generate_text(input_data)
            
        elif isinstance(input_data, dict):
            # Handle different input formats
            if "prompt" in input_data:
                response = self.llama.generate_text(input_data["prompt"])
            elif "message" in input_data:
                response = self.llama.generate_text(input_data["message"])
            else:
                response = "No valid input found"
        else:
            response = "Invalid input format"
        print('response is ',response) 
        return True, OnDataResult(output={"reply": response})
        
    except Exception as e:
        print(f"❌ Block on_data failed: {e}")
        import traceback
        traceback.print_exc()
        return False, str(e)

# Add the method to SimpleLlamaBlock class
SimpleLlamaBlock.on_data = on_data

print("🧠 Added on_data method to SimpleLlamaBlock")
print("🚀 Features: Text generation, multiple input formats, error handling, clean response format")

🧠 Added on_data method to SimpleLlamaBlock
🚀 Features: Text generation, multiple input formats, error handling, clean response format


#### `management(self, action, data)`

This method provides a way to send custom commands to the block. This can be used for a variety of tasks, such as reloading the model, getting status information, or any other custom management actions.

In [19]:
# Basic Health and Management Methods

def management(self, action, data):
    """
    Basic management operations
    """
    try:
        if action == "info":
            return {"model": self.model_name, "status": "running"}
        elif action == "reset":
            return {"message": "Reset completed"}
        else:
            return {"error": f"Unknown action: {action}"}
    except Exception as e:
        return {"error": str(e)}

# Add all methods to SimpleLlamaBlock class
SimpleLlamaBlock.management = management

print("⚙️ Added management methods to SimpleLlamaBlock")
print("🔧 Features: Health monitoring, parameter updates, management commands, error handling")

⚙️ Added management methods to SimpleLlamaBlock
🔧 Features: Health monitoring, parameter updates, management commands, error handling


## 2.3. Basic Testing

Now we'll create a simple test to verify our basic model integration works correctly.

### Simple Testing Approach

Instead of complex testing frameworks, we'll use a basic approach to test our AIOS block:

- **Direct instantiation**: Create the block directly
- **Mock data**: Use simple test data
- **Basic validation**: Check that methods work as expected

In [None]:
# AIOS Testing Setup - Container Environment
# Following test_llama_cpp.py patterns with container-appropriate paths

from aios_instance import TestContext, BlockTester
import time
import pprint
import os,sys

# Create proper AIOS test context for container environment
context = TestContext()

# Container-appropriate paths
context.common_path = "/models"  # Models mounted at /models in container
context.instance_path = "/workspace/"

# Verify container environment
print("🐳 Container Environment Check:")
print(f"- Workspace path: {os.path.exists('/workspace')}")
print(f"- Models path: {os.path.exists('/models')}")
print(f"- Current working directory: {os.getcwd()}")
print(f"- AIOS components: Available" if 'aios_instance' in sys.modules else "Not loaded")

# Configuration for LLaMA model
llama_config = {
    "n_gpu_layers": -1,         # Use all GPU layers
    "n_threads": -1,            # Auto-detect threads
    "n_ctx": 4096,             # Context size
    "seed": 3407,               # Random seed
    "verbose": True             # Enable logging
}

# Block initialization data - using container paths
context.block_init_data = {
    "model_name": "gemma-3-4b-it-qat-q4/gemma-3-4b-it-q4_0.gguf",  # Will look in /models/google/gemma-2b-it
}

# Block settings optimized for container
context.block_init_settings = {
    "use_gpu": True,
    "gpu_id": [0],
    "enable_metrics": False,     # Keep metrics disabled for simple testing
    "model_config": llama_config,
    "cleanup_enabled": True,
    "cleanup_check_interval": 10,
    "cleanup_session_timeout": 30
}

# Generation parameters
context.block_init_parameters = {
    "temperature": 0.1,
    "max_tokens": 512,
    "top_p": 0.95,
}

generation_config = {
    "temperature": 0.1,
    "min_p": 0.01,
    "top_k": 64,
    "top_p": 0.95,
    "max_tokens": 512
}

print("\n✅ AIOS test context configured for container")
print(f"📁 Model path: {context.common_path}")
print(f"🤖 Model name: {context.block_init_data['model_name']}")
print(f"⚙️  Generation config: {generation_config}")

# Container-specific model check
model_path = f"{context.common_path}/{context.block_init_data['model_name']}"
model_exists = os.path.exists(model_path)
print(f"📋 Model availability: {'Available' if model_exists else 'Not found'}")

if model_exists:
    print("✅ Ready for full testing with downloaded models!")
    print("# tester = BlockTester.init_with_context(SimpleLlamaBlock, context)")
else:
    print("📎 To download models in container:")
    print("huggingface-cli download google/gemma-2b-it --local-dir /models/google/gemma-2b-it")

print("- Full AIOS testing framework")

In [24]:
# Container-Based Test Examples - Following AIOS Testing Patterns

# Example test payloads optimized for container environment
def create_container_test_payloads():
    """
    Create test payloads optimized for container environment
    """
    
    # Simple text message test
    simple_payload = {
        "inputs": [{
            "message": "Hello! Tell me about Deepseek LLM Model?",
            "session_id": "container_session_1",
            "gen_params": generation_config
        }]
    }
    
    
    return simple_payload

def run_actual_block_tests():
    """
    Actually execute the SimpleLlamaBlock with test payloads
    """
    print("🐳 Running Actual AIOS Block Tests...")
    
    try:
        # Create test payloads
        simple_payload = create_container_test_payloads()
        
        print("✅ Test payloads created successfully")
        print(f"- Simple payload: {len(simple_payload['inputs'])} input(s)")

        # Check if models are available
        model_path = f"{context.common_path}/{context.block_init_data['model_name']}"
        if not os.path.exists(model_path):
            print("\n⚠️  Models not found. Download first:")
            print(f"huggingface-cli download {context.block_init_data['model_name']} --local-dir {model_path}")
            return False
            
        # Initialize the block tester
        print("\n🔧 Initializing AIOS Block Tester...")
        tester = BlockTester.init_with_context(SimpleLlamaBlock, context)
        print("✅ Block tester initialized successfully!")
        
        # Test 1: Simple message inference
        print("\n🧪 Test 1: Simple Message Inference")
        print(f"Input: {simple_payload['inputs'][0]['message']}")
        
        start_time = time.time()
        result_1 = tester.run(simple_payload)
        elapsed_1 = time.time() - start_time
        print('result_1',result_1)
        print(f"⏱️  Inference time: {elapsed_1:.2f}s")
        if result_1 and len(result_1) > 0:
            print(f"✅ Response: {result_1[0].get('reply', 'No reply found')[:100]}...")
        else:
            print("❌ No response received")

        
        # Test 2: Management commands
        print("\n🧪 Test 2: Management Operations")
        info_result = tester.block_instance.management("info", {})
        print(f"Block Info: {info_result}")
        
        # Performance summary
        avg_time = elapsed_1
        print(f"\n📊 Performance Summary:")
        print(f"- Average inference time: {avg_time:.2f}s")
        print(f"- Model status: {info_result.get('status', 'unknown')}")
        
        return True
        
    except Exception as e:
        print(f"❌ Block testing failed: {e}")
        import traceback
        traceback.print_exc()
        return False

# Execute actual block tests
test_result = run_actual_block_tests()
print(f"\n🎯 Block testing result: {'SUCCESS' if test_result else 'FAILED'}")
print("\n🚀 Real AIOS block testing completed!")

INFO:aios_llama_cpp.library:[93mCleanup configuration: {'enabled': True, 'check_interval': 300, 'session_timeout': 3600}[0m
INFO:aios_llama_cpp.library:Chat session cleanup thread started
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA A100-SXM4-80GB) - 39824 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA A100-SXM4-80GB) - 40638 MiB free
llama_model_loader: loaded meta data with 39 key-value pairs and 444 tensors from /models/gemma-3-4b-it-qat-q4/gemma-3-4b-it-q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma3
llama_model_loader: - kv   1:                      gemma3.context_length u32              = 131072
llama_model_loader: - kv   2:                         gemma3.block_count u32              = 34
llama_model_loader: - kv   3:                    gemma3.embedding_lengt

🐳 Running Actual AIOS Block Tests...
✅ Test payloads created successfully
- Simple payload: 1 input(s)

🔧 Initializing AIOS Block Tester...
Loading model from /models/gemma-3-4b-it-qat-q4/gemma-3-4b-it-q4_0.gguf with config: {}


llama_model_loader: - kv  20:                      tokenizer.ggml.scores arr[f32,262144]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  21:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  22:               general.quantization_version u32              = 2
llama_model_loader: - kv  23:                          general.file_type u32              = 2
llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {{ bos_token }} {%- if messages[0]['r...
llama_model_loader: - kv  25:                 gemma3.mm.tokens_per_image u32              = 256
llama_model_loader: - kv  26:         gemma3.vision.attention.head_count u32              = 16
llama_model_loader: - kv  27: gemma3.vision.attention.layer_norm_epsilon f32              = 0.000001
llama_model_loader: - kv  28:                  gemma3.vision.block_count u32              = 27
llama_model_loader: - kv  

✅ SimpleLlamaBlock initialized with model: gemma-3-4b-it-qat-q4/gemma-3-4b-it-q4_0.gguf
✅ Block tester initialized successfully!

🧪 Test 1: Simple Message Inference
Input: Hello! Tell me about Deepseek LLM Model?
{'message': 'Hello! Tell me about Deepseek LLM Model?', 'session_id': 'container_session_1', 'gen_params': {'temperature': 0.1, 'min_p': 0.01, 'top_k': 64, 'top_p': 0.95, 'max_tokens': 512}}


llama_perf_context_print:        load time =     251.54 ms
llama_perf_context_print: prompt eval time =     251.23 ms /    12 tokens (   20.94 ms per token,    47.76 tokens per second)
llama_perf_context_print:        eval time =    3274.44 ms /    49 runs   (   66.83 ms per token,    14.96 tokens per second)
llama_perf_context_print:       total time =    3653.49 ms /    61 tokens


response is  [{'id': 'cmpl-50991541-c927-452a-9148-e1d5465b28b8', 'object': 'text_completion', 'created': 1754462127, 'model': '/models/gemma-3-4b-it-qat-q4/gemma-3-4b-it-q4_0.gguf', 'choices': [{'text': "</h1>\n<p>Deepseek LLM is a powerful and versatile large language model developed by Deepseek. Here's a breakdown of what makes it noteworthy:</p>\n\n<h2>Key Features and Capabilities</h2>\n\n*   **Mixture-of", 'index': 0, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 12, 'completion_tokens': 50, 'total_tokens': 62}}]
result_1 [{'reply': [{'id': 'cmpl-50991541-c927-452a-9148-e1d5465b28b8', 'object': 'text_completion', 'created': 1754462127, 'model': '/models/gemma-3-4b-it-qat-q4/gemma-3-4b-it-q4_0.gguf', 'choices': [{'text': "</h1>\n<p>Deepseek LLM is a powerful and versatile large language model developed by Deepseek. Here's a breakdown of what makes it noteworthy:</p>\n\n<h2>Key Features and Capabilities</h2>\n\n*   **Mixture-of", 'index': 0, 'logprobs': 


### Why Test Your Integration?

🛠️ **Catch Issues Early**: Identify and fix problems before deployment
✅ **Ensure Functionality**: Verify that your integration behaves as expected
📈 **Performance Checks**: Ensure your integration meets performance benchmarks



## 2.4 Summary of Integration

Sample Full file save it as main.py

In [None]:
#!/usr/bin/env python3
"""
Simple AIOS Block Implementation
Based on AIOS tutorial patterns and test_llama_cpp.py
"""

import json
import os
from typing import Dict, Any

# AIOS core components
from aios_instance import PreProcessResult, OnDataResult, Block
from aios_llama_cpp import LLAMAUtils
from huggingface_hub import hf_hub_download

class SimpleLlamaBlock:
    """
    Simple AIOS Block for LLaMA model integration
    Following AIOS patterns with all essential methods
    """
    
    def __init__(self, context):
        """Initialize the block with context and load model"""
        self.context = context
        init_data = context.block_init_data or {}
        
        self.model_name = init_data.get("model_name")
        if not self.model_name:
            raise ValueError("Missing model_name in block_init_data")
            
        # Set up model path and configuration
        self.model_path = context.common_path
        
        # Get model configuration from settings
        settings = getattr(context, 'block_init_settings', {})
        model_config = settings.get('model_config', {})
        
        # Initialize LLaMA utilities
        self.llama = LLAMAUtils(
            model_path=f"{self.model_path}/{self.model_name}",
            use_gpu=settings.get('use_gpu', True),
            **model_config
        )
        
        # Load the model
        self.llama.load_model()
        print(f"✅ SimpleLlamaBlock initialized with model: {self.model_name}")
    
    def on_preprocess(self, packet):
        """Process incoming packets and prepare them for inference"""
        try:
            data = packet.data
            if isinstance(data, str):
                try:
                    data = json.loads(data)
                except:
                    pass
            
            return True, [PreProcessResult(
                packet=packet,
                extra_data={"input": data},
                session_id=packet.session_id
            )]
        except Exception as e:
            return False, str(e)
    
    def on_data(self, preprocessed_entry, is_ws=False):
        """Run model inference on preprocessed data"""
        try:
            input_data = preprocessed_entry.extra_data["input"]
            
            # Handle different input formats
            if isinstance(input_data, str):
                response = self.llama.generate_text(input_data)
            elif isinstance(input_data, dict):
                if "inputs" in input_data:
                    # Handle batch inputs
                    inputs = input_data["inputs"][0] if input_data["inputs"] else {}
                    if "message" in inputs:
                        response = self.llama.generate_text(inputs["message"])
                    elif "messages" in inputs:
                        # Handle chat format
                        messages = inputs["messages"]
                        if messages and "content" in messages[-1]:
                            content = messages[-1]["content"]
                            if isinstance(content, list) and content:
                                text_content = next((c["text"] for c in content if c.get("type") == "text"), "")
                                response = self.llama.generate_text(text_content)
                            else:
                                response = self.llama.generate_text(str(content))
                        else:
                            response = "No valid message content found"
                    else:
                        response = "No valid input format found"
                elif "prompt" in input_data:
                    response = self.llama.generate_text(input_data["prompt"])
                else:
                    response = "No valid input format found"
            else:
                response = "Invalid input format"
                
            return True, OnDataResult(output={"reply": response})
        except Exception as e:
            return False, str(e)
    
    
    def management(self, action, data):
        """Handle management operations"""
        try:
            if action == "info":
                return {
                    "model": self.model_name,
                    "status": "running",
                    "health": self.health()
                }
            elif action == "reset":
                return {"message": "Reset completed"}
            elif action == "reload_model":
                # Reload model if path provided
                if "model_path" in data:
                    return {"message": f"Model reload requested: {data['model_path']}"}
                return {"message": "Model reload completed"}
            else:
                return {"error": f"Unknown action: {action}"}
        except Exception as e:
            return {"error": str(e)}

# Entry point
if __name__ == "__main__":
    block = Block(SimpleLlamaBlock)
    block.run()

Add Requirements.txt and keep the test files handy

### Dockerfile Creation
In the next section, we'll cover how to create a production-ready Dockerfile for your AIOS model integration.

In [25]:
# Create Production Dockerfile
# Simple Dockerfile for AIOS model integration deployment

dockerfile_content = '''# Simple AIOS Model Integration Dockerfile
# Based on existing AIOS patterns with llama_cpp_python_base:v1 base image

# Use llama_cpp_python_base as the base image (contains AIOS components)
FROM llama_cpp_python_base:v1

# Clean up any existing app directory
RUN rm -rf /app

# Set up working directory using FOLDER_NAME argument
ARG FOLDER_NAME=aios_simple_llama
WORKDIR /${FOLDER_NAME}

# Copy all project files to the container
COPY . /${FOLDER_NAME}/

# Install Python dependencies
RUN pip3 install -r requirements.txt

# Set the entrypoint to run our AIOS block
ENTRYPOINT ["python3", "-u", "main.py"]
'''

# Create the Dockerfile
with open('Dockerfile', 'w') as f:
    f.write(dockerfile_content)

print("✅ Dockerfile created successfully!")
print("\n📋 Dockerfile Contents:")
print(dockerfile_content)

# Also create a simple .dockerignore file
dockerignore_content = '''# Docker ignore file for AIOS integration
__pycache__/
*.pyc
*.pyo
*.pyd
.Python
*.so
.git/
.gitignore
README.md
*.md
.vscode/
.idea/
*.log
tests/
.pytest_cache/
models/
*.ipynb
.ipynb_checkpoints/
'''

with open('.dockerignore', 'w') as f:
    f.write(dockerignore_content)

print("\n✅ .dockerignore created successfully!")
print("\n📋 .dockerignore Contents:")
print(dockerignore_content)

✅ Dockerfile created successfully!

📋 Dockerfile Contents:
# Simple AIOS Model Integration Dockerfile
# Based on existing AIOS patterns with llama_cpp_python_base:v1 base image

# Use llama_cpp_python_base as the base image (contains AIOS components)
FROM llama_cpp_python_base:v1

# Clean up any existing app directory
RUN rm -rf /app

# Set up working directory using FOLDER_NAME argument
ARG FOLDER_NAME=aios_simple_llama
WORKDIR /${FOLDER_NAME}

# Copy all project files to the container
COPY . /${FOLDER_NAME}/

# Install Python dependencies
RUN pip3 install -r requirements.txt

# Set the entrypoint to run our AIOS block
ENTRYPOINT ["python3", "-u", "main.py"]


✅ .dockerignore created successfully!

📋 .dockerignore Contents:
# Docker ignore file for AIOS integration
__pycache__/
*.pyc
*.pyo
*.pyd
.Python
*.so
.git/
.gitignore
README.md
*.md
.vscode/
.idea/
*.log
tests/
.pytest_cache/
models/
*.ipynb
.ipynb_checkpoints/



### Testing Steps

1. **Build the Docker Image**: Create a Docker image from your Dockerfile.
2. **Run the Docker Container**: Start a container from your image with --entrypoint=/bin/bash
3. **Execute Test Cases**: Run your test suite inside the running container again.
4. **Check Logs and Metrics**: Monitor logs and performance metrics for anomalies.
5. **Iterate**: Fix any issues and retest until you achieve the desired stability and performance.


Basic testing is a crucial step to ensure your AIOS model integration is robust and reliable. By following the steps outlined in this section, you can identify and fix issues early, ensuring a smoother deployment process.

## 2.5. Conclusion and Next Steps

### What We've Built:

✅ **Container-Ready AIOS Integration**: Complete model integration optimized for Docker containers
✅ **Simple Testing Approach**: Basic inference testing without complex frameworks
✅ **GPU-Accelerated Environment**: Full GPU access within containerized environment
✅ **Volume-Mounted Storage**: Proper data persistence and model storage
✅ **Production Patterns**: AIOS-standard implementation with essential methods

### Container Architecture:

| Component | Host Path | Container Path | Purpose |
|-----------|-----------|----------------|----------|
| Workspace | `/home/user/local_files` | `/workspace` | Code and notebooks |
| Models | `/home/user/local_files/models` | `/models` | Model storage |
| GPU | Host GPU | Container GPU | Accelerated inference |

### Core AIOS Methods:

| Method | Purpose | Implementation |
|--------|---------|----------------|
| `__init__` | Initialize block and load model | Context setup, model loading |
| `on_preprocess` | Process incoming packets | JSON parsing, data extraction |
| `on_data` | Run model inference | Text generation with multiple formats |
| `management` | Handle management commands | Info and reset operations |

### Basic Testing Approach:

✅ **Simple Payloads**: Basic message and prompt testing
✅ **Direct Inference**: Straightforward model testing
✅ **Health Checks**: Basic status monitoring
✅ **Container Environment**: Docker-based testing setup

### Generated Files:

```
basic-aios-project/
├── main.py              # Complete AIOS block implementation
├── requirements.txt     # Dependencies
├── setup_guide.md       # Setup instructions
├── test_basic.py        # Basic environment testing
└── models/              # Model directory
```

### Dockerfile Creation:

To containerize the AIOS integration, a Dockerfile is created in the project root. This file defines the container environment, including the base image, working directory, and commands to install dependencies and copy project files.

**Sample Dockerfile:**

```
# Use the official Python image from the Docker Hub
FROM llama_cpp_python_base:v1

# Set the working directory in the container
WORKDIR /app

# Copy the requirements file into the container
COPY requirements.txt .

# Install the Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the project files into the container
COPY . .

# Command to run the application
CMD ["python", "main.py"]
```

---

**🎉 Success!** You now have a simple, working AIOS model integration ready for container deployment! 🚀🐳

For registering,allocating and inferencing this model in AIOS Ecosystem , refer this tutorial - [Model Onboarding](https://github.com/OpenCyberspace/AIOS_AI_Blueprints/blob/main/video_tutorial_series/02_Part1_onboard_gemma3_llama_cpp/Model-Onboarding.ipynb)