# A Developer's Guide to the AIOS Router

Welcome to this guide on the **AIOS Router**, a powerful service that dynamically selects the best block to handle a request based on a set of filters and policies. The Router is essential for building intelligent, resource-aware applications that can adapt to changing conditions and requirements.

Below is a visual overview of the routing process, where an incoming request is matched against a set of available blocks based on defined criteria.

## High Level Block Diagram
<img src="adhoc_inference_serving.png" alt="Powerful Routers" width="800" height="800"> 

## High Level Sequence Diagram
<img src="router_2.png" alt="Powerful Routers" width="800" height="800"> 

For more details, refer to the official documentation:
- [Router/Parser Concepts](https://docs.aigr.id/parser/search-server)

## The Router in the AIOS Ecosystem

The Router is a core component of the AIOS ecosystem, acting as an intelligent gateway for inference requests. Instead of sending a request to a specific block, you can send it to the Router with a set of requirements. The Router will then:

1.  **Discover** all available blocks that could potentially handle the request.
2.  **Filter** this list based on both dynamic, real-time metrics (e.g., current load, token speed) and static metadata (e.g., model capabilities, benchmark scores).
3.  **Candidate Pool** Creation of candidate pool based on the filters
4.  **Rank candidate pool** the filtered candidates are ranked using a sophisticated, policy-driven scoring mechanism.
5.  **Final Decision** the best block based on the ranking and route the request to it.

This allows you to build applications that are not tied to a specific model instance but can instead leverage the entire network of available resources, always choosing the best option for the job.

## 1. Deploying the Policy
With the policy tested, we can now deploy it to AIOS. This involves three steps: Assembling the policy,uploading the policy package (tokenautoscaler.zip) and then registering it with the AIOS Policy System.

### a. Assembling the Policy for Deployment

Now that we have defined all the methods of our `AIOSv1PolicyRule` class, we need to assemble them into a single script and package it correctly for deployment. The policy must be contained within a `code` directory, which includes the `function.py` file and a `requirements.txt` file.

In [None]:
# Create a requirements.txt file
with open("code/requirements.txt", "w") as f:
    f.write("requests") # No external dependencies for this policy

print("code/function.py and code/requirements.txt created successfully.")
!cat code/function.py

Now, let's package our policy into a `attributebasedrouter.zip` file for deployment. The zip file must contain the `code` directory.

In [None]:
!zip -r attributebasedrouter.zip code

### b. Upload the Policy Package

First, we upload the `.zip` file containing our `function.py` to a location accessible by AIOS. The following `curl` command sends the file to an upload server, which makes it available via a URL.

In [None]:
!curl -X POST http://POLICYSTORESERVER:30186/upload -F "file=@./attributebasedrouter.zip" -F "path=."

### c. Register the Policy

Next, we register the policy with AIOS. This `curl` command sends a JSON payload to the policy registry endpoint. The payload contains metadata about our policy, including its name, version, and the URL where the code can be found (`"code": "http://MANAGEMENTMASTER:32555/attributebasedrouter.zip"`). This tells AIOS how to find and execute our policy.

In [None]:
import requests

policy_registration_payload = {
    "name": "attributebasedrouter",
    "version": "2.0",
    "release_tag": "stable",
    "metadata": {"author": "admin", "category": "analytics"},
    "tags": "analytics,ai",
    "code": "http://MANAGEMENTMASTER:32555/attributebasedrouter.zip",
    "code_type": "tar.xz",
    "type": "policy",
    "policy_input_schema": {
        "type": "object",
        "properties": {
            "input": {"type": "string"}
        }
    },
    "policy_output_schema": {
        "type": "object",
        "properties": {
            "output": {"type": "string"}
        }
    },
    "policy_settings_schema": {
        "METRICS_BASE_URL": {
            "type": "string",
            "description": "Base URL for fetching block metrics",
            "pattern": "^https?://.*"
        },
        "APP_MAP": {
            "type": "object",
            "description": "Mapping of application types to use cases",
            "additionalProperties": {"type": "string"}
        }
    },
    "policy_parameters_schema": {
        "METRICS_BASE_URL": {
            "type": "string",
            "description": "Override for metrics base URL",
            "pattern": "^https?://.*"
        },
        "selection_query": {
            "type": "object",
            "description": "Query for selecting and optimizing block candidates",
            "properties": {
                "application_type": {"type": "string"},
                "optimization_goals": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "weight": {"type": "number"}
                        },
                        "required": ["name"]
                    }
                }
            }
        }
    },
    "policy_settings": {
        "METRICS_BASE_URL": "http://metrics-service:30201",
        "APP_MAP": {
            "RAG": "chat-completion",
            "Code": "code-generation",
            "Vision": "multi-modal",
            "Chat": "chat-completion"
        }
    },
    "policy_parameters": {
        "METRICS_BASE_URL": "http://metrics-service:30201"
    },
    "description": "A policy for dynamic block selection based on metrics and application type.",
    "functionality_data": {"strategy": "ML-based"},
    "resource_estimates": {}
}



In [None]:
response = requests.post(
    "http://MANAGEMENTMASTER:30102/policy",
    json=policy_registration_payload,
    headers={"Content-Type": "application/json"}
)

print("Status Code:", response.status_code)
print("Response:", response.text)

## 2. The Router Request Structure

The `curl` command below sends a search request to the Router. The request is a JSON payload that specifies the criteria for selecting a block. The key parts are the `blockQuery`, `blockMetricsQuery`, and `selection_query`, which we will explore in detail.

In [None]:
!cat payload.json

## 3. Understanding the Filters and Selection

The power of the Router lies in its filtering and selection capabilities. The request uses two types of filters to narrow down the candidates, and then an optimization query to select the best one.

### a. `blockQuery` (Static Metadata Filter)
This filter looks at the static metadata of the registered components, as defined in their `component.json` files. It's used to find blocks with specific, unchanging characteristics.

In this example, it's looking for a block where:
- The `usecase` is like `*chat-completion*`.
- `supportsStreaming` is `true`.
- The MMLU benchmark score is greater than or equal to `80`.

### b. `blockMetricsQuery` (Dynamic Metrics Filter)
This filter queries the real-time metrics of the block instances. This is how the Router makes decisions based on the current performance and state of the system.

In this example, it's looking for a block where the average:
- `uptime_hours` is greater than `10`.
- `llm_tokens_per_second` is greater than `10`.

### c. `selection_query` (Optimization Goals)
After the initial filtering, the `selection_query` is used to rank the remaining candidates. This is where the `attributebasedrouter` policy applies a more sophisticated scoring logic based on the defined `optimization_goals`. In the example, it prioritizes speed (`Fast_Generator`) over cost (`Free_Block`) with a 70/30 weighting.

## 4. The `attributebasedrouter` Policy Explained
## Sophistciated Name with naivity,connectable, simple explanntion
The logic for processing these filters and goals is contained within the `attributebasedrouter` policy. The flow is as follows:

1.  **Filtering**: The Router first applies the `blockQuery` and `blockMetricsQuery` to get a list of candidate blocks that meet the basic requirements.
2.  **Scoring**: This list of candidates is then passed to the `attributebasedrouter` policy. The policy fetches detailed real-time and static metrics for each candidate.
3.  **Ranking**: It then scores each candidate based on the `optimization_goals` provided in the `selection_query`. Each goal is a combination of one or more metrics. The policy normalizes these metric values and calculates a weighted score.
4.  **Selection**: The block with the highest final score is selected and returned.

#### Available Optimization Goals

The `attributebasedrouter` policy supports several pre-defined optimization goals, each focusing on different aspects of performance and cost:

-   **`Fast_Generator`**: Prioritizes blocks with the highest token generation speed (`llm_tokens_per_second`).
-   **`Free_Block`**: Looks for blocks that are the least busy, preferring those with fewer active sessions (`llm_active_sessions`) and shorter processing queues (`queue_length`).
-   **`Cost_Saver`**: Aims for efficiency by selecting blocks with a lower parameter count (`parameterCountB`) and less total GPU memory (`gpu_totalMem`).
-   **`High_Quality_Lane`**: Focuses on overall performance, balancing low latency (`end_to_end_latency`), high throughput (`end_to_end_fps`), and strong benchmark scores (`MMLU`).
-   **`Active_Block`**: The opposite of `Free_Block`. It selects the busiest blocks, useful for scenarios where you want to consolidate workloads on already active instances.
-   **`Balanced_Performer`**: Provides a middle ground, balancing low latency and a smaller model size.

#### You can combine these goals with different weights to create a custom ranking strategy tailored to your specific needs.

## 5. Running the Inference Request

Executing the `curl` command below will send the request to the Router, which will then perform the filtering and selection process based on our rules and return the most suitable block.

In [None]:
!curl -X POST http://MANAGEMENTMASTER:30501/api/search -H "Content-Type: application/json" -d @payload.json

## 6. Observing in Logs

After running the request, you can check the logs of the **Parser** (Router) service. You will see:
- The final ranked list of blocks and the one that was selected - can be see here in `Management Server` http://MANAGEMENTMASTER:32199/explore , under label `container` with value `executor-executor-001`

## 7. LLM Based Matchers

In [None]:
from typing import Dict, List, Any, Optional
def eval(self, parameters: Dict, input_data: Dict, context: Dict) -> Dict:
        """
        Main evaluation function that processes input data and returns the selected block.
        """
        try:
            merged_params = {**self.parameters, **parameters}
            
            candidates = input_data.get('candidates', [])
            if not candidates:
                return self._create_error_response("No candidates provided")
            
            query_string = merged_params.get('query_string', '').strip()
            if not query_string:
                return self._create_error_response("Query string is required")

            llm_block_id = merged_params.get('llm_block_id')
            llm_server_address = merged_params.get('llm_server_address')

            if not llm_block_id or not llm_server_address:
                return self._create_error_response("LLM block ID and server address are required")

            prompt = f"""
            Based on the following user query, select the best block from the candidates provided.
            User Query: "{query_string}"
            
            Candidates:
            {json.dumps(candidates, indent=2)}
            
            Respond with only the blockId of the best candidate.
            """
            
            # Extract relevant fields for the prompt
            summarized_candidates = [self._summarize_candidate(c) for c in candidates]

            prompt = f"""
            Based on the following user query, select the best block from the candidates provided.
            User Query: "{query_string}"
            
            Candidates:
            {json.dumps(summarized_candidates, indent=2)}
            
            Respond with only the containerImage of the best candidate in a JSON object with the key "containerImage".
            """
        
            llm_response = self._call_llm_inference(prompt, llm_block_id, llm_server_address)
            
            if not llm_response or 'reply' not in llm_response:
                return self._create_error_response("Invalid or empty response from LLM")
            
            reply_text = llm_response['reply'].strip()
            
            selected_container_image = None
            
            # 1. Look for a JSON code block
            json_match = re.search(r"```json\n(.*?)\n```", reply_text, re.DOTALL)
            if json_match:
                try:
                    json_data = json.loads(json_match.group(1))
                    selected_container_image = json_data.get("containerImage")
                except json.JSONDecodeError:
                    logger.warning(f"Found JSON block but failed to decode: {json_match.group(1)}")

            # 2. If no JSON block found, try to parse the whole reply
            if not selected_container_image:
                try:
                    json_reply = json.loads(reply_text)
                    selected_container_image = json_reply.get("containerImage")
                except json.JSONDecodeError:
                    pass # Not a JSON object

            # 3. If still nothing, assume the last line is the container image
            if not selected_container_image:
                selected_container_image = reply_text.splitlines()[-1].strip()


            if not selected_container_image:
                 return self._create_error_response(f"Could not extract containerImage from LLM response: {reply_text}")

            # Find the selected block in the candidates
            selected_block = next((c for c in candidates if c.get("containerRegistryInfo", {}).get("containerImage") == selected_container_image), None)

            if not selected_block:
                return self._create_error_response(f"LLM selected a containerImage not in the candidate list: {selected_container_image}")

            return {
                "allowed": True,
                "input_data": {
                    "selected_block": selected_block,
                    "timestamp": self.current_time
                }
            }
            
        except Exception as e:
            logger.error(f"Error during evaluation: {e}", exc_info=True)
            return self._create_error_response(f"Error during evaluation: {str(e)}")
