# Tutorial: Building a Workflow From Author's Policies

In this tutorial we will explore a powerful workflow based on a block's **pre_processing** and **post_processing** policies as provided by the authors of the models and a **generic Large Language Model (LLM) model** playing the role of the Author's Model. We will then go on to show how we can create our custom generic policies as tools for calling different tools decided by the LLM using the AIOSv1 policies ecosystem

## Tutorial Overview
1. **Task Explanation**: Automate the screening of candidate resumes against a job description to identify and process qualified applicants.
2. **The Architecture**: Block Policies: Utilize a single, generic LLM block customized at runtime with author's pre-processing and post-processing policies.
3. **Block Allocation Json**: Define the entire pipeline, including the model and its attached policies, within a single JSON configuration file.
4. **Policies and Inference Code Overview**: Review the Python code for the resume-parsing policy, the tools-orchestrating policy, and the model inference script.
5. **Running the Pipeline**: Making an Inference Request: Trigger the entire automated recruitment workflow with a single API request to the deployed block's endpoint.
6. **Conclusion**: Understand how policies enable the creation of modular, scalable, and complex real-world workflows on the AIOS platform.

## 1. Task

Our goal is to automate the initial screening of resumes, and send them a mail communication. The pipeline should:
1.  Accept a `.zip` file containing multiple resumes in PDF format.
2.  Extract the text from each resume.
3.  Use an LLM to analyze the resumes against a job description and decide on next steps (e.g., background check, send communication email).
4.  Execute these tool calling steps automatically.


## 2. The Architecture: Block Policies

### What are Policies in AIOSv1?
A policy is a dynamically loadable, executable Python code that is used in various places and use cases across the AIOS system. Since policies are dynamic, they allow developers to implement custom functionalities throughout the AIOS system. 

> 📖 **Further Reading**: [AIOSv1 Policies System Overview](https://github.com/OpenCyberspace/OpenOS.AI-Documentation/blob/main/policies-system/policies-system.md)

The pipeline is designed to be highly modular. We use a central, generic LLM block  with the author's runtime policies implemented as `pre_processing` and `post_processing` policies.

We also show,a key concept in this architecture, that tool calling is implemented as policies that are **stateless**, or "fire and die." As detailed in the `policies-system.md` documentation, policies of this type Type 3 are loaded for a single execution to perform a specific task and are then terminated. They do not maintain any memory or state between requests, which makes the system robust, scalable, and easy to debug.

**Blocks Flow Diagram:** 
```
[Input: zip file of resumes] -> [Block Level Preprocessing Policy] -> [Generic LLM Model] -> [Block Level Postprocessing Policy] -> [Automated Actions]
```

*   **`Preprocessing Policy`**: Its only job is to handle the input data. In this case, it unzips the file, finds all PDFs, and extracts their text content. It then passes this data as `supplemental_data` to the next block.
*   **`Generic LLM Model`**: This is a standard Llama_cpp_python model code. It's designed to be unaware of the specific task. It receives a prompt and, optionally, `supplemental_data`. Its role is to generate a response based on the inputs.
*   **`Postprocessing Policy`**: This policy takes the raw output from the LLM, parses it, and takes action. For our use case, it's responsible for parsing the JSON and executing the requested `tool_calls`.

If you are interested in custom policies for Author's models, refer below tutorial, 
- [Tutorial: Simple vDAG for workflow Automation with Swappable policies](http://CLUSTER2NODE1:9999/notebooks/recruitment_automation_vdag.ipynb)

## 3. Block Allocation Json `allocation-llama4scout_recruiter.json`

This JSON file is the master configuration for our `llama4-scout` block. It defines everything from the model to be used, to the initial system prompt, and the policies that govern its behavior. It's the central piece that orchestrates the entire pipeline. Let's look at the key parts.

In [None]:

{
    "body": {
        "spec": {
            "values": {
                "blockComponentURI": "model.llama4-scout-17b:1.0.0-stable",
                "blockInitData": {
                    "model_name": "Llama-4-Scout-17B-16E-Instruct-UD-Q8_K_XL/...",
                    "system_message": "You are an expert recruitment assistant. Your task is to analyze the provided resume context and create a JSON-based execution plan...Your output MUST be a valid JSON object with a single key, \\"tool_calls\\"..."
                },
                "policyRulesSpec": [
                    {
                        "values": {
                            "name": "pre_processing",
                            "policyRuleURI": "recruitment_preprocessing:1.0.0-stable"
                        }
                    },                    
                    {
                        "values": {
                            "name": "post_processing",
                            "policyRuleURI": "recruitment_postprocessing:1.0.0-stable"
                        }
                    }
                ]
            }
        }
    }
}


In [5]:
!cat allocation-llama4scout_recruiter_block.json

{
    "head": {
        "templateUri": "Parser/V1",
        "parameters": {}
    },
    "body": {
        "spec": {
            "values": {
                "mode": "allocate",
                "blockId": "llama4-scout-17b-block",
                "blockComponentURI": "model.llama4-scout-17b:1.0.0-stable",
                "minInstances": 1,
                "maxInstances": 3,
                "blockInitData": {
                    "model_name": "Llama-4-Scout-17B-16E-Instruct-UD-Q8_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q8_K_XL-00001-of-00003.gguf",
                    "system_message": "You are an expert recruitment assistant. Your task is to analyze the provided resume context and create a JSON-based execution plan for a multi-stage recruitment pipeline. Your output MUST be a valid JSON object with a single key, \"tool_calls\". This key must contain a list of jobs to be executed in sequence.\n\nIMPORTANT: Group ALL candidates into a SINGLE background check job. Do not create separate back

# 4. Policies and Inference Code Overview

### Preprocessing Policy: Getting the Data Ready

This policy handles the initial data extraction. It's a simple Python function that uses standard libraries to process the file.

**File:** `policies/recruitment_automation/preprocessing_policy/code/function.py`

In [None]:
import os
import zipfile
import fitz  # PyMuPDF
import io

def eval(data):
    # Assumes 'file' is a key in the input data, containing the zip file bytes
    zip_file_bytes = data.get('file')
    if not zip_file_bytes:
        return {"error": "No file provided"}

    extracted_texts = []
    with io.BytesIO(zip_file_bytes) as zip_stream:
        with zipfile.ZipFile(zip_stream, 'r') as zip_ref:
            for file_name in zip_ref.namelist():
                if file_name.lower().endswith('.pdf'):
                    with zip_ref.open(file_name) as pdf_file:
                        pdf_bytes = pdf_file.read()
                        with fitz.open(stream=pdf_bytes, filetype="pdf") as doc:
                            text = ""
                            for page in doc:
                                text += page.get_text()
                            extracted_texts.append({"file_name": file_name, "content": text})

    # The output of this policy becomes the 'supplemental_data' for the LLM block
    return {"candidates": extracted_texts}

### Postprocessing Policy: Taking Action

This policy is responsible for interpreting the LLM's response. A key lesson learned was that LLMs don't always produce perfect JSON. They might wrap it in markdown or add extra text. Therefore, we need a robust way to extract the JSON.

**File:** `policies/recruitment_automation/postprocessing_policy/code/function.py`

In [1]:
import json
import re

def extract_json_from_response(response_text):
    # Use regex to find JSON wrapped in markdown-style code blocks
    match = re.search(r"```json\n(.*?)\n```", response_text, re.DOTALL)
    if match:
        json_str = match.group(1)
    else:
        # Fallback for cases where there's no markdown wrapping
        json_str = response_text
    
    try:
        return json.loads(json_str)
    except json.JSONDecodeError:
        # Handle cases where the extracted string is still not valid JSON
        return {"error": "Failed to decode JSON from LLM response", "response": response_text}
        
def submit_and_monitor_job(self, job_name: str, policy_rule_uri: str, params: Dict[str, Any]) -> Dict[str, Any]:
    """
    Submits a job to an AIOS executor and monitors it until completion.
    """
    # 1. Submit the job
    
    submit_endpoint = f"{self.aios_url}/jobs/submit/executor-001"
    
    submit_payload = {
        "name": job_name,
        "policy_rule_uri": policy_rule_uri,
        "inputs": params,
        "policy_rule_parameters": {},
        "node_selector": {}
    }
    
    logger.info(f"Submitting job '{job_name}' for policy '{policy_rule_uri}' with params: {_snippet(params)}")
    
    try:
        headers = {'Content-Type': 'application/json'}
        response = requests.post(submit_endpoint, json=submit_payload, headers=headers, timeout=30)
        response.raise_for_status()
        job_info = response.json()
        
        job_id = job_info.get("job_id")
        if not job_id:
            raise ValueError(f"Job submission did not return a job_id. Response: {_snippet(job_info)}")
        logger.info(f"Job '{job_id}' submitted for policy '{policy_rule_uri}'.")
    except requests.exceptions.HTTPError as e:
        if e.response and 500 <= e.response.status_code < 600:
            logger.error(f"Server error when submitting job. Status: {e.response.status_code}. Response: {e.response.text}")
        raise
    except requests.exceptions.RequestException as e:
        logger.error(f"Failed to submit job for policy '{policy_rule_uri}': {e}")
        raise

    # 2. Poll for job completion
    status_endpoint = f"{self.aios_url}/jobs/{job_id}"
    start_time = time.monotonic()
    
    while True:
        if time.monotonic() - start_time > self.job_timeout:
            raise TimeoutError(f"Job '{job_id}' timed out after {self.job_timeout} seconds.")

        try:
            response = requests.get(status_endpoint, timeout=10)
            if response.status_code == 200:
                resp_json = response.json()
                
                # Check for success at the top level
                if resp_json.get("success"):
                    data = resp_json.get("data", {})
                    status = data.get("job_status")
                    
                    if status == "completed":
                        logger.info(f"Job '{job_id}' completed successfully. Retrieving result from final poll.")
                        result_data = data.get("job_output_data")
                        if result_data is None:
                            logger.warning(f"Job result for '{job_id}' did not contain 'job_output_data'. Full response: {_snippet(resp_json)}")
                            return resp_json # Fallback to the full body
                        
                        logger.info(f"Result for job '{job_id}': {_snippet(result_data)}")
                        return result_data
                    elif status == "failed":
                        # The detailed result might be in 'job_output_data'
                        details = data.get("job_output_data", "No details provided.")
                        raise RuntimeError(f"Job '{job_id}' failed. Details: {details}")
                    elif status in ["queued", "running"]:
                        logger.debug(f"Job '{job_id}' is '{status}'. Polling again.")
                        time.sleep(self.job_poll_interval)
                    else:
                        logger.warning(f"Job '{job_id}' has unknown status: '{status}'. Retrying...")
                        time.sleep(self.job_poll_interval)
                else:
                    # Handle cases where 'success' is false or missing
                    error_message = resp_json.get("message", "Unknown error during polling.")
                    logger.warning(f"Polling for job '{job_id}' was not successful: {error_message}. Retrying...")
                    time.sleep(self.job_poll_interval)
            else:
                logger.warning(f"Polling for job '{job_id}' returned status {response.status_code}. Retrying...")
                time.sleep(self.job_poll_interval)
                        
        except requests.exceptions.RequestException as e:
            logger.warning(f"Error polling job status for '{job_id}': {e}. Retrying...")
            time.sleep(self.job_poll_interval)
                
def eval(data):
    # 'data' is the raw output from the LLM block
    llm_response_text = data.get('response', '')
    
    # Attempt to parse the response directly
    try:
        parsed_json = json.loads(llm_response_text)
    except json.JSONDecodeError:
        # If direct parsing fails, use our robust extraction function
        parsed_json = extract_json_from_response(llm_response_text)

    if 'error' in parsed_json:
        return parsed_json

    # The core logic: check for the 'tool_calls' key
    if 'tool_calls' not in parsed_json:
        return {"error": "'tool_calls' key not found in the LLM response.", "response_data": parsed_json}

    # In a real system, you would execute the tool calls here caliing submit_and_monitor_job
    return {"executed_tool_calls": parsed_json['tool_calls']}

### Generic Tool Policies: The "Fire and Die" Policies

It's important to note that the `postprocessing_policy` itself doesn't contain the logic for every possible action. Instead, it acts as a dispatcher. When the LLM requests a `background-check` or a `send-email`, the postprocessing policy calls *other* generic, stateless policies to do the actual work.

These tool policies, such as `background_check_policy` and `send_email_policy`, are loaded, executed with the inputs provided by the `postprocessing_policy`, and then terminated. This keeps the entire system modular, as new tools can be added without changing the core pipeline, and each tool is a self-contained, stateless unit.

## Making the LLM Context-Aware

We modify the `on_data` method in the LLM block's code to explicitly format the supplemental data and prepend it to the user's prompt. This ensures the LLM has the necessary context to make an informed decision.

**File:** Refer `main.py` from [02_Part2_onboard_custom_llama_cpp
](https://github.com/OpenCyberspace/AIOS_AI_Blueprints/blob/main/video_tutorial_series/02_Part2_onboard_custom_llama_cpp/02-Model-Integration-Setup.ipynb)

In [None]:
# This is a simplified representation of the key logic in main_more_context.py

class LlamaCppBlock:
    # ... (other class methods)

    def on_data(self, input_data):
        message = input_data.get("message", "")
        supplemental_context = ""

        # Check for our specific supplemental data from the preprocessing policy
        if "supplemental_data" in input_data and "candidates" in input_data["supplemental_data"]:
            candidates = input_data["supplemental_data"]["candidates"]
            
            # **THE CRITICAL FIX**: Format the resume data into a clear context string
            context_parts = ["\n\n--- START OF SUPPLEMENTAL DATA ---"]
            for i, candidate in enumerate(candidates):
                context_parts.append(f"\n--- Resume {i+1}: {candidate.get('file_name', 'Unknown')} ---")
                context_parts.append(candidate.get('content', 'No content'))
            context_parts.append("\n--- END OF SUPPLEMENTAL DATA ---")
            supplemental_context = "\n".join(context_parts)
            
            # Clean up the input data so it's not processed further
            del input_data["supplemental_data"]

        # Prepend the context to the user's original message
        final_message = f"{supplemental_context}\n\nUSER REQUEST:\n{message}"

        # ... (rest of the logic to send 'final_message' to the LLM)
        # self.llm.create_chat_completion(...)

## 5. Running the Pipeline: Making an Inference Request

Now that all the components are in place, we can run the entire pipeline by sending an inference request to the `llama4-scout` block. This request includes the job description in the `message` and the zipped resumes as a base64-encoded file.

The block will automatically trigger the `pre_processing policy` to handle the file, pass the extracted text to the LLM, and then use the `post_processing policy` to execute the generated tool calls.

Instead of manually creating the `curl` command, we can use Python to build it for us. The following cell will:
1.  Locate the `resume_compressed.zip` file.
2.  Read its content in binary format.
3.  Encode the content into a Base64 string.
4.  Construct the complete `request payload` for inference
5.  Perform the request

In [6]:
!pip install grpcio grpcio-tools protobuf

import sys
sys.path.append('../utils/inference_client')

import grpc
import json
import time

import service_pb2
import service_pb2_grpc



In [7]:
def run_inference(block_id, session_id, seq_no, message, generation_config):
    SERVER_ADDRESS = "CLUSTER1MASTER:31500"
    
    # Connect to the gRPC server
    channel = grpc.insecure_channel(SERVER_ADDRESS)
    stub = service_pb2_grpc.BlockInferenceServiceStub(channel)


    zip_file_path = "resume_compressed.zip"
    with open(zip_file_path, "rb") as f:
        zip_data = f.read()
    
    file_info = service_pb2.FileInfo(
        metadata=json.dumps({"filename": zip_file_path, "size": len(zip_data)}),
        file_data=zip_data
    )

    # Create the BlockInferencePacket request
    request = service_pb2.BlockInferencePacket(
        block_id= block_id,
        session_id=session_id,
        seq_no=seq_no,
        frame_ptr=b"",  # Emptbytes for now
        data=json.dumps({
         "mode": "chat",
         "message": message,
         "gen_params":generation_config
       }),
        query_parameters="",
        ts=1234567890.0,
        files=[file_info],  # Attach the file
        output_ptr=b''
    )

    try:
        st = time.time()
        # Make the gRPC call
        response = stub.infer(request)
        et = time.time()

        print("\n=== Response Received ===")
        print(f"Latency: {et - st}s")
        print(f"Session ID: {response.session_id}")
        print(f"Sequence No: {response.seq_no}")
        print(f"Data: {response.data}")
        print(f"Timestamp: {response.ts}")
        print(f"Output Ptr: {response.output_ptr}")
        print(f"Files Received: {len(response.files)}")

        # Parse JSON response data
        try:
            response_data = json.loads(response.data)
            print(f"Parsed Response: {response_data}")
        except json.JSONDecodeError:
            print("Response data is not a valid JSON string.")

    except grpc.RpcError as e:
        print(f"gRPC Error: {e.code()} - {e.details()}")

In [8]:
generation_config = {
    "temperature": 0.1,
    # "min_p": 0.01,
    # "top_k": 64,
    "top_p": 0.95,
    "max_tokens": 4096 # Set a limit for the response length
}

# Create the BlockInferencePacket request

recruitment_prompt = """We're hiring for a Senior Computer Vision Engineer. Requirements include:
- 6+ years of hands-on experience in computer vision and deep learning.
- Production-level experience with frameworks like PyTorch or TensorFlow, and libraries like OpenCV.
- Master's degree or Ph.D. in Computer Science or a related field.

Please analyze the provided resumes and create a recruitment plan that:
1. Initiates background checks for qualified candidates
2. Sends appropriate follow-up communications"""

In [10]:
run_inference(
    block_id="llama4-scout-17b-block",
    session_id="session_notebook_7",
    seq_no=1,
    message=recruitment_prompt,
    generation_config=generation_config
)


=== Response Received ===
Latency: 56.35062766075134s
Session ID: session_notebook_7
Sequence No: 1
Data: {
  "message": "Recruitment pipeline processing finished.",
  "background_check_processed": true,
  "processed_tool_calls": 2
}
Timestamp: 1234567890.0
Output Ptr: {"outputs": [{"host": "inference-server.inference-server.svc.cluster.local", "port": 6379, "queue_name": "instance-2__session_notebook_7__1"}]}
Files Received: 1
Parsed Response: {'message': 'Recruitment pipeline processing finished.', 'background_check_processed': True, 'processed_tool_calls': 2}


## 6. Conclusion

By implementing this fix, the pipeline now works as expected:
1.  The `pre_processing policy` correctly extracts text from all 5 resumes.
2.  The `generic_llama_cpp` block now correctly formats this text and includes it in the prompt sent to the LLM.
3.  The LLM, now fully aware of the resume contents, generates a valid JSON response containing the required `tool_calls`.
4.  The `post_processing policy` successfully parses this JSON and can proceed with the automated actions.

This modular, policy-based architecture proves to be both powerful and flexible. The generic core remains unchanged, while specific behaviors can be easily defined and swapped out, making the system adaptable to new and varied tasks.

Checkout this Next Tutorial On runnning these policies as custom Swapable policies as part of vDAG
- [Tutorial: Simple vDAG for workflow Automation with Swappable Policies](http://CLUSTER2NODE1:9999/notebooks/07_pre_and_post_processing_metrics_streaming_health/recruitment_automation_vdag.ipynb)