# AIOS Streaming Inference

This notebook is used to send an inference request to a streaming-enabled AIOS block. It should be run after you have started the WebSocket listener in the `AIOS_Streaming_Tutorial.ipynb` notebook.

In [9]:
import requests
import json
import time

# Configuration - Ensure these match the session in the other notebook
INFERENCE_API = "http://CLUSTER1MASTER:31504/v1/infer"
BLOCK_ID = "magistral-small-2506-llama-cpp-block"  # Should match the block in the other notebook
BLOCK_ID = "llama4-scout-17b-block"
# IMPORTANT: You must get the SESSION_ID from the output of the other notebook
# and paste it here.
SESSION_ID = "session-730377a0-a94c-48f6-92a2-51b0f9418e51" 

print(f"Using Block ID: {BLOCK_ID}")
print(f"Using Session ID: {SESSION_ID}")

Using Block ID: llama4-scout-17b-block
Using Session ID: session-730377a0-a94c-48f6-92a2-51b0f9418e51


### **Action Required**

1.  Run the `AIOS_Streaming_Tutorial.ipynb` notebook up to and including **Step 3**.
2.  Copy the `Generated Session ID` from the output of the configuration cell in that notebook.
3.  Paste the copied Session ID into the `SESSION_ID` variable in the cell above, replacing `"paste_session_id_here"`.
4.  Run the cell above to set the configuration.

# Send Inference Request

This step sends the inference request to the model. The streaming response will appear in the output of the `AIOS_Streaming_Tutorial.ipynb` notebook.

## Inference Parameters:
- **model**: The block ID of the target model
- **session_id**: The session ID from the other notebook
- **seq_no**: Sequence number for request ordering
- **data**: Contains the actual request data including message and system prompt

In [11]:
def send_inference_request(session_id, message, system_message="You are a helpful assistant."):
    """
    Send an inference request to trigger streaming response.
    
    Args:
        session_id (str): Session identifier
        message (str): User message/prompt
        system_message (str): System prompt for the model
        
    Returns:
        dict: Response from inference API
    """
    if session_id == "paste_session_id_here":
        print("❌ ERROR: Please update the SESSION_ID in the cell above.")
        return None

    payload = {
        "model": BLOCK_ID,
        "session_id": session_id,
        "seq_no": int(time.time()),  # Use timestamp as sequence number
        "data": {
            "system_message": system_message,
            "mode": "chat",
            "message": message,
            "ts": time.time()
        },
        "graph": {},
        "selection_query": {}
            
    }
    
    try:
        print(f"🚀 Sending inference request...")
        print(f"📝 Message: {message}")
        
        response = requests.post(
            INFERENCE_API,
            headers={"Content-Type": "application/json"},
            json=payload,
            timeout=200
        )
        
        response.raise_for_status()
        result = response.json()
        
        print(f"✅ Inference request sent successfully!")
        print(f"📄 Response: {json.dumps(result, indent=2)}")
        return result
        
    except requests.exceptions.RequestException as e:
        print(f"❌ Error sending inference request: {e}")
        if hasattr(e.response, 'text'):
            print(f"📄 Error details: {e.response.text}")
        return None

# Example inference request
test_message = "Explain the concept of machine learning in simple terms."
inference_result = send_inference_request(SESSION_ID, test_message)

🚀 Sending inference request...
📝 Message: Explain the concept of machine learning in simple terms.
✅ Inference request sent successfully!
📄 Response: {
  "data": {
  },
  "model": "llama4-scout-17b-block",
  "seq_no": 1754477573,
  "session_id": "session-730377a0-a94c-48f6-92a2-51b0f9418e51",
  "ts": 1754477573.651296
}


After Running the above cell, check back the AIOS_Streaming_Tutorial.ipynb's Step 3: WebSocket Connection Test (Async) for streamer logs. Here you will get the final response only.