# 08 - Query Deployed Models via API

This notebook demonstrates how to query the deployed models through their REST API endpoints. It shows how to make inference requests to both the stress detection and time-to-failure models running on OpenVINO Model Server.


## Import Required Libraries and Get Model Metadata

Import all necessary libraries for making API requests and data processing. Then query the stress detection model endpoint to retrieve metadata information about the deployed model.


In [1]:
import requests
import json
import subprocess
import base64
import numpy as np
from sklearn.preprocessing import StandardScaler
import joblib
import os

## Get Model Metadata

Query the stress detection model endpoint to retrieve metadata information.


In [2]:
MODEL_NAME = "stress-detection"
BASE_URL = "https://stress-detection-ai-edge-project.apps.cluster-xqdb9.dynamic.redhatworkshops.io"
MODEL_URL = f"{BASE_URL}/v2/models/{MODEL_NAME}"

headers = {
    "Content-Type": "application/json"
}

response = requests.get(MODEL_URL, headers=headers, verify=True)
metadata = response.json()


## Load Scaler for Data Normalization

Load the saved scaler to ensure test data is normalized the same way as training data.


In [3]:
scaler_path = "models/stress_scaler.pkl"
if os.path.exists(scaler_path):
    scaler_real = joblib.load(scaler_path)

## Query Stress Detection Model

Prepare test data with stressed battery conditions, normalize it using a StandardScaler, and send an inference request to the deployed stress detection model. The model returns a binary prediction (0=NORMAL, 1=STRESSED).


In [4]:
url = "https://stress-detection-ai-edge-project.apps.cluster-xqdb9.dynamic.redhatworkshops.io/v2/models/stress-detection/infer"
headers = {
    "Content-Type": "application/json"
}
raw_data = np.array([[0.15, 97.00, 480.00, 355.00, 150.00, 50.00, 55.00, 22.00, 100.00]])
scaler = StandardScaler()
scaler.mean_ = np.array([0.8, 99.8, 20.0, 46.0, 15.0, 10.0, 25.0, 10.0, 100.0])
scaler.scale_ = np.array([0.1, 0.1, 5.0, 2.0, 5.0, 5.0, 0.5, 10.0, 0.1])
normalized_data = scaler.transform(raw_data)
payload = {
    "inputs": [
        {
            "name": "keras_tensor",
            "shape": list(normalized_data.shape),
            "datatype": "FP32",
            "data": normalized_data.flatten().tolist()
        }
    ]
}
response = requests.post(url, headers=headers, json=payload, verify=True)
result = response.json()
prediction_value = result["outputs"][0]["data"][0]
print("Stress Prediction:", "STRESSED" if prediction_value == 1 else "NORMAL")

Stress Prediction: STRESSED


## Query Time-to-Failure Model

Prepare test data for TTF prediction, normalize it, and send an inference request to the deployed TTF model. The model returns the predicted number of hours before battery failure.


In [5]:
MODEL_NAME = "time-to-failure"
BASE_URL = "https://time-to-failure-ai-edge-project.apps.cluster-xqdb9.dynamic.redhatworkshops.io"
url = f"{BASE_URL}/v2/models/{MODEL_NAME}/infer"
headers = {
    "Content-Type": "application/json"
}
raw_data = np.array([[45.0, 500.0, 370.0, 0.4, 98.0]])
scaler_ttf = StandardScaler()
scaler_ttf.mean_ = np.array([25.0, 20.0, 45.0, 0.8, 99.8])
scaler_ttf.scale_ = np.array([0.5, 10.0, 2.0, 0.2, 0.1])
normalized_data = scaler_ttf.transform(raw_data)
payload = {
    "inputs": [
        {
            "name": "keras_tensor",
            "shape": list(normalized_data.shape),
            "datatype": "FP32",
            "data": normalized_data.flatten().tolist()
        }
    ]
}
response = requests.post(url, headers=headers, json=payload, verify=True)
result = response.json()
output_value = result["outputs"][0]["data"][0]
print(f"Predicted Time Before Failure: {output_value:.2f} hours")

Predicted Time Before Failure: 19.97 hours
