# Query a Deployed Model on Red Hat OpenShift AI

This notebook demonstrates how to connect to and query a large language model that has already been deployed using the Single-Model Serving feature in a Red Hat OpenShift AI data science project. 

We will use the `langchain-openai` library to create a client that can communicate with any OpenAI-compatible API endpoint, which is what the vLLM runtime provides.

## 1. Install Required Libraries

First, we need to install the necessary Python libraries. `langchain-openai` provides the tools to interact with the model endpoint, and `httpx` is the underlying HTTP client used to make the requests.

In [1]:
%pip install langchain-openai httpx

Collecting langchain-openai
  Downloading langchain_openai-1.0.2-py3-none-any.whl.metadata (1.8 kB)
Collecting langchain-core<2.0.0,>=1.0.2 (from langchain-openai)
  Downloading langchain_core-1.0.5-py3-none-any.whl.metadata (3.6 kB)
Collecting openai<3.0.0,>=1.109.1 (from langchain-openai)
  Downloading openai-2.8.0-py3-none-any.whl.metadata (29 kB)
Collecting tiktoken<1.0.0,>=0.7.0 (from langchain-openai)
  Downloading tiktoken-0.12.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (6.7 kB)
Collecting jsonpatch<2.0.0,>=1.33.0 (from langchain-core<2.0.0,>=1.0.2->langchain-openai)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<1.0.0,>=0.3.45 (from langchain-core<2.0.0,>=1.0.2->langchain-openai)
  Downloading langsmith-0.4.42-py3-none-any.whl.metadata (14 kB)
Collecting pydantic<3.0.0,>=2.7.4 (from langchain-core<2.0.0,>=1.0.2->langchain-openai)
  Downloading pydantic-2.12.4-py3-none-any.whl.metadata (89 kB)
Collecting tenacity!=8.4.0,<10.0.0,>=8

## 2. Configure Model Connection

Next, we need to configure the connection to our deployed Granite model. You must replace the placeholder values below with the specific details from your model's deployment page in OpenShift AI.

**Action Required:**
1.  **`BASE_URL`**: Replace the placeholder with the **Inference endpoint** URL from your model's details page.
2.  **`API_KEY`**: Replace the placeholder with the **Authentication Token** from the 'Authentication' section of the model's details page.
3.  **`MODEL_NAME`**: This should be the name you gave your deployment (e.g., `granite`).

In [None]:
import httpx
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

# --- ❗ACTION REQUIRED: REPLACE THESE VALUES ❗---
MODEL_NAME = "granite-40-h-1b" # Or the specific name of your model deployment
BASE_URL = "insert-here" # Replace with your model's Inference endpoint
API_KEY = "insert-here"# -----------------------------------------------------

# Optional: If your cluster uses self-signed certificates, you may need to disable SSL verification.
# Note: This is not recommended for production environments.
# http_client = httpx.Client(verify=False)

try:
    # Initialize the ChatOpenAI client
    llm = ChatOpenAI(
        model=MODEL_NAME,
        api_key=API_KEY,
        base_url=BASE_URL,
        # Uncomment the line below if you need to disable SSL verification
        # http_client=http_client,
    )

    print("Configuration successful. Client is ready.")

except Exception as e:
    print(f"An error occurred during client initialization: {e}")

Configuration successful. Client is ready.


## 3. Send a Request to the Model

Now that the client is configured, we can send a request to the model. We construct a list of messages, including a `SystemMessage` to set the model's behavior and a `HumanMessage` with our question. Then, we use the `llm.invoke()` method to get a response.

In [36]:
try:
    # Prepare the messages for the model
    messages = [
        SystemMessage(content="You are a helpful assistant who provides concise answers."),
        HumanMessage(content="What is OpenShift AI?"),
    ]

    # Invoke the model and get the response
    print("Sending request to the Granite model...")
    ai_msg = llm.invoke(messages)

    # Print the content of the response
    print("\nResponse from Granite Model:")
    print(ai_msg.content)

except Exception as e:
    print(f"An error occurred: {e}")

Sending request to the Granite model...

Response from Granite Model:
OpenShift AI (originally known as KubeAI) is a platform designed by Red Hat to simplify the deployment of AI workloads on Kubernetes. It allows developers and data scientists to quickly set up and manage a complete AI stack on a Kubernetes cluster, covering everything from model training to deployment.


## 4. Experiment!

Now it's your turn. Go back to the previous code cell, change the content of the `HumanMessage` to your own question, and run the cell again to see how the model responds. 

**Example questions:**
* `"Write a python function that calculates the factorial of a number."`
* `"What are the key benefits of using a GPU for deep learning?"`
* `"Explain the difference between object storage and file storage."`