# Lab 1: Production-Ready LLMOps Endpoint
## Build a Secure, Observable, Versioned, Cached LLM API

**Goal**: Create a real `/v1/chat/completions` endpoint with enterprise-grade LLMOps features

Features you'll implement:
- Jinja2 Prompt Templates
- MLflow Prompt Registry (versioned prompts)
- Redis Semantic + Exact Caching
- OpenTelemetry + Arize Phoenix Tracing
- OpenAI-compatible FastAPI endpoint
- **Supports both OpenAI and Azure OpenAI**

Setup Requirements:
```bash
# Set up your LLM provider in the secrets folder
# 1. Copy the example file:
cp secrets/.env.example secrets/.env

# 2. Edit secrets/.env and choose your provider:
#    - For OpenAI: Set LLM_PROVIDER=openai and add OPENAI_API_KEY
#    - For Azure: Set LLM_PROVIDER=azure and add Azure credentials

# 3. See secrets/README.md for detailed configuration instructions
```

In [None]:
!python -m venv .venv
!source .venv/Scripts/activate

In [7]:
# Install dependencies (run once)
%pip install fastapi 
%pip install python-dotenv
%pip install uvicorn 
%pip install openai 
%pip install langchain-core>=1.0.0 
%pip install langchain-community 
%pip install mlflow 
%pip install redis 
%pip install openinference-instrumentation-langchain 
%pip install opentelemetry-sdk 
%pip install opentelemetry-exporter-otlp 
%pip install jinja2 
%pip install arize-phoenix 
%pip install python-multipart 
%pip install requests==2.32.4 
%pip install grpcio>=1.71.2 
%pip install openinference-instrumentation-openai 

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting requests<3.0.0,>=2.32.5 (from langchain-community)
  Obtaining dependency information for requests<3.0.0,>=2.32.5 from https://files.pythonhosted.org/packages/1e/db/4254e3eabe8020b458f1a747140d32277ec7a271daf1d235b70dc0b4e6e3/requests-2.32.5-py3-none-any.whl.metadata
  Using cached requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Using cached requests-2.32.5-py3-none-any.whl (64 kB)
Installing collected packages: requests
  Attempting uninstall: requests
    Found existing installation: requests 2.32.4
    Uninstalling requests-2.32.4:
      Successfully uninstalled requests-2.32.4
Successfully installed requests-2.32.5
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting requests==2.32.4
  Obtaining dependency information for requests==2.32.4 from https://files.pythonhosted.org/packages/7c/e4/56027c4a6b4ae70ca9de302488c5ca95ad4a39e190093d6c1a8ace08341b/requests-2.32.4-py3-none-any.whl.metadata
  Using cached requests-2.32.4-py3-none-any.whl.metadata (4.9 kB)
Using cached requests-2.32.4-py3-none-any.whl (64 kB)
Installing collected packages: requests
  Attempting uninstall: requests
    Found existing installation: requests 2.32.5
    Uninstalling requests-2.32.5:
      Successfully uninstalled requests-2.32.5
Successfully installed requests-2.32.4
Note: you may need to restart the kernel to use updated packages.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-community 0.4.1 requires requests<3.0.0,>=2.32.5, but you have requests 2.32.4 which is incompatible.

[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
import subprocess
import time
import redis
# Start Redis server in the background
redis_process = subprocess.Popen(['redis-server', '--port', '6379'], 
                                  stdout=subprocess.PIPE, 
                                  stderr=subprocess.PIPE)
# Wait a moment for Redis to start
time.sleep(2)
# Connect to Redis
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
# Test the connection
r.set('test_key', 'Hello from notebook!')
print(r.get('test_key'))


Hello from notebook!


In [11]:
import os
import mlflow
import redis
import hashlib
import json
from datetime import datetime
from jinja2 import Template
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
from pydantic import BaseModel
from typing import List, Dict, Any
from openai import OpenAI
from langchain_core.prompts import ChatPromptTemplate
import phoenix as px

  from .autonotebook import tqdm as notebook_tqdm


If you are running in google colab load the env from local to colab server

In [None]:
#Copy the secrets.

from google.colab import files
import os

# Create secrets directory if it doesn't exist
os.makedirs('secrets', exist_ok=True)

Upload the .env file
print("Please select your .env file to upload:")
uploaded = files.upload()

Move the uploaded file to secrets folder
for filename in uploaded.keys():
    if filename == '.env':
        with open('secrets/.env', 'wb') as f:
            f.write(uploaded[filename])
        print(f"✓ {filename} uploaded to secrets/.env")
        break
else:
    print("⚠️ Warning: No .env file was uploaded")

with open('secrets/.env', 'wb') as f:
     f.write(uploaded[filename])
print(f"✓ {filename} uploaded to secrets/.env")



# Verify the file exists
if os.path.exists('secrets/.env'):
    print("✓ secrets/.env is ready!")
else:
    print("❌ secrets/.env not found")


ModuleNotFoundError: No module named 'google.colab'

## 1. Configure LLM Provider (OpenAI or Azure OpenAI)

We'll load credentials from `secrets/.env` which supports both OpenAI and Azure OpenAI.

In [13]:
from dotenv import load_dotenv
from openai import OpenAI, AzureOpenAI

# Load environment variables from secrets/.env
dotenv_path = os.path.join(os.getcwd(), 'secrets', '.env')

if not os.path.exists(dotenv_path):
    raise FileNotFoundError(
        f"Environment file not found: {dotenv_path}\n"
        f"Please create it by copying secrets/.env.example to secrets/.env\n"
        f"and adding your API keys."
    )

# Load the .env file
load_dotenv(dotenv_path)

# Get the provider
provider = os.getenv('LLM_PROVIDER', 'openai').lower()

if provider not in ['openai', 'azure']:
    raise ValueError(f"Invalid LLM_PROVIDER: {provider}. Must be 'openai' or 'azure'")

print(f"✓ Environment variables loaded from secrets/.env")
print(f"✓ LLM Provider: {provider.upper()}")

# Initialize the appropriate client based on provider
if provider == 'openai':
    api_key = os.getenv('OPENAI_API_KEY')
    if not api_key:
        raise ValueError(
            "OPENAI_API_KEY not found in secrets/.env\n"
            "Please add it to your secrets/.env file"
        )
    
    org_id = os.getenv('OPENAI_ORG_ID')
    client = OpenAI(api_key=api_key, organization=org_id)
    print(f"✓ OpenAI client initialized (key: ...{api_key[-4:]})")

elif provider == 'azure':
    # Get Azure configuration
    azure_api_key = os.getenv('AZURE_OPENAI_API_KEY')
    azure_endpoint = os.getenv('AZURE_OPENAI_ENDPOINT')
    azure_api_version = os.getenv('AZURE_OPENAI_API_VERSION')
    DEPLOYMENT_NAME = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')
    
    # Validate all required Azure variables
    missing_vars = []
    if not azure_api_key:
        missing_vars.append('AZURE_OPENAI_API_KEY')
    if not azure_endpoint:
        missing_vars.append('AZURE_OPENAI_ENDPOINT')
    if not azure_api_version:
        missing_vars.append('AZURE_OPENAI_API_VERSION')
    if not DEPLOYMENT_NAME:
        missing_vars.append('AZURE_OPENAI_DEPLOYMENT_NAME')
    
    if missing_vars:
        raise ValueError(
            f"Missing required Azure OpenAI variables: {', '.join(missing_vars)}\n"
            f"Please add them to secrets/.env"
        )
    
    client = AzureOpenAI(
        api_key=azure_api_key,
        api_version=azure_api_version,
        azure_endpoint=azure_endpoint
    )
    print(f"✓ Azure OpenAI client initialized")
    print(f"  Endpoint: {azure_endpoint}")
    print(f"  Deployment: {DEPLOYMENT_NAME}")

print("✓ LLM client initialized successfully!")

✓ Environment variables loaded from secrets/.env
✓ LLM Provider: AZURE
✓ Azure OpenAI client initialized
  Endpoint: https://sivas-m76dvpr6-eastus2.openai.azure.com/
  Deployment: gpt-4.1-mini
✓ LLM client initialized successfully!


In [14]:
# Load environment variables from secrets folder
from load_env import load_env_from_secrets, get_llm_provider, get_openai_client

try:
    # Load all environment variables from secrets/.env
    env_vars = load_env_from_secrets()
    provider = get_llm_provider()
    
    print("✓ Environment variables loaded from secrets/.env")
    print(f"✓ Found {len(env_vars)} variables")
    print(f"✓ LLM Provider: {provider.upper()}")
    
    # Initialize the appropriate client based on provider
    client = get_openai_client()
    
    # Get deployment name for Azure (needed for API calls)
    if provider == 'azure':
        DEPLOYMENT_NAME = os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')
        print(f"✓ Using deployment: {DEPLOYMENT_NAME}")
    
except FileNotFoundError as e:
    print("❌ Error: secrets/.env file not found!")
    print("\nPlease follow these steps:")
    print("1. Copy secrets/.env.example to secrets/.env")
    print("2. Edit secrets/.env and configure your LLM provider")
    print("   - For OpenAI: Set LLM_PROVIDER=openai and add OPENAI_API_KEY")
    print("   - For Azure: Set LLM_PROVIDER=azure and add Azure credentials")
    print("3. See secrets/README.md for detailed instructions")
    raise

except ValueError as e:
    print(f"❌ Error: {e}")
    print("\nPlease check your secrets/.env configuration")
    raise

print("✓ LLM client initialized successfully!")

✓ Environment variables loaded from secrets/.env
✓ Found 6 variables
✓ LLM Provider: AZURE
✓ Azure OpenAI client initialized
  Endpoint: https://sivas-m76dvpr6-eastus2.openai.azure.com/
  Deployment: gpt-4.1-mini
✓ Using deployment: gpt-4.1-mini
✓ LLM client initialized successfully!


## 2. Start Phoenix Observability (OpenTelemetry UI)

In [15]:
# Launch Phoenix (runs on http://localhost:6006)
px.launch_app()

2025/12/10 21:14:44 INFO alembic.runtime.migration: Context impl SQLiteImpl.
2025/12/10 21:14:44 INFO alembic.runtime.migration: Will assume transactional DDL.
2025/12/10 21:14:44 INFO alembic.runtime.migration: Running upgrade  -> cf03bd6bae1d, init
2025/12/10 21:14:44 INFO alembic.runtime.migration: Running upgrade cf03bd6bae1d -> 10460e46d750, datasets
2025/12/10 21:14:44 INFO alembic.runtime.migration: Running upgrade 10460e46d750 -> 3be8647b87d8, add token columns to spans table
2025/12/10 21:14:45 INFO alembic.runtime.migration: Running upgrade 3be8647b87d8 -> cd164e83824f, users and tokens
2025/12/10 21:14:45 INFO alembic.runtime.migration: Running upgrade cd164e83824f -> 4ded9e43755f, create project_session table
2025/12/10 21:14:45 INFO alembic.runtime.migration: Running upgrade 4ded9e43755f -> bc8fea3c2bc8, Add prompt tables
2025/12/10 21:14:45 INFO alembic.runtime.migration: Running upgrade bc8fea3c2bc8 -> 2f9d1a65945f, Annotation config migrations
  next(self.gen)
  next(se

🌍 To view the Phoenix app in your browser, visit http://localhost:6006/
📖 For more information on how to use Phoenix, check out https://arize.com/docs/phoenix


<phoenix.session.session.ThreadSession at 0x1e1ad221610>

## 3. Connect to Redis (Caching Layer)

In [None]:
# Start Redis in background (or use Docker: redis:7)
r = redis.Redis(host='localhost', port=6379, db=0, decode_responses=True)
r.ping()
print("✓ Redis connection successful!")

## 4. Prompt Templating with Jinja2 + MLflow Registry

In [16]:
# Create prompts directory
os.makedirs("prompts", exist_ok=True)

# Save prompt template v1
prompt_v1 = """
You are a helpful enterprise assistant. Be professional and concise.

Current date: {{ current_date }}
User department: {{ department }}

Question: {{ user_question }}

Answer in under 150 words.
"""

with open("prompts/assistant_v1.jinja2", "w") as f:
    f.write(prompt_v1)

print("✓ Prompt template saved!")

✓ Prompt template saved!


In [17]:
# Initialize MLflow (local tracking)
mlflow.set_tracking_uri("file:./mlflow")
mlflow.set_experiment("LLMOps_Prompt_Registry")

# Register prompt as artifact
with mlflow.start_run(run_name="prompt-v1.0.0"):
    mlflow.log_artifact("prompts/assistant_v1.jinja2", artifact_path="prompts")
    mlflow.set_tag("prompt.version", "1.0.0")
    mlflow.set_tag("author", "student")
    mlflow.log_param("max_tokens", 512)
    mlflow.log_param("temperature", 0.3)
    mlflow.log_param("model", "gpt-4.1-mini")
    
print(f"✓ Prompt v1.0.0 registered in MLflow!")
print(f"View at: http://localhost:5000 (run `mlflow ui` in terminal)")

  return FileStore(store_uri, store_uri)
2025/12/10 21:14:58 INFO mlflow.tracking.fluent: Experiment with name 'LLMOps_Prompt_Registry' does not exist. Creating a new experiment.


✓ Prompt v1.0.0 registered in MLflow!
View at: http://localhost:5000 (run `mlflow ui` in terminal)


## 5. Load Latest Prompt from Registry

In [18]:
def load_latest_prompt() -> Template:
    # In real system: query MLflow for latest version
    with open("prompts/assistant_v1.jinja2") as f:
        template = Template(f.read())
    return template

prompt_template = load_latest_prompt()
print("✓ Latest prompt loaded!")

✓ Latest prompt loaded!


## 6. Semantic + Exact Match Caching

In [30]:
def get_cache_key(messages: List[Dict], department: str) -> str:
    content = json.dumps(messages) + department
    return "cache:" + hashlib.sha256(content.encode()).hexdigest()

def get_cached_response(key: str):
    #cached = r.get(key)
    cached = redis_client.get(key)
    if cached:
        print("✓ Cache HIT")
        return json.loads(cached)
    print("✗ Cache MISS")
    return None

def set_cache(key: str, response: Dict, ttl: int = 3600):
    #r.setex(key, ttl, json.dumps(response))
    redis_client.setex(key, ttl, json.dumps(response))

## 7. LLM Call with OpenAI API

In [20]:
def generate_response(prompt: str, model: str = None, temperature: float = 0.3, max_tokens: int = 512):
    """
    Generate a response using OpenAI or Azure OpenAI API
    
    Args:
        prompt: The user prompt
        model: Model to use (for OpenAI) or None to use Azure deployment
        temperature: Sampling temperature (0-2)
        max_tokens: Maximum tokens in response
    """
    try:
        # For Azure OpenAI, use the deployment name instead of model
        if provider == 'azure':
            response = client.chat.completions.create(
                model=DEPLOYMENT_NAME,  # Azure uses deployment name
                messages=[{"role": "user", "content": prompt}],
                temperature=temperature,
                max_tokens=max_tokens
            )
        else:
            # For OpenAI, use the model parameter
            response = client.chat.completions.create(
                model=model or "gpt-4.1-mini",
                messages=[{"role": "user", "content": prompt}],
                temperature=temperature,
                max_tokens=max_tokens
            )
        return response.choices[0].message.content
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"LLM API error: {str(e)}")

## 8. FastAPI OpenAI-Compatible Endpoint

In [23]:
# Define Pydantic models for request/response
class Message(BaseModel):
    role: str
    content: str
class ChatCompletionRequest(BaseModel):
    model: str = "gpt-4.1-mini"
    messages: List[Message]
    temperature: float = 0.3
    max_tokens: int = 512
    metadata: Dict[str, Any] = {}

In [24]:
# Initialize FastAPI app
app = FastAPI(title="LLMOps Production Endpoint", version="1.0.0")

@app.post("/v1/chat/completions")
async def chat_completions(request: ChatCompletionRequest):
    user_question = request.messages[-1].content
    department = request.metadata.get("department", "general")
    
    # Cache key
    cache_key = get_cache_key([m.dict() for m in request.messages], department)
    cached = get_cached_response(cache_key)
    if cached:
        return cached
    
    # Render prompt
    rendered_prompt = prompt_template.render(
        current_date=datetime.now().strftime("%Y-%m-%d"),
        department=department,
        user_question=user_question
    )
    
    # Generate using the configured provider
    if provider == 'azure':
        # Azure uses deployment name, not model parameter
        answer = generate_response(
            rendered_prompt,
            model=None,  # Will use DEPLOYMENT_NAME
            temperature=request.temperature,
            max_tokens=request.max_tokens
        )
    else:
        # OpenAI uses model parameter
        answer = generate_response(
            rendered_prompt,
            model=request.model,
            temperature=request.temperature,
            max_tokens=request.max_tokens
        )
    
    # Cache result
    response = {
        "id": "chatcmpl-123",
        "object": "chat.completion",
        "created": int(datetime.now().timestamp()),
        "model": request.model if provider == 'openai' else DEPLOYMENT_NAME,
        "choices": [{
            "index": 0,
            "message": {
                "role": "assistant",
                "content": answer
            },
            "finish_reason": "stop"
        }],
        "usage": {"prompt_tokens": 150, "completion_tokens": 80, "total_tokens": 230}
    }
    
    set_cache(cache_key, response)
    return JSONResponse(content=response)

@app.get("/health")
async def health_check():
    return {"status": "healthy", "provider": provider}

print("✓ API Ready!")
print(f"✓ Using provider: {provider.upper()}")
print("Run the next cell to start server")

✓ API Ready!
✓ Using provider: AZURE
Run the next cell to start server


## 9. Start the Production Server

In [25]:
# In a real notebook, use: uvicorn.run(app, port=8000)
# For Jupyter, use thread

import threading
import uvicorn

def start_server():
    uvicorn.run(app, host="0.0.0.0", port=8000)

thread = threading.Thread(target=start_server, daemon=True)
thread.start()

import time
time.sleep(2)  # Give server time to start

print("✓ Production LLMOps API is LIVE!")
print("📡 Endpoint: http://localhost:8000/v1/chat/completions")
print("📊 OpenTelemetry traces: http://localhost:6006")
print("🔍 Health check: http://localhost:8000/health")

INFO:     Started server process [6008]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)


✓ Production LLMOps API is LIVE!
📡 Endpoint: http://localhost:8000/v1/chat/completions
📊 OpenTelemetry traces: http://localhost:6006
🔍 Health check: http://localhost:8000/health


## 10. Test Your Production Endpoint

In [31]:
import requests
import time

payload = {
    "model": "gpt-4.1-mini",
    "messages": [{"role": "user", "content": "What is our vacation policy?"}],
    "metadata": {"department": "HR"}
}

print("Testing API endpoint...\n")

# First call (cache miss)
start = time.time()
resp1 = requests.post("http://localhost:8000/v1/chat/completions", json=payload)
time1 = time.time() - start

# Second call (cache hit)
start = time.time()
resp2 = requests.post("http://localhost:8000/v1/chat/completions", json=payload)
time2 = time.time() - start

print(f"⏱️  First call (cache miss): {time1:.2f}s")
print(f"⚡ Second call (cache hit): {time2:.2f}s → {time1/time2:.1f}x faster!\n")
print(f"📝 Response: {resp1.json()['choices'][0]['message']['content'][:200]}...")

Testing API endpoint...



C:\Users\sivasubramanian.v\AppData\Local\Temp\ipykernel_6008\3997174297.py:10: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/
  cache_key = get_cache_key([m.dict() for m in request.messages], department)


✗ Cache MISS
INFO:     127.0.0.1:49201 - "POST /v1/chat/completions HTTP/1.1" 200 OK
✓ Cache HIT
INFO:     127.0.0.1:49205 - "POST /v1/chat/completions HTTP/1.1" 200 OK
⏱️  First call (cache miss): 5.26s
⚡ Second call (cache hit): 2.05s → 2.6x faster!

📝 Response: Our vacation policy provides full-time employees with 15 days of paid vacation annually, accruing at 1.25 days per month. Part-time employees accrue vacation on a pro-rated basis. Vacation requests sh...


## 11. Test with Different Models

You can test with different OpenAI models:

In [None]:
# Test with GPT-4 (more expensive but higher quality)
payload_gpt4 = {
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Explain quantum computing in simple terms"}],
    "temperature": 0.7,
    "max_tokens": 200,
    "metadata": {"department": "Engineering"}
}

resp_gpt4 = requests.post("http://localhost:8000/v1/chat/completions", json=payload_gpt4)
print(f"GPT-4 Response:\n{resp_gpt4.json()['choices'][0]['message']['content']}")

When done, stop Redis

In [None]:

redis_process.terminate()
redis_process.wait()

## Final Submission Checklist

Take these screenshots:
1. Phoenix tracing UI showing spans
2. MLflow UI with prompt v1.0.0
3. Cache HIT in logs
4. API response from curl/postman
5. `mlflow ui` and `redis-cli monitor` output

### Key Differences from Ollama Version:
- ✅ Uses OpenAI API (requires API key)
- ✅ Access to GPT-3.5, GPT-4, and other OpenAI models
- ✅ Higher quality responses
- ✅ No local model installation required
- ⚠️ Costs money per API call (check OpenAI pricing)
- ⚠️ Requires internet connection

You just built a production-grade LLMOps endpoint using OpenAI!