# Phase 4.2: Testing Model Serving Endpoint

This comprehensive notebook demonstrates:
1. **Making REST API Predictions** - Call the model server
2. **Different Input Formats** - inputs, dataframe_split, dataframe_records
3. **Batch Predictions** - Process multiple samples
4. **Error Handling** - Handle common errors

## Prerequisites
**Important:** 
1. Run `01_prepare_model.ipynb` first to prepare the model
2. Start the model server before running this notebook:
   ```bash
   mlflow models serve -m 'models:/iris-serving-model/Production' -p 5001 --no-conda
   ```

## MLflow Serving Architecture

```
┌──────────────────┐     HTTP/JSON     ┌──────────────────┐
│   Your Client    │─────────────────>│  MLflow Server   │
│   (Python, JS,   │                  │                  │
│    curl, etc.)   │<─────────────────│  Loads model     │
│                  │    Predictions    │  from Registry   │
└──────────────────┘                  └──────────────────┘
```

## Learning Goals
- Make predictions via REST API
- Understand different input formats
- Handle batch predictions
- Debug common errors

## Step 1: Import Libraries

In [None]:
# requests: For making HTTP calls to the model server
import requests

# json: For formatting request/response data
import json

# pandas & numpy: For data manipulation
import pandas as pd
import numpy as np

# sklearn: For test data
from sklearn.datasets import load_iris

# time: For performance testing
import time

# collections: For result analysis
from collections import Counter

print("All libraries imported successfully!")
print("Ready to test model serving!")

## Step 2: Configure Server Connection

In [None]:
# Model server configuration
MODEL_SERVER = "http://localhost:5001"
ENDPOINT = f"{MODEL_SERVER}/invocations"  # Prediction endpoint

# Load Iris metadata for interpreting results
iris = load_iris()
feature_names = list(iris.feature_names)
target_names = list(iris.target_names)

print(f"Model server: {MODEL_SERVER}")
print(f"Prediction endpoint: {ENDPOINT}")
print(f"\nFeatures: {feature_names}")
print(f"Classes: {target_names}")

## Step 3: Define Helper Functions

In [None]:
def check_server():
    """
    Check if the model server is running.
    Returns True if server is up, False otherwise.
    """
    try:
        # Try to hit the health endpoint
        response = requests.get(f"{MODEL_SERVER}/health", timeout=5)
        return response.ok
    except:
        return False


def predict_inputs(data):
    """
    Make predictions using the 'inputs' format.
    This is the simplest format - just a list of lists.
    
    Args:
        data: List of lists, e.g., [[5.1, 3.5, 1.4, 0.2]]
    
    Returns:
        requests.Response object
    """
    payload = {"inputs": data}
    return requests.post(ENDPOINT, json=payload)


def predict_dataframe_split(df):
    """
    Make predictions using the 'dataframe_split' format.
    Includes column names separately from data.
    
    Args:
        df: pandas DataFrame
    
    Returns:
        requests.Response object
    """
    payload = {
        "dataframe_split": {
            "columns": df.columns.tolist(),
            "data": df.values.tolist()
        }
    }
    return requests.post(ENDPOINT, json=payload)


def predict_dataframe_records(df):
    """
    Make predictions using the 'dataframe_records' format.
    Each row is a dictionary with column names as keys.
    
    Args:
        df: pandas DataFrame
    
    Returns:
        requests.Response object
    """
    payload = {"dataframe_records": df.to_dict(orient="records")}
    return requests.post(ENDPOINT, json=payload)


print("Helper functions defined:")
print("  - check_server()")
print("  - predict_inputs(data)")
print("  - predict_dataframe_split(df)")
print("  - predict_dataframe_records(df)")

## Step 4: Check Server Status

In [None]:
print("="*60)
print("MLflow Model Serving Tests")
print("="*60)

print("\n[0] Checking model server...")
print("-" * 40)

if not check_server():
    print("  ERROR: Model server is not running!")
    print("\n  Start it with:")
    print("    mlflow models serve -m 'models:/iris-serving-model/Production' -p 5001 --no-conda")
    print("\n  Then re-run this notebook.")
else:
    print(f"  Server is running at {MODEL_SERVER}")

## Step 5: Prepare Test Data

In [None]:
# Create test samples representing each class
# These are typical values for each iris species
samples = [
    [5.1, 3.5, 1.4, 0.2],  # Typical setosa
    [6.2, 2.9, 4.3, 1.3],  # Typical versicolor
    [7.7, 3.0, 6.1, 2.3],  # Typical virginica
]

# Create DataFrame version
sample_df = pd.DataFrame(samples, columns=feature_names)

print("\n[Test Data]")
print("-" * 40)
print(sample_df.to_string(index=False))
print("\nExpected classes: setosa, versicolor, virginica")

## Test 1: Using 'inputs' Format (Simplest)

The simplest format - just send a list of feature arrays.

In [None]:
print("\n" + "="*60)
print("[Test 1: 'inputs' format]")
print("-" * 40)

print("\nPayload format:")
print('  {"inputs": [[5.1, 3.5, 1.4, 0.2], ...]}')

response = predict_inputs(samples)

print(f"\nStatus: {response.status_code}")

if response.ok:
    result = response.json()
    # Handle different response formats
    predictions = result.get("predictions", result)
    
    print(f"Predictions: {predictions}")
    print(f"Classes: {[target_names[p] for p in predictions]}")
else:
    print(f"Error: {response.text}")

## Test 2: Using 'dataframe_split' Format

Includes column names and data separately. Good for preserving schema.

In [None]:
print("\n" + "="*60)
print("[Test 2: 'dataframe_split' format]")
print("-" * 40)

print("\nPayload format:")
print('''  {
    "dataframe_split": {
      "columns": ["sepal length", ...],
      "data": [[5.1, 3.5, 1.4, 0.2], ...]
    }
  }''')

response = predict_dataframe_split(sample_df)

print(f"\nStatus: {response.status_code}")

if response.ok:
    result = response.json()
    predictions = result.get("predictions", result)
    print(f"Predictions: {predictions}")
else:
    print(f"Error: {response.text}")

## Test 3: Using 'dataframe_records' Format

Each row is a dictionary - most explicit and readable.

In [None]:
print("\n" + "="*60)
print("[Test 3: 'dataframe_records' format]")
print("-" * 40)

print("\nPayload format:")
print('''  {
    "dataframe_records": [
      {"sepal length": 5.1, "sepal width": 3.5, ...},
      ...
    ]
  }''')

response = predict_dataframe_records(sample_df)

print(f"\nStatus: {response.status_code}")

if response.ok:
    result = response.json()
    predictions = result.get("predictions", result)
    print(f"Predictions: {predictions}")
else:
    print(f"Error: {response.text}")

## Test 4: Single Prediction

In [None]:
print("\n" + "="*60)
print("[Test 4: Single Prediction]")
print("-" * 40)

# Single sample
single_sample = [[5.1, 3.5, 1.4, 0.2]]

response = predict_inputs(single_sample)

if response.ok:
    result = response.json()
    pred = result.get("predictions", result)[0]
    
    print(f"Input: {single_sample[0]}")
    print(f"Prediction: {pred} ({target_names[pred]})")
else:
    print(f"Error: {response.text}")

## Test 5: Batch Prediction Performance

In [None]:
print("\n" + "="*60)
print("[Test 5: Batch Prediction (100 samples)]")
print("-" * 40)

# Generate random samples within typical Iris ranges
np.random.seed(42)
batch = np.random.uniform(
    low=[4.0, 2.0, 1.0, 0.1],    # Min values
    high=[8.0, 4.5, 7.0, 2.5],   # Max values
    size=(100, 4)                 # 100 samples, 4 features
).tolist()

# Time the request
start = time.time()
response = predict_inputs(batch)
elapsed = time.time() - start

if response.ok:
    result = response.json()
    predictions = result.get("predictions", result)
    
    print(f"Samples: {len(batch)}")
    print(f"Time: {elapsed:.3f}s")
    print(f"Throughput: {len(batch)/elapsed:.1f} predictions/sec")
    
    # Show distribution
    dist = Counter(predictions)
    print("\nPrediction Distribution:")
    for class_id, count in sorted(dist.items()):
        print(f"  {target_names[class_id]}: {count}")
else:
    print(f"Error: {response.text}")

## Test 6: Error Handling

In [None]:
print("\n" + "="*60)
print("[Test 6: Error Handling]")
print("-" * 40)

# Test 1: Wrong number of features
print("\nWrong number of features (3 instead of 4):")
response = predict_inputs([[1, 2, 3]])  # Only 3 features!
print(f"  Status: {response.status_code}")
if not response.ok:
    print(f"  Error (expected): {response.text[:100]}...")

# Test 2: Invalid data type
print("\nInvalid data type (strings instead of numbers):")
response = predict_inputs([["a", "b", "c", "d"]])  # Strings!
print(f"  Status: {response.status_code}")
if not response.ok:
    print(f"  Error (expected): {response.text[:100]}...")

## cURL Examples for Testing

Here are curl commands you can use from the terminal:

In [None]:
print("\n" + "="*60)
print("cURL Examples")
print("="*60)

print(f"""
# Single prediction:
curl -X POST {ENDPOINT} \\
  -H "Content-Type: application/json" \\
  -d '{{"inputs": [[5.1, 3.5, 1.4, 0.2]]}}'

# Multiple predictions:
curl -X POST {ENDPOINT} \\
  -H "Content-Type: application/json" \\
  -d '{{"inputs": [[5.1, 3.5, 1.4, 0.2], [6.2, 2.9, 4.3, 1.3]]}}'

# DataFrame format (with column names):
curl -X POST {ENDPOINT} \\
  -H "Content-Type: application/json" \\
  -d '{{"dataframe_split": {{
    "columns": ["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"],
    "data": [[5.1, 3.5, 1.4, 0.2]]
  }}}}'
""")

## Summary: Input Formats

| Format | When to Use | Example |
|--------|------------|----------|
| `inputs` | Simple arrays, no column names | `{"inputs": [[5.1, 3.5, ...]]}` |
| `dataframe_split` | Need column names, efficient | `{"dataframe_split": {"columns": [...], "data": [...]}}` |
| `dataframe_records` | Human-readable, explicit | `{"dataframe_records": [{"col": val, ...}]}` |

### Best Practices

1. **Use `inputs` for simplicity** - When you know the column order
2. **Use `dataframe_split` for efficiency** - Best for large batches
3. **Use `dataframe_records` for clarity** - Self-documenting requests
4. **Always handle errors** - Check response status codes

In [None]:
print("="*60)
print("Model Serving Tests Complete!")
print("="*60)
print("\nWhat you learned:")
print("  1. How to make REST API predictions")
print("  2. Three different input formats")
print("  3. How to handle batch predictions")
print("  4. How to debug common errors")
print("  5. cURL commands for terminal testing")