# 05 - API Usage Guide

Learn how to interact with the G-code fingerprinting FastAPI server.

## Table of Contents
1. [Overview](#1.-Overview)
2. [Starting the API Server](#2.-Starting-the-API-Server)
3. [Health Check & Info Endpoints](#3.-Health-Check-&-Info-Endpoints)
4. [Single Prediction Requests](#4.-Single-Prediction-Requests)
5. [Batch Predictions](#5.-Batch-Predictions)
6. [Fingerprint Extraction](#6.-Fingerprint-Extraction)
7. [Dynamic Checkpoint Loading](#7.-Dynamic-Checkpoint-Loading)
8. [Async Requests with aiohttp](#8.-Async-Requests-with-aiohttp)
9. [Error Handling](#9.-Error-Handling)
10. [Performance Benchmarking](#10.-Performance-Benchmarking)

---

## 1. Overview

The G-code Fingerprinting API provides REST endpoints for:

- **Prediction**: Convert sensor data to G-code sequences
- **Fingerprinting**: Extract machine embeddings
- **Batch Processing**: Handle multiple samples efficiently
- **Model Management**: Dynamically load different checkpoints

### API Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    FastAPI Server (port 8000)                │
├─────────────────────────────────────────────────────────────┤
│  GET  /           → API info and available endpoints        │
│  GET  /health     → Server health and model status          │
│  GET  /info       → Model configuration details             │
│  POST /predict    → Single sample prediction                │
│  POST /batch_predict → Batch predictions (up to 32)        │
│  POST /fingerprint   → Extract machine fingerprint         │
│  POST /load_checkpoint → Load different model              │
└─────────────────────────────────────────────────────────────┘
```

In [None]:
# ============================================================
# Environment Setup
# ============================================================

import sys
from pathlib import Path
import json
import time

import numpy as np
import requests

# Project root
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root / 'src'))

# API Configuration
API_BASE_URL = 'http://localhost:8000'

print("="*60)
print("G-CODE FINGERPRINTING API CLIENT")
print("="*60)
print(f"API Base URL: {API_BASE_URL}")
print(f"Project Root: {project_root}")

## 2. Starting the API Server

Before using the API, start the FastAPI server in a separate terminal:

```bash
# Option 1: Run directly
PYTHONPATH=src .venv/bin/python src/miracle/api/server.py

# Option 2: Run with uvicorn (hot reload)
PYTHONPATH=src .venv/bin/uvicorn miracle.api.server:app --reload --host 0.0.0.0 --port 8000

# Option 3: Production mode
PYTHONPATH=src .venv/bin/uvicorn miracle.api.server:app --host 0.0.0.0 --port 8000 --workers 4
```

The server will:
1. Load the default checkpoint from `outputs/training_50epoch/checkpoint_best.pt`
2. Start listening on port 8000
3. Provide interactive docs at http://localhost:8000/docs

In [None]:
# Helper function to check server status
def check_server_status():
    """Check if the API server is running."""
    try:
        response = requests.get(f'{API_BASE_URL}/', timeout=5)
        if response.status_code == 200:
            return True, response.json()
        return False, None
    except requests.exceptions.ConnectionError:
        return False, None
    except Exception as e:
        return False, str(e)

# Check if server is running
is_running, server_info = check_server_status()

if is_running:
    print("\n✓ API Server is running!")
    print(f"  Name: {server_info.get('name', 'Unknown')}")
    print(f"  Version: {server_info.get('version', 'Unknown')}")
    print(f"  Status: {server_info.get('status', 'Unknown')}")
    print(f"\n  Available endpoints:")
    for name, path in server_info.get('endpoints', {}).items():
        print(f"    • {name}: {path}")
else:
    print("\n✗ API Server is not running!")
    print("  Start the server with:")
    print("  PYTHONPATH=src .venv/bin/python src/miracle/api/server.py")

## 3. Health Check & Info Endpoints

These endpoints provide server status and model information.

In [None]:
# Health Check
def get_health():
    """Get API health status."""
    response = requests.get(f'{API_BASE_URL}/health')
    return response.json()

try:
    health = get_health()
    print("\nHealth Check Response:")
    print("-" * 40)
    print(f"  Status: {health.get('status', 'unknown')}")
    print(f"  Model Loaded: {health.get('model_loaded', False)}")
    print(f"  Model Version: {health.get('model_version', 'none')}")
    print(f"  Uptime: {health.get('uptime_seconds', 0):.1f} seconds")
except Exception as e:
    print(f"\nHealth check failed: {e}")

In [None]:
# Model Information
def get_model_info():
    """Get detailed model information."""
    response = requests.get(f'{API_BASE_URL}/info')
    if response.status_code == 200:
        return response.json()
    else:
        return {'error': response.json()}

try:
    info = get_model_info()
    if 'error' not in info:
        print("\nModel Information:")
        print("-" * 40)
        print(f"  Model Name: {info.get('model_name', 'unknown')}")
        print(f"  Version: {info.get('model_version', 'unknown')}")
        print(f"  Vocabulary Size: {info.get('vocab_size', 0)}")
        print(f"  Hidden Dimension: {info.get('d_model', 0)}")
        print(f"  Parameters: {info.get('num_parameters', 0):,}")
        print(f"\n  Supported Endpoints:")
        for endpoint in info.get('supported_endpoints', []):
            print(f"    • {endpoint}")
        print(f"\n  Generation Methods:")
        for method in info.get('supported_generation_methods', []):
            print(f"    • {method}")
    else:
        print(f"\nCould not get model info: {info.get('error')}")
except Exception as e:
    print(f"\nInfo request failed: {e}")

## 4. Single Prediction Requests

Make predictions on a single sensor data sample.

In [None]:
# Generate sample sensor data
def generate_sample_sensor_data(seq_len=64, continuous_dim=155, categorical_dim=4):
    """Generate random sensor data for testing."""
    return {
        'continuous': np.random.randn(seq_len, continuous_dim).tolist(),
        'categorical': np.random.randint(0, 10, (seq_len, categorical_dim)).tolist()
    }

# Create sample data
sample_data = generate_sample_sensor_data()

print("Sample Sensor Data:")
print(f"  Continuous shape: [{len(sample_data['continuous'])}, {len(sample_data['continuous'][0])}]")
print(f"  Categorical shape: [{len(sample_data['categorical'])}, {len(sample_data['categorical'][0])}]")

In [None]:
# Make prediction request
def predict_gcode(sensor_data, return_fingerprint=False, inference_config=None):
    """Send prediction request to API."""
    payload = {
        'sensor_data': sensor_data,
        'return_fingerprint': return_fingerprint,
    }
    if inference_config:
        payload['inference_config'] = inference_config
    
    response = requests.post(
        f'{API_BASE_URL}/predict',
        json=payload
    )
    
    if response.status_code == 200:
        return response.json()
    else:
        return {'error': response.json(), 'status_code': response.status_code}

# Make prediction
try:
    start_time = time.time()
    result = predict_gcode(sample_data, return_fingerprint=True)
    client_time = (time.time() - start_time) * 1000
    
    if 'error' not in result:
        print("\nPrediction Result:")
        print("="*50)
        print(f"  Model Version: {result.get('model_version', 'unknown')}")
        print(f"  Server Inference Time: {result.get('inference_time_ms', 0):.1f} ms")
        print(f"  Client Round-trip Time: {client_time:.1f} ms")
        
        gcode = result.get('gcode_sequence', [])
        print(f"\n  G-code Sequence ({len(gcode)} tokens):")
        print(f"    {' '.join(gcode[:15])}{'...' if len(gcode) > 15 else ''}")
        
        if result.get('fingerprint'):
            fp = result['fingerprint']
            print(f"\n  Fingerprint:")
            print(f"    Dimension: {len(fp)}")
            print(f"    Norm: {np.linalg.norm(fp):.4f}")
            print(f"    First 5 values: {fp[:5]}")
    else:
        print(f"\nPrediction failed: {result.get('error')}")
except Exception as e:
    print(f"\nRequest failed: {e}")

In [None]:
# Test different generation methods
generation_methods = ['greedy', 'sampling', 'top_k', 'top_p']

print("\nTesting Generation Methods:")
print("="*50)

for method in generation_methods:
    try:
        config = {
            'method': method,
            'temperature': 0.8,
            'max_length': 32
        }
        
        result = predict_gcode(sample_data, inference_config=config)
        
        if 'error' not in result:
            gcode = result.get('gcode_sequence', [])
            print(f"\n  {method.upper()}:")
            print(f"    Time: {result.get('inference_time_ms', 0):.1f} ms")
            print(f"    Tokens: {' '.join(gcode[:10])}...")
        else:
            print(f"\n  {method.upper()}: Failed")
    except Exception as e:
        print(f"\n  {method.upper()}: Error - {e}")

## 5. Batch Predictions

Process multiple samples in a single request for better efficiency.

In [None]:
# Batch prediction function
def batch_predict(sensor_data_list, return_fingerprint=False):
    """Send batch prediction request."""
    payload = {
        'sensor_data_batch': sensor_data_list,
        'return_fingerprint': return_fingerprint,
    }
    
    response = requests.post(
        f'{API_BASE_URL}/batch_predict',
        json=payload
    )
    
    if response.status_code == 200:
        return response.json()
    else:
        return {'error': response.json(), 'status_code': response.status_code}

# Generate batch of samples
batch_sizes = [1, 4, 8, 16]

print("\nBatch Prediction Benchmarks:")
print("="*50)
print(f"{'Batch Size':<12} {'Total Time (ms)':<18} {'Per Sample (ms)':<15}")
print("-" * 45)

for batch_size in batch_sizes:
    try:
        batch_data = [generate_sample_sensor_data() for _ in range(batch_size)]
        
        start_time = time.time()
        result = batch_predict(batch_data)
        total_time = (time.time() - start_time) * 1000
        
        if 'error' not in result:
            server_time = result.get('total_inference_time_ms', 0)
            per_sample = total_time / batch_size
            print(f"{batch_size:<12} {server_time:<18.1f} {per_sample:<15.1f}")
        else:
            print(f"{batch_size:<12} {'Error':<18} {'-':<15}")
    except Exception as e:
        print(f"{batch_size:<12} {'Failed':<18} {str(e)[:15]}")

## 6. Fingerprint Extraction

Extract machine fingerprints (embeddings) from sensor data.

In [None]:
# Fingerprint extraction function
def extract_fingerprint(sensor_data):
    """Extract machine fingerprint from sensor data."""
    payload = {'sensor_data': sensor_data}
    
    response = requests.post(
        f'{API_BASE_URL}/fingerprint',
        json=payload
    )
    
    if response.status_code == 200:
        return response.json()
    else:
        return {'error': response.json()}

# Extract fingerprints from multiple samples
print("\nFingerprint Extraction:")
print("="*50)

fingerprints = []
n_samples = 5

for i in range(n_samples):
    try:
        sample = generate_sample_sensor_data()
        result = extract_fingerprint(sample)
        
        if 'error' not in result:
            fp = np.array(result['fingerprint'])
            fingerprints.append(fp)
            print(f"  Sample {i+1}: dim={result['embedding_dim']}, norm={result['norm']:.4f}")
        else:
            print(f"  Sample {i+1}: Error")
    except Exception as e:
        print(f"  Sample {i+1}: Failed - {e}")

In [None]:
# Compute pairwise similarities
if len(fingerprints) >= 2:
    import matplotlib.pyplot as plt
    
    # Normalize fingerprints
    fps_normalized = [fp / np.linalg.norm(fp) for fp in fingerprints]
    
    # Compute cosine similarity matrix
    n = len(fps_normalized)
    similarity_matrix = np.zeros((n, n))
    
    for i in range(n):
        for j in range(n):
            similarity_matrix[i, j] = np.dot(fps_normalized[i], fps_normalized[j])
    
    print("\nCosine Similarity Matrix:")
    print("-" * 40)
    print("       ", end="")
    for i in range(n):
        print(f"  S{i+1}   ", end="")
    print()
    
    for i in range(n):
        print(f"  S{i+1} ", end="")
        for j in range(n):
            print(f" {similarity_matrix[i,j]:5.3f} ", end="")
        print()
    
    # Visualize
    fig, ax = plt.subplots(figsize=(6, 5))
    im = ax.imshow(similarity_matrix, cmap='coolwarm', vmin=0, vmax=1)
    ax.set_xticks(range(n))
    ax.set_yticks(range(n))
    ax.set_xticklabels([f'S{i+1}' for i in range(n)])
    ax.set_yticklabels([f'S{i+1}' for i in range(n)])
    ax.set_title('Fingerprint Cosine Similarity')
    plt.colorbar(im)
    plt.tight_layout()
    plt.show()

## 7. Dynamic Checkpoint Loading

Load different model checkpoints without restarting the server.

In [None]:
# List available checkpoints
import glob

checkpoint_patterns = [
    'outputs/*/checkpoint_best.pt',
    'outputs/final_model/checkpoint_best.pt',
]

available_checkpoints = []
for pattern in checkpoint_patterns:
    available_checkpoints.extend(glob.glob(str(project_root / pattern)))

print("\nAvailable Checkpoints:")
print("-" * 50)
for i, cp in enumerate(available_checkpoints[:5]):
    rel_path = Path(cp).relative_to(project_root)
    size_mb = Path(cp).stat().st_size / (1024 * 1024)
    print(f"  {i+1}. {rel_path} ({size_mb:.1f} MB)")

In [None]:
# Load checkpoint function
def load_checkpoint(checkpoint_path, vocab_path=None, device='cpu'):
    """Load a new checkpoint on the server."""
    payload = {
        'checkpoint_path': checkpoint_path,
        'vocab_path': vocab_path,
        'device': device
    }
    
    response = requests.post(
        f'{API_BASE_URL}/load_checkpoint',
        json=payload
    )
    
    return response.json()

# Example: Load a specific checkpoint
# Uncomment to test
# if available_checkpoints:
#     result = load_checkpoint(
#         checkpoint_path=str(available_checkpoints[0]),
#         vocab_path='data/gcode_vocab_v2.json',
#         device='cpu'
#     )
#     print(f"Load result: {result}")

print("\nTo load a checkpoint, use:")
print("  result = load_checkpoint('outputs/model/checkpoint_best.pt')")

## 8. Async Requests with aiohttp

For high-throughput applications, use async requests.

In [None]:
# Async client example
import asyncio

try:
    import aiohttp
    AIOHTTP_AVAILABLE = True
except ImportError:
    AIOHTTP_AVAILABLE = False
    print("aiohttp not installed. Install with: pip install aiohttp")

if AIOHTTP_AVAILABLE:
    async def async_predict(session, sensor_data):
        """Async prediction request."""
        async with session.post(
            f'{API_BASE_URL}/predict',
            json={'sensor_data': sensor_data}
        ) as response:
            return await response.json()
    
    async def run_concurrent_predictions(n_requests=10):
        """Run multiple predictions concurrently."""
        async with aiohttp.ClientSession() as session:
            # Generate sample data
            samples = [generate_sample_sensor_data() for _ in range(n_requests)]
            
            # Create tasks
            start = time.time()
            tasks = [async_predict(session, sample) for sample in samples]
            results = await asyncio.gather(*tasks, return_exceptions=True)
            total_time = time.time() - start
            
            return results, total_time
    
    # Run concurrent predictions
    # Note: In Jupyter, use nest_asyncio if needed
    try:
        import nest_asyncio
        nest_asyncio.apply()
    except ImportError:
        pass
    
    print("\nAsync Concurrent Predictions:")
    print("="*50)
    
    for n_requests in [5, 10, 20]:
        try:
            results, total_time = asyncio.get_event_loop().run_until_complete(
                run_concurrent_predictions(n_requests)
            )
            successful = sum(1 for r in results if not isinstance(r, Exception) and 'error' not in r)
            print(f"  {n_requests} requests: {total_time:.2f}s total, {successful}/{n_requests} successful")
        except Exception as e:
            print(f"  {n_requests} requests: Failed - {e}")

## 9. Error Handling

Handle common API errors gracefully.

In [None]:
# API client with error handling
class GCodeAPIClient:
    """API client with proper error handling."""
    
    def __init__(self, base_url='http://localhost:8000', timeout=30):
        self.base_url = base_url
        self.timeout = timeout
    
    def _request(self, method, endpoint, **kwargs):
        """Make HTTP request with error handling."""
        url = f"{self.base_url}{endpoint}"
        kwargs['timeout'] = self.timeout
        
        try:
            if method == 'GET':
                response = requests.get(url, **kwargs)
            elif method == 'POST':
                response = requests.post(url, **kwargs)
            else:
                raise ValueError(f"Unsupported method: {method}")
            
            # Check for HTTP errors
            response.raise_for_status()
            return {'success': True, 'data': response.json()}
            
        except requests.exceptions.ConnectionError:
            return {'success': False, 'error': 'Connection failed. Is the server running?'}
        except requests.exceptions.Timeout:
            return {'success': False, 'error': f'Request timed out after {self.timeout}s'}
        except requests.exceptions.HTTPError as e:
            return {'success': False, 'error': f'HTTP error: {e}', 'status_code': response.status_code}
        except Exception as e:
            return {'success': False, 'error': str(e)}
    
    def health(self):
        return self._request('GET', '/health')
    
    def predict(self, sensor_data, **kwargs):
        payload = {'sensor_data': sensor_data, **kwargs}
        return self._request('POST', '/predict', json=payload)
    
    def fingerprint(self, sensor_data):
        payload = {'sensor_data': sensor_data}
        return self._request('POST', '/fingerprint', json=payload)

# Test error handling
print("\nError Handling Examples:")
print("="*50)

client = GCodeAPIClient()

# Test health check
result = client.health()
print(f"\n  Health check: {'Success' if result['success'] else result['error']}")

# Test with valid data
result = client.predict(generate_sample_sensor_data())
print(f"  Valid prediction: {'Success' if result['success'] else result['error']}")

# Test with invalid data (wrong shape)
invalid_data = {'continuous': [[1, 2, 3]], 'categorical': [[1, 2]]}  # Wrong dimensions
result = client.predict(invalid_data)
print(f"  Invalid data: {'Success' if result['success'] else 'Handled error'}")

## 10. Performance Benchmarking

Benchmark API performance under various conditions.

In [None]:
# Comprehensive benchmark
def benchmark_api(n_iterations=20):
    """Run performance benchmark."""
    results = {
        'latencies': [],
        'server_times': [],
        'successful': 0,
        'failed': 0
    }
    
    for i in range(n_iterations):
        sample = generate_sample_sensor_data()
        
        start = time.time()
        try:
            response = requests.post(
                f'{API_BASE_URL}/predict',
                json={'sensor_data': sample},
                timeout=30
            )
            latency = (time.time() - start) * 1000
            
            if response.status_code == 200:
                data = response.json()
                results['latencies'].append(latency)
                results['server_times'].append(data.get('inference_time_ms', 0))
                results['successful'] += 1
            else:
                results['failed'] += 1
        except Exception:
            results['failed'] += 1
    
    return results

# Run benchmark
print("\nRunning Performance Benchmark (20 iterations)...")
print("="*50)

try:
    bench_results = benchmark_api(20)
    
    if bench_results['latencies']:
        latencies = np.array(bench_results['latencies'])
        server_times = np.array(bench_results['server_times'])
        
        print(f"\n  Requests: {bench_results['successful']} successful, {bench_results['failed']} failed")
        print(f"\n  Client Latency (ms):")
        print(f"    Mean: {latencies.mean():.1f}")
        print(f"    Std:  {latencies.std():.1f}")
        print(f"    P50:  {np.percentile(latencies, 50):.1f}")
        print(f"    P95:  {np.percentile(latencies, 95):.1f}")
        print(f"    P99:  {np.percentile(latencies, 99):.1f}")
        
        print(f"\n  Server Inference Time (ms):")
        print(f"    Mean: {server_times.mean():.1f}")
        print(f"    Std:  {server_times.std():.1f}")
        
        network_overhead = latencies.mean() - server_times.mean()
        print(f"\n  Network Overhead: {network_overhead:.1f} ms")
    else:
        print("  No successful requests")
except Exception as e:
    print(f"  Benchmark failed: {e}")

## Summary

In this notebook, you learned:

- **Starting the API**: FastAPI server with uvicorn
- **Health checks**: Monitor server and model status
- **Predictions**: Single and batch prediction requests
- **Fingerprints**: Extract machine embeddings
- **Dynamic loading**: Change models without restart
- **Async requests**: High-throughput with aiohttp
- **Error handling**: Robust API client patterns
- **Benchmarking**: Measure performance metrics

### Quick Reference

```python
# Health check
requests.get('http://localhost:8000/health')

# Single prediction
requests.post('http://localhost:8000/predict', json={'sensor_data': data})

# Batch prediction
requests.post('http://localhost:8000/batch_predict', json={'sensor_data_batch': [data1, data2]})

# Fingerprint
requests.post('http://localhost:8000/fingerprint', json={'sensor_data': data})
```

---

**Navigation:**
← [Previous: 04_inference_prediction](04_inference_prediction.ipynb) |
[Next: 06_dashboard_usage](06_dashboard_usage.ipynb) →

**Related:** [03_training_models](03_training_models.ipynb) | [08_model_evaluation](08_model_evaluation.ipynb)