## Module 10 -- Deployment -- Recap 

#### How to think before thinking of deploying - There're Several Critical Steps

You got the broad strokes but a production pipeline needs to be exact. Missing one transformation = silent garbage predictions. No errors, just wrong numbers.
> List the inference transformation steps in the correct order that raw input data must go through before hitting `model.predict()`.
```
Let me give you a hint. A customer places an order right now. We receive this JSON:
{
  "created_at": "2025-02-20 19:30:00",
  "market_id": 3,
  "store_primary_category": "italian",
  "order_protocol": 1,
  "total_items": 3,
  "subtotal": 2500,
  "num_distinct_items": 2,
  "min_item_price": 600,
  "max_item_price": 1200,
  "total_onshift_partners": 12,
  "total_busy_partners": 8,
  "total_outstanding_orders": 15
} 
```
>  what happens to a single raw order JSON at prediction time. The sequence of data transformations.

Here's the complete correct order:

```
1. Parse created_at → extract hour, day_of_week, month
2. Engineer is_peak from hour
3. Engineer time_period → encode using TRAINING category means
4. Engineer demand_supply_ratio = outstanding_orders / (onshift + 1)
5. Engineer price_range = max_item_price - min_item_price
6. Engineer avg_item_price = subtotal / total_items
7. Encode store_primary_category → training means (or global mean if unseen)
8. One-hot encode market_id and order_protocol (drop_first=True)
9. Drop store_id, created_at, raw datetime cols
10. Apply log1p to: total_items, subtotal, min_item_price, 
                    max_item_price, total_outstanding_orders
11. scaler.transform() — using saved scaler.pkl
12. model.predict()
13. np.expm1() → real minutes
```

### Build the Inference Pipeline

In [None]:
import numpy as np

In [1]:
def preprocess_input(raw_input: dict) -> np.ndarray:
    """
    Transform raw order input through exact same pipeline as training.
    Input:  raw order dictionary
    Output: scaled numpy array ready for model.predict()
    """
    # Load artifacts
    scaler = joblib.load('porter_model/scaler.pkl')
    category_map = joblib.load('porter_model/category_encoding.pkl')
    time_period_map = joblib.load('porter_model/time_period_encoding.pkl')
    global_mean = joblib.load('porter_model/global_mean.pkl')
    feature_cols = joblib.load('porter_model/feature_columns.pkl')
    
    # Work on a copy
    data = raw_input.copy()
    
    # ── Step 1: Parse datetime ─────────────────────────────────────────
    created_at = pd.to_datetime(data['created_at'])
    data['hour'] = created_at.hour
    data['day_of_week'] = created_at.dayofweek
    data['month'] = created_at.month
    
    # ── Step 2: Time features ──────────────────────────────────────────
    def get_time_period(hour):
        if 6 <= hour <= 9: return 'breakfast'
        elif 11 <= hour <= 14: return 'lunch'
        elif 17 <= hour <= 21: return 'dinner'
        elif hour in [22, 23, 0, 1]: return 'late_night'
        else: return 'off_peak'
    
    data['is_peak'] = 1 if data['hour'] in [
        11,12,13,14,15,19,20,21] else 0
    time_period = get_time_period(data['hour'])
    data['time_period_encoded'] = time_period_map.get(
        time_period, global_mean)
    data['is_weekend'] = 1 if data['day_of_week'] >= 5 else 0
    
    # ── Step 3: Category encoding ──────────────────────────────────────
    category = data.get('store_primary_category', 'unknown')
    data['category_encoded'] = category_map.get(
        category, global_mean)
    
    # ── Step 4: Supply demand features ────────────────────────────────
    data['demand_supply_ratio'] = (
        data['total_outstanding_orders'] / 
        (data['total_onshift_partners'] + 1)
    )
    data['price_range'] = (
        data['max_item_price'] - data['min_item_price'])
    data['avg_item_price'] = (
        data['subtotal'] / data['total_items'])
    
    # ── Step 5: Log1p transformations ─────────────────────────────────
    for col in ['total_items', 'subtotal', 'min_item_price',
                'max_item_price', 'total_outstanding_orders']:
        data[col] = np.log1p(max(data[col], 0))
    
    # ── Step 6: One-hot encode market_id and order_protocol ───────────
    for i in [2, 3, 4, 5, 6]:
        data[f'market_id_{i}.0'] = 1 if data['market_id'] == i else 0
    for i in [2, 3, 4, 5, 6, 7]:
        data[f'order_protocol_{i}.0'] = (
            1 if data['order_protocol'] == i else 0)
    
    # ── Step 7: Build dataframe in exact training column order ─────────
    df_input = pd.DataFrame([data])
    df_input = df_input[feature_cols]
    
    # ── Step 8: Scale ──────────────────────────────────────────────────
    df_scaled = scaler.transform(df_input)
    
    return df_scaled

def predict_delivery_time(raw_input: dict) -> dict:
    """
    End to end prediction function.
    Returns prediction in real minutes with confidence context.
    """
    model = keras.models.load_model(
        'porter_model/best_model_v2.keras')
    
    # Preprocess
    X = preprocess_input(raw_input)
    
    # Predict in log space
    y_log = model.predict(X, verbose=0).flatten()[0]
    
    # Inverse transform to real minutes
    y_minutes = float(np.expm1(y_log))
    
    # Clamp to realistic range
    y_minutes = max(5.0, min(120.0, y_minutes))
    
    return {
        'predicted_delivery_minutes': round(y_minutes, 1),
        'predicted_delivery_range': {
            'optimistic': round(max(5.0, y_minutes - 10), 1),
            'pessimistic': round(min(120.0, y_minutes + 10), 1)
        }
    }

# ── Test the pipeline ──────────────────────────────────────────────────
test_order = {
    "created_at": "2015-02-15 19:30:00",
    "market_id": 3,
    "store_primary_category": "italian",
    "order_protocol": 1,
    "total_items": 3,
    "subtotal": 2500,
    "num_distinct_items": 2,
    "min_item_price": 600,
    "max_item_price": 1200,
    "total_onshift_partners": 12,
    "total_busy_partners": 8,
    "total_outstanding_orders": 15
}

result = predict_delivery_time(test_order)
print("Test prediction:")
print(f"  Predicted: {result['predicted_delivery_minutes']} minutes")
print(f"  Range: {result['predicted_delivery_range']['optimistic']}"
      f" - {result['predicted_delivery_range']['pessimistic']} minutes")

NameError: name 'np' is not defined