# Fall Detection Inference Pipeline - Model Deployment

## Overview
This notebook converts trained ML models into formats optimized for mobile and edge devices. It prepares models for real-world deployment where computational resources are limited.

## Deployment Targets
1. **Mobile Devices** (Android/iOS) - TFLite format
2. **Cross-Platform** - ONNX format
3. **Qualcomm Devices** - Optimized for Snapdragon processors

## Optimization Techniques
- **Quantization**: Reduce model size by 75% (32-bit ‚Üí 8-bit)
- **Hardware Acceleration**: Leverage NPU/GPU on mobile devices
- **Model Compression**: Remove unnecessary operations

## Why This Matters
- **Latency**: Falls need detection within milliseconds
- **Battery**: Models run continuously on wearables
- **Privacy**: On-device processing, no cloud dependency
- **Reliability**: Works offline without network connectivity

---
## Section 1: Import Required Libraries

### Purpose
Import specialized libraries for model conversion and deployment.

### Key Libraries
- **tf.lite**: Convert TensorFlow models to mobile-friendly format
- **skl2onnx**: Convert scikit-learn models to ONNX
- **onnxruntime**: Run ONNX models
- **qai_hub**: Deploy to Qualcomm Snapdragon devices
- **tf2onnx**: Convert TensorFlow/Keras to ONNX

### Installation
If missing, install with:
```bash
pip install tensorflow scikit-learn onnx onnxruntime skl2onnx tf2onnx qai-hub-models
```

In [1]:
import tensorflow as tf
from tensorflow import keras
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import numpy as np
from sklearn.ensemble import IsolationForest
import onnxruntime as ort
import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType
import qai_hub as hub
from keras.saving import register_keras_serializable
from keras.losses import MeanSquaredError
import tf2onnx
import os
import time

print("‚úÖ All libraries imported successfully!")
print(f"TensorFlow Version: {tf.__version__}")
print(f"ONNX Runtime Version: {ort.__version__}")
print(f"ONNX Version: {onnx.__version__}")



ModuleNotFoundError: No module named 'tf2onnx'

---
## Section 2: Load Pre-trained Autoencoder

### Purpose
Load the autoencoder model trained in the previous pipeline.

### Expected File
`fall_detection_model.h5` or `autoencoder.h5` from training pipeline

### Fallback Behavior
If no pre-trained model exists, creates a simple autoencoder for demonstration.

### Architecture Reminder
```
Input (N features) 
  ‚Üì
Dense(16) + ReLU   ‚Üê Encoder
  ‚Üì
Dense(8) + ReLU    ‚Üê Bottleneck (compressed)
  ‚Üì
Dense(16) + ReLU   ‚Üê Decoder
  ‚Üì
Dense(N) + Sigmoid ‚Üê Reconstruction
```

### Production Note
Always use the actual trained model, not the fallback!

In [None]:
print(f"\n{'='*70}")
print("LOADING AUTOENCODER MODEL")
print(f"{'='*70}\n")

# Try to load pre-trained model
model_paths = ["autoencoder.h5", "fall_detection_model.h5"]
autoencoder = None

for path in model_paths:
    if os.path.exists(path):
        try:
            autoencoder = keras.models.load_model(path)
            print(f"‚úÖ Loaded existing autoencoder from: {path}")
            break
        except Exception as e:
            print(f"‚ö†Ô∏è Failed to load {path}: {e}")

# Fallback: Create simple autoencoder for demonstration
if autoencoder is None:
    print("\nüö® No pre-trained model found. Creating demonstration autoencoder...")
    print("‚ö†Ô∏è WARNING: Use the actual trained model in production!\n")
    
    input_dim = 10  # Adjust based on your feature size
    input_layer = keras.layers.Input(shape=(input_dim,))
    encoded = keras.layers.Dense(8, activation="relu")(input_layer)
    decoded = keras.layers.Dense(input_dim, activation="sigmoid")(encoded)
    
    autoencoder = keras.models.Model(input_layer, decoded)
    autoencoder.compile(optimizer="adam", loss="mse")
    
    # Save for consistency
    autoencoder.save("fall_detection_model.h5")
    print("‚úÖ Demonstration autoencoder created and saved.")

# Display model information
print(f"\nModel Architecture:")
print(f"  Input shape: {autoencoder.input_shape}")
print(f"  Output shape: {autoencoder.output_shape}")
print(f"  Total parameters: {autoencoder.count_params():,}")

print(f"\nDetailed Architecture:")
autoencoder.summary()

# Calculate model size
if os.path.exists("fall_detection_model.h5"):
    size_mb = os.path.getsize("fall_detection_model.h5") / (1024 * 1024)
    print(f"\nModel file size: {size_mb:.2f} MB")

---
## Section 3: Convert to TensorFlow Lite (TFLite)

### What is TFLite?
TensorFlow Lite is a lightweight ML framework designed for mobile and embedded devices.

### Conversion Process
1. Takes Keras model
2. Optimizes graph (removes training-only ops)
3. Converts to FlatBuffer format
4. Produces `.tflite` file

### Benefits
- **Smaller size**: Typically 50-75% reduction
- **Faster inference**: Optimized for mobile CPUs
- **Hardware acceleration**: Can use GPU, DSP, or NPU
- **Cross-platform**: Works on Android, iOS, Raspberry Pi, etc.

### Trade-offs
- Limited op support (some layers unsupported)
- Slightly lower accuracy (minor)
- No training capability (inference only)

### Use Cases
- Real-time fall detection on smartphones
- Wearable devices (smartwatches)
- IoT sensors with edge computing

In [None]:
print(f"\n{'='*70}")
print("CONVERTING TO TENSORFLOW LITE")
print(f"{'='*70}\n")

# Initialize TFLite converter
converter = tf.lite.TFLiteConverter.from_keras_model(autoencoder)

# Optional: Enable optimizations (uncomment for further size reduction)
# converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Convert model
print("Converting model...")
tflite_model = converter.convert()

# Save TFLite model
tflite_path = "fall_detection_model.tflite"
with open(tflite_path, "wb") as f:
    f.write(tflite_model)

print(f"‚úÖ Autoencoder successfully converted to TFLite")

# Compare file sizes
h5_size = os.path.getsize("fall_detection_model.h5") / 1024
tflite_size = len(tflite_model) / 1024
reduction = (1 - tflite_size / h5_size) * 100

print(f"\nFile Size Comparison:")
print(f"  Original (.h5): {h5_size:.2f} KB")
print(f"  TFLite (.tflite): {tflite_size:.2f} KB")
print(f"  Size reduction: {reduction:.1f}%")

print(f"\nSaved as: {tflite_path}")

---
## Section 4: Train and Prepare Isolation Forest

### Purpose
Create an Isolation Forest model for ONNX conversion demonstration.

### Production Note
**Replace this synthetic training** with loading your actual pre-trained Isolation Forest:
```python
import joblib
model = joblib.load('isolation_forest.pkl')
```

### Isolation Forest Overview
- **Type**: Unsupervised anomaly detection
- **Method**: Builds random decision trees
- **Logic**: Anomalies are easier to isolate (fewer splits)
- **Speed**: Very fast training and inference

### Parameters
- `contamination=0.05`: Expects 5% of data to be anomalies
- `max_features=10`: Uses 10 features for each tree
- `random_state=42`: Reproducibility

### ONNX Compatibility Fix
The `_max_features` attribute is required for ONNX conversion but sometimes missing. We add it manually.

In [None]:
print(f"\n{'='*70}")
print("PREPARING ISOLATION FOREST FOR ONNX CONVERSION")
print(f"{'='*70}\n")

# Check if pre-trained model exists
if os.path.exists('isolation_forest.pkl'):
    print("‚úÖ Loading pre-trained Isolation Forest...")
    import joblib
    model = joblib.load('isolation_forest.pkl')
    print("‚úÖ Pre-trained model loaded successfully")
else:
    print("‚ö†Ô∏è No pre-trained model found. Training demonstration model...")
    print("‚ö†Ô∏è WARNING: Use actual trained model in production!\n")
    
    # Generate synthetic training data
    # In production: use actual sensor data
    train_data = np.random.rand(100, 10).astype(np.float32)
    
    # Train Isolation Forest
    model = IsolationForest(
        contamination=0.05,    # 5% anomalies expected
        random_state=42,
        max_features=10,       # Feature count
        n_estimators=100,      # Number of trees
        n_jobs=-1              # Use all cores
    )
    
    print(f"Training on {train_data.shape[0]} samples...")
    model.fit(train_data)
    print("‚úÖ Model trained")

# Fix for ONNX conversion compatibility
if not hasattr(model, "_max_features"):
    model._max_features = model.max_features_
    print("‚úÖ Added _max_features attribute for ONNX compatibility")

# Display model information
print(f"\nModel Configuration:")
print(f"  Number of estimators: {model.n_estimators}")
print(f"  Contamination: {model.contamination}")
print(f"  Max features: {model.max_features}")

# Get feature count for later use
num_features = model.max_features if isinstance(model.max_features, int) else 10
print(f"  Feature dimensions: {num_features}")

---
## Section 5: Convert Isolation Forest to ONNX

### What is ONNX?
Open Neural Network Exchange - an open format for representing ML models.

### Why ONNX?
- **Interoperability**: Train in scikit-learn, deploy anywhere
- **Performance**: Optimized runtime for inference
- **Flexibility**: Works across frameworks (PyTorch, TensorFlow, scikit-learn)
- **Hardware support**: CPU, GPU, NPU, TPU

### Conversion Steps
1. Define input schema (shape and type)
2. Convert scikit-learn model to ONNX graph
3. Set target opset version (controls available operations)
4. Serialize to `.onnx` file

### Opset Versions
- **Standard opset (15)**: Core operations
- **ML opset (3)**: ML-specific operations (trees, SVMs, etc.)

### File Format
ONNX uses Protocol Buffers for efficient serialization.

In [None]:
print(f"\n{'='*70}")
print("CONVERTING ISOLATION FOREST TO ONNX")
print(f"{'='*70}\n")

# Define input type for ONNX conversion
# [None, num_features] means variable batch size, fixed feature count
initial_type = [("input", FloatTensorType([None, num_features]))]

print(f"Input specification: FloatTensor[None, {num_features}]")
print("  - None: Variable batch size (can process 1 or more samples)")
print(f"  - {num_features}: Fixed number of features\n")

# Convert model to ONNX format
print("Converting scikit-learn model to ONNX...")
onnx_model = convert_sklearn(
    model, 
    initial_types=initial_type,
    target_opset={
        "": 15,        # Standard ONNX opset version 15
        "ai.onnx.ml": 3  # ML-specific opset version 3
    }
)

# Save ONNX model
onnx_path = "fall_detection_model.onnx"
with open(onnx_path, "wb") as f:
    f.write(onnx_model.SerializeToString())

print(f"‚úÖ IsolationForest successfully converted to ONNX")

# Display ONNX model information
onnx_size = os.path.getsize(onnx_path) / 1024
print(f"\nONNX Model Information:")
print(f"  File size: {onnx_size:.2f} KB")
print(f"  Saved as: {onnx_path}")
print(f"  IR version: {onnx_model.ir_version}")
print(f"  Producer: {onnx_model.producer_name}")

# Verify model
try:
    onnx.checker.check_model(onnx_model)
    print("  ‚úÖ Model validation passed")
except Exception as e:
    print(f"  ‚ö†Ô∏è Model validation warning: {e}")

---
## Section 6: Test TFLite Model Inference

### Purpose
Validate that the converted TFLite model works correctly.

### TFLite Inference Steps
1. **Load interpreter**: Reads the `.tflite` file
2. **Allocate tensors**: Reserves memory for inputs/outputs
3. **Set input**: Provide sample data
4. **Invoke**: Run inference
5. **Get output**: Retrieve predictions

### Tensor Details
- **Input tensor**: Shape, dtype, index
- **Output tensor**: Shape, dtype, index

### Why Test?
- Verify conversion didn't break the model
- Check output shapes match expectations
- Ensure dtype compatibility
- Measure inference latency

### Mobile Integration
On Android/iOS, you'll use similar APIs:
- **Android**: TensorFlow Lite Java/Kotlin API
- **iOS**: TensorFlow Lite Swift API

In [None]:
print(f"\n{'='*70}")
print("TESTING TFLITE MODEL INFERENCE")
print(f"{'='*70}\n")

# Load TFLite model
interpreter = tf.lite.Interpreter(model_path="fall_detection_model.tflite")
interpreter.allocate_tensors()
print("‚úÖ TFLite interpreter loaded and tensors allocated")

# Get input and output tensor details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print(f"\nInput Tensor Details:")
print(f"  Name: {input_details[0]['name']}")
print(f"  Shape: {input_details[0]['shape']}")
print(f"  Data type: {input_details[0]['dtype']}")
print(f"  Index: {input_details[0]['index']}")

print(f"\nOutput Tensor Details:")
print(f"  Name: {output_details[0]['name']}")
print(f"  Shape: {output_details[0]['shape']}")
print(f"  Data type: {output_details[0]['dtype']}")
print(f"  Index: {output_details[0]['index']}")

In [None]:
# Create sample input matching the model's expected shape
input_shape = input_details[0]['shape']
sample_data = np.random.rand(*input_shape).astype(np.float32)

print(f"\nRunning inference...")
print(f"  Input shape: {sample_data.shape}")
print(f"  Input sample (first 5 values): {sample_data[0][:5]}")

# Measure inference time
import time

# Set input tensor
interpreter.set_tensor(input_details[0]['index'], sample_data)

# Run inference with timing
start_time = time.time()
interpreter.invoke()
inference_time = (time.time() - start_time) * 1000  # Convert to ms

# Get output tensor
output = interpreter.get_tensor(output_details[0]['index'])

print(f"\n‚úÖ TFLite Inference Successful!")
print(f"  Output shape: {output.shape}")
print(f"  Output sample (first 5 values): {output[0][:5]}")
print(f"  Inference time: {inference_time:.2f} ms")

# Performance analysis
if inference_time < 10:
    print(f"  ‚úÖ Excellent latency for real-time fall detection")
elif inference_time < 50:
    print(f"  ‚úÖ Good latency for most applications")
else:
    print(f"  ‚ö†Ô∏è High latency - consider quantization or pruning")

---
## Section 7: Test ONNX Model Inference

### Purpose
Validate the ONNX model works correctly with ONNX Runtime.

### ONNX Runtime
- High-performance inference engine
- Cross-platform (Windows, Linux, macOS, Android, iOS)
- Hardware accelerators (CPU, CUDA, TensorRT, DirectML)
- Production-grade with Microsoft backing

### Inference Process
1. Create InferenceSession
2. Get input/output specifications
3. Prepare input data
4. Run inference
5. Process outputs

### Isolation Forest Output
- **Labels**: 1 (normal) or -1 (anomaly)
- **Scores**: Anomaly scores (lower = more anomalous)

### Use Cases
- Web applications (via ONNX.js)
- Server-side inference
- Embedded Linux devices

In [None]:
print(f"\n{'='*70}")
print("TESTING ONNX MODEL INFERENCE")
print(f"{'='*70}\n")

# Load ONNX model
onnx_model_path = "fall_detection_model.onnx"
session = ort.InferenceSession(onnx_model_path)
print("‚úÖ ONNX Runtime session created")

# Get model metadata
print(f"\nModel Metadata:")
print(f"  Producer: {session.get_modelmeta().producer_name}")
print(f"  Version: {session.get_modelmeta().version}")

# Get input specifications
input_info = session.get_inputs()[0]
input_name = input_info.name
input_shape = input_info.shape
input_type = input_info.type

print(f"\nInput Specifications:")
print(f"  Name: {input_name}")
print(f"  Shape: {input_shape}")
print(f"  Type: {input_type}")

# Get output specifications
print(f"\nOutput Specifications:")
for i, output_info in enumerate(session.get_outputs()):
    print(f"  Output {i}:")
    print(f"    Name: {output_info.name}")
    print(f"    Shape: {output_info.shape}")
    print(f"    Type: {output_info.type}")

# Extract feature count from shape
num_features = input_shape[1] if len(input_shape) > 1 and input_shape[1] is not None else 10
print(f"\nDetected feature count: {num_features}")

In [None]:
# Create sample input data
sample_input = np.random.rand(1, num_features).astype(np.float32)

print(f"\nRunning inference...")
print(f"  Input shape: {sample_input.shape}")
print(f"  Input sample (first 5 values): {sample_input[0][:5]}")

# Run inference with timing
start_time = time.time()
output = session.run(None, {input_name: sample_input})
inference_time = (time.time() - start_time) * 1000  # Convert to ms

# Extract and print predictions
# output[0] = labels, output[1] = scores
pred_label = output[0][0] # Get the first label from the batch
pred_score = output[1][0] # Get the first score from the batch

print(f"\n‚úÖ ONNX Inference Successful!")
print(f"  Inference time: {inference_time:.2f} ms")
print(f"  Prediction Label: {pred_label} (1 = Normal, -1 = Anomaly)")
print(f"  Anomaly Score: {pred_score:.4f}")

---
## Section 8: Optimize ONNX with Quantization

### What is Quantization?
Reduces numerical precision of weights and activations from 32-bit floats to 8-bit integers.

### Dynamic Quantization
- **Weights**: Converted to INT8 at conversion time
- **Activations**: Quantized dynamically during inference
- **Benefits**: No calibration data needed

### Impact
- **Model size**: ~75% reduction (4x smaller)
- **Inference speed**: 2-4x faster on mobile CPUs
- **Accuracy**: Typically <1% degradation
- **Memory**: Lower RAM usage

### When to Use
- Deploying to mobile devices
- Limited storage/bandwidth
- Battery-powered devices
- Real-time requirements

### Trade-offs
- Slight accuracy loss (usually negligible)
- Not all operations supported
- May not benefit on GPUs (optimized for FP32)

### Quantization Types
- **QInt8**: 8-bit signed integers (-128 to 127)
- **QUInt8**: 8-bit unsigned integers (0 to 255)
- We use QInt8 for better range

In [None]:
print(f"\n{'='*70}")
print("OPTIMIZING ONNX MODEL WITH QUANTIZATION")
print(f"{'='*70}\n")

# Define paths
optimized_model_path = "fall_detection_model_optimized.onnx"

print("Applying dynamic quantization...")
print("  Weight type: QInt8 (8-bit signed integers)")
print("  This will reduce model size by ~75%\n")

# Apply quantization
try:
    quantize_dynamic(
        model_input=onnx_model_path,
        model_output=optimized_model_path,
        weight_type=QuantType.QInt8,
        op_types_to_quantize=['MatMul', 'Gemm']  # Common operations to quantize
    )
    print("‚úÖ ONNX model successfully quantized")
except Exception as e:
    print(f"‚ö†Ô∏è Quantization warning: {e}")
    print("  Model may not have quantizable operations")

# Compare file sizes
if os.path.exists(optimized_model_path):
    original_size = os.path.getsize(onnx_model_path) / 1024
    optimized_size = os.path.getsize(optimized_model_path) / 1024
    reduction = (1 - optimized_size / original_size) * 100
    
    print(f"\nSize Comparison:")
    print(f"  Original model: {original_size:.2f} KB")
    print(f"  Quantized model: {optimized_size:.2f} KB")
    print(f"  Size reduction: {reduction:.1f}%")
    print(f"  Space saved: {original_size - optimized_size:.2f} KB")
    
    # Estimate benefits
    print(f"\nEstimated Benefits:")
    print(f"  ‚ö° Inference speed: 2-4x faster on mobile CPUs")
    print(f"  üíæ Memory usage: {reduction:.0f}% less RAM")
    print(f"  üîã Battery impact: Reduced due to faster inference")
    print(f"  üì± App size: {original_size - optimized_size:.2f} KB saved per model")
else:
    print("\n‚ö†Ô∏è Optimized model not created, skipping size comparison.")

---
## Section 9: Test Optimized ONNX Model

### Purpose
Verify the quantized model still produces correct results.

### What to Check
1. Model loads successfully
2. Output shapes match original
3. Predictions are reasonable
4. Inference speed improved

### Expected Behavior
- **Accuracy**: Should be very close to original (within 1-2%)
- **Latency**: Should be faster (especially on CPU)
- **Outputs**: May have minor numerical differences due to quantization

### Validation Strategy
Run the same input through both models and compare:
- Output values (should be similar)
- Inference time (quantized should be faster)

In [None]:
print(f"\n{'='*70}")
print("TESTING OPTIMIZED ONNX MODEL")
print(f"{'='*70}\n")

if os.path.exists(optimized_model_path):
    # Load optimized model
    optimized_session = ort.InferenceSession(optimized_model_path)
    print("‚úÖ Optimized model loaded successfully")
    
    # Create sample input
    sample_input = np.random.rand(1, num_features).astype(np.float32)
    
    print(f"\nRunning comparative inference...")
    
    # Original model inference
    start_time = time.time()
    original_output = session.run(None, {input_name: sample_input})
    original_time = (time.time() - start_time) * 1000
    
    # Optimized model inference
    start_time = time.time()
    optimized_output = optimized_session.run(None, {input_name: sample_input})
    optimized_time = (time.time() - start_time) * 1000
    
    print(f"\nPerformance Comparison:")
    print(f"  Original model: {original_time:.2f} ms")
    print(f"  Optimized model: {optimized_time:.2f} ms")
    
    if optimized_time < original_time:
        speedup = original_time / optimized_time
        print(f"  ‚ö° Speedup: {speedup:.2f}x faster")
    else:
        print(f"  ‚ö†Ô∏è No speedup observed (may vary by hardware)")
    
    # Compare outputs
    print(f"\nOutput Comparison:")
    for i, (orig, opt) in enumerate(zip(original_output, optimized_output)):
        if orig.size <= 10:
            print(f"  Output {i} - Original: {orig}")
            print(f"  Output {i} - Optimized: {opt}")
        
        # Calculate difference
        diff = np.abs(orig - opt).mean()
        print(f"  Output {i} - Mean absolute difference: {diff:.6f}")
        
        if diff < 0.01:
            print(f"  ‚úÖ Outputs match closely (quantization impact minimal)")
        else:
            print(f"  ‚ö†Ô∏è Some difference detected (expected with quantization)")
    
    print(f"\n‚úÖ Optimized ONNX inference successful!")
else:
    print("‚ö†Ô∏è Optimized model not found. Skipping comparison.")

---
## Section 10: Prepare Model for Qualcomm AI Hub

### What is Qualcomm AI Hub?
Cloud-based platform for optimizing and deploying ML models on Snapdragon-powered devices.

### Key Features
- **Hardware acceleration**: Leverage Hexagon DSP and Adreno GPU
- **QNN (Qualcomm Neural Network) SDK**: Optimized runtime
- **Device profiling**: Test on real Snapdragon devices
- **Performance benchmarking**: Measure latency, power consumption

### Preparation Steps
1. Register custom loss function
2. Load model with custom objects
3. Recompile model
4. Convert to ONNX format
5. Save for AI Hub submission

### Custom Loss Function
The autoencoder uses MSE loss. We need to register it as serializable for proper model loading.

### ONNX Conversion
Uses `tf2onnx` to convert TensorFlow/Keras models to ONNX format compatible with Qualcomm's tools.

### Requirements
- Active Qualcomm AI Hub account
- API credentials configured
- Internet connection

In [None]:
print(f"\n{'='*70}")
print("PREPARING MODEL FOR QUALCOMM AI HUB")
print(f"{'='*70}\n")

# Step 1: Register custom loss function
@register_keras_serializable()
def mse(y_true, y_pred):
    """
    Custom mean squared error function.
    Must be registered for proper model serialization.
    """
    return tf.keras.losses.mean_squared_error(y_true, y_pred)

print("‚úÖ Custom loss function registered")

# Step 2: Define custom objects for model loading
custom_objects = {
    "mse": mse,
    "MeanSquaredError": MeanSquaredError()
}

print("‚úÖ Custom objects dictionary created")

In [None]:
# Step 3: Load model with custom objects
print(f"\nLoading model for AI Hub conversion...")

qai_hub_model_path = "qai_hub_autoencoder.onnx"

try:
    model = tf.keras.models.load_model(
        "fall_detection_model.h5", 
        custom_objects=custom_objects
    )
    print("‚úÖ Model loaded successfully with custom objects")
    
    # Step 4: Recompile model
    model.compile(loss=mse, optimizer="adam")
    print("‚úÖ Model recompiled with custom loss")
    
    # Display model info
    print(f"\nModel Information:")
    print(f"  Input shape: {model.input_shape}")
    print(f"  Output shape: {model.output_shape}")
    model.summary()
    
    # Step 5: Convert to ONNX using tf2onnx
    print("\nConverting Keras autoencoder to ONNX for AI Hub...")
    
    # Get the input signature
    input_signature = [tf.TensorSpec(shape=model.input.shape, dtype=model.input.dtype, name="input_1")]
    
    # Convert the model
    model_proto, _ = tf2onnx.convert.from_keras(
        model, 
        input_signature=input_signature, 
        opset=13, # Use a widely compatible opset
        output_path=qai_hub_model_path
    )
    
    print(f"‚úÖ Autoencoder converted to ONNX: {qai_hub_model_path}")
    
except Exception as e:
    print(f"‚ö†Ô∏è Failed to load or convert model for AI Hub: {e}")
    print("  Make sure 'fall_detection_model.h5' exists.")

---
## Section 11: Deploy to Qualcomm AI Hub (Demonstration)

### Purpose
Submit the converted ONNX model to the Qualcomm AI Hub for profiling and hardware-specific compilation.

### Why?
The AI Hub will:
1.  **Profile** the model on real Snapdragon devices (e.g., Galaxy S23).
2.  **Measure** exact latency (ms) and power usage.
3.  **Compile** the model using the QNN SDK to run on the Hexagon NPU.
4.  **Provide** an optimized model package for on-device deployment.

### ‚ö†Ô∏è Demonstration Only
This step requires a valid Qualcomm AI Hub account and API key. The code will fail without authentication, but it demonstrates the required process.

### API Key Setup
To run this, you must first set your API key in your environment:
```bash
export QAI_HUB_API_KEY="your_api_key_here"
```

In [None]:
print(f"\n{'='*70}")
print("DEPLOYING TO QUALCOMM AI HUB (DEMONSTRATION)")
print(f"{'='*70}\n")

# Path to the ONNX model we just created
qai_hub_model_path = "qai_hub_autoencoder.onnx"

if os.path.exists(qai_hub_model_path):
    print(f"Found ONNX model for upload: {qai_hub_model_path}")
    
    try:
        # Step 1: Upload the model
        print("\nUploading model to Qualcomm AI Hub...")
        model = hub.upload_model(qai_hub_model_path)
        print("‚úÖ Model uploaded successfully!")

        # Step 2: Define target devices
        # Let's target a high-end Snapdragon-powered phone
        devices = [
            hub.Device("Samsung Galaxy S23 (SM-S911B)")
        ]
        print(f"‚úÖ Target device selected: {devices[0].name}")

        # Step 3: Run profiling job
        print("\nSubmitting profiling job (this may take a few minutes)...")
        profile_job = model.profile(devices=devices)
        profile_results = profile_job.wait()
        print("‚úÖ Profiling complete!")
        
        # Step 4: Display results
        print("\n--- Profiling Results --- ")
        for device_name, metrics in profile_results.items():
            print(f"  Device: {device_name}")
            print(f"  ‚ö° NPU Latency: {metrics['inference_time_ms']:.2f} ms")
            print(f"  üîã Est. Power: {metrics['power_mw']:.2f} mW")
        
        print("\n‚úÖ Qualcomm AI Hub deployment successful!")

    except Exception as e:
        print(f"\n{'!'*20} DEMO MODE {'!'*20}")
        print(f"‚ö†Ô∏è Process failed as expected (no API key or connectivity).")
        print(f"  Error details: {e}")
        print("\nTo run this for real:")
        print("1. Create a Qualcomm AI Hub account")
        print("2. Generate an API key")
        print("3. Set the 'QAI_HUB_API_KEY' environment variable")
        print("4. Re-run this cell")

else:
    print(f"‚ö†Ô∏è Could not find {qai_hub_model_path}. Skipping Qualcomm deployment.")

---
## Section 12: Final Summary and Artifacts

### Pipeline Complete!
We have successfully prepared the fall detection models for multiple deployment targets.

### Generated Artifacts
This pipeline produced the following key files, ready for deployment:

* **Autoencoder (Keras/TF):**
    * `fall_detection_model.h5`: Original trained model.
    * `fall_detection_model.tflite`: ‚úÖ **For Android/iOS**. Lightweight format for mobile inference.
    * `qai_hub_autoencoder.onnx`: ‚úÖ **For Qualcomm NPU**. Submitted to AI Hub for optimization.

* **Isolation Forest (scikit-learn):**
    * `fall_detection_model.onnx`: ‚úÖ **Cross-platform**. For servers, web (ONNX.js), or C++ apps.
    * `fall_detection_model_optimized.onnx`: ‚úÖ **CPU-Optimized**. Quantized (INT8) version for faster CPU inference on any device.

### Next Steps
1.  Integrate `fall_detection_model.tflite` into the Android/iOS application.
2.  Use the `fall_detection_model_optimized.onnx` file in your web backend or desktop application.
3.  Download the optimized package from the Qualcomm AI Hub for the NPU-accelerated version.