# SAP HANA AI Toolkit with NVIDIA Optimization

This notebook provides setup instructions for running the SAP HANA AI Toolkit with NVIDIA GPU optimizations in VM Mode.

## Why VM Mode is Required

This project requires VM Mode for the following reasons:

1. **Private Container Registry**: Uses NVIDIA NGC private registry requiring authentication
2. **Direct GPU Access**: Requires direct GPU access for TensorRT optimization
3. **External Authentication**: Connects to SAP HANA Cloud services requiring credentials
4. **System Access**: Needs full system access for container orchestration

## Prerequisites

- NVIDIA GPU (A100 or H100 recommended)
- Docker with NVIDIA Container Toolkit
- NGC account and API key
- SAP HANA Cloud instance (optional for full functionality)

## 1. NVIDIA NGC Authentication

### 1.1 Create NGC Account

If you don't already have an NGC account:

1. Go to [NGC website](https://ngc.nvidia.com/) and sign up
2. Verify your email address and complete registration

### 1.2 Generate NGC API Key

1. Log in to your NGC account
2. Navigate to your account settings (click your name → "Setup")
3. Click "Get API Key"
4. Generate a new API key and save it securely

### 1.3 Log in to NGC Registry

In [None]:
# Run this cell to log in to NGC
!echo "Please enter your NGC API key when prompted"
!docker login nvcr.io
# Username: $oauthtoken
# Password: <your NGC API key>

### 1.4 Set NGC API Key as Environment Variable

In [None]:
# Set NGC API key (replace with your key)
import os
os.environ["NGC_API_KEY"] = "<your NGC API key>"

# Verify NGC authentication
!curl -s -o /dev/null -w "%{http_code}" -H "Authorization: ${NGC_API_KEY}" https://api.ngc.nvidia.com/v2/org/nvidia

## 2. Verify GPU Environment

Verify that GPUs are available and properly configured:

In [None]:
# Check NVIDIA driver and GPU availability
!nvidia-smi

In [None]:
# Verify NVIDIA Container Toolkit is installed and working
!docker run --rm --gpus all nvcr.io/nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

## 3. Pull NGC Container

Pull the pre-optimized container from NGC:

In [None]:
# Pull the SAP HANA AI Toolkit container from NGC
!docker pull nvcr.io/ea-sap/hana-ai-toolkit:latest

## 4. SAP HANA Cloud Connection Setup (Optional)

For connecting to SAP HANA Cloud, you'll need to provide connection credentials. These can be set as environment variables:

In [None]:
# Set SAP HANA connection environment variables
import os

# Option 1: Direct credentials
os.environ["HANA_HOST"] = "<your HANA host>"
os.environ["HANA_PORT"] = "443"  # Default port
os.environ["HANA_USER"] = "<your HANA username>"
os.environ["HANA_PASSWORD"] = "<your HANA password>"

# Option 2: Using HANA user key (if available)
# os.environ["HANA_USERKEY"] = "<your HANA user key>"

## 5. Configure Environment Variables

Create a configuration file with environment variables for container deployment:

In [None]:
# Create .env file for container configuration
%%writefile .env
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
LOG_LEVEL=INFO
LOG_FORMAT=json
AUTH_REQUIRED=true
API_KEYS=dev-key-only-for-testing  # Replace with secure key in production

# NVIDIA GPU Configuration
ENABLE_GPU_ACCELERATION=true
NVIDIA_VISIBLE_DEVICES=all
CUDA_MEMORY_FRACTION=0.8
MULTI_GPU_STRATEGY=auto

# TensorRT Optimization
ENABLE_TENSORRT=true
TENSORRT_PRECISION=fp16
TENSORRT_MAX_BATCH_SIZE=32
TENSORRT_WORKSPACE_SIZE_MB=1024
TENSORRT_CACHE_DIR=/tmp/tensorrt_engines
TENSORRT_BUILDER_OPTIMIZATION_LEVEL=3

# Hopper Optimizations (if using H100 GPU)
HOPPER_ENABLE_FLASH_ATTENTION=true
HOPPER_ENABLE_FP8=true
HOPPER_ENABLE_TRANSFORMER_ENGINE=true
HOPPER_ENABLE_FSDP=true

## 6. Run the Container with GPU Acceleration

Launch the container with GPU support and all required configurations:

In [None]:
# Run the container with GPU support
!docker run -d \
  --name hana-ai-toolkit \
  --gpus all \
  -p 8000:8000 \
  -p 9090:9090 \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --env-file .env \
  nvcr.io/ea-sap/hana-ai-toolkit:latest

## 7. Verify Deployment

Check if the API is running and GPU acceleration is enabled:

In [None]:
# Check container logs
!docker logs hana-ai-toolkit

In [None]:
# Check if API is running
!curl -s http://localhost:8000/

In [None]:
# Check metrics for GPU usage
!curl -s http://localhost:9090/metrics | grep -i gpu

## 8. Run TensorRT Optimization Benchmark

Test the TensorRT optimization with a simple benchmark:

In [None]:
# Install Python requests if not already installed
!pip install -q requests

In [None]:
import requests
import json
import time

# API endpoint
url = "http://localhost:8000/api/v1/llm"

# API key from environment variable
headers = {
    "Authorization": "Bearer dev-key-only-for-testing",  # Replace with your API key
    "Content-Type": "application/json"
}

# Request payload
payload = {
    "model": "sap-ai-core-llama3",
    "prompt": "Explain the benefits of GPU acceleration for deep learning inference.",
    "max_tokens": 200
}

# Benchmark function
def run_benchmark(num_iterations=5):
    total_time = 0
    results = []
    
    for i in range(num_iterations):
        start_time = time.time()
        response = requests.post(url, headers=headers, json=payload)
        end_time = time.time()
        
        if response.status_code == 200:
            response_data = response.json()
            latency = end_time - start_time
            total_time += latency
            results.append({
                "iteration": i + 1,
                "latency": latency,
                "status": "success"
            })
            print(f"Iteration {i+1}: {latency:.3f} seconds")
        else:
            print(f"Iteration {i+1}: Failed with status code {response.status_code}")
            print(response.text)
            results.append({
                "iteration": i + 1,
                "status": "failed",
                "error": response.text
            })
    
    if total_time > 0:
        avg_latency = total_time / num_iterations
        print(f"\nAverage latency: {avg_latency:.3f} seconds")
        print(f"Throughput: {num_iterations / total_time:.2f} requests/second")
    
    return results

# Run the benchmark
benchmark_results = run_benchmark()

## 9. Compare TensorRT vs. Standard Performance

Let's compare performance with and without TensorRT optimization:

In [None]:
# Stop the current container
!docker stop hana-ai-toolkit
!docker rm hana-ai-toolkit

In [None]:
# Modify .env file to disable TensorRT
%%writefile .env
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
LOG_LEVEL=INFO
LOG_FORMAT=json
AUTH_REQUIRED=true
API_KEYS=dev-key-only-for-testing

# NVIDIA GPU Configuration
ENABLE_GPU_ACCELERATION=true
NVIDIA_VISIBLE_DEVICES=all
CUDA_MEMORY_FRACTION=0.8
MULTI_GPU_STRATEGY=auto

# Disable TensorRT Optimization
ENABLE_TENSORRT=false

In [None]:
# Run the container without TensorRT
!docker run -d \
  --name hana-ai-toolkit \
  --gpus all \
  -p 8000:8000 \
  -p 9090:9090 \
  --env-file .env \
  nvcr.io/ea-sap/hana-ai-toolkit:latest

# Wait for container to start
import time
time.sleep(10)

In [None]:
# Run benchmark without TensorRT
print("Running benchmark without TensorRT optimization:")
benchmark_results_no_tensorrt = run_benchmark()

## 10. Cleanup

In [None]:
# Stop and remove container
!docker stop hana-ai-toolkit
!docker rm hana-ai-toolkit

## Conclusion

This notebook demonstrated how to:

1. Authenticate with NVIDIA NGC
2. Verify GPU environment and tools
3. Pull and run the SAP HANA AI Toolkit container with GPU acceleration
4. Configure TensorRT optimization
5. Benchmark performance with and without TensorRT

The results show significant performance improvements with TensorRT optimization, especially for inference workloads. The NGC container provides a pre-optimized environment with all the necessary NVIDIA optimizations for maximum performance.

### Additional Resources

- [Full Authentication Guide](https://github.com/finsightsap/generative-ai-toolkit-for-sap-hana-cloud/blob/main/AUTHENTICATION.md)
- [NGC Deployment Guide](https://github.com/finsightsap/generative-ai-toolkit-for-sap-hana-cloud/blob/main/NGC_DEPLOYMENT.md)
- [TensorRT Optimization Documentation](https://github.com/finsightsap/generative-ai-toolkit-for-sap-hana-cloud/blob/main/TENSORRT_OPTIMIZATION.md)
- [NVIDIA GPU Optimization Guide](https://github.com/finsightsap/generative-ai-toolkit-for-sap-hana-cloud/blob/main/NVIDIA.md)