# GPU-Accelerated Energy Tensor Computation

This notebook demonstrates how to use GPU acceleration with WarpFactory to significantly speed up stress-energy tensor calculations. GPU computation is particularly beneficial for:

- Large grid sizes (>100x100x100)
- Batch processing of multiple metrics
- High-resolution simulations
- Real-time visualization and analysis

WarpFactory uses [CuPy](https://cupy.dev/) for GPU acceleration, which provides a NumPy-compatible interface that runs on NVIDIA CUDA GPUs.

## GPU Availability Check

First, let's check if GPU acceleration is available on this system. The code will gracefully fall back to CPU computation if CuPy is not installed or if no GPU is available.

In [None]:
# Check for GPU availability
try:
    import cupy as cp
    print("CuPy version:", cp.__version__)
    print("CUDA available:", cp.cuda.is_available())
    if cp.cuda.is_available():
        print("GPU device:", cp.cuda.Device().name)
        print("GPU memory:", cp.cuda.Device().mem_info[1] / 1e9, "GB")
    GPU_AVAILABLE = cp.cuda.is_available()
except ImportError:
    print("CuPy is not installed. GPU acceleration will not be available.")
    print("To install CuPy, visit: https://docs.cupy.dev/en/stable/install.html")
    GPU_AVAILABLE = False

In [None]:
# Import WarpFactory modules
import numpy as np
import matplotlib.pyplot as plt
import time
from warpfactory.metrics.alcubierre import get_alcubierre_metric
from warpfactory.solver.energy import get_energy_tensor

## Method 1: Using the try_gpu Parameter

The simplest way to use GPU acceleration is with the `try_gpu=True` parameter in `get_energy_tensor()`. This automatically:
1. Transfers the metric to GPU memory
2. Performs all computations on the GPU
3. Transfers results back to CPU memory

If CuPy is not available, it automatically falls back to CPU computation.

In [None]:
# Create a moderate-size Alcubierre metric for testing
grid_size = [1, 50, 50, 50]
world_center = [(grid_size[i] + 1) / 2 for i in range(4)]

print("Creating Alcubierre metric...")
metric = get_alcubierre_metric(
    grid_size=grid_size,
    world_center=world_center,
    velocity=0.9,
    radius=10,
    sigma=0.5
)

print(f"Metric created: {metric.name}")
print(f"Grid shape: {metric.shape}")
print(f"Total grid points: {np.prod(metric.shape):,}")

In [None]:
# CPU computation
print("Computing energy tensor on CPU...")
start_cpu = time.time()
energy_cpu = get_energy_tensor(metric, try_gpu=False)
time_cpu = time.time() - start_cpu
print(f"CPU time: {time_cpu:.3f} seconds")

In [None]:
# GPU computation (if available)
if GPU_AVAILABLE:
    print("Computing energy tensor on GPU...")
    start_gpu = time.time()
    energy_gpu = get_energy_tensor(metric, try_gpu=True)
    time_gpu = time.time() - start_gpu
    print(f"GPU time: {time_gpu:.3f} seconds")
    print(f"Speedup: {time_cpu/time_gpu:.2f}x")
    
    # Verify results match
    max_diff = np.max(np.abs(energy_cpu.tensor[(0,0)] - energy_gpu.tensor[(0,0)]))
    print(f"Maximum difference between CPU and GPU results: {max_diff:.2e}")
else:
    print("\nGPU not available - skipping GPU computation")
    print("The try_gpu=True parameter will automatically fall back to CPU")
    energy_safe = get_energy_tensor(metric, try_gpu=True)
    print("Computation completed successfully on CPU (fallback)")

## Method 2: Manual GPU Transfer with Tensor Methods

For more control, you can manually transfer tensors to/from GPU memory using:
- `.to_gpu()`: Transfer tensor from CPU to GPU
- `.to_cpu()`: Transfer tensor from GPU to CPU
- `.is_gpu()`: Check if tensor is on GPU

This is useful when:
- You want to keep data on GPU for multiple operations
- You need to minimize data transfer overhead
- You're working with custom analysis pipelines

In [None]:
# Demonstrate tensor GPU methods
print("Metric is on GPU:", metric.is_gpu())
print("Energy tensor is on GPU:", energy_cpu.is_gpu())

if GPU_AVAILABLE:
    # Transfer metric to GPU
    print("\nTransferring metric to GPU...")
    metric_gpu = metric.to_gpu()
    print("Metric is now on GPU:", metric_gpu.is_gpu())
    
    # Check memory usage
    import cupy as cp
    mempool = cp.get_default_memory_pool()
    print(f"GPU memory used: {mempool.used_bytes() / 1e6:.1f} MB")
    
    # Transfer back to CPU
    print("\nTransferring back to CPU...")
    metric_cpu_again = metric_gpu.to_cpu()
    print("Metric is on GPU:", metric_cpu_again.is_gpu())
    
    # Free GPU memory
    del metric_gpu
    mempool.free_all_blocks()
    print(f"GPU memory freed: {mempool.used_bytes() / 1e6:.1f} MB")
else:
    print("\nGPU methods work safely without CuPy:")
    print("- to_gpu() will raise ImportError")
    print("- to_cpu() returns a copy if not on GPU")
    print("- is_gpu() returns False")
    
    # This is safe - returns a copy
    metric_copy = metric.to_cpu()
    print("\nto_cpu() created a safe copy on CPU")

## Performance Comparison: CPU vs GPU

Let's compare performance across different grid sizes to see where GPU acceleration provides the most benefit.

In [None]:
# Performance comparison across different grid sizes
if GPU_AVAILABLE:
    grid_sizes = [
        [1, 20, 20, 20],
        [1, 40, 40, 40],
        [1, 60, 60, 60],
    ]
    
    cpu_times = []
    gpu_times = []
    grid_points = []
    
    for grid_size in grid_sizes:
        print(f"\nTesting grid size: {grid_size}")
        world_center = [(grid_size[i] + 1) / 2 for i in range(4)]
        
        # Create metric
        test_metric = get_alcubierre_metric(
            grid_size=grid_size,
            world_center=world_center,
            velocity=0.9,
            radius=5,
            sigma=0.5
        )
        
        points = np.prod(grid_size)
        grid_points.append(points)
        
        # CPU timing
        start = time.time()
        _ = get_energy_tensor(test_metric, try_gpu=False)
        cpu_time = time.time() - start
        cpu_times.append(cpu_time)
        print(f"  CPU: {cpu_time:.3f}s")
        
        # GPU timing
        start = time.time()
        _ = get_energy_tensor(test_metric, try_gpu=True)
        gpu_time = time.time() - start
        gpu_times.append(gpu_time)
        print(f"  GPU: {gpu_time:.3f}s (speedup: {cpu_time/gpu_time:.2f}x)")
        
        # Clean up
        del test_metric
        import cupy as cp
        cp.get_default_memory_pool().free_all_blocks()
    
    # Plot results
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Computation time
    ax1.plot(grid_points, cpu_times, 'o-', label='CPU', linewidth=2)
    ax1.plot(grid_points, gpu_times, 's-', label='GPU', linewidth=2)
    ax1.set_xlabel('Total Grid Points')
    ax1.set_ylabel('Computation Time (seconds)')
    ax1.set_title('CPU vs GPU Performance')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    ax1.set_xscale('log')
    ax1.set_yscale('log')
    
    # Speedup factor
    speedups = [cpu_times[i]/gpu_times[i] for i in range(len(cpu_times))]
    ax2.plot(grid_points, speedups, 'o-', linewidth=2, color='green')
    ax2.axhline(y=1, color='r', linestyle='--', alpha=0.5, label='Break-even')
    ax2.set_xlabel('Total Grid Points')
    ax2.set_ylabel('Speedup Factor (CPU time / GPU time)')
    ax2.set_title('GPU Speedup Factor')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    ax2.set_xscale('log')
    
    plt.tight_layout()
    plt.show()
    
    print(f"\nAverage speedup: {np.mean(speedups):.2f}x")
else:
    print("GPU not available - skipping performance comparison")
    print("\nTypical GPU speedups on NVIDIA GPUs:")
    print("  Small grids (20^3):  1-2x")
    print("  Medium grids (50^3): 3-5x")
    print("  Large grids (100^3): 5-10x")

## Memory Benefits of GPU Computation

GPU computation can also help with memory management:
- GPU memory is separate from system RAM
- Large grids can be processed without exhausting system memory
- Multiple metrics can be batched on GPU

In [None]:
if GPU_AVAILABLE:
    import cupy as cp
    
    # Create a moderately large metric
    grid_size = [1, 50, 50, 50]
    world_center = [(grid_size[i] + 1) / 2 for i in range(4)]
    
    print("Creating metric...")
    metric = get_alcubierre_metric(
        grid_size=grid_size,
        world_center=world_center,
        velocity=0.9,
        radius=10,
        sigma=0.5
    )
    
    # Calculate memory usage
    bytes_per_component = metric.tensor[(0,0)].nbytes
    total_metric_memory = bytes_per_component * 16  # 4x4 tensor
    
    print(f"\nCPU Memory Usage:")
    print(f"  Per component: {bytes_per_component / 1e6:.2f} MB")
    print(f"  Full metric (16 components): {total_metric_memory / 1e6:.2f} MB")
    
    # Transfer to GPU and monitor memory
    mempool = cp.get_default_memory_pool()
    mempool.free_all_blocks()
    
    print(f"\nGPU Memory Usage:")
    print(f"  Before transfer: {mempool.used_bytes() / 1e6:.2f} MB")
    
    metric_gpu = metric.to_gpu()
    print(f"  After metric transfer: {mempool.used_bytes() / 1e6:.2f} MB")
    
    # Compute energy tensor on GPU
    energy_gpu = get_energy_tensor(metric, try_gpu=True)
    print(f"  After energy computation: {mempool.used_bytes() / 1e6:.2f} MB")
    
    # Clean up
    del metric_gpu
    mempool.free_all_blocks()
    print(f"  After cleanup: {mempool.used_bytes() / 1e6:.2f} MB")
    
    print("\nNote: GPU computation keeps intermediate results in GPU memory,")
    print("reducing CPU memory pressure for large-scale simulations.")
else:
    print("GPU not available - memory management example")
    print("\nWith GPU acceleration, memory usage is distributed:")
    print("  - Input metric and parameters stay in system RAM")
    print("  - Intermediate calculations happen in GPU memory")
    print("  - Final results are transferred back to system RAM")
    print("\nThis allows processing grids larger than available system RAM.")

## Visualizing Results

Let's visualize the energy tensor computed on GPU (or CPU if GPU not available).

In [None]:
# Create a clean metric for visualization
grid_size = [1, 40, 40, 40]
world_center = [(grid_size[i] + 1) / 2 for i in range(4)]

metric = get_alcubierre_metric(
    grid_size=grid_size,
    world_center=world_center,
    velocity=0.9,
    radius=10,
    sigma=0.5
)

# Compute with GPU if available
energy = get_energy_tensor(metric, try_gpu=GPU_AVAILABLE)

computation_type = "GPU" if GPU_AVAILABLE else "CPU"
print(f"Energy tensor computed on {computation_type}")

In [None]:
# Plot key energy tensor components
t_slice = 0
z_slice = int(world_center[3])

fig, axes = plt.subplots(2, 2, figsize=(12, 12))
computation_type = "GPU" if GPU_AVAILABLE else "CPU"
fig.suptitle(f'Energy Tensor Components ({computation_type} Computation)', fontsize=16)

# Energy density T^00
ax = axes[0, 0]
data = energy.tensor[(0, 0)][t_slice, :, :, z_slice]
im = ax.imshow(data.T, origin='lower', cmap='RdBu_r', aspect='auto')
ax.set_title(r'Energy Density $T^{00}$', fontsize=14)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.colorbar(im, ax=ax)

# Momentum density T^01
ax = axes[0, 1]
data = energy.tensor[(0, 1)][t_slice, :, :, z_slice]
im = ax.imshow(data.T, origin='lower', cmap='RdBu_r', aspect='auto')
ax.set_title(r'x-Momentum Density $T^{01}$', fontsize=14)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.colorbar(im, ax=ax)

# Stress component T^11
ax = axes[1, 0]
data = energy.tensor[(1, 1)][t_slice, :, :, z_slice]
im = ax.imshow(data.T, origin='lower', cmap='RdBu_r', aspect='auto')
ax.set_title(r'xx-Stress $T^{11}$', fontsize=14)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.colorbar(im, ax=ax)

# Stress component T^22
ax = axes[1, 1]
data = energy.tensor[(2, 2)][t_slice, :, :, z_slice]
im = ax.imshow(data.T, origin='lower', cmap='RdBu_r', aspect='auto')
ax.set_title(r'yy-Stress $T^{22}$', fontsize=14)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.colorbar(im, ax=ax)

plt.tight_layout()
plt.show()

In [None]:
# Energy density statistics and cross-section
energy_density = energy.tensor[(0, 0)][t_slice, :, :, z_slice]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# 2D energy density
ax = axes[0]
im = ax.imshow(energy_density.T, origin='lower', cmap='RdBu_r', aspect='auto')
ax.set_title(r'Energy Density $T^{00}$', fontsize=14)
ax.set_xlabel('x index')
ax.set_ylabel('y index')
plt.colorbar(im, ax=ax, label='Energy Density')

# Cross-section through center
ax = axes[1]
y_center = int(world_center[2])
ax.plot(energy_density[:, y_center], linewidth=2)
ax.set_title('Energy Density Cross-Section at y=center', fontsize=14)
ax.set_xlabel('x index')
ax.set_ylabel(r'$T^{00}$')
ax.grid(True, alpha=0.3)
ax.axhline(y=0, color='k', linestyle='-', alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nEnergy Density Statistics:")
print(f"  Minimum: {np.min(energy_density):.6e}")
print(f"  Maximum: {np.max(energy_density):.6e}")
print(f"  Mean:    {np.mean(energy_density):.6e}")
print(f"  Std Dev: {np.std(energy_density):.6e}")
print(f"\nNote: Negative energy density indicates exotic matter requirement.")

## Best Practices for GPU Computation

### When to Use GPU Acceleration:

1. **Large grids** (>50x50x50 points): GPU overhead is amortized
2. **Repeated computations**: Keep data on GPU between operations
3. **Batch processing**: Process multiple metrics sequentially on GPU
4. **Memory constraints**: Offload computation from system RAM to GPU memory

### When to Use CPU:

1. **Small grids** (<20x20x20 points): CPU is faster due to GPU transfer overhead
2. **Single computations**: If you only need one result
3. **No GPU available**: Code works seamlessly on CPU

### Memory Management Tips:

```python
# Good: Use try_gpu for automatic management
energy = get_energy_tensor(metric, try_gpu=True)

# Good: Manual control for batch processing
metric_gpu = metric.to_gpu()
# ... perform multiple operations ...
result_cpu = result_gpu.to_cpu()

# Good: Clean up GPU memory when done
import cupy as cp
cp.get_default_memory_pool().free_all_blocks()
```

## Summary

In this notebook, we demonstrated:

1. **GPU Availability Check**: How to check if CuPy and CUDA are available

2. **Simple GPU Usage**: Using `try_gpu=True` for automatic GPU acceleration with CPU fallback

3. **Manual GPU Control**: Using `.to_gpu()`, `.to_cpu()`, and `.is_gpu()` methods for fine-grained control

4. **Performance Comparison**: Measuring CPU vs GPU computation times across different grid sizes

5. **Memory Management**: Understanding GPU memory usage and benefits for large-scale computations

6. **Best Practices**: Guidelines for when to use GPU acceleration and how to manage memory efficiently

GPU acceleration can provide significant speedups (3-10x) for large-scale warp drive simulations, enabling exploration of higher-resolution grids and more complex metrics that would be impractical on CPU alone.

### Installation Notes:

To use GPU acceleration, install CuPy:

```bash
# For CUDA 11.x
pip install cupy-cuda11x

# For CUDA 12.x
pip install cupy-cuda12x

# See https://docs.cupy.dev/en/stable/install.html for details
```