Integrate trueno-zram for GPU-accelerated model weight loading

## Summary

Integrate `trueno-zram` GPU compression for 10x faster model loading and 3x larger model support.

## Problem

```
Current model loading:
Disk (SSD) → CPU RAM → GPU VRAM
     500 MB/s    16 GB/s

Llama-70B (140GB): 280 seconds to load 😢
```

## Solution

```
With trueno-zram:
Disk → ZRAM (compressed) → GPU decompress → VRAM
   2 GB/s      60 GB/s (GPU)

Llama-70B: 27 seconds to load 🚀
```

## Benefits

| Benefit | Impact |
|---------|--------|
| **10x faster loading** | Cold start 280s → 27s |
| **3x larger models** | Fit 405B in 256GB RAM |
| **Zero-copy GPU** | Decompress → VRAM direct |
| **Lower memory pressure** | More RAM for KV cache |

## Implementation

```rust
// Model loading with GPU ZRAM
let model = Model::load("llama-70b.safetensors")
    .with_compression(trueno_zram::GpuBackend::auto())
    .to_device(Device::Cuda(0))?;
```

### API Changes

```rust
pub struct ModelLoader {
    compression: Option<trueno_zram::GpuBackend>,
}

impl ModelLoader {
    pub fn with_compression(mut self, backend: GpuBackend) -> Self;
    pub fn preload_compressed(&self, path: &Path) -> Result<CompressedModel>;
}
```

## Related

- trueno-zram#1: GPU batch compression
- trueno-zram#2: WGPU cross-platform backend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate trueno-zram for GPU-accelerated model weight loading #145

Summary

Problem

Solution

Benefits

Implementation

API Changes

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Benefit	Impact
10x faster loading	Cold start 280s → 27s
3x larger models	Fit 405B in 256GB RAM
Zero-copy GPU	Decompress → VRAM direct
Lower memory pressure	More RAM for KV cache

Integrate trueno-zram for GPU-accelerated model weight loading #145

Description

Summary

Problem

Solution

Benefits

Implementation

API Changes

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions