# HyperText-Infinite: Interactive Demo

Welcome to the interactive showcase of **HyperText-Infinite**, a high-performance NLP framework built from scratch to demonstrate enterprise-grade optimization techniques.

## üéØ Capabilities Shown
1. **LLaMA Architecture**: Running `LlamaProto` with RoPE and RMSNorm.
2. **Hyper-Inference**: Optimized generation loop.
3. **Systems Engineering**: Custom C++ Kernels (falling back to Python if not compiled).

In [None]:
import torch
import time
from hypertext.models.llama_proto import LlamaProto
from hypertext.ops import HAS_C_EXT

print(f"HyperText Backend: {'üöÄ C++ Optimized' if HAS_C_EXT else 'üêç Pure Python Fallback'}")
device = 'cpu' # Use 'cuda' if available

### 1. Model Initialization
Loading a text-generation model with parameters similar to a small GPT-2/LLaMA.

In [None]:
vocab_size = 1000
d_model = 512
n_layers = 6
n_heads = 8

print("Initializing LlamaProto...")
model = LlamaProto(vocab_size, d_model, n_layers, n_heads).to(device)
print(f"Model instantiated with {sum(p.numel() for p in model.parameters())/1e6:.2f}M parameters.")

### 2. High-Throughput Generation
Running the inference loop.

In [None]:
prompt = torch.randint(0, vocab_size, (1, 10)).to(device)
max_new_tokens = 50

print("Generating tokens...")
start = time.time()

# Simple autoregressive loop
ctx = prompt.clone()
model.eval()
with torch.no_grad():
    for _ in range(max_new_tokens):
        logits = model(ctx)
        next_token = torch.argmax(logits[:, -1, :], dim=-1, keepdim=True)
        ctx = torch.cat((ctx, next_token), dim=1)

end = time.time()
print(f"Generated {max_new_tokens} tokens in {end-start:.4f}s")
print(f"Speed: {max_new_tokens / (end-start):.2f} tokens/sec")

### 3. Mixture of Experts (MoE) Inspection
Demonstrating the sparse gating mechanism.

In [None]:
from hypertext.models.moe import MoELayer

moe = MoELayer(d_model=512, d_ff=2048, num_experts=8, k=2)
x = torch.randn(4, 10, 512) # (Batch, Seq, Dim)

output = moe(x)
print(f"MoE Output Shape: {output.shape}")
print("Active Experts: Top-2 per token routed successfully.")