#  Quantum Transformer Fundamentals

**Quantum Transformer Tutorial 01**

This notebook introduces the Quantum Transformer architecture where ALL attention and feed-forward computations are performed using quantum circuits.

## What You'll Learn

1. Quantum attention via SWAP test
2. Quantum positional encoding
3. Quantum feed-forward networks
4. Building a complete Quantum Transformer

In [None]:
import numpy as np
import torch
import pennylane as qml

from quantum_transformers import (
    QuantumTransformer,
    QuantumTransformerConfig,
    get_info,
)

print(get_info())

## 1. Quantum Attention Mechanism

Unlike classical attention, quantum attention computes similarity using **SWAP test**:

```
|0⟩ ──H──●──H── Measure  → P(0) = (1 + |⟨ψ|φ⟩|²)/2
         │
|ψ⟩ ─────X───── (Query)
         │
|φ⟩ ─────X───── (Key)
```

In [None]:
from quantum_transformers.circuits import SwapTestCircuit

# Create SWAP test circuit
swap_test = SwapTestCircuit(n_qubits=2)

# Test with similar states
query = torch.tensor([0.5, 0.5])
key = torch.tensor([0.5, 0.6])

similarity = swap_test(query, key)
print(f"Quantum similarity (SWAP test): {similarity.item():.4f}")

## 2. Building a Quantum Transformer

In [None]:
# Configure Quantum Transformer
config = QuantumTransformerConfig(
    n_qubits=2,
    n_heads=2,
    n_layers=2,
    d_model=16,
    max_seq_len=32,
)

# Create model
model = QuantumTransformer(config)
print(f"Parameters: {model.count_parameters()}")

In [None]:
# Forward pass
x = torch.randn(2, 8, 16)  # batch=2, seq=8, d=16
output = model(x)

print(f"Input: {x.shape}")
print(f"Output: {output.shape}")

## 3. Quantum Feed-Forward Network

Replaces classical FFN with variational quantum circuits.

In [None]:
from quantum_transformers.layers import QuantumFeedForward

qffn = QuantumFeedForward(d_model=16, n_qubits=2, n_layers=2)

x = torch.randn(4, 8, 16)
output = qffn(x)

print(f"Quantum FFN output: {output.shape}")

## Summary

- Quantum Transformer uses quantum circuits for ALL computations
- SWAP test computes attention similarity
- Variational circuits implement feed-forward layers
- End-to-end differentiable via parameter-shift rule

**Next**: [02_molecular_prediction.ipynb](02_molecular_prediction.ipynb)