# ðŸ”¬ The Shrink Ray: Tensor Networks & MPOs

**Objective:** Understand how **Matrix Product Operators (MPOs)** allow us to compress neural networks by 50% or more.

## The Theory
A standard linear layer in an LLM is a dense matrix $W$ of size $N \times N$. 
If $N=4096$, that's **16 Million** parameters for just one layer.

**Multiverse Computing** uses a technique from Quantum Physics called the **Matrix Product Operator (MPO)**. 
1. We **reshape** the matrix into a high-dimensional tensor.
2. We **decompose** it into a chain of small tensors.
3. We **compress** the connections (Bond Dimension $\chi$) to remove noise.

This notebook uses the helper library `tensor_lib.py` to demonstrate this structure.

In [None]:
import torch
import numpy as np
from tensor_lib import TensorNetworkLayer # Our custom library

# Define a standard layer size (e.g., 1024 input -> 1024 output)
input_dim = 1024
output_dim = 1024
original_params = input_dim * output_dim

print(f"Standard Linear Layer Parameters: {original_params:,}")

## 1. Reshaping into Tensors

The first step is to stop thinking of `1024` as a single number. We factor it into smaller dimensions.
$$ 1024 = 4 \times 4 \times 4 \times 4 \times 4 $$

This turns our 2D Matrix into a 10D Tensor (5 Input legs, 5 Output legs).

In [None]:
tensor_shape = [4, 4, 4, 4, 4]
print(f"Reshaping logic: {input_dim} -> {tensor_shape}")

## 2. Compressing with Bond Dimension ($\chi$)

The **Bond Dimension** controls the compression. It represents how much "information" flows from one tensor in the chain to the next.

* $\chi = 10$: High compression (Used in R1 Slim)
* $\chi = 100$: Low compression

Let's create an MPO layer with $\chi=10$ and see the parameter savings.

In [None]:
# Create the Quantum-Inspired Layer
mpo_layer = TensorNetworkLayer(input_dim, output_dim, tensor_shape, bond_dim=10)

compressed_params = mpo_layer.count_parameters()

print(f"Original Params:   {original_params:,}")
print(f"Compressed Params: {compressed_params:,}")
print(f"Reduction Factor:  {original_params / compressed_params:.1f}x smaller")
print(f"Space Savings:     {100 * (1 - compressed_params/original_params):.1f}%")

## 3. The Forward Pass

Even though it is compressed, this object still functions as a Linear Layer. It takes an input vector and produces an output vector. 

In the **DeepSeek** usage, they would perform the SVD surgery (from Notebook 1) *during* the conversion to this format, effectively filtering out the censorship noise while building the compressed chain.

In [None]:
# Create a random input
x = torch.randn(1, input_dim)

# Process it through the tensor network
y = mpo_layer.forward(x)

print(f"Input Shape:  {x.shape}")
print(f"Output Shape: {y.shape}")
print("Success! The MPO processed the data.")