Skip to content

massimolauri/LatentBridge

Repository files navigation

LatentBridge: Telepathic Multi-Agent Communication

LatentBridge Cover

Disclaimer: This repository is an experimental proof-of-concept created exclusively for personal study and research. It is not intended for production use. The results and capabilities discussed here are hypothetical, and the system has not been rigorously evaluated against standard benchmarks. The code is shared "as is" to explore alternative architectural ideas in neural networks.


Introduction: Multi-Agent Systems (MAS)

In modern AI, a Multi-Agent System (MAS) involves multiple LLM instances (agents) working together to solve complex problems. Typically, agents communicate by generating text and reading each other's outputs. For example, an "Analyzer Agent" writes a long chain-of-thought, and a "Speaker Agent" reads that text to produce a final concise answer. While effective, this text-based communication is slow, consumes large amounts of the context window, and forces models to externalize every single thought into tokens.

LatentBridge proposes a radical alternative: what if agents could communicate telepathically?

LatentBridge is a lightweight, standalone PyTorch implementation of Latent Space Communication for Multi-Agent Systems. It allows two instances of an LLM (e.g., Qwen 3.5 4B) to communicate their "thoughts" without generating visible text tokens. Instead, they share their intermediate neural activations directly.


The Architecture: How It Works

Today, if you want an LLM to give a complex answer, you must force it to "think out loud" (Chain-of-Thought). This consumes thousands of visible tokens (the <think> blocks), drastically slows down the response (latency), and saturates the Context Window.

LatentBridge eliminates the need to print text to think: it moves reasoning entirely into the latent (vector) space of the neural network. The architecture relies on two instances of the same base model, assuming two different roles via System Prompts:

  • Agent B (The Thinker): Analyzes the problem deeply in the background.
  • Agent A (The Speaker): Provides the final concise answer.

Here is the step-by-step mechanism of the Parallel Injection:

1. Context Capture (Global Intuition)

Agent B reads the user prompt and performs a forward pass. Instead of making it generate text, we extract its Hidden States (internal neural activations). Specifically, we capture only the final token (H_B[:, -1:, :]). Because of Self-Attention, the last token computed by an LLM contains a dense, compressed, mathematical summary of the entire sentence and reasoning process. It is a "vector of pure intuition."

2. Deep Layer Injection (Hooks)

Instead of merging minds at the very beginning (word embeddings) or at the very end (probabilities), LatentBridge hooks into the deep intermediate layers of the model (e.g., Layers 11, 19, and 27). This is where the neural network processes logical abstraction, complex syntax, and problem-solving.

3. The Safety Translation ($W_{ab}$)

To prevent destroying the stable mathematical space of Agent A, LatentBridge uses a trainable neural projection (an MLP or linear matrix, $W_{ab}$). This network acts as a simultaneous translator: it takes B's intuition and remaps it so that it is mathematically "digestible" by Agent A's LayerNorm.

4. The Dynamic Gate (The Smart Valve)

This is the masterpiece of the system. During generation, Agent A creates one word at a time. For every single word generated, the Dynamic Gate computes an equation. We use a Sigmoid function as a "valve" controlling the injection:

  • Agent A dynamically decides, token by token, if it needs Agent B's help.
  • If Agent A is writing obvious words ("The", "answer", "is"), the Gate closes near zero to avoid distortion.
  • If Agent A reaches a crucial logical crossroad, the Gate opens to 1, and the mathematical solution thought by B flows into A's calculations to guide the correct word.

5. Autoregressive Decay

The more Agent A speaks, the more its sentence makes sense, and the less it needs B's initial intuition. A decay_rate gradually lowers the injection token after token, allowing Agent A to finish its sentence independently with perfect grammar.


Benchmark Results (GSM8K)

To validate the LatentBridge approach, we evaluated its performance on the GSM8K mathematical dataset compared to a standard Textual Multi-Agent Baseline.

Evaluation Methodology

  1. Textual Baseline: Agent B receives the question and generates an explicit textual Chain-of-Thought (CoT). Agent A reads the question and Agent B's CoT text, then generates the final numeric answer.
  2. LatentBridge: Agent B reads the question and processes it internally, without generating any text. Its final hidden states at layers 11, 19, and 27 are extracted and injected into Agent A via the bridge. Agent A generates the final answer guided by this latent intuition.

The test was conducted on 44 reasoning problems, checking for strict numeric exact-match accuracy. The results demonstrate a net improvement across all metrics.

Accuracy & Reasoning Quality

Accuracy Comparison

The accuracy jumped from 55.8% to 76.7% (+20.9% absolute increase). Latent communication dramatically reduces the chances of the model hallucinating or committing logic errors by condensing the reasoning.

Efficiency: Solving the "Overthinking" Problem

Efficiency Comparison

A major flaw of the Textual Baseline is overthinking: generating excessively long CoT blocks (averaging nearly 1900 tokens). LatentBridge completely resolves this. By processing the intuition in the latent space, it forces synthesis and coherence.

  • Latency: 5.1x faster (from ~184s down to ~36s).
  • Token Usage: Reduced by 81% (down to ~353 tokens).

Resource Utilization & Confidence

VRAM Comparison Confidence Comparison
  • VRAM: The impact is virtually negligible. Adding LatentBridge increases peak VRAM by only ~3% (+289 MB), a very low cost for the performance gains.
  • Confidence: The model's internal confidence in its generations increased from 87.86% to 92.26%.

Quick Start

This repository provides everything you need to test LatentBridge out of the box with zero API servers or complex setups.

Prerequisites

pip install torch transformers accelerate

Note: The scripts are optimized to use Scaled Dot Product Attention (sdpa) and run completely on GPU (.to("cuda")). An 8GB+ GPU is recommended.

Running the Hello World

To verify that everything is working correctly, you can run the minimal hello_world.py script located in the src folder. This script initializes the base model, loads the latent bridge, and makes Agent A speak by using Agent B's latent intuition.

cd src
python hello_world.py

Training the Bridge

If you want to train your own LatentBridge (or fine-tune it), the repository includes the complete training pipeline. The training script uses Teacher Forcing and freezes the base model, optimizing only the bridge's projections ($W_{ab}$ and gates). This makes training extremely lightweight.

To run a training session using the default reasoning_dataset.json:

cd src
python train.py

CLI Parameters

You can customize the training run via command line arguments without modifying config.py:

Parameter Description Default
--source Path to the custom JSON dataset. reasoning_dataset.json
--model Override the base model name/path. Qwen/Qwen3.5-4B
--layers Space-separated list of target layers for injection. 11 19 27
--bridge-type Neural architecture of the bridge (mlp or linear). mlp
--lr Learning rate for the optimizer. 5e-4
--epochs Number of training epochs. 3
--max-steps Force stop after N steps (0 = no limit). 5000
--checkpoint-dir Directory to save .pt checkpoint weights. checkpoints
--resume Path to a .pt file to resume training from. None

For example, to train for 1000 steps with a specific learning rate:

python train.py --source my_dataset.json --lr 1e-4 --max-steps 1000

Technical Details

  • Base Model: Qwen/Qwen3.5-4B
  • Training Methodology (Knowledge Distillation): The neural bridge (bridge_weights.pt) was trained by distilling the explicit Chain-of-Thought (CoT) reasoning from Claude Opus 4.6. The MLP projections and dynamic gates learned to map the complex, multi-step logical deductions generated by Opus 4.6 directly into the latent space of the 4B model.
  • Trainable Parameters: The bridge_weights.pt contains the trained MLP projections and dynamic gates for layers 11, 19, 27.

About

LatentBridge: Telepathic Multi-Agent Communication

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages