LatentBridge: Telepathic Multi-Agent Communication

Disclaimer: This repository is an experimental proof-of-concept created exclusively for personal study and research. It is not intended for production use. The results and capabilities discussed here are hypothetical, and the system has not been rigorously evaluated against standard benchmarks. The code is shared "as is" to explore alternative architectural ideas in neural networks.

Introduction: Multi-Agent Systems (MAS)

In modern AI, a Multi-Agent System (MAS) involves multiple LLM instances (agents) working together to solve complex problems. Typically, agents communicate by generating text and reading each other's outputs. For example, an "Analyzer Agent" writes a long chain-of-thought, and a "Speaker Agent" reads that text to produce a final concise answer. While effective, this text-based communication is slow, consumes large amounts of the context window, and forces models to externalize every single thought into tokens.

LatentBridge proposes a radical alternative: what if agents could communicate telepathically?

LatentBridge is a lightweight, standalone PyTorch implementation of Latent Space Communication for Multi-Agent Systems. It allows two instances of an LLM (e.g., Qwen 3.5 4B) to communicate their "thoughts" without generating visible text tokens. Instead, they share their intermediate neural activations directly.

The Architecture: How It Works

Today, if you want an LLM to give a complex answer, you must force it to "think out loud" (Chain-of-Thought). This consumes thousands of visible tokens (the <think> blocks), drastically slows down the response (latency), and saturates the Context Window.

LatentBridge eliminates the need to print text to think: it moves reasoning entirely into the latent (vector) space of the neural network. The architecture relies on two instances of the same base model, assuming two different roles via System Prompts:

Agent B (The Thinker): Analyzes the problem deeply in the background.
Agent A (The Speaker): Provides the final concise answer.

Here is the step-by-step mechanism of the Parallel Injection:

1. Context Capture (Global Intuition)

Agent B reads the user prompt and performs a forward pass. Instead of making it generate text, we extract its Hidden States (internal neural activations). Specifically, we capture only the final token (H_B[:, -1:, :]). Because of Self-Attention, the last token computed by an LLM contains a dense, compressed, mathematical summary of the entire sentence and reasoning process. It is a "vector of pure intuition."

2. Deep Layer Injection (Hooks)

Instead of merging minds at the very beginning (word embeddings) or at the very end (probabilities), LatentBridge hooks into the deep intermediate layers of the model (e.g., Layers 11, 19, and 27). This is where the neural network processes logical abstraction, complex syntax, and problem-solving.

3. The Safety Translation ($W_{ab}$)

To prevent destroying the stable mathematical space of Agent A, LatentBridge uses a trainable neural projection (an MLP or linear matrix, $W_{ab}$). This network acts as a simultaneous translator: it takes B's intuition and remaps it so that it is mathematically "digestible" by Agent A's LayerNorm.

4. The Dynamic Gate (The Smart Valve)

This is the masterpiece of the system. During generation, Agent A creates one word at a time. For every single word generated, the Dynamic Gate computes an equation. We use a Sigmoid function as a "valve" controlling the injection:

Agent A dynamically decides, token by token, if it needs Agent B's help.
If Agent A is writing obvious words ("The", "answer", "is"), the Gate closes near zero to avoid distortion.
If Agent A reaches a crucial logical crossroad, the Gate opens to 1, and the mathematical solution thought by B flows into A's calculations to guide the correct word.

5. Autoregressive Decay

The more Agent A speaks, the more its sentence makes sense, and the less it needs B's initial intuition. A decay_rate gradually lowers the injection token after token, allowing Agent A to finish its sentence independently with perfect grammar.

Benchmark Results (GSM8K)

To validate the LatentBridge approach, we evaluated its performance on the GSM8K mathematical dataset compared to a standard Textual Multi-Agent Baseline.

Evaluation Methodology

Textual Baseline: Agent B receives the question and generates an explicit textual Chain-of-Thought (CoT). Agent A reads the question and Agent B's CoT text, then generates the final numeric answer.
LatentBridge: Agent B reads the question and processes it internally, without generating any text. Its final hidden states at layers 11, 19, and 27 are extracted and injected into Agent A via the bridge. Agent A generates the final answer guided by this latent intuition.

The test was conducted on 44 reasoning problems, checking for strict numeric exact-match accuracy. The results demonstrate a net improvement across all metrics.

Accuracy & Reasoning Quality

The accuracy jumped from 55.8% to 76.7% (+20.9% absolute increase). Latent communication dramatically reduces the chances of the model hallucinating or committing logic errors by condensing the reasoning.

Efficiency: Solving the "Overthinking" Problem

A major flaw of the Textual Baseline is overthinking: generating excessively long CoT blocks (averaging nearly 1900 tokens). LatentBridge completely resolves this. By processing the intuition in the latent space, it forces synthesis and coherence.

Latency: 5.1x faster (from ~184s down to ~36s).
Token Usage: Reduced by 81% (down to ~353 tokens).

Resource Utilization & Confidence

VRAM: The impact is virtually negligible. Adding LatentBridge increases peak VRAM by only ~3% (+289 MB), a very low cost for the performance gains.
Confidence: The model's internal confidence in its generations increased from 87.86% to 92.26%.

Quick Start

This repository provides everything you need to test LatentBridge out of the box with zero API servers or complex setups.

Prerequisites

pip install torch transformers accelerate

Note: The scripts are optimized to use Scaled Dot Product Attention (sdpa) and run completely on GPU (.to("cuda")). An 8GB+ GPU is recommended.

Running the Hello World

To verify that everything is working correctly, you can run the minimal hello_world.py script located in the src folder. This script initializes the base model, loads the latent bridge, and makes Agent A speak by using Agent B's latent intuition.

cd src
python hello_world.py

Training the Bridge

If you want to train your own LatentBridge (or fine-tune it), the repository includes the complete training pipeline. The training script uses Teacher Forcing and freezes the base model, optimizing only the bridge's projections ($W_{ab}$ and gates). This makes training extremely lightweight.

To run a training session using the default reasoning_dataset.json:

cd src
python train.py

CLI Parameters

You can customize the training run via command line arguments without modifying config.py:

Parameter	Description	Default
`--source`	Path to the custom JSON dataset.	`reasoning_dataset.json`
`--model`	Override the base model name/path.	`Qwen/Qwen3.5-4B`
`--layers`	Space-separated list of target layers for injection.	`11 19 27`
`--bridge-type`	Neural architecture of the bridge (`mlp` or `linear`).	`mlp`
`--lr`	Learning rate for the optimizer.	`5e-4`
`--epochs`	Number of training epochs.	`3`
`--max-steps`	Force stop after N steps (0 = no limit).	`5000`
`--checkpoint-dir`	Directory to save `.pt` checkpoint weights.	`checkpoints`
`--resume`	Path to a `.pt` file to resume training from.	`None`

For example, to train for 1000 steps with a specific learning rate:

python train.py --source my_dataset.json --lr 1e-4 --max-steps 1000

Technical Details

Base Model: Qwen/Qwen3.5-4B
Training Methodology (Knowledge Distillation): The neural bridge (bridge_weights.pt) was trained by distilling the explicit Chain-of-Thought (CoT) reasoning from Claude Opus 4.6. The MLP projections and dynamic gates learned to map the complex, multi-step logical deductions generated by Opus 4.6 directly into the latent space of the 4B model.
Trainable Parameters: The bridge_weights.pt contains the trained MLP projections and dynamic gates for layers 11, 19, 27.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
README.md		README.md
accuracy_comparison.png		accuracy_comparison.png
benchmark_report_gsm8k.md		benchmark_report_gsm8k.md
confidence_comparison.png		confidence_comparison.png
efficiency_comparison.png		efficiency_comparison.png
info.png		info.png
vram_comparison.png		vram_comparison.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LatentBridge: Telepathic Multi-Agent Communication

Introduction: Multi-Agent Systems (MAS)

The Architecture: How It Works

1. Context Capture (Global Intuition)

2. Deep Layer Injection (Hooks)

3. The Safety Translation ($W_{ab}$)

4. The Dynamic Gate (The Smart Valve)

5. Autoregressive Decay

Benchmark Results (GSM8K)

Evaluation Methodology

Accuracy & Reasoning Quality

Efficiency: Solving the "Overthinking" Problem

Resource Utilization & Confidence

Quick Start

Prerequisites

Running the Hello World

Training the Bridge

CLI Parameters

Technical Details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LatentBridge: Telepathic Multi-Agent Communication

Introduction: Multi-Agent Systems (MAS)

The Architecture: How It Works

1. Context Capture (Global Intuition)

2. Deep Layer Injection (Hooks)

3. The Safety Translation ($W_{ab}$)

4. The Dynamic Gate (The Smart Valve)

5. Autoregressive Decay

Benchmark Results (GSM8K)

Evaluation Methodology

Accuracy & Reasoning Quality

Efficiency: Solving the "Overthinking" Problem

Resource Utilization & Confidence

Quick Start

Prerequisites

Running the Hello World

Training the Bridge

CLI Parameters

Technical Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages