# Plasmid LoRA Swarm – Mathematical Foundations

This notebook presents the core mathematical framework used in the Plasmid LoRA Swarm project.

## 1. Linear Algebra for LoRA
- Low-rank decomposition: A ∈ R^{m×r}, B ∈ R^{r×n}, ΔW = B A.
- Compositional merging: weighted deltas, SVD-based rank-k re-projection.
- Subspace analysis: principal angles, cos^2 overlap, effective rank.

## 2. Information Theory
- Cross-entropy, KL and JS divergences on logits.
- Mutual information across (Agent, Domain) presence; information gain per round.
- PID-lite proxies: co-occurrence excess, synergy/redundancy aggregates.

## 3. Graph Theory
- Topologies: Erdős–Rényi, Watts–Strogatz, Barabási–Albert.
- Spectral gap (λ₂) and heuristic diffusion predictor t ~ C log n / λ₂.
- Rounds-to-diffuse, coverage/entropy dynamics.

## 4. Optimization and Composition
- Alternating train–merge schedules; trust-region scaling (‖Δ‖_F ≤ τ).
- Fisher-weighted merging (diagonal Fisher proxies).

## 5. Security and Alignment
- Policy gate: ranks/targets/size/SHA/signatures with quorum.
- Behavioral probes: trigger rate, clean accuracy deltas.
- Reputation gating and verifiable audit logs (hash chain).

## 6. Pointers to Code
- See docs/module_links.md for mapping from concepts to modules.

> This notebook is a living document; fill in derivations and proofs as experiments evolve.


## LoRA Weighted Merge and Re-projection Bounds

Let base weights be W₀ and adapters produce deltas ΔWᵢ = BᵢAᵢ (rank rᵢ). Weighted composition:

\[ W' = W_0 + \sum_i w_i \Delta W_i. \]

Re-projection to rank k chooses ΔW*_k = argmin_{rank≤k} ||ΔW - X||_F = U_k S_k V_k^T (Eckart–Young–Mirsky). Error bound:

\[ ||ΔW - ΔW*_k||_F^2 = \sum_{j>k} \sigma_j^2. \]

Trust region scales ΔW by α ≤ τ/||ΔW||_F to enforce ||αΔW||_F ≤ τ.

Assumptions for stability: bounded operator norms of Jacobians under small ΔW; approximate additivity when adapter supports are weakly overlapping (principal angles large). See `plora/la_utils.py` for principal angles and effective rank.



## Mixing-time and Spectral Gap

Consider an undirected connected graph with Laplacian L and eigenvalues 0=λ₁ ≤ λ₂ ≤ … ≤ λₙ. For lazy random walks, the mixing time t_mix(ε) satisfies bounds involving λ₂ and conductance Φ:

- Cheeger: (1/2)Φ² ≤ λ₂ ≤ 2Φ
- t_mix(ε) ≤ (1/λ₂) log(1/ε π_min)

Heuristic for push–pull gossip rounds: \( t_\text{diffuse} \approx C \frac{\log n}{\lambda_2} \). We calibrate C per topology in `scripts/validate_bounds.py`.



## Information Flow: Inter-model MI and Transfer Entropy

We proxy I(M;Y) via Jensen–Shannon divergence of next-token distributions from two models, averaged over tokens. This captures how informative model choice is about emitted symbols.

Transfer entropy across rounds is approximated by the increase in R² when adding lagged source series to a regression predicting the target series. See `plora.metrics.transfer_entropy_proxy`.



## Adapter-Set Diagnostics

For a set of adapters {ΔWᵢ}, compute:
- Principal angles between colspans of Aᵢ and Aⱼ.
- Subspace overlap cos² mean and effective ranks.

These reveal interference vs complementarity. See `plora/la_utils.py` and add summary plots via `scripts/plot_figures.py`.


## LoRA Derivations (EYM, Composition, Principal Angles)

Let a base parameter matrix be W₀ and an adapter induce a low-rank perturbation ΔW = B A with A ∈ ℝ^{r×n}, B ∈ ℝ^{m×r}.

- Eckart–Young–Mirsky (EYM): For any k < rank(ΔW), the best rank-k approximation in Frobenius norm is ΔW_k = U_k S_k V_k^T where U,S,V are from SVD(ΔW). Error: \(\|\Delta W - \Delta W_k\|_F^2 = \sum_{j>k} \sigma_j^2.\)
- Weighted composition of multiple adapters i with weights w_i:
  \[ W' = W_0 + \sum_i w_i \Delta W_i,\quad \Delta W_\mathrm{tot} = \sum_i w_i \Delta W_i. \]
  The reprojection error obeys EYM on ΔW_tot.
- Stability under small updates: If \(\|\Delta W\|\) is small and layer Jacobians have bounded operator norms, the output drift is O(\|ΔW\|). A Frobenius trust region enforces \(\|ΔW\|_F \le \tau\).
- Composition interference bound: If subspaces span(A_i) are nearly orthogonal (principal angles \(\theta_j\) large), then \(\|\sum_i w_i B_i A_i\|_F^2 \approx \sum_i w_i^2 \|B_i A_i\|_F^2\) up to terms depending on cos(θ_j).



## Information Theory (KL/JS, MI, kNN, MINE, TE)

- KL and JS: JS(P,Q) = 1/2 KL(P||M) + 1/2 KL(Q||M), M = 1/2(P+Q). Symmetric, finite.
- Mutual Information: I(X;Y) = H(X) + H(Y) − H(X,Y). Estimators:
  - KSG kNN (continuous) – near-unbiased for moderate N, uses Chebyshev metric.
  - MINE (variational lower bound) – train small net with MA correction.
- Transfer Entropy: TE(A→B) = H(B_t | B_past) − H(B_t | B_past, A_past). Use discrete histogram or continuous KSG proxy.

In-code: `plora/it_estimators.py`, `plora/mine.py`, `plora/te.py`. See tests under `tests/`.


## Graph Diffusion and Mixing-time

- Cheeger’s inequalities: \(\tfrac{1}{2}\Phi^2 \le \lambda_2 \le 2\Phi\), where Φ is conductance.
- Mixing time bound: \(t_\text{mix}(\varepsilon) \lesssim \lambda_2^{-1} \log(1/\varepsilon\,\pi_\min^{-1})\).
- Push–pull heuristic: \(t_\text{diffuse} \approx C \log n / \lambda_2\). We calibrate C for ER/WS/BA.

In-code: `swarm/metrics.py` (spectral gap, conductance), `swarm/theory.py`, `scripts/calibrate_c.py`, `scripts/validate_bounds.py`. Figures via `scripts/plot_figures.py`.


## Optimization and Stability (Alternating Train–Merge)

- Convergence heuristic: with small ΔW and bounded curvature, alternating local FT and small merges behaves like a damped proximal step.
- Trust-region: choose α so that ‖αΔW‖_F ≤ τ, optionally line-search to minimise parameter drift.
- Diagnostics: track Frobenius deltas, effective rank, cosine similarity.

In-code: `scripts/alternating_train_merge.py` (line search, logs), `plora/loader.py`.



## Security and Consensus

- Signature policy: SHA-256 + RSA-PSS; quorum verification with peer attestation.
- Threshold signatures: aggregate multi-sig (RSA fallback) with quorum verification.
- Consensus: proposal/vote/commit with safety (one commit per slot), liveness under honest majority.
- Audit chain: hash-chained JSONL for tamper detection.
- Probes: trigger and clean probes with calibrated thresholds; safetensors per-tensor anomaly z-scores.

In-code: `plora/gate.py`, `plora/threshold_sigs.py`, `swarm/consensus.py`, `scripts/audit_verify.py`, `scripts/probes_calibrate.py`.


In [None]:
# Setup and common imports
import math, json, random
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)
random.seed(42)

print("Environment ready.")


In [None]:
# Figure 1: SVD reprojection error vs rank (toy)
U = np.random.randn(64, 32)
V = np.random.randn(32, 64)
Delta = U @ V  # rank at most 32
s = np.linalg.svd(Delta, compute_uv=False)
errs = [np.sum(s[k:]**2) for k in range(1, 33)]
plt.plot(range(1,33), errs)
plt.xlabel('rank k'); plt.ylabel('Frobenius error^2 beyond k'); plt.title('EYM error curve (toy)')
plt.show()


In [None]:
# Figure 2: MI vs correlation (KSG estimator)
from plora.it_estimators import mi_knn

rs = np.linspace(0.0, 0.95, 10)
mis = []
for rho in rs:
    X = np.random.randn(1000, 1)
    E = np.random.randn(1000, 1)
    Y = rho * X + np.sqrt(1 - rho**2) * E
    mis.append(mi_knn(X, Y, k=5))
plt.plot(rs, mis, marker='o')
plt.xlabel('rho'); plt.ylabel('MI (nats)'); plt.title('KSG MI vs correlation')
plt.show()


In [None]:
# Figure 3: MINE MI on Gaussian (short run)
import torch
from plora.mine import mine_estimate, MineConfig

X = torch.randn(2000, 2)
rho = 0.8
Y = rho * X + (1 - rho**2)**0.5 * torch.randn_like(X)
mi, _ = mine_estimate(X, Y, cfg=MineConfig(epochs=100, batch_size=256))
print(f"MINE MI ~ {mi:.3f} nats at rho={rho}")


In [None]:
# Figure 4: TE directionality on AR process
from plora.te import transfer_entropy_discrete

def ar_process(n=600, rho=0.8):
    a = np.random.randn(n)
    b = np.zeros(n)
    for t in range(1, n):
        b[t] = rho * a[t-1] + (1-rho) * b[t-1] + 0.1 * np.random.randn()
    return a, b

a, b = ar_process()
te_ab = transfer_entropy_discrete(a, b, k=1, bins=8)
te_ba = transfer_entropy_discrete(b, a, k=1, bins=8)
print(f"TE A->B: {te_ab:.3f}, TE B->A: {te_ba:.3f}")


In [None]:
# Figure 5: Diffusion time vs spectral gap (toy ER)
from swarm.graph_v2 import erdos_renyi_graph
from swarm.metrics import spectral_gap

ns = [20, 30, 40, 50]
lam2s = []
for n in ns:
    g = erdos_renyi_graph(n, 0.25, seed=42+n)
    lam2s.append(spectral_gap(g))
plt.plot(ns, lam2s, marker='o')
plt.xlabel('n'); plt.ylabel('lambda2'); plt.title('Spectral gap vs n (ER p=0.25)')
plt.show()
