Skip to content

g023/neuronblade

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

NeuronBlade

NeuronBlade implements 19 abliteration techniques (including 5 novel approaches) to remove common elements that LLMs generate repetitively, with minimal damage to the underlying model's reasoning, perplexity, and general capabilities. Built for Qwen3.5-4B's hybrid linear/full-attention architecture.

Results can be found here: (https://huggingface.co/g023/NeuronBlade-Qwen3.5-4B/)

Winning technique: Embedding Surgery (0.8) + Harmonic Dampening + Orthogonal Projection (top 4 layers)


Key Discoveries

  1. Embedding Surgery is near-lossless: Modifying <0.01% of embedding parameters achieves 89-100% name bias reduction with zero measurable quality impact.

  2. Harmonic Dampening improves perplexity: Novel FFT-based technique actually improved model quality while reducing bias — the first technique to show this property.

  3. The PPL Cliff: There is a sharp phase transition between gentle techniques (PPL ≈ 13.2) and aggressive full-layer orthogonal projection (PPL ≈ 27.7). No technique occupies the middle ground.

  4. Layer 31 dominance: Name generation decisions are encoded primarily in the final transformer layer (signal strength 10.84 vs 7.95 for layer 30).

  5. Last-token activations: Name bias is encoded at the last token position, not distributed across the sequence. Mean-pooled probing produces negligible signal (~1.0001).


Techniques

Tier 1 (Required)

  • Norm-Preserving Biprojection: Projects out concept direction while preserving weight matrix norms
  • Single-Pass Deterministic: Rank-constrained perturbation in a single deterministic pass

Tier 2 (Proven)

  • Orthogonal Projection: Projects weight matrices onto the orthogonal complement of the concept direction. Best traditional technique.
  • Embedding Surgery: Directly modifies token embeddings by blending toward generic reference. Best overall — near-zero model damage.
  • Directional Ablation: Subtracts scaled concept component from weights
  • Activation Steering: Modifies weights to steer activations away from concept
  • Rank-1 Perturbation: Minimal rank-1 update to reduce concept alignment
  • Spectral Filtering: SVD-based removal of concept-aligned singular vectors

Novel Techniques (Discovered in this research)

  • Harmonic Resonance Dampening: FFT-based technique that identifies and attenuates dominant frequency components along the concept direction. First technique to improve PPL.
  • Phase Rotation: Rotates weight components in the concept subspace
  • Gradient Echo: Approximates gradient-based unlearning without backpropagation
  • Synaptic Rerouting: SVD-based pathway modification to redirect concept flow
  • Spectral Antibody: Creates "antibody" vectors in spectral space to neutralize concept

Architecture

neuronblade.py          # Rich CLI interface
src/
├── model_loader.py     # Model loading, weight access, text generation
├── concept_probe.py    # Basic concept direction probing
├── advanced_probe.py   # Advanced multi-strategy probing (activation diff, logit attribution)
├── architecture.py     # Qwen3.5 hybrid architecture helpers
├── evaluator.py        # Name bias, perplexity, and reasoning evaluation
├── harness.py          # A/B test harness
├── exporter.py         # GGUF export via llama.cpp
└── techniques/         # 19 abliteration techniques
    ├── base.py                         # Abstract base class
    ├── norm_preserving_biprojection.py # Tier 1: Required
    ├── single_pass_deterministic.py    # Tier 1: Required
    ├── directional_ablation.py         # Tier 2: Standard
    ├── orthogonal_projection.py        # Tier 2: Best traditional technique
    ├── embedding_surgery.py            # Tier 2: Best overall (near-lossless)
    ├── activation_steering.py          # Tier 2
    ├── rank1_perturbation.py           # Tier 2
    ├── spectral_filtering.py           # Tier 2
    ├── harmonic_dampening.py           # Novel: FFT-based, PPL-improving!
    ├── phase_rotation.py               # Novel: Rotation in concept space
    ├── gradient_echo.py                # Novel: Gradient approximation
    ├── synaptic_rerouting.py           # Novel: SVD-based pathway modification
    ├── spectral_antibody.py            # Novel: Spectral immune response
    ├── selective_attention_pruning.py  # Tier 3
    ├── anti_lora.py                    # Tier 3
    ├── activation_clamping.py          # Tier 3
    ├── mlp_gate_modulation.py          # Tier 3
    ├── weight_interpolation.py         # Tier 3
    └── combined.py                     # Meta: Multi-technique combiner

License

MIT License — Copyright (c) 2025 g023 (github.com/g023)

About

NeuronBlade implements 19 abliteration techniques (including 5 novel approaches) to remove common elements that LLMs generate repetitively, with minimal damage to the underlying model's reasoning, perplexity, and general capabilities.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors