NeuronBlade implements 19 abliteration techniques (including 5 novel approaches) to remove common elements that LLMs generate repetitively, with minimal damage to the underlying model's reasoning, perplexity, and general capabilities. Built for Qwen3.5-4B's hybrid linear/full-attention architecture.
Results can be found here: (https://huggingface.co/g023/NeuronBlade-Qwen3.5-4B/)
Winning technique: Embedding Surgery (0.8) + Harmonic Dampening + Orthogonal Projection (top 4 layers)
-
Embedding Surgery is near-lossless: Modifying <0.01% of embedding parameters achieves 89-100% name bias reduction with zero measurable quality impact.
-
Harmonic Dampening improves perplexity: Novel FFT-based technique actually improved model quality while reducing bias — the first technique to show this property.
-
The PPL Cliff: There is a sharp phase transition between gentle techniques (PPL ≈ 13.2) and aggressive full-layer orthogonal projection (PPL ≈ 27.7). No technique occupies the middle ground.
-
Layer 31 dominance: Name generation decisions are encoded primarily in the final transformer layer (signal strength 10.84 vs 7.95 for layer 30).
-
Last-token activations: Name bias is encoded at the last token position, not distributed across the sequence. Mean-pooled probing produces negligible signal (~1.0001).
- Norm-Preserving Biprojection: Projects out concept direction while preserving weight matrix norms
- Single-Pass Deterministic: Rank-constrained perturbation in a single deterministic pass
- Orthogonal Projection: Projects weight matrices onto the orthogonal complement of the concept direction. Best traditional technique.
- Embedding Surgery: Directly modifies token embeddings by blending toward generic reference. Best overall — near-zero model damage.
- Directional Ablation: Subtracts scaled concept component from weights
- Activation Steering: Modifies weights to steer activations away from concept
- Rank-1 Perturbation: Minimal rank-1 update to reduce concept alignment
- Spectral Filtering: SVD-based removal of concept-aligned singular vectors
- Harmonic Resonance Dampening: FFT-based technique that identifies and attenuates dominant frequency components along the concept direction. First technique to improve PPL.
- Phase Rotation: Rotates weight components in the concept subspace
- Gradient Echo: Approximates gradient-based unlearning without backpropagation
- Synaptic Rerouting: SVD-based pathway modification to redirect concept flow
- Spectral Antibody: Creates "antibody" vectors in spectral space to neutralize concept
neuronblade.py # Rich CLI interface
src/
├── model_loader.py # Model loading, weight access, text generation
├── concept_probe.py # Basic concept direction probing
├── advanced_probe.py # Advanced multi-strategy probing (activation diff, logit attribution)
├── architecture.py # Qwen3.5 hybrid architecture helpers
├── evaluator.py # Name bias, perplexity, and reasoning evaluation
├── harness.py # A/B test harness
├── exporter.py # GGUF export via llama.cpp
└── techniques/ # 19 abliteration techniques
├── base.py # Abstract base class
├── norm_preserving_biprojection.py # Tier 1: Required
├── single_pass_deterministic.py # Tier 1: Required
├── directional_ablation.py # Tier 2: Standard
├── orthogonal_projection.py # Tier 2: Best traditional technique
├── embedding_surgery.py # Tier 2: Best overall (near-lossless)
├── activation_steering.py # Tier 2
├── rank1_perturbation.py # Tier 2
├── spectral_filtering.py # Tier 2
├── harmonic_dampening.py # Novel: FFT-based, PPL-improving!
├── phase_rotation.py # Novel: Rotation in concept space
├── gradient_echo.py # Novel: Gradient approximation
├── synaptic_rerouting.py # Novel: SVD-based pathway modification
├── spectral_antibody.py # Novel: Spectral immune response
├── selective_attention_pruning.py # Tier 3
├── anti_lora.py # Tier 3
├── activation_clamping.py # Tier 3
├── mlp_gate_modulation.py # Tier 3
├── weight_interpolation.py # Tier 3
└── combined.py # Meta: Multi-technique combiner
MIT License — Copyright (c) 2025 g023 (github.com/g023)