NeuronBlade

NeuronBlade implements 19 abliteration techniques (including 5 novel approaches) to remove common elements that LLMs generate repetitively, with minimal damage to the underlying model's reasoning, perplexity, and general capabilities. Built for Qwen3.5-4B's hybrid linear/full-attention architecture.

Results can be found here: (https://huggingface.co/g023/NeuronBlade-Qwen3.5-4B/)

Winning technique: Embedding Surgery (0.8) + Harmonic Dampening + Orthogonal Projection (top 4 layers)

Key Discoveries

Embedding Surgery is near-lossless: Modifying <0.01% of embedding parameters achieves 89-100% name bias reduction with zero measurable quality impact.
Harmonic Dampening improves perplexity: Novel FFT-based technique actually improved model quality while reducing bias — the first technique to show this property.
The PPL Cliff: There is a sharp phase transition between gentle techniques (PPL ≈ 13.2) and aggressive full-layer orthogonal projection (PPL ≈ 27.7). No technique occupies the middle ground.
Layer 31 dominance: Name generation decisions are encoded primarily in the final transformer layer (signal strength 10.84 vs 7.95 for layer 30).
Last-token activations: Name bias is encoded at the last token position, not distributed across the sequence. Mean-pooled probing produces negligible signal (~1.0001).

Techniques

Tier 1 (Required)

Norm-Preserving Biprojection: Projects out concept direction while preserving weight matrix norms
Single-Pass Deterministic: Rank-constrained perturbation in a single deterministic pass

Tier 2 (Proven)

Orthogonal Projection: Projects weight matrices onto the orthogonal complement of the concept direction. Best traditional technique.
Embedding Surgery: Directly modifies token embeddings by blending toward generic reference. Best overall — near-zero model damage.
Directional Ablation: Subtracts scaled concept component from weights
Activation Steering: Modifies weights to steer activations away from concept
Rank-1 Perturbation: Minimal rank-1 update to reduce concept alignment
Spectral Filtering: SVD-based removal of concept-aligned singular vectors

Novel Techniques (Discovered in this research)

Harmonic Resonance Dampening: FFT-based technique that identifies and attenuates dominant frequency components along the concept direction. First technique to improve PPL.
Phase Rotation: Rotates weight components in the concept subspace
Gradient Echo: Approximates gradient-based unlearning without backpropagation
Synaptic Rerouting: SVD-based pathway modification to redirect concept flow
Spectral Antibody: Creates "antibody" vectors in spectral space to neutralize concept

Architecture

neuronblade.py          # Rich CLI interface
src/
├── model_loader.py     # Model loading, weight access, text generation
├── concept_probe.py    # Basic concept direction probing
├── advanced_probe.py   # Advanced multi-strategy probing (activation diff, logit attribution)
├── architecture.py     # Qwen3.5 hybrid architecture helpers
├── evaluator.py        # Name bias, perplexity, and reasoning evaluation
├── harness.py          # A/B test harness
├── exporter.py         # GGUF export via llama.cpp
└── techniques/         # 19 abliteration techniques
    ├── base.py                         # Abstract base class
    ├── norm_preserving_biprojection.py # Tier 1: Required
    ├── single_pass_deterministic.py    # Tier 1: Required
    ├── directional_ablation.py         # Tier 2: Standard
    ├── orthogonal_projection.py        # Tier 2: Best traditional technique
    ├── embedding_surgery.py            # Tier 2: Best overall (near-lossless)
    ├── activation_steering.py          # Tier 2
    ├── rank1_perturbation.py           # Tier 2
    ├── spectral_filtering.py           # Tier 2
    ├── harmonic_dampening.py           # Novel: FFT-based, PPL-improving!
    ├── phase_rotation.py               # Novel: Rotation in concept space
    ├── gradient_echo.py                # Novel: Gradient approximation
    ├── synaptic_rerouting.py           # Novel: SVD-based pathway modification
    ├── spectral_antibody.py            # Novel: Spectral immune response
    ├── selective_attention_pruning.py  # Tier 3
    ├── anti_lora.py                    # Tier 3
    ├── activation_clamping.py          # Tier 3
    ├── mlp_gate_modulation.py          # Tier 3
    ├── weight_interpolation.py         # Tier 3
    └── combined.py                     # Meta: Multi-technique combiner

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuronBlade

NeuronBlade implements 19 abliteration techniques (including 5 novel approaches) to remove common elements that LLMs generate repetitively, with minimal damage to the underlying model's reasoning, perplexity, and general capabilities. Built for Qwen3.5-4B's hybrid linear/full-attention architecture.

Key Discoveries

Techniques

Tier 1 (Required)

Tier 2 (Proven)

Novel Techniques (Discovered in this research)

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

NeuronBlade

Key Discoveries

Techniques

Tier 1 (Required)

Tier 2 (Proven)

Novel Techniques (Discovered in this research)

Architecture

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages