Paper: DART-ing Through the Drift: Dynamic Tracing of Knowledge Neurons for Adaptive Inference-Time Pruning [arXiv]
Abhishek Tyagi1, Yunuo Cen2, Shrey Dhorajiya, Bharadwaj Veeravalli, Xuanyao Fong
To cite this paper, use
@misc{tyagi2026dartingdriftdynamictracing,
title={DART-ing Through the Drift: Dynamic Tracing of Knowledge Neurons for Adaptive Inference-Time Pruning},
author={Abhishek Tyagi and Yunuo Cen and Shrey Dhorajiya and Bharadwaj Veeravalli and Xuanyao Fong},
year={2026},
eprint={2601.22632},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.22632},
}
If you have questions or thoughts regarding the tool or this work, please contact atygai@u.nus.edu1 or cenyunuo@u.nus.edu2.
DART.mp4
This project implements dynamic neuron pruning strategies for large language models with adaptive masking and knowledge drift detection.
# Core dependencies
torch==2.2.0
transformers==4.57.3
datasets==4.4.2
numpy==1.26.4
pandas==2.2.0
# Evaluation and visualization
matplotlib==3.8.2
evaluate==0.4.6
cmcrameri==1.8
# Optional but recommended
accelerate==1.12.0You can install all dependencies using:
pip install torch transformers datasets numpy pandas matplotlib evaluate cmcrameri accelerateThe project provides two main experiment scripts:
For running pruning experiments with perplexity and MMLU evaluation:
# Edit configuration in run_experiment.sh
# Key parameters:
# - MODEL: Choose your model (e.g., "meta-llama/Llama-3.1-8B")
# - LAYER_TOPK: Pruning configuration per layer
# - MASKING_STEP: When to start applying masks
# - EVAL_PERPLEXITY: Enable perplexity evaluation
# - EVAL_MMLU: Enable MMLU evaluation
./run_experiment.shFor analyzing knowledge drift during generation:
# Edit configuration in run_knowledge_drift.sh
# Additional parameters:
# - KNOWLEDGE_DRIFT: Enable drift detection
# - CUSTOM_PROMPT_TEXT: Specify your prompt
# - GENERATION: Number of tokens to generate
./run_knowledge_drift.shBoth scripts support the following key parameters:
Model Configuration:
MODEL: Model name from HuggingFace (e.g., "meta-llama/Llama-3.1-8B", "gpt2")CACHE_DIR: Directory for model weightsDEVICE: CUDA device ID
Pruning Configuration:
LAYER_TOPK: Per-layer neuron keep ratios (e.g., "all:auto" or "0:0.9,1:0.8,...")MASKING_STEP: Step to start masking neuronsRELEASE_STEP: Step to release masks (optional)EMA_DECAY: Exponential moving average decay factorRANKING_METHOD: Neuron ranking method ("max", "mean", "combined", "product", "magnitude")PRUNE_STRATEGY: Pruning strategy ("topk", "auto")TOTAL_PRUNE_PERCENT: Target total pruning percentage
Evaluation Configuration:
EVAL_PERPLEXITY: Enable perplexity evaluationEVAL_MMLU: Enable MMLU benchmark evaluationEVAL_GENERAL_NLP: Enable general NLP tasks evaluation
Prompt Configuration:
PROMPT_TYPE: "custom" or "mmlu"PROMPT_SUBJECT: Dataset/subject nameCUSTOM_PROMPT_TEXT: Custom prompt textPROMPT_LENGTH: Maximum prompt length
Results are saved to results/ directory with:
- Output logs
- Timing information
- Configuration JSON
- Perplexity scores
- MMLU accuracy
- Neuron masking statistics
- Knowledge drift metrics (if enabled)
# Quick test with GPT-2 on a small dataset
DEVICE=0
MODEL="gpt2"
LAYER_TOPK="all:auto"
MASKING_STEP=10
EVAL_PERPLEXITY=true
./run_experiment.sh
# Knowledge drift analysis with custom prompt
KNOWLEDGE_DRIFT=true
CUSTOM_PROMPT_TEXT="Write about artificial intelligence"
GENERATION=500
./run_knowledge_drift.sh.
├── dynamicPrune.py # Main execution script
├── run_experiment.sh # Standard experiment wrapper
├── run_knowledge_drift.sh # Knowledge drift wrapper
├── src/
│ ├── neuronDefuser.py # Core pruning logic
│ ├── perplexity_utils.py # Perplexity evaluation
│ ├── mmlu_utils.py # MMLU benchmark
│ ├── hook_setup.py # Model hook registration
│ └── resultCompiler.py # Result analysis and plotting
├── lib/ # Additional utilities
├── datasets/ # Dataset storage
├── results/ # Experiment outputs
└── llm_weights/ # Model cache
- Memory Management: For large models, reduce batch size or use gradient checkpointing
- Pruning Ratios: Start with conservative values (keep 80-90% neurons) and adjust
- Adaptive Pruning: Use
LAYER_TOPK="all:auto"for automatic layer-wise optimization - Logging: Check both
output_*.logandtiming_*.logfor detailed information
See LICENSE file for details.