DynaFuser (Dynamic Feature Fusion for Adaptive Inference) is a high-performance Transformer framework supporting Dynamic-Rank Attention and Confidence-Triggered KV Cache mechanisms.
- 对 Q、K 投影使用低秩分解: W_Q = A_Q @ B_Q^T, W_K = A_K @ B_K^T
- 通过轻量级 Router 网络动态选择每层的秩 r_t ∈ [r_min, r_max]
- FLOPs 降低 30-50%,同时保持精度
- 两层 KV 缓存机制:
- KV_simple: 轻量级投影 (hidden_size/4)
- KV_full: 完整 KV (仅在低置信度时触发)
- 置信度阈值 τ: 当 conf < τ 时触发完整计算
- 显存节省 40-60%,触发率 < 15%
- 统一置信度信号同时控制动态秩选择和 KV 触发决策
- 端到端联合训练,最小化多目标损失
DynaFuser/
├── config/ # 配置文件
│ ├── model.yaml # 模型架构配置
│ ├── dynamic.yaml # 动态秩 & 置信度配置
│ ├── train.yaml # 训练参数
│ ├── infer.yaml # 推理参数
│ └── deepspeed_config.json # DeepSpeed ZeRO 配置
│
├── src/
│ ├── core/ # 核心模块
│ │ ├── dynamic_rank_attention.py # 动态秩注意力
│ │ ├── confidence_kv_cache.py # 置信度触发 KV
│ │ └── router.py # 路由器 (秩 + 置信度)
│ │
│ ├── models/ # 模型实现
│ │ ├── transformer_base.py # Transformer 基础
│ │ └── dynafuser_model.py # DynaFuser 主模型
│ │
│ ├── trainer/ # 训练脚本
│ │ ├── train_joint.py # 联合训练
│ │ └── losses.py # 损失函数
│ │
│ ├── benchmarks/ # 评测脚本
│ │ ├── evaluate_wikitext.py # WikiText PPL
│ │ ├── evaluate_mmlu.py # MMLU 准确率
│ │ └── evaluate_longbench.py # 长文本评测
│ │
│ └── utils/ # 工具函数
│ ├── __init__.py # 通用工具
│ └── profiler.py # 性能分析
│
├── scripts/ # Shell 脚本
│ ├── setup_env.sh # 环境安装
│ ├── run_train.sh # 启动训练 (8卡H20)
│ ├── run_infer.sh # 推理
│ └── run_eval.sh # 评测
│
├── external/ # 外部依赖 (自动克隆)
│ ├── transformers/
│ ├── flash-attention/
│ ├── DeepSpeed/
│ ├── vllm/
│ ├── LongBench/
│ └── test/ # MMLU 数据
│
└── experiments/ # 实验输出
├── baseline/
├── dynamic_rank/
├── confidence_kv/
└── joint/
在 8 卡 H20 服务器上运行:
# 克隆仓库
git clone https://github.com/yourusername/DynaFuser.git
cd DynaFuser
# 运行自动安装脚本
bash scripts/setup_env.sh
# 激活环境
conda activate dynafuser# 使用 DeepSpeed ZeRO-2 在 8 卡上训练
bash scripts/run_train.sh
# 或手动运行
deepspeed --num_gpus=8 src/trainer/train_joint.py \
--config config/train.yaml \
--model_config config/model.yaml \
--output_dir ./experiments/joint \
--deepspeed config/deepspeed_config.jsonbash scripts/run_infer.sh \
./experiments/joint/checkpoint-best \
./data/test_prompts.txt \
./logs/infer/predictions.jsonl# 评测 WikiText-103 / MMLU / LongBench
bash scripts/run_eval.sh ./experiments/joint/checkpoint-bestdynamic_rank:
rank_min: 32 # 最小秩
rank_max: 256 # 最大秩
rank_target: 96 # 训练目标秩
confidence_kv:
threshold: 0.85 # 触发阈值 τ
simple_layers: [0, 8, 16, 24] # 使用轻量 KV 的层
joint:
use_shared_confidence: true
loss:
ce_weight: 1.0
distill_weight: 0.3
rank_sparsity_weight: 0.01
trigger_sparsity_weight: 0.005from src.core import DynamicRankAttention
attn = DynamicRankAttention(
hidden_size=4096,
num_heads=32,
max_rank=256,
min_rank=32,
)
# 使用动态秩
outputs = attn(hidden_states, rank=96) # 使用秩96from src.core import ConfidenceTriggeredKVCache
kv_manager = ConfidenceTriggeredKVCache(
num_layers=32,
hidden_size=4096,
num_heads=32,
threshold=0.85,
)
# 根据置信度动态管理 KV
key, value, stats = kv_manager(
hidden_states,
layer_idx=0,
confidence=conf_scores,
full_k_proj=k_proj,
full_v_proj=v_proj,
)from src.models import DynaFuserModel, DynaFuserConfig
config = DynaFuserConfig.from_yaml("config/model.yaml")
model = DynaFuserModel(config)
# 训练
outputs = model(input_ids, labels=labels)
loss = outputs["loss"]
# 推理
generated = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.7,
)from src.utils.profiler import Profiler
profiler = Profiler(enabled=True, warmup_steps=5)
profiler.start()
outputs = model(input_ids)
metrics = profiler.stop(num_tokens=input_ids.numel())
print(f"Latency: {metrics.latency_ms:.2f} ms")
print(f"Memory: {metrics.memory_allocated_mb:.2f} MB")
print(f"Throughput: {metrics.throughput_tokens_per_sec:.2f} tok/s")
# 获取模型统计
stats = model.get_performance_stats()
print(f"Average Rank: {stats['avg_rank']}")
print(f"Trigger Rate: {stats['trigger_rate']}")欢迎提交 Issue 和 Pull Request!
本项目采用 MIT 许可证。