Skip to content

liuhanzuo/DynaFuser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DynaFuser: Dynamic-Rank + Confidence-Triggered Transformer

DynaFuser (Dynamic Feature Fusion for Adaptive Inference) is a high-performance Transformer framework supporting Dynamic-Rank Attention and Confidence-Triggered KV Cache mechanisms.

Python 3.10+ PyTorch 2.1+ License: MIT


🎯 核心特性

1. Dynamic-Rank Attention (动态秩注意力)

  • 对 Q、K 投影使用低秩分解: W_Q = A_Q @ B_Q^T, W_K = A_K @ B_K^T
  • 通过轻量级 Router 网络动态选择每层的秩 r_t ∈ [r_min, r_max]
  • FLOPs 降低 30-50%,同时保持精度

2. Confidence-Triggered KV Cache (置信度触发式 KV 缓存)

  • 两层 KV 缓存机制:
    • KV_simple: 轻量级投影 (hidden_size/4)
    • KV_full: 完整 KV (仅在低置信度时触发)
  • 置信度阈值 τ: 当 conf < τ 时触发完整计算
  • 显存节省 40-60%,触发率 < 15%

3. Joint Routing (联合路由)

  • 统一置信度信号同时控制动态秩选择和 KV 触发决策
  • 端到端联合训练,最小化多目标损失

📦 项目结构

DynaFuser/
├── config/                      # 配置文件
│   ├── model.yaml              # 模型架构配置
│   ├── dynamic.yaml            # 动态秩 & 置信度配置
│   ├── train.yaml              # 训练参数
│   ├── infer.yaml              # 推理参数
│   └── deepspeed_config.json   # DeepSpeed ZeRO 配置
│
├── src/
│   ├── core/                   # 核心模块
│   │   ├── dynamic_rank_attention.py   # 动态秩注意力
│   │   ├── confidence_kv_cache.py      # 置信度触发 KV
│   │   └── router.py                   # 路由器 (秩 + 置信度)
│   │
│   ├── models/                 # 模型实现
│   │   ├── transformer_base.py         # Transformer 基础
│   │   └── dynafuser_model.py          # DynaFuser 主模型
│   │
│   ├── trainer/                # 训练脚本
│   │   ├── train_joint.py              # 联合训练
│   │   └── losses.py                   # 损失函数
│   │
│   ├── benchmarks/             # 评测脚本
│   │   ├── evaluate_wikitext.py        # WikiText PPL
│   │   ├── evaluate_mmlu.py            # MMLU 准确率
│   │   └── evaluate_longbench.py       # 长文本评测
│   │
│   └── utils/                  # 工具函数
│       ├── __init__.py                 # 通用工具
│       └── profiler.py                 # 性能分析
│
├── scripts/                    # Shell 脚本
│   ├── setup_env.sh            # 环境安装
│   ├── run_train.sh            # 启动训练 (8卡H20)
│   ├── run_infer.sh            # 推理
│   └── run_eval.sh             # 评测
│
├── external/                   # 外部依赖 (自动克隆)
│   ├── transformers/
│   ├── flash-attention/
│   ├── DeepSpeed/
│   ├── vllm/
│   ├── LongBench/
│   └── test/                   # MMLU 数据
│
└── experiments/                # 实验输出
    ├── baseline/
    ├── dynamic_rank/
    ├── confidence_kv/
    └── joint/

🚀 快速开始

1. 环境安装

在 8 卡 H20 服务器上运行:

# 克隆仓库
git clone https://github.com/yourusername/DynaFuser.git
cd DynaFuser

# 运行自动安装脚本
bash scripts/setup_env.sh

# 激活环境
conda activate dynafuser

2. 训练 DynaFuser

# 使用 DeepSpeed ZeRO-2 在 8 卡上训练
bash scripts/run_train.sh

# 或手动运行
deepspeed --num_gpus=8 src/trainer/train_joint.py \
    --config config/train.yaml \
    --model_config config/model.yaml \
    --output_dir ./experiments/joint \
    --deepspeed config/deepspeed_config.json

3. 推理

bash scripts/run_infer.sh \
    ./experiments/joint/checkpoint-best \
    ./data/test_prompts.txt \
    ./logs/infer/predictions.jsonl

4. 评测

# 评测 WikiText-103 / MMLU / LongBench
bash scripts/run_eval.sh ./experiments/joint/checkpoint-best

📊 关键配置

config/dynamic.yaml

dynamic_rank:
  rank_min: 32           # 最小秩
  rank_max: 256          # 最大秩
  rank_target: 96        # 训练目标秩
  
confidence_kv:
  threshold: 0.85        # 触发阈值 τ
  simple_layers: [0, 8, 16, 24]  # 使用轻量 KV 的层
  
joint:
  use_shared_confidence: true
  loss:
    ce_weight: 1.0
    distill_weight: 0.3
    rank_sparsity_weight: 0.01
    trigger_sparsity_weight: 0.005

🔧 核心组件

1. Dynamic-Rank Attention (src/core/dynamic_rank_attention.py)

from src.core import DynamicRankAttention

attn = DynamicRankAttention(
    hidden_size=4096,
    num_heads=32,
    max_rank=256,
    min_rank=32,
)

# 使用动态秩
outputs = attn(hidden_states, rank=96)  # 使用秩96

2. Confidence-Triggered KV Cache (src/core/confidence_kv_cache.py)

from src.core import ConfidenceTriggeredKVCache

kv_manager = ConfidenceTriggeredKVCache(
    num_layers=32,
    hidden_size=4096,
    num_heads=32,
    threshold=0.85,
)

# 根据置信度动态管理 KV
key, value, stats = kv_manager(
    hidden_states,
    layer_idx=0,
    confidence=conf_scores,
    full_k_proj=k_proj,
    full_v_proj=v_proj,
)

3. DynaFuser Model (src/models/dynafuser_model.py)

from src.models import DynaFuserModel, DynaFuserConfig

config = DynaFuserConfig.from_yaml("config/model.yaml")
model = DynaFuserModel(config)

# 训练
outputs = model(input_ids, labels=labels)
loss = outputs["loss"]

# 推理
generated = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.7,
)

📈 性能监控

from src.utils.profiler import Profiler

profiler = Profiler(enabled=True, warmup_steps=5)

profiler.start()
outputs = model(input_ids)
metrics = profiler.stop(num_tokens=input_ids.numel())

print(f"Latency: {metrics.latency_ms:.2f} ms")
print(f"Memory: {metrics.memory_allocated_mb:.2f} MB")
print(f"Throughput: {metrics.throughput_tokens_per_sec:.2f} tok/s")

# 获取模型统计
stats = model.get_performance_stats()
print(f"Average Rank: {stats['avg_rank']}")
print(f"Trigger Rate: {stats['trigger_rate']}")

🤝 贡献

欢迎提交 Issue 和 Pull Request!


📄 许可证

本项目采用 MIT 许可证。


🙏 致谢

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors