Skip to content

Conversation

@yiming0416
Copy link
Contributor

@yiming0416 yiming0416 commented Oct 27, 2025

This is needed to run deepseek_v3 with flex attention

test:

NGPU=4 CONFIG_FILE=./torchtitan/models/deepseek_v3/train_configs/debug_model.toml ./run_train.sh --model.name compiler_toolkit.deepseek_v3 --compile.enable --parallelism.data_parallel_shard_degree=2 --parallelism.tensor_parallel_degree=2 --parallelism.expert_parallel_degree=2 --activation_checkpoint.mode none --model.flavor=debugmodel_flex_attn

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 27, 2025
@SherlockNoMad SherlockNoMad merged commit f51ab9f into main Oct 27, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants