I’m unable to train Qwen-Image on a 24GB GPU — it always runs into CUDA OOM.
Steps I already tried:
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
--enable_fp8_training
--use_gradient_checkpointing
--gradient_accumulation_steps 2
--max_pixels 262144
Even with these, the model immediately runs out of memory on 24GB card