How to train Qwen Image on 4090, getting non stop OOM no matter what I set up the config to.

I’m unable to train Qwen-Image on a 24GB GPU — it always runs into CUDA OOM.

Steps I already tried:
- `export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`
- `--enable_fp8_training`
- `--use_gradient_checkpointing`
- `--gradient_accumulation_steps 2`
- `--max_pixels 262144`

Even with these, the model immediately runs out of memory on 24GB card