In the initialization of FluxAttention, we determine whether to create to_out based on the parameter pre_only.
However, during inference, we decide whether to call to_out based on whether encoder_hidden_states is provided.
This asymmetry can be somewhat confusing for beginners.
Although this does not actually cause any runtime errors.
Only FluxTransformerBlock sets pre_only to False, and likewise, only FluxTransformerBlock passes encoder_hidden_states during inference.
Another issue is that context_pre_only appears to be an unused parameter.
I was wondering if it might be better to:
- Remove context_pre_only
- During inference, rely on self.pre_only to decide whether to_out should be called