Add Photon model and pipeline support #12456

DavidBert · 2025-10-09T13:21:05Z

This commit adds support for the Photon image generation model:

PhotonTransformer2DModel: Core transformer architecture
PhotonPipeline: Text-to-image generation pipeline
Attention processor updates for Photon-specific attention mechanism
Conversion script for loading Photon checkpoints
Documentation and tests

Some exemples below with the 512 model fine-tuned on the Alchemist dataset and distilled with PAG

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

This commit adds support for the Photon image generation model: - PhotonTransformer2DModel: Core transformer architecture - PhotonPipeline: Text-to-image generation pipeline - Attention processor updates for Photon-specific attention mechanism - Conversion script for loading Photon checkpoints - Documentation and tests

DavidBert · 2025-10-09T13:21:46Z

scripts/convert_photon_to_diffusers.py

+    print("✓ Created scheduler config")
+
+
+def download_and_save_vae(vae_type: str, output_path: str):


I'm not sure on this one: I'm saving the VAE weights while they are already available on the Hub (Flux VAE and DC-AE).
Is there a way to avoid storing them and instead look directly for the original ones?

For now, it's okay to keep this as is. This way, everything is under the same model repo.

DavidBert · 2025-10-09T13:22:22Z

scripts/convert_photon_to_diffusers.py

+    print(f"✓ Saved VAE to {vae_path}")
+
+
+def download_and_save_text_encoder(output_path: str):


Same here for the Text Encoder.

sayakpaul · 2025-10-09T13:40:52Z

scripts/convert_photon_to_diffusers.py

+    print("✓ Created scheduler config")
+
+
+def download_and_save_vae(vae_type: str, output_path: str):


For now, it's okay to keep this as is. This way, everything is under the same model repo.

src/diffusers/pipelines/photon/pipeline_output.py

src/diffusers/models/attention_processor.py

src/diffusers/models/transformers/transformer_photon.py

src/diffusers/pipelines/photon/pipeline_photon.py

sayakpaul

Thanks for the clean PR! I left some initial feedback for you. LMK if that makes sense.

Also, it would be great to see some samples of Photon!

sayakpaul

Thanks! Left a couple more comments. Let's also add the pipeline-level tests.

docs/source/en/api/pipelines/photon.md

sayakpaul · 2025-10-13T10:59:38Z

docs/source/en/api/pipelines/photon.md

+  <img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
+</div>
+
+Photon is a text-to-image diffusion model using simplified MMDIT architecture with flow matching for efficient high-quality image generation. The model uses T5Gemma as the text encoder and supports either Flux VAE (AutoencoderKL) or DC-AE (AutoencoderDC) for latent compression.


Cc: @stevhliu for a review on the docs.

sayakpaul · 2025-10-13T11:00:59Z

src/diffusers/models/transformers/transformer_photon.py

+    return xq_out.reshape(*xq.shape).type_as(xq)
+
+
+class PhotonAttnProcessor2_0:


Could we write it in a fashion similar to

diffusers/src/diffusers/models/transformers/transformer_flux.py

Line 75 in 8abc7ae

class FluxAttnProcessor:

?

I second this suggestion - in particular, I think it would be more in line with other diffusers models implementations to reuse the layers defined in Attention, such as to_q/to_k/to_v, etc. instead of defining them in PhotonBlock (e.g. PhotonBlock.img_qkv_proj), and to keep the entire attention implementation in the PhotonAttnProcessor2_0 class.

Attention supports stuff like QK norms and fusing projections, so that could potentially be reused as well. If you need some custom logic not found in Attention, you could potentially add it in there or create a new Attention-style class like Flux does:

diffusers/src/diffusers/models/transformers/transformer_flux.py

Line 275 in 8abc7ae

class FluxAttention(torch.nn.Module, AttentionModuleMixin):

I made the change and updated both the conversion script and the checkpoints on the hub.

src/diffusers/models/transformers/transformer_photon.py

src/diffusers/pipelines/photon/pipeline_photon.py

sayakpaul · 2025-10-13T11:10:00Z

src/diffusers/pipelines/photon/pipeline_photon.py

+    def __call__(
+        self,
+        prompt: Union[str, List[str]] = None,
+        height: Optional[int] = None,


We support passing prompt embeddings too in case users want to supply them precomputed:

diffusers/src/diffusers/pipelines/flux/pipeline_flux.py

Line 669 in 8abc7ae

prompt_embeds: Optional[torch.FloatTensor] = None,

sayakpaul · 2025-10-13T11:10:46Z

src/diffusers/pipelines/photon/pipeline_photon.py

+        default_sample_size = getattr(self.config, "default_sample_size", DEFAULT_RESOLUTION)
+        height = height or default_sample_size
+        width = width or default_sample_size


Prefer this pattern:

diffusers/src/diffusers/pipelines/flux/pipeline_flux.py

Line 783 in 8abc7ae

height = height or self.default_sample_size * self.vae_scale_factor

I did it this way because the model works for two different vae with different scale_factors.
Is it ok to not make it depend of self.vae_scale_factor? It makes it hard to define a default value otherwise.

Oh good point! I think we could make a small utility function in the pipeline class that determines the default resolution given the VAE that's loaded into it? WDYT?

Sure, way cleaner! I did it.

src/diffusers/pipelines/photon/pipeline_photon.py

DavidBert · 2025-10-16T09:51:41Z

Thanks @dg845 and @stevhliu for your last reviews! I updated the PR and hopefully addressed all your suggestions.

…photon

stevhliu

Thanks, docs LGTM

dg845 · 2025-10-17T00:04:21Z

scripts/convert_photon_to_diffusers.py

+    parser.add_argument(
+        "--checkpoint_path", type=str, required=True, help="Path to the original Photon checkpoint (.pth file)"
+    )


Would it be possible to set a meaningful default argument for checkpoint_path (for example, if the model checkpoint has been open-sourced and is available on e.g. HF hub, we could set it as a default)?

We did not open source the original code and model weights yet but plan to do it soon.
Is it ok to update it later when it's done?
What's the common practice here? Store the original weights and corresponding code on a model repo? I don't see any default path in the other conversion scripts.

Yeah, that's totally ok. I thought other conversion scripts would have it set but you're right that it's usually not the case.

scripts/convert_photon_to_diffusers.py

src/diffusers/models/transformers/transformer_photon.py

dg845 · 2025-10-17T00:30:39Z

src/diffusers/models/transformers/transformer_photon.py

+        # Apply scaled dot-product attention
+        attn_output = torch.nn.functional.scaled_dot_product_attention(
+            img_q.contiguous(), k.contiguous(), v.contiguous(), attn_mask=attn_mask_tensor
+        )


Just curious, have you tested Photon with any other attention backends (e.g. Flash Attention, Sage Attention, etc.)? Not a blocker, but if so you could consider refactoring to use dispatch_attention_fn to add support for these backends.

You can look at the Flux attention processor for an example:

diffusers/src/diffusers/models/transformers/transformer_flux.py

Lines 118 to 125 in dbe4136

hidden_states = dispatch_attention_fn(

query,

key,

value,

attn_mask=attention_mask,

backend=self._attention_backend,

parallel_config=self._parallel_config,

)

See PR #11916 and the attention backend docs for more info.

Thanks for the suggestion, I tried and it works!

src/diffusers/models/transformers/transformer_photon.py

src/diffusers/pipelines/photon/pipeline_photon.py

dg845 · 2025-10-17T00:58:38Z

tests/pipelines/photon/test_pipeline_photon.py

+from ..test_pipelines_common import PipelineTesterMixin
+
+
+class PhotonPipelineFastTests(PipelineTesterMixin, unittest.TestCase):


Would it be possible to add a corresponding PhotonPipelineSlowTests class where we test whether inference on a full checkpoint is consistent between diffusers and the original code? You can refer to FluxPipelineSlowTests as a reference:

diffusers/tests/pipelines/flux/test_pipeline_flux.py

Lines 233 to 237 in dbe4136

@nightly

@require_big_accelerator

class FluxPipelineSlowTests(unittest.TestCase):

pipeline_class = FluxPipeline

repo_id = "black-forest-labs/FLUX.1-schnell"

Okay to skip it for now IMO since we also don't add it for Qwen.

src/diffusers/models/transformers/transformer_photon.py

docs/source/en/api/pipelines/photon.md

dg845 · 2025-10-17T01:25:04Z

docs/source/en/api/pipelines/photon.md

+| Model | Resolution | Fine-tuned | Distilled | Description | Suggested prompts | Suggested parameters | Recommended dtype |
+|:-----:|:-----------------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
+| [`Photoroom/photon-256-t2i`](https://huggingface.co/Photoroom/photon-256-t2i)| 256 | No | No | Base model pre-trained at 256 with Flux VAE|Works best with detailed prompts in natural language|28 steps, cfg=5.0| `torch.bfloat16` |
+| [`Photoroom/photon-256-t2i-sft`](https://huggingface.co/Photoroom/photon-256-t2i-sft)| 512 | Yes | No | Fine-tuned on the [Alchemist dataset](https://huggingface.co/datasets/yandex/alchemist) dataset with Flux VAE | Can handle less detailed prompts|28 steps, cfg=5.0| `torch.bfloat16` |


Are these model links expected to be broken for now? I get a 404 for https://huggingface.co/Photoroom/photon-256-t2i-sft currently and see that only the Photoroom/photon-256-t2i model is currently in the Photon collection.

They were on a private repo. I made it public.

dg845 · 2025-10-17T01:34:13Z

src/diffusers/__init__.py

            "MultiControlNetModel",
            "OmniGenTransformer2DModel",
            "ParallelConfig",
+            "PhotonTransformer2DModel",


Could you also add PhotonPipeline to the main __init__? As an example, here is how FluxPipeline is added:

diffusers/src/diffusers/__init__.py

Line 457 in dbe4136

"FluxPipeline",

diffusers/src/diffusers/__init__.py

Line 1119 in dbe4136

FluxPipeline,

Also, could you add PhotonTransformer2DModel to the TYPE_CHECKING section of __init__? Here is how FluxTransformer2DModel is added:

diffusers/src/diffusers/__init__.py

Line 906 in dbe4136

FluxTransformer2DModel,

I see that PhotonPipeline has been added in both places, but PhotonTransformer2DModel is still only added to the _import_structure part of the __init__ file. Could you add it to the other (TYPE_CHECKING) section as well? See e.g. FluxTransformer2DModel:

_import_structure:

diffusers/src/diffusers/__init__.py

Line 214 in bef0845

"FluxTransformer2DModel",

if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:

diffusers/src/diffusers/__init__.py

Line 906 in dbe4136

FluxTransformer2DModel,

src/diffusers/models/__init__.py

src/diffusers/pipelines/__init__.py

dg845

Thanks for the changes! The PR is close to merge, I think the most important things left are to fix the imports (e.g. #12456 (comment)) and other changes to make the CI green :).

dg845 · 2025-10-17T02:15:25Z

tests/pipelines/photon/test_pipeline_photon.py

+        encoder_params = dict(
+            vocab_size=tokenizer.vocab_size,
+            hidden_size=8,
+            intermediate_size=16,
+            num_hidden_layers=1,
+            num_attention_heads=2,
+            num_key_value_heads=1,
+            head_dim=4,
+            max_position_embeddings=64,
+            layer_types=["full_attention"],
+            attention_bias=False,
+            attention_dropout=0.0,
+            dropout_rate=0.0,
+            hidden_activation="gelu_pytorch_tanh",
+            rms_norm_eps=1e-06,
+            attn_logit_softcapping=50.0,
+            final_logit_softcapping=30.0,
+            query_pre_attn_scalar=4,
+            rope_theta=10000.0,
+            sliding_window=4096,
+        )


Suggested change

encoder_params = dict(

vocab_size=tokenizer.vocab_size,

hidden_size=8,

intermediate_size=16,

num_hidden_layers=1,

num_attention_heads=2,

num_key_value_heads=1,

head_dim=4,

max_position_embeddings=64,

layer_types=["full_attention"],

attention_bias=False,

attention_dropout=0.0,

dropout_rate=0.0,

hidden_activation="gelu_pytorch_tanh",

rms_norm_eps=1e-06,

attn_logit_softcapping=50.0,

final_logit_softcapping=30.0,

query_pre_attn_scalar=4,

rope_theta=10000.0,

sliding_window=4096,

)

encoder_params = {

"vocab_size": tokenizer.vocab_size,

"hidden_size": 8,

"intermediate_size": 16,

"num_hidden_layers": 1,

"num_attention_heads": 2,

"num_key_value_heads": 1,

"head_dim": 4,

"max_position_embeddings": 64,

"layer_types": ["full_attention"],

"attention_bias": False,

"attention_dropout": 0.0,

"dropout_rate": 0.0,

"hidden_activation": "gelu_pytorch_tanh",

"rms_norm_eps": 1e-06,

"attn_logit_softcapping": 50.0,

"final_logit_softcapping": 30.0,

"query_pre_attn_scalar": 4,

"rope_theta": 10000.0,

"sliding_window": 4096,

}

make style/make quality complain about the dict(...) call here and I think it will happier if a dict literal is used instead

Hi @dg845! Thanks for your new review.
I addressed all your new comments except for one about a default path because we do not not currently have an open-source implementation of our original model.
I also prepared a second PR based on this one where we rename Photon (already existing in the community) to PRX.
I did not include it here to ease the PR review, but can do it if you prefer.
Have a nice weekend!

I think it would be easier to merge this PR first, then do the renaming as a follow-up PR. CC @sayakpaul

dg845 · 2025-10-17T23:25:05Z

tests/models/transformers/test_models_transformer_photon.py

+        timestep = torch.tensor([1.0]).to(torch_device).expand(batch_size)
+
+        return {
+            "image_latent": image_latent,


Suggested change

"image_latent": image_latent,

"hidden_states": image_latent,

To be consistent with suggested naming change in #12456 (comment)

dg845 · 2025-10-17T23:26:14Z

tests/models/transformers/test_models_transformer_photon.py

+        return {
+            "image_latent": image_latent,
+            "timestep": timestep,
+            "cross_attn_conditioning": cross_attn_conditioning,


Suggested change

"cross_attn_conditioning": cross_attn_conditioning,

"encoder_hidden_states": cross_attn_conditioning,

To be consistent with suggested naming change in #12456 (comment)

dg845 · 2025-10-17T23:31:14Z

src/diffusers/models/transformers/transformer_photon.py

-            micro_conditioning (`torch.Tensor`):
-                Extra conditioning vector (currently unused, reserved for future use).


Was removing micro_conditioning here (in bef0845) intentional? I think it would be fine to retain it and the transformer tests (specifically PhotonTransformerTests.prepare_dummy_input) also use this argument.

Yes it was intentional, I removed it in the tests too.

dg845 · 2025-10-17T23:39:05Z

tests/models/transformers/test_models_transformer_photon.py

+
+class PhotonTransformerTests(ModelTesterMixin, unittest.TestCase):
+    model_class = PhotonTransformer2DModel
+    main_input_name = "image_latent"


Suggested change

main_input_name = "image_latent"

main_input_name = "hidden_states"

To be consistent with the naming change suggested in #12456 (comment)

dg845

Thanks! Can you confirm that the tests are working as expected after the new changes?

DavidBert · 2025-10-18T05:12:00Z

Thanks! Can you confirm that the tests are working as expected after the new changes?

Very sorry, I forgot to verify this test.
Both test_pipeline_photon.py and test_models_transformer_photon.py are working now.

DavidBert commented Oct 9, 2025

View reviewed changes

sayakpaul reviewed Oct 9, 2025

View reviewed changes

src/diffusers/models/attention_processor.py Outdated Show resolved Hide resolved

sayakpaul reviewed Oct 9, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_photon.py Outdated Show resolved Hide resolved

src/diffusers/models/transformers/transformer_photon.py Outdated Show resolved Hide resolved

sayakpaul reviewed Oct 9, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_photon.py Outdated Show resolved Hide resolved

sayakpaul reviewed Oct 9, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_photon.py Outdated Show resolved Hide resolved

sayakpaul reviewed Oct 9, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_photon.py Outdated Show resolved Hide resolved

sayakpaul reviewed Oct 9, 2025

View reviewed changes

src/diffusers/pipelines/photon/pipeline_photon.py Outdated Show resolved Hide resolved

sayakpaul reviewed Oct 9, 2025

View reviewed changes

src/diffusers/pipelines/photon/pipeline_photon.py Outdated Show resolved Hide resolved

sayakpaul reviewed Oct 9, 2025

View reviewed changes

src/diffusers/pipelines/photon/pipeline_photon.py Outdated Show resolved Hide resolved

sayakpaul reviewed Oct 9, 2025

View reviewed changes

src/diffusers/pipelines/photon/pipeline_photon.py Outdated Show resolved Hide resolved

sayakpaul reviewed Oct 9, 2025

View reviewed changes

src/diffusers/pipelines/photon/pipeline_photon.py Outdated Show resolved Hide resolved

sayakpaul reviewed Oct 9, 2025

View reviewed changes

davidb and others added 10 commits October 9, 2025 16:06

just store the T5Gemma encoder

64ddfe5

enhance_vae_properties if vae is provided only

2947da0

remove autocast for text encoder forwad

2575997

BF16 example

27421cb

conditioned CFG

1321ab4

remove enhance vae and use vae.config directly when possible

32807a1

move PhotonAttnProcessor2_0 in transformer_photon

117e835

remove einops dependency and now inherits from AttentionMixin

c86aed2

unify the structure of the forward block

5f6359f

update doc

3396143

DavidBert force-pushed the photon branch from fafd774 to 3396143 Compare October 10, 2025 14:30

davidb and others added 3 commits October 10, 2025 19:17

update doc

c78f444

fix T5Gemma loading from hub

3f70395

fix timestep shift

d09ff3c

sayakpaul reviewed Oct 13, 2025

View reviewed changes

DavidBert requested review from dg845 and stevhliu October 16, 2025 09:51

Merge branch 'photon' of https://github.com/Photoroom/diffusers into …

fba7b33

…photon

stevhliu approved these changes Oct 16, 2025

View reviewed changes