Joyai image by mi804 · Pull Request #1392 · modelscope/DiffSynth-Studio

mi804 · 2026-04-14T04:53:37Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces support for the JoyAI-Image model, featuring a new DiT architecture, a Qwen3-VL-based text encoder, and a dedicated inference pipeline. The review identifies several critical bugs, including a missing device parameter in the pipeline that causes a KeyError and logic in the image embedder that ignores input images during denoising. Additionally, feedback points out redundant code in the DiT model, unused utility functions, and inconsistencies in pipeline unit definitions that should be cleaned up to improve maintainability and performance.

gemini-code-assist · 2026-04-14T04:56:57Z

+        inputs_shared = {
+            "cfg_scale": cfg_scale,
+            "input_image": input_image, "edit_images": edit_images,
+            "denoising_strength": denoising_strength,
+            "height": height, "width": width,
+            "seed": seed, "max_sequence_length": max_sequence_length,
+            "tiled": tiled, "tile_size": tile_size, "tile_stride": tile_stride,
+        }


The rand_device parameter is missing from the inputs_shared dictionary in the __call__ method. This will lead to a KeyError when JoyAIImageUnit_NoiseInitializer is executed, as it expects rand_device as an input parameter. You should add "rand_device": self.device (or a user-provided device) to the inputs_shared dictionary.

gemini-code-assist · 2026-04-14T04:56:57Z

+
+    def _encode_text_only(self, pipe, prompt, max_sequence_length):
+        # TODO: may support for text-only encoding in the future.
+        raise NotImplementedError("Text-only encoding is not implemented yet. Please provide edit_images for now.")


The _encode_text_only method currently raises a NotImplementedError. This prevents the pipeline from being used for text-to-image generation, which is listed as a supported feature in the model description. Implementing this method would ensure the pipeline is fully functional for all advertised tasks.

gemini-code-assist · 2026-04-14T04:56:57Z

+    def process(self, pipe: JoyAIImagePipeline, input_image, noise, tiled, tile_size, tile_stride):
+        if input_image is None:
+            return {"latents": noise}
+        pipe.load_models_to_device(self.onload_model_names)
+        if isinstance(input_image, Image.Image):
+            input_image = [input_image]
+        input_image = [pipe.preprocess_image(img).transpose(0, 1) for img in input_image]
+        latents = pipe.vae.encode(input_image, device=pipe.device, tiled=tiled, tile_size=tile_size, tile_stride=tile_stride)
+        input_latents = rearrange(latents, "(b n) c 1 h w -> b n c 1 h w", n=(len(input_image)))
+        return {"latents": noise, "input_latents": input_latents}


The JoyAIImageUnit_InputImageEmbedder does not correctly handle the initial latents for img2img mode. It always returns the pure noise as latents on line 267, effectively ignoring the input_image and the denoising_strength parameter. To support img2img, it should blend the encoded input_latents with noise based on the starting timestep determined by denoising_strength using the scheduler's add_noise method.

def process(self, pipe: JoyAIImagePipeline, input_image, noise, tiled, tile_size, tile_stride): if input_image is None: return {"latents": noise} pipe.load_models_to_device(self.onload_model_names) if isinstance(input_image, Image.Image): input_image = [input_image] input_image = [pipe.preprocess_image(img).transpose(0, 1) for img in input_image] latents = pipe.vae.encode(input_image, device=pipe.device, tiled=tiled, tile_size=tile_size, tile_stride=tile_stride) input_latents = rearrange(latents, "(b n) c 1 h w -> b n c 1 h w", n=(len(input_image))) if pipe.scheduler.denoising_strength < 1.0: latents = pipe.scheduler.add_noise(input_latents, noise, pipe.scheduler.timesteps[0]) else: latents = noise return {"latents": latents, "input_latents": input_latents}

gemini-code-assist · 2026-04-14T04:56:57Z

+        if activation_fn == "gelu-approximate":
+            self.proj = nn.Linear(dim, inner_dim, bias=bias)
+            self.act = lambda x: F.gelu(x, approximate="tanh")
+        elif activation_fn == "gelu":
+            self.proj = nn.Linear(dim, inner_dim, bias=bias)
+            self.act = F.gelu
+        else:
+            self.proj = nn.Linear(dim, inner_dim, bias=bias)
+            self.act = F.gelu


The nn.Linear layer is initialized identically in all branches of the if-elif-else block. This can be refactored to improve maintainability and reduce code duplication by moving the linear layer initialization outside the conditional block.

self.proj = nn.Linear(dim, inner_dim, bias=bias) if activation_fn == "gelu-approximate": self.act = lambda x: F.gelu(x, approximate="tanh") else: self.act = F.gelu

gemini-code-assist · 2026-04-14T04:56:58Z

+def get_cu_seqlens(text_mask, img_len):
+    batch_size = text_mask.shape[0]
+    text_len = text_mask.sum(dim=1)
+    max_len = text_mask.shape[1] + img_len
+    cu_seqlens = torch.zeros([2 * batch_size + 1], dtype=torch.int32, device="cuda")
+    for i in range(batch_size):
+        s = text_len[i] + img_len
+        s1 = i * max_len + s
+        s2 = (i + 1) * max_len
+        cu_seqlens[2 * i + 1] = s1
+        cu_seqlens[2 * i + 2] = s2
+    return cu_seqlens


The function get_cu_seqlens is defined but not used anywhere in the file. Additionally, it contains a hardcoded device="cuda" on line 158, which would cause runtime errors in CPU or NPU environments. It is recommended to remove this unused function to keep the code clean and avoid potential device-related bugs.

gemini-code-assist · 2026-04-14T04:56:58Z

+        x = torch.cat((img, txt), 1)
+        img = x[:, :img_len, ...]


This concatenation and subsequent slicing operation is redundant. img is concatenated with txt and then immediately sliced back to its original length using img_len. This does not change the content of img and can be removed to improve efficiency.

gemini-code-assist · 2026-04-14T04:56:58Z

+            input_params=("edit_images", "tiled", "tile_size", "tile_stride", "edit_image_basesize", "height", "width"),
+            output_params=("ref_latents", "num_items", "is_multi_item"),
+            onload_model_names=("wan_video_vae",),


There are several inconsistencies in the unit parameter definitions. In JoyAIImageUnit_EditImageEmbedder, edit_image_basesize is listed as an input but is unused in the process method. Additionally, num_items and is_multi_item are listed as output parameters but are never returned by the unit. Similarly, in JoyAIImageUnit_PromptEmbedder, the positive parameter is defined but unused. These unused parameters should be removed to improve code clarity.

mi804 added 6 commits April 9, 2026 17:50

auto intergrate joyimage model

d02d23d

joyimage pipeline

4a19346

train

3c27fd3

ready

d51fabb

styling

016fbf6

joyai-image docs

8a41183

mi804 closed this Apr 14, 2026

gemini-code-assist bot reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joyai image#1392

Joyai image#1392
mi804 wants to merge 6 commits intomodelscope:mainfrom
mi804:joyai-image

mi804 commented Apr 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

gemini-code-assist bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mi804 commented Apr 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant