fix bug for float duration src_audio by mi804 · Pull Request #1421 · modelscope/DiffSynth-Studio

mi804 · 2026-04-29T08:10:31Z

No description provided.

gemini-code-assist

Code Review

This pull request updates the AceStepPipeline to improve the handling of float durations by transitioning from floor division to rounding for frame calculations. It also introduces a logic change in the cover task to concatenate source latents when the source audio is shorter than the generated latents. Additionally, the default shift parameter in the documentation has been updated from 1.0 to 3.0. Feedback was provided to ensure consistency by using round() instead of int() across all frame conversions to prevent off-by-one errors, and to add a null check for source_latents to avoid a potential AttributeError.

gemini-code-assist · 2026-04-29T08:12:19Z

+                if src_latents.shape[1] > source_latents.shape[1]:
+                    source_latents = torch.cat([source_latents, src_latents[:, source_latents.shape[1]:]], dim=1)


If the cover task is initiated using audio_code_string instead of src_audio, source_latents will be None. Accessing .shape[1] on a None object will raise an AttributeError. A check should be added to ensure source_latents is not None before performing the shape comparison and concatenation.

Suggested change

if src_latents.shape[1] > source_latents.shape[1]:

source_latents = torch.cat([source_latents, src_latents[:, source_latents.shape[1]:]], dim=1)

if source_latents is not None and src_latents.shape[1] > source_latents.shape[1]:

source_latents = torch.cat([source_latents, src_latents[:, source_latents.shape[1]:]], dim=1)

gemini-code-assist · 2026-04-29T08:12:19Z

+        padding_frames_left = int(pad_left * pipe.vae.sampling_rate)
+        padding_frames_right = max(int(max_right * pipe.vae.sampling_rate) - src_audio.shape[-1], 0)


Using int() for converting seconds to audio frames can lead to off-by-one errors due to floating-point precision (e.g., 0.1 * 48000 might result in 4799.999999999999 which int() floors to 4799). Since this PR aims to fix float duration issues, using round() is more appropriate and consistent with the changes made elsewhere in this file.

Suggested change

padding_frames_left = int(pad_left * pipe.vae.sampling_rate)

padding_frames_right = max(int(max_right * pipe.vae.sampling_rate) - src_audio.shape[-1], 0)

padding_frames_left = round(pad_left * pipe.vae.sampling_rate)

padding_frames_right = max(round(max_right * pipe.vae.sampling_rate) - src_audio.shape[-1], 0)

gemini-code-assist · 2026-04-29T08:12:19Z

+            start_frame = int(start * pipe.vae.sampling_rate / 1920)
+            end_frame = int(end * pipe.vae.sampling_rate / 1920)


For consistency with the calculation of max_latent_length (line 513), these frame indices should be calculated using round() instead of int(). This ensures that float durations are mapped to the same latent frames across different parts of the pipeline.

Suggested change

start_frame = int(start * pipe.vae.sampling_rate / 1920)

end_frame = int(end * pipe.vae.sampling_rate / 1920)

start_frame = round(start * pipe.vae.sampling_rate / 1920)

end_frame = round(end * pipe.vae.sampling_rate / 1920)

gemini-code-assist · 2026-04-29T08:12:19Z

+        pad_left_frames =  int(padding_frames_left / 1920)
+        pad_right_frames = int(padding_frames_right / 1920)


Using round() here is more consistent with the mapping of audio duration to latent frames used in the rest of the pipeline. This helps avoid potential artifacts at the boundaries when dealing with float durations.

Suggested change

pad_left_frames = int(padding_frames_left / 1920)

pad_right_frames = int(padding_frames_right / 1920)

pad_left_frames = round(padding_frames_left / 1920)

pad_right_frames = round(padding_frames_right / 1920)

fix bug for float duration src_audio

4eada0d

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

Artiprocher merged commit 534e2e4 into modelscope:main Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix bug for float duration src_audio#1421

fix bug for float duration src_audio#1421
Artiprocher merged 1 commit into
modelscope:mainfrom
mi804:ace-step1

mi804 commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if src_latents.shape[1] > source_latents.shape[1]:
		source_latents = torch.cat([source_latents, src_latents[:, source_latents.shape[1]:]], dim=1)

		padding_frames_left = int(pad_left * pipe.vae.sampling_rate)
		padding_frames_right = max(int(max_right * pipe.vae.sampling_rate) - src_audio.shape[-1], 0)

		start_frame = int(start * pipe.vae.sampling_rate / 1920)
		end_frame = int(end * pipe.vae.sampling_rate / 1920)

		pad_left_frames = int(padding_frames_left / 1920)
		pad_right_frames = int(padding_frames_right / 1920)

Conversation

mi804 commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants