resolve dependencies by Artiprocher · Pull Request #1426 · modelscope/DiffSynth-Studio

Artiprocher · 2026-04-30T08:10:04Z

No description provided.

gemini-code-assist

Code Review

This pull request refactors audio loading operators in diffsynth/core/data/operators.py to use deferred imports, facilitating optional dependencies, and updates pyproject.toml with new dependency groups. Feedback identifies a resource leak in LoadAudioWithTorchaudio where the file reader is not properly closed and suggests standardizing the LoadAudio operator to return a torch tensor and sample rate for consistency with other operators.

gemini-code-assist · 2026-04-30T08:12:18Z

            reader = self.get_reader(data)
            num_frames = self.get_num_frames(reader)
            duration = num_frames / self.frame_rate
-            waveform, sample_rate = torchaudio.load(data)
+            waveform, sample_rate = self.audio_loader(data)


There is a resource leak here: the imageio reader created by self.get_reader(data) is never closed. This can lead to an accumulation of open file handles, especially when processing large datasets. Furthermore, the file is opened twice (once by imageio and once by self.audio_loader). Using a context manager for the reader ensures it is closed immediately after extracting the necessary metadata.

Suggested change

reader = self.get_reader(data)

num_frames = self.get_num_frames(reader)

duration = num_frames / self.frame_rate

waveform, sample_rate = torchaudio.load(data)

waveform, sample_rate = self.audio_loader(data)

with self.get_reader(data) as reader:

num_frames = self.get_num_frames(reader)

duration = num_frames / self.frame_rate

waveform, sample_rate = self.audio_loader(data)

gemini-code-assist · 2026-04-30T08:12:19Z

+        input_audio, sample_rate = self.audio_loader(data, sr=self.sr)
        return input_audio


The LoadAudio operator returns a numpy array (from librosa.load), whereas other audio operators in this file (like LoadAudioWithTorchaudio and LoadPureAudioWithTorchaudio) return torch tensors. This inconsistency can cause issues in data pipelines that expect a uniform tensor format. Additionally, LoadAudio only returns the waveform, while the others return a (waveform, sample_rate) tuple. Consider standardizing the output format across all audio operators.

Suggested change

input_audio, sample_rate = self.audio_loader(data, sr=self.sr)

return input_audio

input_audio, sample_rate = self.audio_loader(data, sr=self.sr)

return torch.from_numpy(input_audio)

resolve dependencies

6f90006

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

Artiprocher merged commit 44c52da into main Apr 30, 2026

Artiprocher deleted the resolve-dependencies branch April 30, 2026 08:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolve dependencies#1426

resolve dependencies#1426
Artiprocher merged 1 commit into
mainfrom
resolve-dependencies

Artiprocher commented Apr 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		input_audio, sample_rate = self.audio_loader(data, sr=self.sr)
		return input_audio

Conversation

Artiprocher commented Apr 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant