Skip to content

resolve dependencies#1426

Merged
Artiprocher merged 1 commit into
mainfrom
resolve-dependencies
Apr 30, 2026
Merged

resolve dependencies#1426
Artiprocher merged 1 commit into
mainfrom
resolve-dependencies

Conversation

@Artiprocher
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors audio loading operators in diffsynth/core/data/operators.py to use deferred imports, facilitating optional dependencies, and updates pyproject.toml with new dependency groups. Feedback identifies a resource leak in LoadAudioWithTorchaudio where the file reader is not properly closed and suggests standardizing the LoadAudio operator to return a torch tensor and sample rate for consistency with other operators.

Comment on lines 267 to +270
reader = self.get_reader(data)
num_frames = self.get_num_frames(reader)
duration = num_frames / self.frame_rate
waveform, sample_rate = torchaudio.load(data)
waveform, sample_rate = self.audio_loader(data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a resource leak here: the imageio reader created by self.get_reader(data) is never closed. This can lead to an accumulation of open file handles, especially when processing large datasets. Furthermore, the file is opened twice (once by imageio and once by self.audio_loader). Using a context manager for the reader ensures it is closed immediately after extracting the necessary metadata.

Suggested change
reader = self.get_reader(data)
num_frames = self.get_num_frames(reader)
duration = num_frames / self.frame_rate
waveform, sample_rate = torchaudio.load(data)
waveform, sample_rate = self.audio_loader(data)
with self.get_reader(data) as reader:
num_frames = self.get_num_frames(reader)
duration = num_frames / self.frame_rate
waveform, sample_rate = self.audio_loader(data)

Comment on lines +254 to 255
input_audio, sample_rate = self.audio_loader(data, sr=self.sr)
return input_audio
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The LoadAudio operator returns a numpy array (from librosa.load), whereas other audio operators in this file (like LoadAudioWithTorchaudio and LoadPureAudioWithTorchaudio) return torch tensors. This inconsistency can cause issues in data pipelines that expect a uniform tensor format. Additionally, LoadAudio only returns the waveform, while the others return a (waveform, sample_rate) tuple. Consider standardizing the output format across all audio operators.

Suggested change
input_audio, sample_rate = self.audio_loader(data, sr=self.sr)
return input_audio
input_audio, sample_rate = self.audio_loader(data, sr=self.sr)
return torch.from_numpy(input_audio)

@Artiprocher Artiprocher merged commit 44c52da into main Apr 30, 2026
@Artiprocher Artiprocher deleted the resolve-dependencies branch April 30, 2026 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant