resolve dependencies#1426
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors audio loading operators in diffsynth/core/data/operators.py to use deferred imports, facilitating optional dependencies, and updates pyproject.toml with new dependency groups. Feedback identifies a resource leak in LoadAudioWithTorchaudio where the file reader is not properly closed and suggests standardizing the LoadAudio operator to return a torch tensor and sample rate for consistency with other operators.
| reader = self.get_reader(data) | ||
| num_frames = self.get_num_frames(reader) | ||
| duration = num_frames / self.frame_rate | ||
| waveform, sample_rate = torchaudio.load(data) | ||
| waveform, sample_rate = self.audio_loader(data) |
There was a problem hiding this comment.
There is a resource leak here: the imageio reader created by self.get_reader(data) is never closed. This can lead to an accumulation of open file handles, especially when processing large datasets. Furthermore, the file is opened twice (once by imageio and once by self.audio_loader). Using a context manager for the reader ensures it is closed immediately after extracting the necessary metadata.
| reader = self.get_reader(data) | |
| num_frames = self.get_num_frames(reader) | |
| duration = num_frames / self.frame_rate | |
| waveform, sample_rate = torchaudio.load(data) | |
| waveform, sample_rate = self.audio_loader(data) | |
| with self.get_reader(data) as reader: | |
| num_frames = self.get_num_frames(reader) | |
| duration = num_frames / self.frame_rate | |
| waveform, sample_rate = self.audio_loader(data) |
| input_audio, sample_rate = self.audio_loader(data, sr=self.sr) | ||
| return input_audio |
There was a problem hiding this comment.
The LoadAudio operator returns a numpy array (from librosa.load), whereas other audio operators in this file (like LoadAudioWithTorchaudio and LoadPureAudioWithTorchaudio) return torch tensors. This inconsistency can cause issues in data pipelines that expect a uniform tensor format. Additionally, LoadAudio only returns the waveform, while the others return a (waveform, sample_rate) tuple. Consider standardizing the output format across all audio operators.
| input_audio, sample_rate = self.audio_loader(data, sr=self.sr) | |
| return input_audio | |
| input_audio, sample_rate = self.audio_loader(data, sr=self.sr) | |
| return torch.from_numpy(input_audio) |
No description provided.