About audio sampling #3

17Skye17 · 2022-12-13T15:39:16Z

Hi, I find that in this _get_audio function, a spectrogram will be generated for each sampled frame by signal.spectrogram(samples, samplerate, nperseg=512,noverlap=353). For efficiency, would it be possible to generate one large spectrogram for a video and then perform sampling on the large spectrogram? Since this would save some preprocessing costs if there is no memory problem.

Thanks for releasing this nice work!

def _get_audio(self, idx, s, e):
		
		audio_mask = torch.zeros((1, self.opt.max_audio_frames), dtype=np.long)


		audio = torch.zeros((self.opt.max_audio_frames, 1024, 128), dtype=torch.double) 



		audio_folder = self.video_dict[idx].split('.')[:-1][0].replace('frames',self.opt.audio_pt)

		audio_folder_bk = audio_folder.replace('audio_raw','VGGSound_Audio_features_10s_aligned')
		self.save_path = audio_folder_bk
		# audio_folder = audio_folder.replace('playpen-iop','playpen-storage')
		
		total_num_wav = len(glob.glob(audio_folder+'/*.wav'))
		total_num_pt = len(glob.glob(audio_folder+'/*.pt'))

		# print('Read: '+ idx)

		total_fbank = []
		# if self.opt.max_audio_frames < total_num_wav: # for frame-wise fusion
		
		if self.my_len == 4816 or True:
			sample_indx = np.linspace(0, total_num_pt-1, num=self.opt.max_audio_frames, dtype=int)
			for tmp_idx in sample_indx:
				fbank = torch.load(audio_folder+'/'+ str("{:04d}".format(tmp_idx))+ '.pt', map_location=torch.device('cpu'))

				total_fbank.append(fbank)

		else:
			for tmp_idx in range(self.opt.max_audio_frames):	#total_num_wav self.opt.max_audio_frames
					### loader for VGGSound
					try:
							
						samples, samplerate = sf.read(audio_folder+'/'+ '0000.wav')

						if samples.shape[0] > 16000*(self.opt.yb_audio_length+0.1):
							sample_indx = np.linspace(0, samples.shape[0] -16000*(self.opt.yb_audio_length+0.1), num=self.opt.max_audio_frames, dtype=int)
							samples = samples[sample_indx[tmp_idx]:sample_indx[tmp_idx]+int(16000*self.opt.yb_audio_length)]

						else:
							# repeat in case audio is too short
							samples = np.tile(samples,int(self.opt.yb_audio_length))[:int(16000*self.opt.yb_audio_length)]

						samples[samples > 1.] = 1.
						samples[samples < -1.] = -1.

						frequencies, times, spectrogram = signal.spectrogram(samples, samplerate, nperseg=512,noverlap=353)
						spectrogram = np.log(spectrogram+ 1e-7)

						mean = np.mean(spectrogram)
						std = np.std(spectrogram)
						spectrogram = np.divide(spectrogram-mean,std+1e-9)

						total_fbank.append(torch.tensor(spectrogram).unsqueeze(0).float())

					except:
						print('Too short: '+ audio_folder+'/'+ str("{:04d}".format(tmp_idx))+ '.wav')
						# print("skip too short")
						continue
					
		
		

		# audio[:total_fbank.size(0)] = total_fbank
		# audio_mask[0, :total_fbank.size(0)] = 1
		# return audio, audio_mask
		total_fbank = torch.vstack(total_fbank)
		return total_fbank, audio_mask

The text was updated successfully, but these errors were encountered:

GenjiB · 2022-12-18T00:47:34Z

Actually, it spend more time if you compute a spectrogram for a long audio signal. Thus, my code would compute only a short segment (e.g., 5s 10s 30s whatever you want)

GenjiB closed this as completed Dec 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About audio sampling #3

About audio sampling #3

17Skye17 commented Dec 13, 2022

GenjiB commented Dec 18, 2022

About audio sampling #3

About audio sampling #3

Comments

17Skye17 commented Dec 13, 2022

GenjiB commented Dec 18, 2022