You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I read your paper and tests/integration_test.py , my question is that I want to know the way you use, to embedding the audio stream data with D = 512.
Actually it's like the question here The way you generate train data or test data from a audio stream.
Is that like librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40) ?
In your paper, say: In this system, audio signals are first transformed into frames of width 25ms and step 10ms, and log-mel-filterbank energies of dimension 40 are extracted from each frame as the network input. These frames form overlapping sliding windows of a fixed length, on which we run the LSTM network. The last-frame output of the LSTM is then used as the d-vector representation of this sliding window
How can I reproduce this part ~
I appreciate it, waiting for your response!
Thanks,
Bo
The text was updated successfully, but these errors were encountered:
The feature extraction system and d-vector system at Google are proprietary code, and cannot be open-sourced. You need to either find a third-party implementation, or use your own implementation. This repo is dedicated to the UIS-RNN system.
Hi, thank you for open source it !
I read your paper and tests/integration_test.py , my question is that I want to know the way you use, to embedding the audio stream data with D = 512.
Actually it's like the question here
The way you generate train data or test data from a audio stream.
Is that like
librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40)
?In your paper, say:
In this system, audio signals are first transformed into frames of width 25ms and step 10ms, and log-mel-filterbank energies of dimension 40 are extracted from each frame as the network input. These frames form overlapping sliding windows of a fixed length, on which we run the LSTM network. The last-frame output of the LSTM is then used as the d-vector representation of this sliding window
How can I reproduce this part ~
I appreciate it, waiting for your response!
Thanks,
Bo
The text was updated successfully, but these errors were encountered: