-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RMS-levels = 0.3 #1066
Comments
Thanks for the report! Just briefly perusing the code, I can't see where we would normalize in the CLI, but I'll look into it asap. A few things that would really help me identify the issue are:
I likely won't have time to look at this today or tomorrow, but hopefully this week! |
See atached files in zip. (1 & 2) Ramped file seems to be correct in beginning with 0 rms, since the start is 0 and then peaks close to the beginning. However the highest rms in the csv is found in the end (row 18) and there's no sign of decaying rms equivalent to the file signal. It makes no sense and looks like normalizing per frame to me, wich also makes no sense. I also tried removing hop overlap but the results are equivalent. I don't know about the last row being 0, I get that in all files with overlap but that's no big deal for me since it can be removed. (3) The problem for me is producing csv files offline for model building so that these csv files corresponds with what I get when using Meyda on realtime WebAudio context. |
If I was better at Javascript/Typescript I would love to help out but unfortunately that's a bit out of my reach right now. |
It's definitely best to compare apples to apples - if you're comparing wavs on disk to audio input via an interface, it might be that your audio interface doesn't have whatever compression was applied to the wav. Is the wav just a plain recording of the audio input? I would expect pink noise to be of a consistent high RMS, and audio input from a microphone to be much quieter. |
The wavs are from various sources, professional recordings, but now that I swiched to integer wavs things seems to be getting consistent. I will have to do more tests though. My idea is to find a correspondence in level between training and realtime. That's why rms is useful. Alternatively, normalization is an option but I'm not sure how to do that on the actual audio stream (time domain) so I might do it on the Meyda output (frequency domain e.t.c. - mfcc for now) however I have to read in on how to normalize mfcc correctly so I don't destroy information. |
A couple of things:
Do you still think an issue with Meyda is causing your different values between training and real-time? |
Now I've done a more thorough preliminary test and I get consistent results. Very consistent actually. And even the very stupid and simple model I trained seems to be quite robust to level differences. Anyway, this confirms I can train a model in Python with parameters extracted with Meyda CLI. Export the model using Tensorflow.js and then use Javascript, WebAudio and Meyda in realtime in a browser to make predictions. It's very nice. So I don't think there's an issue with Meyda. About levels realtime vs offline and this might be depending on browser – but in my system with Chrome on macOS 10.14 I needed to double the gain from the WebAudio input Stream to get corresponding RMS-readings from Meyda compared to offline analysis. Could be a mono/stereo thing or something else, not a big deal and as you say I will need to take various measures to augument training data (including levels) in a more realistic model scenario. |
While trying to match offline analysis and realtime analysis using WebAudio and Meyda I came across a strange thing:
If I analyze a batch of files using something like:
# meyda --bs=1024 --hs=512 --mfcc=13 --o=meydaoutput.csv samples_training/DczN6842.wav mfcc rms
If I look in the csv files all frames for all files analyzed has an RMS that is somewhere between 0.29 and 0.31. (apart from the last frame 2 frames where the last is always zero)
This indicates normalization per frame which makes the RMS value of no use as far as I can tell. Or is there something I miss here?
In the realtime case, playing back the same files through an audiocard (using a calibration procedure to get levels the same as for the files) I get an RMS at about 0.01-0.02 which is consistent with what I get from measuring WebAudio directly using
input = event.inputBuffer.getChannelData(0);
.This is nowhere near 0.3, actually 0.3 RMS seems to be close to maximum level possible as far as I can tell. So the realtime version does not normalize I guess?
How does this affect other parameters like MFCC, I ask since I have trouble getting consistency between realtime and offline feature extractions.
The text was updated successfully, but these errors were encountered: