Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results of mfcc in main.c is different from that in python with the same setting. #103

Open
liziru opened this issue Nov 19, 2020 · 13 comments

Comments

@liziru
Copy link

liziru commented Nov 19, 2020

Results of mfcc in main.c is different from that in python with the same setting.

I give the same input(512 zero samples.) to the 'mfcc‘ api in main.c and python with the same setting, but i got different results. The setting is shown below:
Python: ![image](https://user-images.githubusercontent.com/34911790/99630712-55906a00-2a75-11eb-8c42-337fdbbc9da7.png)
mfcc api in main.c ![image](https://user-images.githubusercontent.com/34911790/99630815-73f66580-2a75-11eb-909c-04feeddedb8f.png) ![image](https://user-images.githubusercontent.com/34911790/99630858-87093580-2a75-11eb-9d3a-712bc2affb94.png)
The input is 512 zero sample, and the result is -84.3408 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 -0.0000 0.0001 -0.0001 0.0000 -0.0000 -0.0001 -0.0000 0.0000 0.0001 -0.0001 0.0002 in main.c and '-36.0437,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000' in python.

The mfcc setting is the same as original rnn-denoise example that makes me confused a lot.
I will appreciate a lot for any help.

@liziru
Copy link
Author

liziru commented Nov 19, 2020

mfcc setting in python:
# get the mfcc of noisy voice mfcc_feat = mfcc(sig, sample_rate, winlen=0.032, winstep=0.032 / 2, numcep=20, nfilt=20, nfft=512, lowfreq=20, highfreq=8000, winfunc=np.hanning, ceplifter=0, preemph=0, appendEnergy=True)
mfcc setting in main.c:
// 20 features, 0 offset, 20 bands, 512fft, 0 preempha, attached_energy_to_band0 mfcc_t *mfcc = mfcc_create(NUM_FEATURES, 0, NUM_FEATURES, 512, 0, true);
#define SAMP_FREQ 16000 #define MEL_LOW_FREQ 20 #define MEL_HIGH_FREQ 8000

@majianjia
Copy link
Owner

majianjia commented Nov 19, 2020

Hi @liziru

Please use some true number to test both functions. All zero simply means there is no energy in each band so the first band will give to its minimum cause by Log(0). With true signal (or just some random noise), you might plot both or use some metric like MSE or cosine similarity to compare the output of those 2 signals.

Since we use the option appendEnergy=True and in main.c mfcc_create(..., true)., the first band will represent the energy of the FFT. I believe the python is using 64bit float arithmetic but 32bit float in C.
So this might be the cause of the different. anyway, both -84 and -36 are their minimum number.
In both python and c code, they are saturated by 2^3 = 8

quantize_data(nn_features, nn_features_q7, NUM_FEATURES+20, 3);

x_train = normalize(x_train, 3, quantize=False)

They will both saturated to -8 after these 2 quantisation/saturation. So this energy different will not affect anything.

@liziru
Copy link
Author

liziru commented Nov 20, 2020

@majianjia
Thank you for your reply.
I did the test following your advice and it really works.
However, I found another two problems.
First, with the sample input(0-512, 512 samples), the result of the python code is a little different from that of c code. As you said, python using 64bit float arithmetic but 32bit float in C may lead to this problem.
before being saturated to -8, python and c results with the same input:
-8.0303,29.3225,6.7850,7.4641,3.6157,4.1926,2.4651,2.8310,1.8338,2.0457,1.3851,1.5110,1.0347,1.0887,0.7333,0.7338,0.4813,0.4191,0.2408,0.1338, -4.5886,30.0869,7.2367,7.8549,3.8652,4.3900,2.5817,2.9530,1.8638,2.1015,1.4186,1.5603,1.0761,1.1739,0.7649,0.7836,0.5086,0.5034,0.2978,0.1628

Second, with the sample input, the result of nnom inference is a little different from results of tf model.predict api.
input feats:
-8.0303, 29.3225, 6.7850, 7.4641, 3.6157, 4.1926, 2.4651, 2.8310, 1.8338, 2.0457, 1.3851, 1.5110, 1.0347, 1.0887, 0.7333, 0.7338, 0.4813, 0.4191, 0.2408, 0.1338, -8.0303, 29.3225, 6.7850, 7.4641, 3.6157, 4.1926, 2.4651, 2.8310, 1.8338, 2.0457, -8.0303, 29.3225, 6.7850, 7.4641, 3.6157, 4.1926, 2.4651, 2.8310, 1.8338, 2.0457
results of nnom infer and tf api infer:
0.4724,0.7480,0.8504,0.8583,0.8583,0.8583,0.8346,0.8425,0.8110,0.8346,0.8110,0.8268,0.8268,0.8031,0.8268,0.8268,0.8346,0.8425,0.8031,0.8346, 0.9275,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000
The first row is results of nnom infer. The secocnd row is results of tf api infer. Quanifiztions of inference and feats can leads to some loss, but the loss is a little big.
Is the loss acceptable? Do you have some advice to improve the loss?
Looking forward to your reply soon!

@liziru
Copy link
Author

liziru commented Nov 20, 2020

As a footnote,my nn model is made up of four full-connected layers, so there is no hidden information like RNN. And, the result distributions of two inference engines is almost same.

@majianjia
Copy link
Owner

the 8 bit resolution might not good for regression application. Please also try to this if It is related. #104

I will check in detail later when I am back.

@liziru
Copy link
Author

liziru commented Nov 20, 2020

the 8 bit resolution might not good for regression application. Please also try to this if It is related. #104

I will check in detail later when I am back.

Thank you very much. I checked my code and 'NNOM_TRUNCATE' was already defined in nnom_port.h as you advised in #104, but I didnot do the following step because i think this ops will round the results. change this line #define NNOM_ROUND(out_shift) ( (0x1u << out_shift) >> 1 ) to #define NNOM_ROUND(out_shift) ((q31_t)( (0x1u << out_shift) >> 1 )) fix the issue. But how about the arm version? still not working.

Sadly, the loss is not changed.

@majianjia
Copy link
Owner

Round or floor don't actually change the result because it only affects the result by 0.5/128.
In the denoise example, the output of normal gains are like this, with column represent the gain index (1~20) and rows represent the timestamp. You can see that they will reach 1 here after the hard_sigmoid() as the final output layer.

Did you try to use Conv or RNN? They might be different, dense is not working well when the 2 vectors is hugely in size (e.g. 1000 units input vs 2 units output).

image

@liziru
Copy link
Author

liziru commented Nov 20, 2020

Round or floor don't actually change the result because it only affects the result by 0.5/128.
In the denoise example, the output of normal gains are like this, with column represent the gain index (1~20) and rows represent the timestamp. You can see that they will reach 1 here after the hard_sigmoid() as the final output layer.

Did you try to use Conv or RNN? They might be different, dense is not working well when the 2 vectors is hugely in size (e.g. 1000 units input vs 2 units output).

image

Thank you for your reply.
I have to use denses(full-connected) layer in rnn-denoise project due to some limits. Input size and output size is all 20, so dense should be ok. I use four dense layers with relu activations besides the last layer with sigmoid rather than hard-sigmoid.
I can understand the gains table provided in the picture, but the loss of two inference engines exists truly which makes me sad and confused.

@majianjia
Copy link
Owner

The RNN currently runs with 8bit input/output data and 16bit memory (state) data, which might keep more info.
I am not sure what is the cause of the loss you met. Would you be able to validate the model for more data?
You may also try Conv-based network, TCN (consist of Conv with dilation>1) is completely fine using NNoM. Which can outperform RNN type model.

@liziru
Copy link
Author

liziru commented Nov 21, 2020

The RNN currently runs with 8bit input/output data and 16bit memory (state) data, which might keep more info.
I am not sure what is the cause of the loss you met. Would you be able to validate the model for more data?
You may also try Conv-based network, TCN (consist of Conv with dilation>1) is completely fine using NNoM. Which can outperform RNN type model.

OK, I think i am close to the answer. I found the weights.h file is different because of different x_train, i have to say the x_train is generated randomly, which is used to compare with nnom infer in c code with x_train as the same input. As a result, the result of nnorm infer with different weight.h file generated with different x_train.
# now generate the NNoM model. generate_model(model, x_train[:2048 * 4], name='weights.h')
weight.h file comparsions generated with different x_train.
image

So, the loss of two infer engines maybe caused by setting in weight.h.
However, NNOM_TRUNCATE' was already defined in nnom_port.h as you advised in #104, which should mean i am using float computation now.

@liziru
Copy link
Author

liziru commented Nov 21, 2020

@majianjia After i set x_train in 'generate_model' to x_train of training as you did in main.py example,
the result of nnom infer changes a lot and is still hugely different from that of 'tf predict' api infer.
the first row is generated by nnom infer.
0.5827,0.5197,0.4409,0.3937,0.2992,0.1654,0.1102,0.1181,0.1260,0.1417,0.1575,0.1575,0.1654,0.1732,0.1811,0.1969,0.2126,0.2126,0.2047,0.2047, 0.9952,0.9999,0.9998,0.9994,0.9898,0.9204,0.6904,0.6838,0.7321,0.8566,0.8668,0.8191,0.7994,0.8683,0.8680,0.9044,0.9288,0.9375,0.9346,0.9124

@majianjia
Copy link
Owner

Forget about NNOM_TRUNCATE since you don't use RNN layers. Also, this macro is not about using the floating number, nnom currently only running on 8bit fixed-point data.

For the calibration step generate_model(model, x_train[:2048 * 4], name='weights.h'), you should use real data, can be training or testing data but not random number. And the data should covert the most cases possible, you can enlarge the size of x_train[:2048 * 4] to see if it helps.
The calibration step will generate those number in you screenshot is determined by the output of each layer. Therefore to provide this q format to contains the maximum/minimum of the layer/weights. By using different calibration dataset, these bits/shift are supposed to be changed. However, calibrating with different real signals will bring little changes while with fake signals can change quite a lot.

I will suggest you run the example first. Once it is successful, then modify the tf model and see if it still work.

@liziru
Copy link
Author

liziru commented Nov 23, 2020

Forget about NNOM_TRUNCATE since you don't use RNN layers. Also, this macro is not about using the floating number, nnom currently only running on 8bit fixed-point data.

For the calibration step generate_model(model, x_train[:2048 * 4], name='weights.h'), you should use real data, can be training or testing data but not random number. And the data should covert the most cases possible, you can enlarge the size of x_train[:2048 * 4] to see if it helps.
The calibration step will generate those number in you screenshot is determined by the output of each layer. Therefore to provide this q format to contains the maximum/minimum of the layer/weights. By using different calibration dataset, these bits/shift are supposed to be changed. However, calibrating with different real signals will bring little changes while with fake signals can change quite a lot.

I will suggest you run the example first. Once it is successful, then modify the tf model and see if it still work.

I am sorry to tell you that enlarging the size of x_train[:2048 * 4] and running the example first and modifying the tf model does not work. Your nnom inference is a good project. Do you have a plan to support floating computation? I think developers of other areas will like this project very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants