Different prediction results depending on inference batch size #16

techscientist · 2017-08-10T21:31:40Z

I'm enjoying your Kapre library, as I'm currently using it in my music-based deep learning project...

In my deep network, I'm using a similar architecture as the ones you have in your music auto tagging repo, but I have replaced manual input audio preprocessing and the batch normalization layers in the network with their Kapre equivalent layers (like Melspectrogram and Normalization2D).

However, I am getting different prediction results when I change the batch size for the number of audio samples I am predicting at once.

I believe this is because your Normalization2D layer (which I think mimics Keras's BatchNormalization) is recalculating the mean/std, etc. during testing mode (aka. Keras.learning_phase == 0). This might cause the difference that I am experiencing from changing the batch size during batch prediction...

Is it possible to fix this?

Update: You can see Keras's implementation of BatchNormalization here: https://github.com/fchollet/keras/blob/master/keras/layers/normalization.py

From what I observe, it seems that Keras handles this issue by checking if the model is in testing mode, and if it is, it simply performs the batch normalization by using the mean and variance values learned through training (via storing them and updating these weights)..

How can we implement this functionality into Normalization2D, so that it also produces accurate prediction results when predicting single and/or batch samples?

Or, can we maybe rewrite Normalization2D, such that it simply extends Keras's BatchNormalization and only passes the axis to it (determined by the str_axis)? I think this would resolve this issue and also reduce the amt. of code needed for this Kapre layer...

Thanks!

keunwoochoi · 2017-08-12T04:36:10Z

You're correct. Perhaps I should've made it clear. For other axes then batch axis, it is fine -- no differences by batch. It seems possible to fix the batch_axis Normalizatio2D to work like BN layer, all we have to do is just mimic the already existing code in Keras. PR would be appreciated, personally I'm not sure I could have spare time in near future..

techscientist · 2017-08-12T12:17:48Z

Ok @keunwoochoi , sounds great. Once I get around finishing my current experiments, I'll try to clone your repo and fix the layer. Once I get it working, I'll send you a PR for your review..

techscientist · 2017-08-12T12:51:13Z

Also @keunwoochoi , just to make sure before I send you a PR, the Normalization2D layer is simply meant to perform the same functionality as Keras's Batch Normalization, right? (except with the addition of specifying a str_axis for normalization?)

I am asking because, if this is the case, then I can simply rewrite the Normalization2D layer by making it a subclass of Keras's Batchnormalization and adding some code to connect the Normalization2D's 'str_axis' with the BatchNormalization layer in Keras...

keunwoochoi · 2017-08-13T05:12:24Z

Hm, not really, if so I’d just wrap BN. I didn’t expect they would do something dependent to the batch. For example, frequency-axis normalisation should work per sample…that’s what I thought. I realised the current implementation doesn’t work like that thanks to you.

…

On 12Aug 2017, at 22:51, Subhash Ramesh ***@***.***> wrote: Also @keunwoochoi <https://github.com/keunwoochoi> , just to make sure before I send you a PR, the Normalization2D layer is simply meant to perform the same functionality as Keras's Batch Normalization, right? (except with the addition of specifying a str_axis for normalization?) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APZ8xdbkESldHGWlS-yb5rBm6TR3zfwMks5sXZ_CgaJpZM4O0AYK>.

techscientist · 2017-08-17T21:38:24Z

Oh ok, I get it..

Right now, I've found a workaround to this problem by using Keras's Batch normalization layer instead of kapre's Normalization2D (With the addition of using the correct axis, as specified in kapre's Normalization2D's code)...

I have included this workaround here incase some else happens to stumble upon the same problem.

keunwoochoi mentioned this issue Oct 13, 2017

TODO #18

Closed

auroracramer mentioned this issue Nov 28, 2018

Amplitude-to-decibel conversion produces different results on different batches #46

Closed

keunwoochoi closed this as completed Aug 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different prediction results depending on inference batch size #16

Different prediction results depending on inference batch size #16

techscientist commented Aug 10, 2017 •

edited

Loading

keunwoochoi commented Aug 12, 2017

techscientist commented Aug 12, 2017

techscientist commented Aug 12, 2017 •

edited

Loading

keunwoochoi commented Aug 13, 2017 via email

techscientist commented Aug 17, 2017 •

edited

Loading

Different prediction results depending on inference batch size #16

Different prediction results depending on inference batch size #16

Comments

techscientist commented Aug 10, 2017 • edited Loading

keunwoochoi commented Aug 12, 2017

techscientist commented Aug 12, 2017

techscientist commented Aug 12, 2017 • edited Loading

keunwoochoi commented Aug 13, 2017 via email

techscientist commented Aug 17, 2017 • edited Loading

techscientist commented Aug 10, 2017 •

edited

Loading

techscientist commented Aug 12, 2017 •

edited

Loading

techscientist commented Aug 17, 2017 •

edited

Loading