You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not sure if this is an issue but it is definitely a cause of possible confusion: We currently implement pooling.avg_pool as avg_pool(x) = lax.reduce_window(lax.add, x) / prod(window_size). If we use padding, we always divide by the full window size even if this contains padding tokens.
TF does not use padding and does not provide an override option.
Personally I feel that including padding tokens with value 0 is wrong (it seems like an arbitrary constant). At the very least we should be explicit about our choice and document it.
A possible solution to implementing average pooling and only counting non-padding tokens is to doing an additional sum_pool2 on the same input shape with only 1s, where you pad with 0s. Then you return sum_pool / sum_pool2, which correctly ignores the padding tokens.
The text was updated successfully, but these errors were encountered:
Discussed this offline with @jheek and @cgarciae. We agreed that the current behavior is not desirable since we are assuming that padding tokens for avg_pool are 0's and we include them when counting the average, but we are not docuementing this anywhere. Tensorflow has chosen to implement this differently, namely by excluding the padding tokens, and similarly, they are not documenting this in their APis. Pytorch seems to have the best of both worlds: they allow the user to specify it in a flags. This seems something we could do as well.
I am not sure if this is an issue but it is definitely a cause of possible confusion: We currently implement
pooling.avg_pool
asavg_pool(x) = lax.reduce_window(lax.add, x) / prod(window_size)
. If we use padding, we always divide by the full window size even if this contains padding tokens.Example:
Is this what we want? the first result
(1+2)/2=1.5
makes sense, but the second result2/2=1
. is a bit odd. Shouldn't we do2/1=2
?Other frameworks do it as follows:
Personally I feel that including padding tokens with value 0 is wrong (it seems like an arbitrary constant). At the very least we should be explicit about our choice and document it.
A possible solution to implementing average pooling and only counting non-padding tokens is to doing an additional sum_pool2 on the same input shape with only 1s, where you pad with 0s. Then you return sum_pool / sum_pool2, which correctly ignores the padding tokens.
The text was updated successfully, but these errors were encountered: