# Introduction
In one of my projects I encountered an unexpected effect of using `mask_zero=True` in Keras Embedding layers. 
I was training a RNN to label sequences of variable length, and trained it on batches postpadded with zeros to be of equal length.
Accuracy when using my models evaluate function was much higher than when actually making predictions. 
In this notebook I will create a toy example to explore why this was happening.

# Input Data
Two words, 1 and 2 should both have label 1. Zero is used for padding.

In [68]:
import numpy as np
from keras.layers import Embedding, Dense, Input
from keras.models import Model

X = [
    [1, 1, 1, 0],
    [1, 1, 2, 2],
    [1, 0, 0, 0],
    [2, 2, 0, 0]
]
X = np.asarray(X)
y = np.asarray([[[0, 1, 0] if x else [1, 0, 0] for x in instance] for instance in X])

sample_weights = (X != 0).astype(int)

# Setting up the Model
I will set up a toy embedding layer that just one hot encodes for different words and uses zero for padding.
The model will just output the embedding values, and thus will be wrong on all twos and correct on all ones.
Padding should be ignored.

The model should achieve 60% accuracy.

In [123]:
import keras 
import tensorflow

In [124]:
keras.__version__

'2.1.5'

In [125]:
tensorflow.__version__

'1.2.1'

In [87]:
n_words = 2
dims = n_words+1
embedding_matrix = np.eye(dims)

embedding = Embedding(n_words+1, dims, weights=[embedding_matrix], trainable=False, mask_zero=True)

model_input = Input(shape=(None,))
model_output = embedding(model_input)
model = Model(model_input, model_output)

model.compile(optimizer='adam', loss='categorical_crossentropy', weighted_metrics=['acc'], metrics=['acc'], sample_weight_mode='temporal')

## Evaluate Model

In [118]:
correct = ((model.predict(X) == y).all(2) & (sample_weights == 1)).sum()
instances = (X != 0).sum()

In [120]:
correct / instances

0.59999999999999998

In [127]:
print("Loss: {}, Accuracy {}, Weigted Accuracy: {}".format(*model.evaluate(X, y, sample_weight=sample_weights)))

Loss: 10.315580368041992, Accuracy 0.6000000238418579, Weigted Accuracy: 0.9599999785423279


The metric `acc` matches expected accuracy of 60%, the weighted accuracy does not.