Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeDistributed(Dense) with Masking not masking bias #12495

Open
andersjohanandreassen opened this Issue Mar 16, 2019 · 1 comment

Comments

Projects
None yet
2 participants
@andersjohanandreassen
Copy link

andersjohanandreassen commented Mar 16, 2019

I have come across a problem with Masking the inputs to a TimeDistributed(Dense) layer.
If the input is masked and the bias in the dense layer is zero, the output of TimeDistributed(Dense) is zero.
However, if the input is masked and the bias is non-zero, TimeDistributed(Dense) returns the bias value.

This seems somewhat similar to #1300.

Inspired by this example, here is some example code that illustrates the problem. (more thorough example here)

In this example, I construct a TimeDistributed(Dense) layer that sums the values at each timestep and adds a constant value.

# Construct time series data set
pad_value = 0.1

t1 = [3, 1, 2, 0.1]
t2 = [1, 1, 1, 0.1]
t_pad = [pad_value, pad_value, pad_value, pad_value]
time_series = np.asarray([[t1, t2, t_pad]])
add_constant = 10

# Build model
my_input = Input(shape=(None, 4))
mask = Masking(mask_value=pad_value)(my_input)
out = TimeDistributed(Dense(1, activation='linear'))(mask)
model = Model(inputs=my_input, outputs=out)
# Set all weights multiplying the input to 1, and set the bias to add_constant
model.set_weights([np.ones_like(model.get_weights()[0]),
                   add_constant*np.ones_like(model.get_weights()[1])])
model.predict(time_series)

This outputs

array([[[16.1],
        [13.1],
        [10. ]]], dtype=float32)

As you can see, the last entry should be zero if the masking was applied, instead it is returning the bias.

@pkomiske

This comment has been minimized.

Copy link

pkomiske commented Mar 17, 2019

I am also having the same problem. The documentation for Masking indicates that the timesteps are skipped, which I assumed means "has no effect", not just mapped to zero and propagated forward. For my use cases I do not want the network to have information about how much padding there was, which seems to be in effect what's happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.