Question about a place of the self.bias in _embeddings_layers.py. #60

jmpark-swk · 2021-11-19T09:29:04Z

(

pytorch-widedeep/pytorch_widedeep/models/transformers/_embeddings_layers.py

Line 173 in c287c87

x = x + self.bias.unsqueeze(0)

)
Hello, first of all, thank you for releasing a wonderful code.
I have a question while looking at the code related to ft_transformer.

Is there any particular reason why an adding bias function is placed after a dropout function?
Is there a problem that occurs when the dropout function is placed after the adding bias function?

jrzaurin · 2021-11-19T09:47:26Z

Hey @jmpark-swk

Thanks for opening the issue.

No, no reason, and I think you are right, we should add bias and THEN apply dropout. We are about to merge a PR from @5uperpalo (check #56 and we will fix it there).

@5uperpalo can you take care of this? If not I will edit it myself, no probs

Thanks again.

in the meantime, please, join us in slack! 🙂: https://join.slack.com/t/pytorch-widedeep/shared_invite/zt-soss7stf-iXpVuLeKZz8lGTnxxtHtTw

jmpark-swk · 2021-11-22T03:57:41Z

@jrzaurin
Thanks for your kind reply😀.

jmpark-swk · 2021-11-22T09:01:02Z

@jrzaurin
As I look at the code, I have one more question to ask.

pytorch-widedeep/pytorch_widedeep/models/transformers/_embeddings_layers.py

Line 15 in c287c87

class FullEmbeddingDropout(nn.Module):

I understand FullEmbeddingDropout and nn.Dropout have the same mathematical effect.
I wonder the purpose of the implementation.
In the current situation, if FullEmbeddingDropout is used, dropout is expected to work in both train and test situations, and I wonder if this is the intended action.

jrzaurin · 2021-11-22T09:28:23Z

hey @jmpark-swk

Let's address the question in two parts:

understand FullEmbeddingDropout and nn.Dropout have the same mathematical effect. : no, FullEmbeddingDropout will drop out a full "word" in the sequence, while nn.Dropout will set to 0 random elements of the income Tensor. For example, say you have a batch of size 1, sequence length 10 and embed size 16, leading to an input tensor of size (1, 10, 16). Let's ignore the bacth dim, and work with a tensor of dim (10, 16). FullEmbeddingDropout will set to 0 an entire row of that tensor (i.e. the representation of one column) while nn.Dropout would set to 0 random elements. Let's have a look:

import torch
from torch import nn
X = torch.rand(1, 10, 5)
p = 0.2
# Full Embed Dropout
fedp = (X.new().resize_((X.size(1), 1)).bernoulli_(1 - p).expand_as(X) / (1 - p )) * X
# Dropout
dp = nn.Dropout(p)(X)

fedp
tensor([[[0.1910, 1.0102, 0.9767, 0.2338, 0.2881],
         [0.7028, 0.7615, 1.1196, 0.1449, 0.0463],
         [0.7549, 0.9568, 0.1248, 0.9500, 0.9484],
         [0.3327, 0.5511, 0.4135, 1.0409, 0.1293],
         [1.2022, 1.0507, 0.7874, 0.2441, 1.0943],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.4181, 1.1530, 0.2361, 0.1072, 0.6228],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.5364, 0.1017, 0.2276, 1.1333, 0.5364]]])

tensor([[[0.1910, 1.0102, 0.9767, 0.0000, 0.2881],
         [0.7028, 0.7615, 1.1196, 0.1449, 0.0463],
         [0.7549, 0.0000, 0.1248, 0.9500, 0.9484],
         [0.3327, 0.0000, 0.4135, 1.0409, 0.1293],
         [1.2022, 1.0507, 0.7874, 0.2441, 0.0000],
         [0.0000, 0.0291, 0.7125, 0.3697, 0.7847],
         [1.1591, 0.1397, 1.1471, 0.8107, 0.0000],
         [0.0000, 1.1530, 0.2361, 0.1072, 0.6228],
         [0.4835, 0.2229, 0.9602, 0.0000, 0.4619],
         [0.5364, 0.1017, 0.2276, 1.1333, 0.0000]]])

In the current situation, if FullEmbeddingDropout is used, dropout is expected to work in both train and test situations, and I wonder if this is the intended action.
No, this is a bug! can you please open a separate issue?

The code for the forward pass should look like:

    def forward(self, X: Tensor) -> Tensor:
        if self.training:
                mask = X.new().resize_((X.size(1), 1)).bernoulli_(1 - self.dropout).expand_as(
                    X
                ) / (1 - self.dropout)
                return mask * X
        else:
            return X

Thanks!

jmpark-swk · 2021-11-22T13:39:15Z

@jrzaurin
Oh, I didn't get to take a close look at the shape inside the resize function.
Thanks for your great example!
I'll open a separate issue with the second bug.😊

jrzaurin added the suggestion label Nov 19, 2021

jmpark-swk closed this as completed Nov 22, 2021

jmpark-swk reopened this Nov 22, 2021

jrzaurin added the bug label Nov 22, 2021

jrzaurin self-assigned this Nov 22, 2021

jmpark-swk mentioned this issue Nov 22, 2021

FullEmbeddingDropout works during evaluation #61

Closed

jrzaurin closed this as completed in 3532a98 Nov 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about a place of the self.bias in _embeddings_layers.py. #60

Question about a place of the self.bias in _embeddings_layers.py. #60

jmpark-swk commented Nov 19, 2021

jrzaurin commented Nov 19, 2021 •

edited

Loading

jmpark-swk commented Nov 22, 2021

jmpark-swk commented Nov 22, 2021 •

edited

Loading

jrzaurin commented Nov 22, 2021 •

edited

Loading

jmpark-swk commented Nov 22, 2021

Question about a place of the self.bias in _embeddings_layers.py. #60

Question about a place of the self.bias in _embeddings_layers.py. #60

Comments

jmpark-swk commented Nov 19, 2021

jrzaurin commented Nov 19, 2021 • edited Loading

jmpark-swk commented Nov 22, 2021

jmpark-swk commented Nov 22, 2021 • edited Loading

jrzaurin commented Nov 22, 2021 • edited Loading

jmpark-swk commented Nov 22, 2021

jrzaurin commented Nov 19, 2021 •

edited

Loading

jmpark-swk commented Nov 22, 2021 •

edited

Loading

jrzaurin commented Nov 22, 2021 •

edited

Loading