Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about a place of the self.bias in _embeddings_layers.py. #60

Closed
jmpark-swk opened this issue Nov 19, 2021 · 5 comments
Closed
Assignees

Comments

@jmpark-swk
Copy link

(

)
Hello, first of all, thank you for releasing a wonderful code.
I have a question while looking at the code related to ft_transformer.

Is there any particular reason why an adding bias function is placed after a dropout function?
Is there a problem that occurs when the dropout function is placed after the adding bias function?

@jrzaurin
Copy link
Owner

jrzaurin commented Nov 19, 2021

Hey @jmpark-swk

Thanks for opening the issue.

No, no reason, and I think you are right, we should add bias and THEN apply dropout. We are about to merge a PR from @5uperpalo (check #56 and we will fix it there).

@5uperpalo can you take care of this? If not I will edit it myself, no probs

Thanks again.

in the meantime, please, join us in slack! 🙂: https://join.slack.com/t/pytorch-widedeep/shared_invite/zt-soss7stf-iXpVuLeKZz8lGTnxxtHtTw

@jmpark-swk
Copy link
Author

@jrzaurin
Thanks for your kind reply😀.

@jmpark-swk
Copy link
Author

jmpark-swk commented Nov 22, 2021

@jrzaurin
As I look at the code, I have one more question to ask.


I understand FullEmbeddingDropout and nn.Dropout have the same mathematical effect.
I wonder the purpose of the implementation.
In the current situation, if FullEmbeddingDropout is used, dropout is expected to work in both train and test situations, and I wonder if this is the intended action.

@jrzaurin
Copy link
Owner

jrzaurin commented Nov 22, 2021

hey @jmpark-swk

Let's address the question in two parts:

  1. understand FullEmbeddingDropout and nn.Dropout have the same mathematical effect. : no, FullEmbeddingDropout will drop out a full "word" in the sequence, while nn.Dropout will set to 0 random elements of the income Tensor. For example, say you have a batch of size 1, sequence length 10 and embed size 16, leading to an input tensor of size (1, 10, 16). Let's ignore the bacth dim, and work with a tensor of dim (10, 16). FullEmbeddingDropout will set to 0 an entire row of that tensor (i.e. the representation of one column) while nn.Dropout would set to 0 random elements. Let's have a look:
import torch
from torch import nn
X = torch.rand(1, 10, 5)
p = 0.2
# Full Embed Dropout
fedp = (X.new().resize_((X.size(1), 1)).bernoulli_(1 - p).expand_as(X) / (1 - p )) * X
# Dropout
dp = nn.Dropout(p)(X)

fedp
tensor([[[0.1910, 1.0102, 0.9767, 0.2338, 0.2881],
         [0.7028, 0.7615, 1.1196, 0.1449, 0.0463],
         [0.7549, 0.9568, 0.1248, 0.9500, 0.9484],
         [0.3327, 0.5511, 0.4135, 1.0409, 0.1293],
         [1.2022, 1.0507, 0.7874, 0.2441, 1.0943],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.4181, 1.1530, 0.2361, 0.1072, 0.6228],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
         [0.5364, 0.1017, 0.2276, 1.1333, 0.5364]]])

tensor([[[0.1910, 1.0102, 0.9767, 0.0000, 0.2881],
         [0.7028, 0.7615, 1.1196, 0.1449, 0.0463],
         [0.7549, 0.0000, 0.1248, 0.9500, 0.9484],
         [0.3327, 0.0000, 0.4135, 1.0409, 0.1293],
         [1.2022, 1.0507, 0.7874, 0.2441, 0.0000],
         [0.0000, 0.0291, 0.7125, 0.3697, 0.7847],
         [1.1591, 0.1397, 1.1471, 0.8107, 0.0000],
         [0.0000, 1.1530, 0.2361, 0.1072, 0.6228],
         [0.4835, 0.2229, 0.9602, 0.0000, 0.4619],
         [0.5364, 0.1017, 0.2276, 1.1333, 0.0000]]])
  1. In the current situation, if FullEmbeddingDropout is used, dropout is expected to work in both train and test situations, and I wonder if this is the intended action.
    No, this is a bug! can you please open a separate issue?

The code for the forward pass should look like:

    def forward(self, X: Tensor) -> Tensor:
        if self.training:
                mask = X.new().resize_((X.size(1), 1)).bernoulli_(1 - self.dropout).expand_as(
                    X
                ) / (1 - self.dropout)
                return mask * X
        else:
            return X

Thanks!

@jrzaurin jrzaurin added the bug label Nov 22, 2021
@jrzaurin jrzaurin self-assigned this Nov 22, 2021
@jmpark-swk
Copy link
Author

@jrzaurin
Oh, I didn't get to take a close look at the shape inside the resize function.
Thanks for your great example!
I'll open a separate issue with the second bug.😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants