[fix] torch.nn.functional.embedding -> padding_idx behavior #46714

kshitij12345 · 2020-10-22T15:13:49Z

Reference #46585

Fix for second snippet in the mentioned issue.

predefined_weights = torch.rand(10, 3)
result = torch.nn.functional.embedding(torch.LongTensor([1,2,0]), predefined_weights, padding_idx=0)

aten/src/ATen/native/Embedding.cpp

dr-ci · 2020-10-22T17:00:53Z

💊 CI failures summary and remediations

As of commit dc77bba (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 3 times.

codecov · 2020-10-23T19:06:23Z

Codecov Report

Merging #46714 into master will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #46714      +/-   ##
==========================================
- Coverage   68.49%   68.48%   -0.01%     
==========================================
  Files         413      413              
  Lines       54478    54478              
==========================================
- Hits        37312    37311       -1     
- Misses      17166    17167       +1

kshitij12345 · 2020-10-26T17:34:34Z

@albanD Gentle Ping.

With branching on padding_idx>=0, do you think we still need to benchmark it once?

albanD · 2020-10-26T20:17:50Z

I think it would be nice to have an idea of what the perf drop is yes.
A simple check with %timeit for CPU and manual run on cuda with proper synchronization should be enough to get an idea.

We do have a benchmark op for embedding bag but I guess it does not cover the same code: https://github.com/pytorch/pytorch/blob/master/benchmarks/operator_benchmark/pt/embeddingbag_test.py ?

kshitij12345 · 2020-10-27T18:48:21Z

@albanD

With some update to the reference script.

import operator_benchmark as op_bench
import torch
import numpy
from pt import configs

"""EmbeddingBag Operator Benchmark"""

class EmbeddingBenchmark(op_bench.TorchBenchmarkBase):
    def init(self, vocab, dim, input_size, padding, device):
        numpy.random.seed((1 << 32) - 1)
        self.weight = torch.randn(vocab, dim, device=device)
        self.input = torch.tensor(numpy.random.randint(0, vocab, input_size), device=device).long()

        if padding is not None:
            padding_mask = torch.rand(self.input.shape) > 0.5
            self.input[padding_mask] = padding

        self.padding = padding
        self.set_module_name('embedding')

    def forward(self):
        return torch.nn.functional.embedding(self.input, self.weight, padding_idx=self.padding)

op_bench.generate_pt_test(configs.embedding_short_configs, EmbeddingBenchmark)

if __name__ == "__main__":
    op_bench.benchmark_runner.main()

Relevant Config

embedding_short_configs = op_bench.cross_product_configs(
    vocab=[10000, 20000],
    dim=[64, 128],
    padding=[None, 2],
    input_size=[32, 48, 64],
    device=['cpu', 'cuda'],
    tags=['short']
)

Attached is the output of running the above code.

embedding.txt

Let me know if the benchmark code looks good.
Also do let me know, we should add that code in this PR?

albanD · 2020-10-27T19:29:36Z

That looks good. Do you have the same timings without this change to be able to compare?

kshitij12345 · 2020-10-28T03:43:10Z

Following is the benchmark without the change.

embedding_pre_change.txt

albanD

Thanks for taking the time to report all the timings.
There is a small change but not too bad. Definitely acceptable to have the correct behavior.

facebook-github-bot

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-10-30T02:14:24Z

@albanD merged this pull request in 1d233d7.

kshitij12345 added 3 commits October 22, 2020 20:37

consider padding

a8eac09

update test

5da87e4

update formula to fix gradgrad test

18d172c

kshitij12345 requested review from albanD and apaszke as code owners October 22, 2020 15:13

pytorchbot added the open source label Oct 22, 2020

albanD reviewed Oct 22, 2020

View reviewed changes

aten/src/ATen/native/Embedding.cpp Outdated Show resolved Hide resolved

gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 22, 2020

fill padding idx only when given

dc77bba

albanD approved these changes Oct 28, 2020

View reviewed changes

facebook-github-bot reviewed Oct 28, 2020

View reviewed changes

facebook-github-bot closed this in 1d233d7 Oct 29, 2020

facebook-github-bot added the Merged label Oct 30, 2020

kshitij12345 deleted the fix/embedding-functional branch October 30, 2020 03:23

gchanan mentioned this pull request Mar 5, 2021

Undocumented change of behavior for Embeddings in PyTorch 1.8 #53368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] torch.nn.functional.embedding -> padding_idx behavior #46714

[fix] torch.nn.functional.embedding -> padding_idx behavior #46714

kshitij12345 commented Oct 22, 2020

dr-ci bot commented Oct 22, 2020 •

edited

codecov bot commented Oct 23, 2020

kshitij12345 commented Oct 26, 2020

albanD commented Oct 26, 2020 •

edited

kshitij12345 commented Oct 27, 2020

albanD commented Oct 27, 2020

kshitij12345 commented Oct 28, 2020

albanD left a comment

facebook-github-bot left a comment

facebook-github-bot commented Oct 30, 2020

[fix] torch.nn.functional.embedding -> padding_idx behavior #46714

[fix] torch.nn.functional.embedding -> padding_idx behavior #46714

Conversation

kshitij12345 commented Oct 22, 2020

dr-ci bot commented Oct 22, 2020 • edited

💊 CI failures summary and remediations

codecov bot commented Oct 23, 2020

Codecov Report

kshitij12345 commented Oct 26, 2020

albanD commented Oct 26, 2020 • edited

kshitij12345 commented Oct 27, 2020

albanD commented Oct 27, 2020

kshitij12345 commented Oct 28, 2020

albanD left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 30, 2020

dr-ci bot commented Oct 22, 2020 •

edited

albanD commented Oct 26, 2020 •

edited