Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Weighted average for EmbeddingBag #4068

Closed
kunaldahiya opened this issue Dec 7, 2017 · 9 comments
Closed

Feature request: Weighted average for EmbeddingBag #4068

kunaldahiya opened this issue Dec 7, 2017 · 9 comments
Assignees
Labels
high priority module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@kunaldahiya
Copy link

Right now 'torch.nn.EmbeddingBag' supports only 'sum' and 'mean'. What do you think about providing an option for weights to compute 'weighted average'? This would be more memory efficient than using current alternative.

For instance something like 'sp_weights' in 'tf.nn.embedding_lookup_sparse' [1].

References:
[1] https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup_sparse

@glample
Copy link
Contributor

glample commented Mar 13, 2019

Hi, any updates on this? Being able to provide weights (on top of the indices) would be really useful.

@zou3519 zou3519 self-assigned this Mar 14, 2019
@snakers4
Copy link

snakers4 commented Apr 2, 2019

This would be very helpful for my work with the Russian language.
I understand that I can use some kind of attention + small char level CNN, but I really doubt that my PyTorch implementation will not be 10x slower.
Many thanks!

@snakers4
Copy link

snakers4 commented Apr 2, 2019

As some kind of motivation, I will just post a link to my post, where in many applications for Russian EmbeddingBags were superior to BPE =)

@zou3519
Copy link
Contributor

zou3519 commented Apr 2, 2019

API Bikeshedding: which of these two APIs would be better?

  1. nn.EmbeddingBag's forward pass accepts a per_input_weights argument. When mode='sum', this makes it do a weighted sum; when mode='mean', this does a weighted mean, when mode='max', this does a weighted max, like the TF API.

  2. nn.EmbeddingBag's forward pass accepts a per_input_weights argument and a new mode='weighted_sum'. mode='weighted_sum' weights the output of the embedding according to the weights. No weighted mean / weighted max are implemented.

I'm leaning towards (2) because I haven't been able to find use cases for "weighted mean" (which can be emulated via weighted sum) and "weighted max".

@ezyang
Copy link
Contributor

ezyang commented Apr 2, 2019

FWIW, you don't have to actually implemented weighted mean and weighted max if you implement (1); you can just make them raise errors. (This is not necessarily in favor of (1), but it's a comment on the reasoning.)

@snakers4
Copy link

snakers4 commented Apr 2, 2019

nn.EmbeddingBag's forward pass accepts a per_input_weights argument

Most likely these weights will be calculated using some sort of attention mechanism
The simplest attention mechanism would be something like this (just a linear layer + softmax)

I wonder whether something like this can be implemented inside of this layer

zou3519 added a commit to zou3519/pytorch that referenced this issue Apr 2, 2019
On the way to pytorch#4068.

Adds a new per_sample_weights argument to nn.EmbeddingBag's forward pass
and embedding_bag. This is only supported for mode='sum' and is
intepreted as scaling the output of the embedding before applying the
reduction.

i.e.,
indices: 0, 3, 7 ; 1, 2
per_sample_weights: 0.1, 0.2, 0.4 ; 0.7, -0.8
offsets: 0, 3
weights (embeddings): e_0, e_1, e_2, ..., e_7

return 2 vectors:
0.1 * e_0 + 0.2 * e_3 + 0.4 * e_7
0.7 * e_1 - 0.8 * e_2

Future:
- CPU backward,
- CUDA forward,
- CUDA backward,
- CPU differentiable per_sample_weights
- CUDA differentiable per_sample_weights

Test Plan:
- New tests
zou3519 added a commit that referenced this issue Apr 2, 2019
On the way to #4068.

Adds a new per_sample_weights argument to nn.EmbeddingBag's forward pass
and embedding_bag. This is only supported for mode='sum' and is
intepreted as scaling the output of the embedding before applying the
reduction.

i.e.,
indices: 0, 3, 7 ; 1, 2
per_sample_weights: 0.1, 0.2, 0.4 ; 0.7, -0.8
offsets: 0, 3
weights (embeddings): e_0, e_1, e_2, ..., e_7

return 2 vectors:
0.1 * e_0 + 0.2 * e_3 + 0.4 * e_7
0.7 * e_1 - 0.8 * e_2

Future:
- CPU backward,
- CUDA forward,
- CUDA backward,
- CPU differentiable per_sample_weights
- CUDA differentiable per_sample_weights

Test Plan:
- New tests
zou3519 added a commit that referenced this issue Apr 2, 2019
On the way to #4068.

Adds a new per_sample_weights argument to nn.EmbeddingBag's forward pass
and embedding_bag. This is only supported for mode='sum' and is
intepreted as scaling the output of the embedding before applying the
reduction.

i.e.,
indices: 0, 3, 7 ; 1, 2
per_sample_weights: 0.1, 0.2, 0.4 ; 0.7, -0.8
offsets: 0, 3
weights (embeddings): e_0, e_1, e_2, ..., e_7

return 2 vectors:
0.1 * e_0 + 0.2 * e_3 + 0.4 * e_7
0.7 * e_1 - 0.8 * e_2

Future:
- CPU backward,
- CUDA forward,
- CUDA backward,
- CPU differentiable per_sample_weights
- CUDA differentiable per_sample_weights

Test Plan:
- New tests

gh-metadata: pytorch pytorch 18735 gh/zou3519/26/head
@ezyang ezyang added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: nn Related to torch.nn high priority labels Apr 2, 2019
zou3519 added a commit that referenced this issue Apr 3, 2019
EmbeddingBag CPU forward with per_sample_weights.

On the way to #4068.

Adds a new per_sample_weights argument to nn.EmbeddingBag's forward pass
and embedding_bag. This is only supported for mode='sum' and is
intepreted as scaling the output of the embedding before applying the
reduction.

i.e.,
indices: 0, 3, 7 ; 1, 2
per_sample_weights: 0.1, 0.2, 0.4 ; 0.7, -0.8
offsets: 0, 3
weights (embeddings): e_0, e_1, e_2, ..., e_7

return 2 vectors:
0.1 * e_0 + 0.2 * e_3 + 0.4 * e_7
0.7 * e_1 - 0.8 * e_2

Future:
- CPU backward,
- CUDA forward,
- CUDA backward,
- CPU differentiable per_sample_weights
- CUDA differentiable per_sample_weights

Test Plan:
- New tests

gh-metadata: pytorch pytorch 18735 gh/zou3519/26/head
zou3519 added a commit that referenced this issue Apr 3, 2019
EmbeddingBag CPU forward with per_sample_weights.

On the way to #4068.

Adds a new per_sample_weights argument to nn.EmbeddingBag's forward pass
and embedding_bag. This is only supported for mode='sum' and is
intepreted as scaling the output of the embedding before applying the
reduction.

i.e.,
```
indices: 0, 3, 7 ; 1, 2
per_sample_weights: 0.1, 0.2, 0.4 ; 0.7, -0.8
offsets: 0, 3
weights (embeddings): e_0, e_1, e_2, ..., e_7
```

return 2 vectors:
```
0.1 * e_0 + 0.2 * e_3 + 0.4 * e_7
0.7 * e_1 - 0.8 * e_2
```

Future:
- CPU backward,
- CUDA forward,
- CUDA backward,
- CPU differentiable per_sample_weights
- CUDA differentiable per_sample_weights

Test Plan:
- New tests

gh-metadata: pytorch pytorch 18735 gh/zou3519/26/head
zou3519 added a commit that referenced this issue Apr 3, 2019
EmbeddingBag CPU forward with per_sample_weights.

On the way to #4068.

Adds a new per_sample_weights argument to nn.EmbeddingBag's forward pass
and embedding_bag. This is only supported for mode='sum' and is
intepreted as scaling the output of the embedding before applying the
reduction.

i.e.,
```
indices: 0, 3, 7 ; 1, 2
per_sample_weights: 0.1, 0.2, 0.4 ; 0.7, -0.8
offsets: 0, 3
weights (embeddings): e_0, e_1, e_2, ..., e_7
```

return 2 vectors:
```
0.1 * e_0 + 0.2 * e_3 + 0.4 * e_7
0.7 * e_1 - 0.8 * e_2
```

Future:
- CPU backward,
- CUDA forward,
- CUDA backward,
- CPU differentiable per_sample_weights
- CUDA differentiable per_sample_weights

Test Plan:
- New tests

gh-metadata: pytorch pytorch 18735 gh/zou3519/26/head
zou3519 added a commit that referenced this issue Apr 3, 2019
EmbeddingBag CPU forward with per_sample_weights.

On the way to #4068.

Adds a new per_sample_weights argument to nn.EmbeddingBag's forward pass
and embedding_bag. This is only supported for mode='sum' and is
intepreted as scaling the output of the embedding before applying the
reduction.

i.e.,
```
indices: 0, 3, 7 ; 1, 2
per_sample_weights: 0.1, 0.2, 0.4 ; 0.7, -0.8
offsets: 0, 3
weights (embeddings): e_0, e_1, e_2, ..., e_7
```

return 2 vectors:
```
0.1 * e_0 + 0.2 * e_3 + 0.4 * e_7
0.7 * e_1 - 0.8 * e_2
```

Future:
- CPU backward,
- CUDA forward,
- CUDA backward,
- CPU differentiable per_sample_weights
- CUDA differentiable per_sample_weights

Test Plan:
- New tests

gh-metadata: pytorch pytorch 18735 gh/zou3519/26/head
zou3519 added a commit that referenced this issue Apr 8, 2019
EmbeddingBag CPU forward with per_sample_weights.

On the way to #4068.

Adds a new per_sample_weights argument to nn.EmbeddingBag's forward pass
and embedding_bag. This is only supported for mode='sum' and is
intepreted as scaling the output of the embedding before applying the
reduction.

i.e.,
```
indices: 0, 3, 7 ; 1, 2
per_sample_weights: 0.1, 0.2, 0.4 ; 0.7, -0.8
offsets: 0, 3
weights (embeddings): e_0, e_1, e_2, ..., e_7
```

return 2 vectors:
```
0.1 * e_0 + 0.2 * e_3 + 0.4 * e_7
0.7 * e_1 - 0.8 * e_2
```

Future:
- CPU backward,
- CUDA forward,
- CUDA backward,
- CPU differentiable per_sample_weights
- CUDA differentiable per_sample_weights

Test Plan:
- New tests

gh-metadata: pytorch pytorch 18735 gh/zou3519/26/head
zou3519 added a commit that referenced this issue Apr 8, 2019
EmbeddingBag CPU forward with per_sample_weights.

On the way to #4068.

Adds a new per_sample_weights argument to nn.EmbeddingBag's forward pass
and embedding_bag. This is only supported for mode='sum' and is
intepreted as scaling the output of the embedding before applying the
reduction.

i.e.,
```
indices: 0, 3, 7 ; 1, 2
per_sample_weights: 0.1, 0.2, 0.4 ; 0.7, -0.8
offsets: 0, 3
weights (embeddings): e_0, e_1, e_2, ..., e_7
```

return 2 vectors:
```
0.1 * e_0 + 0.2 * e_3 + 0.4 * e_7
0.7 * e_1 - 0.8 * e_2
```

Future:
- CPU backward,
- CUDA forward,
- CUDA backward,
- CPU differentiable per_sample_weights
- CUDA differentiable per_sample_weights

Test Plan:
- New tests

gh-metadata: pytorch pytorch 18735 gh/zou3519/26/head
@zou3519
Copy link
Contributor

zou3519 commented Apr 10, 2019

Added the feature in #18957.

@snakers4
Copy link

snakers4 commented Apr 11, 2019

Many thanks!
We will try adding this to our next project!

@zou3519 zou3519 closed this as completed Apr 11, 2019
@drevicko
Copy link

Is there currently a plan to implement per_sample_weights on CUDA for max aggregation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants