A small question regarding `softmax_kernel` #36

boredtylin · 2020-12-02T05:59:16Z

First things first, greate repo.

I'm trying to understand the renormalizing in softmax_kernel, tho:


if is_query:
    data_dash = ratio * (
    torch.exp(data_dash - diag_data -
                       torch.max(data_dash, dim=-1, keepdim=True).values) + eps)
else:
    data_dash = ratio * (
                            torch.exp(data_dash - diag_data - torch.max(data_dash)) + eps)

In this segment of code, an argument is_query is used to distinguish the difference in computation.

I reckon that this part is to alleviate numerical problems. I wonder why the computation for query features and key features should be different (in that the max op is different)?

Really appreciate it if you could shed a light on this question so I could understand this.

The text was updated successfully, but these errors were encountered:

boredtylin · 2020-12-02T06:11:07Z

Seem to get this now. Probably because the normalization is done w.r.t the query axis...

boredtylin closed this as completed Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A small question regarding `softmax_kernel` #36

A small question regarding `softmax_kernel` #36

boredtylin commented Dec 2, 2020

boredtylin commented Dec 2, 2020

A small question regarding softmax_kernel #36

A small question regarding softmax_kernel #36

Comments

boredtylin commented Dec 2, 2020

boredtylin commented Dec 2, 2020

A small question regarding `softmax_kernel` #36

A small question regarding `softmax_kernel` #36