please add comments with meaning of einsum dimensions #8

bionicles · 2020-07-11T18:12:09Z

just curious about this line:

fast-transformers/fast_transformers/attention/linear_attention.py

Line 62 in 02552cc

KV = torch.einsum("nshd,nshm->nhmd", K, values)

does 'n' means 'batch size', 'h' is 'heads', 'd' means 'channels'?

what are the 's' and 'm' dimensions?

bionicles · 2020-07-11T18:33:26Z

just switching this up for JAX with channels first (for performance) and without batch size (jax.vmap automatically batches)

            x: The input features of shape (N, L, E) where N is the batch size,
               L is the sequence length (padded) and E is d_model passed in the
               constructor.

https://github.com/idiap/fast-transformers/blob/master/fast_transformers/transformers.py

"BLHC,BLHc->BHCc"

then without batch, it's

"LHC,LHc->HCc"

and with channels first it's:

"CHL,cHL->hCc"

bionicles · 2020-07-11T19:11:40Z

        # D/d: d_head; H: n_heads; L: sequence length
        key_value = np.einsum("DHL,dHL->HDd", key, value)
        norm = 1.0 / (np.einsum("DHL,DH->HL", query, key.sum(2)) + 1e-6)
        attended = np.einsum("DHL,HDd,HL->DHL", query, key_value, norm)
        attended = attended.reshape((d_head * self.n_heads, length))

works like a charm! a lot faster too! thanks for your work

angeloskath mentioned this issue Jul 20, 2020

Ensure all forward() methods have a proper description of input tensors #13

Open

bionicles closed this as completed Aug 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

please add comments with meaning of einsum dimensions #8

please add comments with meaning of einsum dimensions #8

bionicles commented Jul 11, 2020

bionicles commented Jul 11, 2020

bionicles commented Jul 11, 2020

please add comments with meaning of einsum dimensions #8

please add comments with meaning of einsum dimensions #8

Comments

bionicles commented Jul 11, 2020

bionicles commented Jul 11, 2020

bionicles commented Jul 11, 2020