Add attention functions and tests #181

jenkspt · 2022-08-22T21:12:42Z

Adds dot_product_attention_weights and dot_product_attention functions and tests

Design considerations:

dot_product_attention and dot_product_attention_weights don't take multi-head inputs -- instead attention heads are vmap'd over in MultiheadAttention. This allows for greater flexibility when creating other types of attention modules
To simplify the dot_product_attention signature -- dropout_fn is added as a single argument callable, which should close over the dropout arguments like key and inference. The alternative I think would be to add a functional version of dropout and add its arguments to dot_product_attention, however this would make changing the dropout rate after initializing the module less intuitive -- since dropout rate would have to be an attribute of MultiheadAttention.
mask shape check is kept inside dot_product_attention_weights. The downside to this is that errors raised inside vmap'd functions are less obvious -- i.e. if the heads don't match then vmap function will raise an error. The alternative is to pull the shape check out and put it back in MultiheadAttention

patrick-kidger

Nits aside, this LGTM!

equinox/nn/attention.py

tests/test_nn.py

patrick-kidger · 2022-08-23T16:47:03Z

Have you checked (with the out_axes=1 change) that the results of MultiheadAttention really do remain unchanged after this update? If so I'm happy to merge this.

jenkspt · 2022-08-23T17:32:33Z

Have you checked (with the out_axes=1 change) that the results of MultiheadAttention really do remain unchanged after this update? If so I'm happy to merge this.

I checked the outputs and they match with out_axes=1. The tests should probably be updated to catch this (since using out_axes=0 passes the tests -- but doesn't match the outputs of the current MultiheadAttention. Writing these tests isn't easy however -- maybe just copying a small input/output from the current MultiheadAttention and adding it to the tests would suffice?

Also while doing this I noticed that the outputs do not match when using dropout -- I'm not sure yet if this is because the order that dropout is being applied is different, or whether there is a problem with vmap'ing when the key is closed over in the dropout_fn.

jenkspt · 2022-08-23T17:39:23Z

Yea the problem is vmap'ing with the key in dropout_fn

patrick-kidger · 2022-08-29T21:10:57Z

Just checking that you're not waiting on any input from me at the moment?

jenkspt · 2022-08-30T19:14:51Z

The most recent changes fix the problem with closing over the key in dropout_fn by adding explicit key and inference arguments.

However vmap'ing dropout doesn't match the un-vmap'ed version.
For example:

import jax
import jax.numpy as jnp
from equinox import nn

key = jax.random.PRNGKey(41)

x = jnp.arange(4*5).reshape(4, 5)
dropout = nn.Dropout(.5)

y1 = dropout(x, key=key)
y2 = jax.vmap(dropout, in_axes=0, out_axes=0)(x, key=jax.random.split(key, 4))
y3 = jax.vmap(dropout, in_axes=1, out_axes=1)(x, key=jax.random.split(key, 5))

jnp.allclose(y1, y2)    # --> False
jnp.allclose(y1, y3)    # --> False

I may be fundamentally misunderstanding how the PRNG works, but my expectation is that y1 and y2 are equal. Any insight into why these don't match? Otherwise I can create a jax issue.

A simpler MWE:

import jax
import jax.numpy as jnp

key = jax.random.PRNGKey(41)

def test(key):
    return jax.random.normal(key, ())

y1 = jax.random.normal(key, (4,))
y2 = jax.vmap(test)(jax.random.split(key, 4))
jnp.allclose(y1, y2)    # --> False

patrick-kidger · 2022-08-30T19:32:50Z

It's definitely expected that these don't match. jax.random.split should return new keys that produce statistically independent random numbers to their parent. (And to each other.)

I could believe that there's no way to reproduce the same behaviour as before when using dropout, since we now have a vmap'd dropout and JAX may be doing something else under-the-hood. I think that's fine -- it's mostly inference mode that I'm concerned about; I'd prefer not to break any models that have already been serialised to disk, but dropout is really training-time-only.

jenkspt · 2022-08-30T20:31:56Z

Ok that sounds reasonable. Pending any additional feedback I think this is ready to merge.

patrick-kidger · 2022-08-30T20:45:02Z

Excellent. Thanks for contributing!
(I'll be doing a new release shortly.)

add attention functions and tests

22bec45

patrick-kidger reviewed Aug 23, 2022

View reviewed changes

equinox/nn/attention.py Outdated Show resolved Hide resolved

equinox/nn/attention.py Outdated Show resolved Hide resolved

tests/test_nn.py Show resolved Hide resolved

resolve feedback

fd69894

jenkspt requested a review from patrick-kidger August 30, 2022 17:59

fix dropout problem with new args

094785a

jenkspt force-pushed the add-attn-utils branch from f1fb68f to 094785a Compare August 30, 2022 18:48

patrick-kidger merged commit bc3a8d9 into patrick-kidger:main Aug 30, 2022

patrick-kidger mentioned this pull request Aug 30, 2022

Adds utils for calculating attention weights and multihead attention without projection layers. #155

Closed

jenkspt deleted the add-attn-utils branch August 30, 2022 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add attention functions and tests #181

Add attention functions and tests #181

jenkspt commented Aug 22, 2022

patrick-kidger left a comment

patrick-kidger commented Aug 23, 2022

jenkspt commented Aug 23, 2022 •

edited

jenkspt commented Aug 23, 2022 •

edited

patrick-kidger commented Aug 29, 2022 •

edited

jenkspt commented Aug 30, 2022 •

edited

patrick-kidger commented Aug 30, 2022 •

edited

jenkspt commented Aug 30, 2022

patrick-kidger commented Aug 30, 2022

Add attention functions and tests #181

Add attention functions and tests #181

Conversation

jenkspt commented Aug 22, 2022

patrick-kidger left a comment

Choose a reason for hiding this comment

patrick-kidger commented Aug 23, 2022

jenkspt commented Aug 23, 2022 • edited

jenkspt commented Aug 23, 2022 • edited

patrick-kidger commented Aug 29, 2022 • edited

jenkspt commented Aug 30, 2022 • edited

patrick-kidger commented Aug 30, 2022 • edited

jenkspt commented Aug 30, 2022

patrick-kidger commented Aug 30, 2022

jenkspt commented Aug 23, 2022 •

edited

jenkspt commented Aug 23, 2022 •

edited

patrick-kidger commented Aug 29, 2022 •

edited

jenkspt commented Aug 30, 2022 •

edited

patrick-kidger commented Aug 30, 2022 •

edited