Release v0.3.2 · softmax1/Flash-Attention-Softmax-N

There was a request on LinkedIn from Rami to give XLNet the "one-line treatment."

What that means is someone would be able to convert their XLNet model from softmax_0 to softmax_n by adding just one line of code to their script (or two lines if you count the import statement). For example,

import transformers

from flash_attention_softmax_n.surgery import apply_attention_softmax_n


model = transformers.AutoModel.from_pretrained('xlnet-base-cased')
apply_attention_softmax_n(model=model, softmax_n_param=1.)
...
On the backend, the _xlnet subpackage contains the modified rel_attn_core method code. It also adds the XLNetRelativeAttention class to the policy registry such that apply_attention_softmax_n and AttentionSoftmaxN know to operate on it.

As far as testing, the unit tests in tests/cpu/surgery/test_xlnet.py all pass. The tests in the cpu subpackage will run on a cpu or gpu. (The gpu subpackage tests are gpu-only.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!