v0.3.2
There was a request on LinkedIn from Rami to give XLNet the "one-line treatment."
What that means is someone would be able to convert their XLNet model from softmax_0 to softmax_n by adding just one line of code to their script (or two lines if you count the import statement). For example,
import transformers
from flash_attention_softmax_n.surgery import apply_attention_softmax_n
model = transformers.AutoModel.from_pretrained('xlnet-base-cased')
apply_attention_softmax_n(model=model, softmax_n_param=1.)
...
On the backend, the _xlnet subpackage contains the modified rel_attn_core method code. It also adds the XLNetRelativeAttention class to the policy registry such that apply_attention_softmax_n and AttentionSoftmaxN know to operate on it.
As far as testing, the unit tests in tests/cpu/surgery/test_xlnet.py all pass. The tests in the cpu subpackage will run on a cpu or gpu. (The gpu subpackage tests are gpu-only.)