This repo contains code accompanying the paper "Increasing Learning Efficiency of Self-Attention Networks through Direct Position Interactions, Learnable Temperature, and Convoluted Attention".
Requirements are in requirements.txt
. To run it on GPU, CUDA 10.0 is required (the code is already old).
@inproceedings{dufter2020increasing,
title={Increasing Learning Efficiency of Self-Attention Networks through Direct Position Interactions, Learnable Temperature, and Convoluted Attention},
author={Dufter, Philipp and Schmitt, Martin and Sch{\"u}tze, Hinrich},
booktitle={to appear in COLING2020},
year={2020}
}