Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 285 Bytes

compress_attention.md

File metadata and controls

8 lines (6 loc) · 285 Bytes

Compressed Attention

It turned out that the attention cannot be compresed using TT-decomposition (empirical result) , but with the Tucker decomposition we achieve the same quality as a full model.

BLEU = 0.44 compression rate: 3.175 compression rate without embeddings: 9.174