Skip to content

Commit

Permalink
Sat 07 Nov 2020 04:20:19 PM +08
Browse files Browse the repository at this point in the history
  • Loading branch information
jethrokuan committed Nov 7, 2020
1 parent d45ac82 commit 9182adc
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 0 deletions.
22 changes: 22 additions & 0 deletions content/posts/making_transformer_models_efficient.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
+++
title = "Making Transformer Models Efficient"
author = ["Jethro Kuan"]
draft = false
+++

The traditional [Transformer]({{< relref "transformer" >}}) model has memory and computational complexities that
are quadratic with the input sequence length (\\(O(N^2)\\)). This limits the utility
of Transformer models, since their main benefit is the ability to learn
alignments across long sequences.

Efficient transformer models attempt to alleviate the cost of computing the
attention matrix, either by approximating the matrix, or by introducing
sparsity. (NO_ITEM_DATA:tayEfficientTransformersSurvey2020) provides a good overview of
these efficient Transformer models. The key summary table in the paper is
reproduced below.

{{< figure src="/ox-hugo/screenshot2020-11-07_16-18-25_.png" caption="Figure 1: Summary of Efficient Transformer Models" >}}

## Bibliography {#bibliography}

NO_ITEM_DATA:tayEfficientTransformersSurvey2020
18 changes: 18 additions & 0 deletions org/making_transformer_models_efficient.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#+title: Making Transformer Models Efficient

The traditional [[file:transformer.org][Transformer]] model has memory and computational complexities that
are quadratic with the input sequence length ($O(N^2)$). This limits the utility
of Transformer models, since their main benefit is the ability to learn
alignments across long sequences.

Efficient transformer models attempt to alleviate the cost of computing the
attention matrix, either by approximating the matrix, or by introducing
sparsity. cite:tayEfficientTransformersSurvey2020 provides a good overview of
these efficient Transformer models. The key summary table in the paper is
reproduced below.

#+DOWNLOADED: screenshot @ 2020-11-07 16:18:25
#+CAPTION: Summary of Efficient Transformer Models
[[file:images/making_transformer_models_efficient/screenshot2020-11-07_16-18-25_.png]]

bibliography:biblio.bib
Binary file added static/ox-hugo/screenshot2020-11-07_16-18-25_.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 9182adc

Please sign in to comment.