Sat 07 Nov 2020 04:20:19 PM +08

jethrokuan · Nov 7, 2020 · 9182adc · 9182adc
1 parent d45ac82
commit 9182adc
Show file tree

Hide file tree

Showing 3 changed files with 40 additions and 0 deletions.
diff --git a/content/posts/making_transformer_models_efficient.md b/content/posts/making_transformer_models_efficient.md
@@ -0,0 +1,22 @@
++++
+title = "Making Transformer Models Efficient"
+author = ["Jethro Kuan"]
+draft = false
++++
+
+The traditional [Transformer]({{< relref "transformer" >}}) model has memory and computational complexities that
+are quadratic with the input sequence length (\\(O(N^2)\\)). This limits the utility
+of Transformer models, since their main benefit is the ability to learn
+alignments across long sequences.
+
+Efficient transformer models attempt to alleviate the cost of computing the
+attention matrix, either by approximating the matrix, or by introducing
+sparsity. (NO_ITEM_DATA:tayEfficientTransformersSurvey2020) provides a good overview of
+these efficient Transformer models. The key summary table in the paper is
+reproduced below.
+
+{{< figure src="/ox-hugo/screenshot2020-11-07_16-18-25_.png" caption="Figure 1: Summary of Efficient Transformer Models" >}}
+
+## Bibliography {#bibliography}
+
+NO_ITEM_DATA:tayEfficientTransformersSurvey2020
diff --git a/org/making_transformer_models_efficient.org b/org/making_transformer_models_efficient.org
@@ -0,0 +1,18 @@
+#+title: Making Transformer Models Efficient
+
+The traditional [[file:transformer.org][Transformer]] model has memory and computational complexities that
+are quadratic with the input sequence length ($O(N^2)$). This limits the utility
+of Transformer models, since their main benefit is the ability to learn
+alignments across long sequences.
+
+Efficient transformer models attempt to alleviate the cost of computing the
+attention matrix, either by approximating the matrix, or by introducing
+sparsity. cite:tayEfficientTransformersSurvey2020 provides a good overview of
+these efficient Transformer models. The key summary table in the paper is
+reproduced below.
+
+#+DOWNLOADED: screenshot @ 2020-11-07 16:18:25
+#+CAPTION: Summary of Efficient Transformer Models
+[[file:images/making_transformer_models_efficient/screenshot2020-11-07_16-18-25_.png]]
+
+bibliography:biblio.bib
diff --git a/static/ox-hugo/screenshot2020-11-07_16-18-25_.png b/static/ox-hugo/screenshot2020-11-07_16-18-25_.png