methods 4

kheyer · Apr 25, 2019 · 93c71de · 93c71de
1 parent 873596c
commit 93c71de
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/Methods/Methods Long Form.ipynb b/Methods/Methods Long Form.ipynb
@@ -393,7 +393,7 @@
     "\n",
     "### 5.4 Discriminative Learning Rates\n",
     "\n",
-    "Different layers in the network encode different types of information [25]. In the context of transfer learning, different layers of the pre-trained model need to be fine tuned to different extents. This is done through the use of discriminative learning rates, introduced by [1]. With this technique, higher layers in the model are fine-tuned at higher learning rates compared to the lower layers of the model. Following [1], learning rates follow the function $\\eta^{l-1} = \\eta^{l}/2.6$.\n",
+    "Different layers in the network encode different types of information [25]. In the context of transfer learning, different layers of the pre-trained model need to be fine tuned to different extents. This is done through the use of discriminative learning rates, introduced by [1]. With this technique, higher layers in the model are fine-tuned at higher learning rates compared to the lower layers of the model. Following [1], learning rates follow the function $\\eta^{l-1} = \\frac{\\eta^{l}}{2.6}$.\n",
     "\n",
     "Discriminative learning rates are used in fune tuning the language model and training the classification model.\n",
     "\n",
@@ -435,7 +435,7 @@
     "\n",
     "### 5.8 Language Model Fine Tuning\n",
     "\n",
-    "Language Model fine tuning on a classification corpus is done using the One Cycle policy with discriminative learning rates. Discriminative learning rates follow the form $\\eta^{l-1} = \\eta^{l}/2.6$. Learning rates depend on the dataset but typically range from $5e-4$ and $5e-3$. The model is trained using cross entropy loss.\n",
+    "Language Model fine tuning on a classification corpus is done using the One Cycle policy with discriminative learning rates. Discriminative learning rates follow the form $\\eta^{l-1} = \\frac{\\eta^{l}}{2.6}$. Learning rates depend on the dataset but typically range from $5e-4$ and $5e-3$. The model is trained using cross entropy loss.\n",
     "\n",
     "### 5.9 Classification Model Training\n",
     "\n",