Update index.html

mobiusml · Nov 13, 2023 · a358480 · a358480
1 parent de47317
commit a358480
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/index.html b/index.html
@@ -138,7 +138,7 @@ <h2 id="hqq" class="">Half-Quadratic Quantization</h2>
                         & \beta^{(t+1)}\leftarrow\kappa\beta^{(t)}
                         \end{array}$$
 
-                        <h4>Sub-problem (\( (sp}_{1} \))</h4>
+                        <h4>Sub-problem \text{(sp}_{1})</h4>
 
                         This problem takes the form of a <a href="https://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf">Proximal Operator</a>. When \( \phi() \) is the \( l_{1} \) norm, the solution is the <a href="https://sparse-plex.readthedocs.io/en/latest/book/opt/soft_thresholding.html">soft-thresholding operator</a>. There exists a more general thresholding solution for the \( l_{p}\)-norm with \( 0 \le p \leq 1 \) that we adopt known is as the <a href="https://inria.hal.science/hal-01317151/file/lowrank_ieee_tip.pdf">generalized soft-thresholding operator</a>:
 
@@ -148,7 +148,7 @@ <h4>Sub-problem (\( (sp}_{1} \))</h4>
                         \end{array}$$
 
 
-                        <h4>Sub-problem (\( (sp}_{2} \))</h4>
+                        <h4>Sub-problem \text{(sp}_{2})</h4>
                         The second sub-problem can be rewritten as follows:
                         $$\begin{array}{c}
                         z^{(t+1)}\leftarrow\underset{z}{\text{argmin}}\,\frac{1}{2}||z-\left(W_{q}^{(t+1)}-\frac{(W-W_{e}^{(t+1)})}{s}\right)||_{2}^{2}\\
@@ -165,8 +165,8 @@ <h4>Sub-problem (\( (sp}_{2} \))</h4>
                         <h2 id="processing_time" class="">Processing Time</h2>
                         <p>We report the processing time to quantize the <a href="https://ai.meta.com/llama/">Llama2</a> models. We noticed that the processing time for GPTQ and AWQ drastically changes from one machine to another. GPTQ heavily relies on the CPU which creates issues on virtual machines, so we limit the number of threads to those available in the virtual machine (32) to avoid the process hanging for hours. Our method performs the whole quantization on the GPU with half-precision and only uses the CPU to transfer data to the GPU once the solver is finished. </p>
                         <center><img src="figs/llama2-7b_time.png" /></center>
-                        center><img src="figs/llama2-13b_time.png" /></center>
-                        center><img src="figs/llama2-70b_time.png" /></center>
+                        <center><img src="figs/llama2-13b_time.png" /></center>
+                        <center><img src="figs/llama2-70b_time.png" /></center>
 
                         <h2 id="benchmark" class="">Benchmark</h2>