FFF-BERT seems to run slower than a vanilla BERT model #1

p-i- · 2023-11-23T23:32:09Z

Boyan and I performance-tested the FFF-BERT (on HuggingFace) against a vanilla BERT of similar size, and found that it performs maybe 15% more slowly on my M2 mac.

https://gist.github.com/p-i-/355668983aaeee3f282977cdfb93017c

This seems surprising, as the benchmarks do indeed demonstrate a ~50x speedup for a single feed-forward layer:

#!/bin/bash

echo "🔸 Batch size 100"
echo "naive FF (batch matmult)"
python main.py  --model ff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 100  --n-iters 10  --device cpu

echo "FFF (batch matmult)"
python main.py  --model fff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 100  --n-iters 10  --device cpu


echo "🔸 Batch size 10"
echo "naive FF (batch matmult)"
python main.py  --model ff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 10  --n-iters 10  --device cpu

echo "FFF (batch matmult)"
python main.py  --model fff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 10  --n-iters 10  --device cpu


echo "🔸 Batch size 1"

echo "naive FF (batch matmult)"
python main.py  --model ff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 1  --n-iters 10  --device cpu

echo "FFF (batch matmult)"
python main.py  --model fff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 1  --n-iters 10  --device cpu

> . run.sh 
🔸 Batch size 100
naive FF (batch matmult)
eager: 1.3852830000000003
compile: 1.366022000000001
(eval) compiled: 1.3960490000000003 ± 0.03737091447636828
~~~~~~~~~~
FFF (batch matmult)
eager: 0.05451000000000006
compile: 0.018572000000000255
(eval) compiled: 0.01893820000000006 ± 0.0015569136006856079
~~~~~~~~~~
🔸 Batch size 10
naive FF (batch matmult)
eager: 0.141181
compile: 0.1446900000000002
(eval) compiled: 0.1389437 ± 0.0026585709714055
~~~~~~~~~~
FFF (batch matmult)
eager: 0.005520000000000191
compile: 0.001954999999999707
(eval) compiled: 0.002634200000000009 ± 0.0015433838667031883
~~~~~~~~~~
🔸 Batch size 1
naive FF (batch matmult)
eager: 0.01369599999999993
compile: 0.01478299999999999
(eval) compiled: 0.014860099999999932 ± 0.0014923330358871411
~~~~~~~~~~
FFF (batch matmult)
eager: 0.0005589999999999762
compile: 0.0005690000000000417
(eval) compiled: 0.0003634999999999167 ± 7.71248987033366e-05
~~~~~~~~~~

Speedups for batchsize 100 10 1:

In [1]: 1.3471607 / 0.019425799999999917, 0.14026139999999993 / 0
   ...: .0023557000000000716, 0.014105299999999899 / 0.0003394000
   ...: 0000001194
Out[1]: (69.34904611393127, 59.54128284586138, 41.5595167943412)

The text was updated successfully, but these errors were encountered:

pbelcak · 2023-12-04T14:07:07Z

Hi @p-i-,

As noted in the introduction section of the preprint, the model provided through HuggingFace is only a simulation of the conditionality. In the code itself, you will find that the FFF implementation available to the HuggingFace model is only masking out all neurons that are not being used for the particular inference instance.

That is why you are not seeing any meaningful improvement :)

p-i- mentioned this issue Nov 23, 2023

FFF-BERT seems to run slower than a vanilla BERT model pbelcak/fastfeedforward#6

Closed

pbelcak added question Further information is requested DIP This issue is being discussed. and removed DIP This issue is being discussed. labels Dec 4, 2023

pbelcak closed this as completed Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFF-BERT seems to run slower than a vanilla BERT model #1

FFF-BERT seems to run slower than a vanilla BERT model #1

p-i- commented Nov 23, 2023

pbelcak commented Dec 4, 2023

FFF-BERT seems to run slower than a vanilla BERT model #1

FFF-BERT seems to run slower than a vanilla BERT model #1

Comments

p-i- commented Nov 23, 2023

pbelcak commented Dec 4, 2023