Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFF-BERT seems to run slower than a vanilla BERT model #1

Closed
p-i- opened this issue Nov 23, 2023 · 1 comment
Closed

FFF-BERT seems to run slower than a vanilla BERT model #1

p-i- opened this issue Nov 23, 2023 · 1 comment
Labels
question Further information is requested

Comments

@p-i-
Copy link

p-i- commented Nov 23, 2023

Boyan and I performance-tested the FFF-BERT (on HuggingFace) against a vanilla BERT of similar size, and found that it performs maybe 15% more slowly on my M2 mac.

https://gist.github.com/p-i-/355668983aaeee3f282977cdfb93017c

This seems surprising, as the benchmarks do indeed demonstrate a ~50x speedup for a single feed-forward layer:

#!/bin/bash

echo "🔸 Batch size 100"
echo "naive FF (batch matmult)"
python main.py  --model ff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 100  --n-iters 10  --device cpu

echo "FFF (batch matmult)"
python main.py  --model fff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 100  --n-iters 10  --device cpu


echo "🔸 Batch size 10"
echo "naive FF (batch matmult)"
python main.py  --model ff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 10  --n-iters 10  --device cpu

echo "FFF (batch matmult)"
python main.py  --model fff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 10  --n-iters 10  --device cpu


echo "🔸 Batch size 1"

echo "naive FF (batch matmult)"
python main.py  --model ff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 1  --n-iters 10  --device cpu

echo "FFF (batch matmult)"
python main.py  --model fff_bmm  --input-width 8000  --hidden-width 4000  --output-width 8000  --depth 8  --batch-size 1  --n-iters 10  --device cpu
> . run.sh 
🔸 Batch size 100
naive FF (batch matmult)
eager: 1.3852830000000003
compile: 1.366022000000001
(eval) compiled: 1.3960490000000003 ± 0.03737091447636828
~~~~~~~~~~
FFF (batch matmult)
eager: 0.05451000000000006
compile: 0.018572000000000255
(eval) compiled: 0.01893820000000006 ± 0.0015569136006856079
~~~~~~~~~~
🔸 Batch size 10
naive FF (batch matmult)
eager: 0.141181
compile: 0.1446900000000002
(eval) compiled: 0.1389437 ± 0.0026585709714055
~~~~~~~~~~
FFF (batch matmult)
eager: 0.005520000000000191
compile: 0.001954999999999707
(eval) compiled: 0.002634200000000009 ± 0.0015433838667031883
~~~~~~~~~~
🔸 Batch size 1
naive FF (batch matmult)
eager: 0.01369599999999993
compile: 0.01478299999999999
(eval) compiled: 0.014860099999999932 ± 0.0014923330358871411
~~~~~~~~~~
FFF (batch matmult)
eager: 0.0005589999999999762
compile: 0.0005690000000000417
(eval) compiled: 0.0003634999999999167 ± 7.71248987033366e-05
~~~~~~~~~~

Speedups for batchsize 100 10 1:

In [1]: 1.3471607 / 0.019425799999999917, 0.14026139999999993 / 0
   ...: .0023557000000000716, 0.014105299999999899 / 0.0003394000
   ...: 0000001194
Out[1]: (69.34904611393127, 59.54128284586138, 41.5595167943412)
@pbelcak
Copy link
Owner

pbelcak commented Dec 4, 2023

Hi @p-i-,

As noted in the introduction section of the preprint, the model provided through HuggingFace is only a simulation of the conditionality. In the code itself, you will find that the FFF implementation available to the HuggingFace model is only masking out all neurons that are not being used for the particular inference instance.

That is why you are not seeing any meaningful improvement :)

@pbelcak pbelcak added question Further information is requested DIP This issue is being discussed. and removed DIP This issue is being discussed. labels Dec 4, 2023
@pbelcak pbelcak closed this as completed Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants