You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Boyan and I performance-tested the FFF-BERT (on HuggingFace) against a vanilla BERT of similar size, and found that it performs maybe 15% more slowly on my M2 mac.
As noted in the introduction section of the preprint, the model provided through HuggingFace is only a simulation of the conditionality. In the code itself, you will find that the FFF implementation available to the HuggingFace model is only masking out all neurons that are not being used for the particular inference instance.
That is why you are not seeing any meaningful improvement :)
pbelcak
added
question
Further information is requested
DIP
This issue is being discussed.
and removed
DIP
This issue is being discussed.
labels
Dec 4, 2023
Boyan and I performance-tested the FFF-BERT (on HuggingFace) against a vanilla BERT of similar size, and found that it performs maybe 15% more slowly on my M2 mac.
https://gist.github.com/p-i-/355668983aaeee3f282977cdfb93017c
This seems surprising, as the benchmarks do indeed demonstrate a ~50x speedup for a single feed-forward layer:
Speedups for batchsize 100 10 1:
The text was updated successfully, but these errors were encountered: