1.58 bit implementation #1956

okpatil4u · 2024-03-28T11:15:10Z

Would it possible to implement 1.58 bit quantization on candle ? It was proposed in the following paper,

https://arxiv.org/pdf/2402.17764.pdf

The main inspiration behind using 1.58 bit implementation is that you could replace matrix multiplication with addition. If that is feasible, with apple accelerate framework's SIMD instructions, we could expect better training and inference on large language models.

A couple of Llama.cpp discussions here

ggerganov/llama.cpp#5761
ggerganov/llama.cpp#5999

There is also a training library which was released a couple of days ago,
https://github.com/rafacelente/bllama

Any thoughts ?

LaurentMazare · 2024-03-29T23:04:25Z

Are there some reference trained models somewhere? I haven't been able to find any so far.

okpatil4u · 2024-03-30T04:36:45Z

Apparently this one trains a 54M parameter mode from scratch.

https://github.com/pranavjad/tinyllama-bitnet

And this one is a pretty good technique for quantization which retains the model performance. They have also released the model weights.

https://mobiusml.github.io/1bit_blog/

What is more interesting to me is the replacement of matrix multiplication with addition leading to significant performance gains.

okpatil4u · 2024-03-30T04:38:33Z

And the official models are here

https://huggingface.co/1bitLLM/bitnet_b1_58-3B

LaurentMazare · 2024-04-01T17:45:02Z

Not sure how close to complete this is but @tomsanbear has put up bitnet-rs which seems to be a candle implementation of this archicecture.

okpatil4u · 2024-04-02T09:32:16Z

Thanks @LaurentMazare, this is super helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.58 bit implementation #1956

1.58 bit implementation #1956

okpatil4u commented Mar 28, 2024

LaurentMazare commented Mar 29, 2024

okpatil4u commented Mar 30, 2024

okpatil4u commented Mar 30, 2024

LaurentMazare commented Apr 1, 2024

okpatil4u commented Apr 2, 2024

1.58 bit implementation #1956

1.58 bit implementation #1956

Comments

okpatil4u commented Mar 28, 2024

LaurentMazare commented Mar 29, 2024

okpatil4u commented Mar 30, 2024

okpatil4u commented Mar 30, 2024

LaurentMazare commented Apr 1, 2024

okpatil4u commented Apr 2, 2024