You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks! I've clarified the installation instructions in the README. The general outline is to clone the repo, install dependencies and run the example inference code in the README. Unfortunately, the Llama code requires flash-attention (and there seems to be a performance gap when training the model with flash-attention and running inference without it). The OPT AutoCompressor does not use flash attention by default.
Hi,
I find the following missing from the install instructions
The text was updated successfully, but these errors were encountered: