Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install instructions are not clear #12

Closed
imbalu007 opened this issue Jan 15, 2024 · 2 comments
Closed

Install instructions are not clear #12

imbalu007 opened this issue Jan 15, 2024 · 2 comments

Comments

@imbalu007
Copy link

imbalu007 commented Jan 15, 2024

Hi,
I find the following missing from the install instructions

  1. How do I install autocompressors package?
  2. What should I install to just perform inference (i.e., obtain soft prompts for a given prompt)?
  3. How can I run (2) without flash-attention?
@CodeCreator
Copy link
Member

Thanks! I've clarified the installation instructions in the README. The general outline is to clone the repo, install dependencies and run the example inference code in the README. Unfortunately, the Llama code requires flash-attention (and there seems to be a performance gap when training the model with flash-attention and running inference without it). The OPT AutoCompressor does not use flash attention by default.

@CodeCreator
Copy link
Member

Closing this to due to inactivity -- feel free to re-open!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants