Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flash Attention Implementation & Fuller Config Options #139

Merged
merged 9 commits into from
Apr 9, 2024
Merged

Conversation

benjaminye
Copy link
Contributor

Addresses #130

@benjaminye benjaminye added enhancement New feature or request dependencies Pull requests that update a dependency file labels Apr 8, 2024
@benjaminye benjaminye linked an issue Apr 8, 2024 that may be closed by this pull request
Comment on lines +61 to +65
**pip**

```
pip install flash-attn --no-build-isolation
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to install it with llm-toolkit, so users do not have to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not possible as the package doesn't support PEP 517. See python-poetry/poetry#8427, Dao-AILab/flash-attention#493 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ Also that's the first thing I tried 🙈

llmtune/inference/lora.py Outdated Show resolved Hide resolved
@benjaminye benjaminye merged commit fa532c0 into main Apr 9, 2024
3 checks passed
@benjaminye benjaminye deleted the flash-attn branch April 10, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add flash_attention_1/flash_attention_2 support & examples
3 participants