-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CohereForAI/c4ai-command-r-plus-4bit deployment fails on Inference Endpoint #1799
Comments
Fails the same way for me on a gcp vm with
|
same issue |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
System Info
Information
Tasks
Reproduction
See this output:
It looks like the flash attention mechanism ist not working properly.
Expected behavior
Expected behaviour is to start normally.
Note: I have not been able to get it running with any cohere model on any hardware configuration.
The text was updated successfully, but these errors were encountered: