Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support attention_bias on LLaMA architecture #4283

Merged
merged 3 commits into from
Dec 1, 2023

Commits on Dec 1, 2023

  1. Support attention_bias on LLaMA architecture

    QKVO bias, should fix InternLM (ggerganov#3133) and works for LLaMAfied Qwen models (ggerganov#3743 (comment)).
    CausalLM committed Dec 1, 2023
    Configuration menu
    Copy the full SHA
    c48679a View commit details
    Browse the repository at this point in the history
  2. check existence of qkvo bias while loading llama models

    Tested on LLaMA2, CUDA and CPU.
    CausalLM committed Dec 1, 2023
    Configuration menu
    Copy the full SHA
    e192572 View commit details
    Browse the repository at this point in the history
  3. Update llama.cpp

    CausalLM committed Dec 1, 2023
    Configuration menu
    Copy the full SHA
    b1efaed View commit details
    Browse the repository at this point in the history