Skip to content
This repository was archived by the owner on Jun 24, 2024. It is now read-only.
This repository was archived by the owner on Jun 24, 2024. It is now read-only.

Supporting Llama-2 70B param #402

@AmineDiro

Description

@AmineDiro

Hello,
First of all, I want to thank you for this amazing crate! It is truly a joy to work on LLM using Rust 😄 .

I recently wrote an API that serves Llama-2 models using this crate.

I have an issue for serving Llama2-70B-GGML model. The 65B llama and 70B Llama-2 models use grouped query attention. This is done in llama.cpp by specifying the n_gqa params in model hyperparameters which feels a little bit hacky 🤔

I would love to work on adding support for the n_gqa on this crate, I think that it can be added to the Llama model:

/// LLaMA [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning))
#[derive(Debug, Default, PartialEq, Eq, Clone, Copy)]
pub struct Hyperparameters {
  ....
   pub n_head_kv: usize,
}

Where the n_head_kv is n_head_kv = n_head / n_gqa; and n_head % n_gqa == 0 and pass the n_gqa parameter in ModelParameters as Optional 🤔

Thank you for you help !

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions