Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement customizable RoPE #2054

Merged
merged 10 commits into from
Jul 15, 2023
Merged

Implement customizable RoPE #2054

merged 10 commits into from
Jul 15, 2023

Commits on Jul 7, 2023

  1. Implement customizable RoPE

    The original RoPE has pre-defined parameters
    
    theta_i = 10000^(−2(i−1)/d), for i in [1, 2, ..., d/2]
    
    Our customizable RoPE, ggml_rope_custom_inplace, uses
    
    theta_i = scale * base^(−2(i−1)/d), for i in [1, 2, ..., d/2]
    
    with the default matches the original
    
    scale = 1.0
    base = 10000
    
    The new command line arguments
    --rope-freq-base
    --rope-freq-scale
    set the two new RoPE parameter.
    
    Recent researches show changing these two parameters extends the context limit with minimal loss.
    
    1. Extending Context to 8K
       kaiokendev
       https://kaiokendev.github.io/til#extending-context-to-8k
    
    2. Extending Context Window of Large Language Models via Positional Interpolation
       Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian
       https://arxiv.org/abs/2306.15595
    
    3. NTK-Aware Scaled RoPE allows LLaMA models to have extended (8k+) context size without any fine-tuning and minimal perplexity degradation.
       https://www.reddit.com/user/bloc97
       https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/
    
    For the bold, try adding the following command line parameters to your favorite model:
    -c 16384 --rope-freq-base 80000 --rope-freq-scale 0.5
    jxy committed Jul 7, 2023
    Configuration menu
    Copy the full SHA
    dc0d0eb View commit details
    Browse the repository at this point in the history
  2. ggml-metal: fix custom rope

    jxy committed Jul 7, 2023
    Configuration menu
    Copy the full SHA
    1ae4318 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    41819b0 View commit details
    Browse the repository at this point in the history
  4. llama: increase MEM_REQ_EVAL for MODEL_3B

    It avoids crashing for quantized weights on CPU.
    Better ways to calculate the required buffer size would be better.
    jxy committed Jul 7, 2023
    Configuration menu
    Copy the full SHA
    5c6eed3 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    a728a0d View commit details
    Browse the repository at this point in the history
  6. server: use proper Content-Type in curl examples

    Without the header Content-Type: application/json, curl will POST with
    Content-Type: application/x-www-form-urlencoded
    
    Though our simple server doesn't care, the httplib.h used has a limit
    with CPPHTTPLIB_FORM_URL_ENCODED_PAYLOAD_MAX_LENGTH 8192
    
    With Content-Type: application/json, we can send large json data.
    jxy committed Jul 7, 2023
    Configuration menu
    Copy the full SHA
    a3b4d93 View commit details
    Browse the repository at this point in the history

Commits on Jul 13, 2023

  1. Configuration menu
    Copy the full SHA
    a6b5695 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    da730c5 View commit details
    Browse the repository at this point in the history

Commits on Jul 15, 2023

  1. Configuration menu
    Copy the full SHA
    d0b6c94 View commit details
    Browse the repository at this point in the history
  2. ggml : fix asserts

    ggerganov committed Jul 15, 2023
    Configuration menu
    Copy the full SHA
    6024bcc View commit details
    Browse the repository at this point in the history