Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT2 Architecture Integration #4073

Closed
dpleus opened this issue Nov 14, 2023 · 6 comments · Fixed by #4555
Closed

GPT2 Architecture Integration #4073

dpleus opened this issue Nov 14, 2023 · 6 comments · Fixed by #4555
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@dpleus
Copy link

dpleus commented Nov 14, 2023

Feature Description

The idea is to be able to convert models using the GPT2 architecture into GGUF. The convert-hf-to-gguf.py should include GPT2, as well as llama.cpp for running the model.

Motivation

There are quite a few models for low resource languages or specific use cases that are fine-tuned on GPT2 architecture.

Possible Implementation

The structure of models is quite similar to Starcoder. From my understanding, you can modify it quite easily by:

convert-hf-to-gguf.py

  • Add a new model class
  • Modify the set_gguf_parameters() [kv heads] and write_tensors() [maybe you need to transpose the qkv, up-ffn and down-ffn layer] methods

llama.cpp

  • Add an new model class

Status

I tried implementing that myself, but am not deep enough into the topic and find it quite hard to understand the libraries structure (is there any good documentation). So, I am probably not able to pull this off by myself, but am happy to support!

@dpleus dpleus added the enhancement New feature or request label Nov 14, 2023
@dpleus
Copy link
Author

dpleus commented Nov 21, 2023

I started, but could not get it to work. The model outputs something, but just gibberish. I lack the documentation into LLama.cpp and the C++ skills to really finish this, but maybe someone has an idea on how to get it over the line. This is how far I got in my own fork.

I based my implementation mainly on the Starcoder class, because the architecture is quite similar. I took inspiration from mmnga's fork, who implemented it in an older version.

From my understanding, you need to modify the following elements in the code.

Serializing the model using convert-hf-to-gguf.py
Adding the architecture by creating a GPT2 class that contains

  • set_gguf_parameters() -> deviation from Starcoder is head_count_kv=n_head and add_rope_scaling_type = None
  • write_tensors() -> adding skipping .attn.masked_bias", ".attn.bias" and transposing ".ffn_down.weight", ".ffn_up.weight", ".attn_qkv.weight", ".attn_output,weight"
  • set_vocab() -> using sentence piece. Might need to change that, because I used a fine-tuned one without original GPT2 tokenizer

Adding the mappings in gguf-py/gguf/constants.py and gguf-py/gguf/tensor_mapping.py

  • Add the architecture in constants
  • Move "transformer.h.{bid}.ln_2" to ATTN_NORM_2 in tensor_mapping.py

Adjust the backend file llama.cpp

  • Creating the inital model architecture in class LLM_ARCH_GPT2 (entirely based on Starcoder, except for the Norm_2)
  • Build the graph (entirely based on Starcoder, except for the Norm_2)

@Galunid
Copy link
Collaborator

Galunid commented Nov 21, 2023

Which model are you using? I tried https://huggingface.co/gpt2 and https://huggingface.co/gpt2-medium/tree/main, but they fail to convert, once I added missing properties, they still miss output
The keys I see in my models:

h.{lid}.attn.c_attn
h.{lid}.attn.c_proj
h.{lid}.ln_1
h.{lid}.ln_2
h.{lid}.mlp.c_fc
h.{lid}.mlp.c_proj
ln_f
wpe
wte

@Galunid
Copy link
Collaborator

Galunid commented Nov 21, 2023

As a sidenote

    def set_vocab(self):
        self._set_vocab_sentencepiece()

Should most likely be

    def set_vocab(self):
        self._set_vocab_gpt2()

@dpleus
Copy link
Author

dpleus commented Nov 22, 2023

@Galunid Thanks for having a look into this 👍 I first started with one of the models from AI Sweden, which is based on GPT2. But I realised they have a few specifics, so I made new commit with a few changes to make it compatible with the original GPT2.

  • Adjusting the tokenizer
  • Setting "add_feed_forward_length" to "n_inner" or "n_embd*4"
  • Adding the "transformer." to the layer names prefix if not given (there seems to be some inconsistencies in various GPT2 derivatives)

The other thing, as you mentioned, is the lack of an output layer. I extracted it from the model and wrote it to the safetensor file (code below) . But wasn't sure how to fit it into the codebase best.

Overall it runs through, but the output is still somewhat gibberish.

Once upon a time, the was
The I on. is the. of the it all the.
can.
and is's, is

Code to add the output layer to safetensors.

from transformers import AutoTokenizer, AutoModelForCausalLM
from safetensors.torch import save_file, safe_open
import os

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def safe_save_tensors(model, file_path):
    tensors = {}
    try:
        # Open the file in a safe context
        with safe_open(file_path, framework="pt", device="cpu") as f:
            for k in f.keys():
                tensors[k] = f.get_tensor(k)
            tensors["lm_head.weight"] = model.lm_head.weight

        # Save to a temporary file first
        temp_file_path = file_path + ".temp"
        save_file(tensors, temp_file_path)
        del tensors

        # Rename the temporary file
        if os.path.exists(file_path):
            os.remove(file_path)
        os.rename(temp_file_path, file_path)
    except Exception as e:
        print("An error occurred:", e)
        # Handle or log the error as needed


safe_save_tensors(model, "model/model.safetensors")

@ggerganov
Copy link
Owner

Would be great to add GPT2 arch to llama.cpp.
A working example is available in the ggml repo

@ggerganov ggerganov added the good first issue Good for newcomers label Nov 22, 2023
@manikbhandari
Copy link
Contributor

I'd like to help with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants