Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPT-2] Convert h5 to ggml #35

Merged
merged 5 commits into from
Mar 29, 2023
Merged

Conversation

ocordeiro
Copy link
Contributor

@ocordeiro ocordeiro commented Mar 10, 2023

I adapted the GPT-J example script to convert a Portuguese fine-tuned GPT2 model in h5 format to ggml.

full conversion log:

Some weights of the model checkpoint at /Volumes/Documentos/Models/gpt2-small-portuguese were not used when initializing GPT2Model: ['lm_head.weight']

  • This IS expected if you are initializing GPT2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing GPT2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Processing variable: wte.weight with shape: (50257, 768)
    Processing variable: wpe.weight with shape: (1024, 768)
    Processing variable: h.0.ln_1.weight with shape: (768,)
    Processing variable: h.0.ln_1.bias with shape: (768,)
    Processing variable: h.0.attn.bias with shape: (1024, 1024)
    Skipping variable: h.0.attn.bias
    Processing variable: h.0.attn.masked_bias with shape: ()
    Skipping variable: h.0.attn.masked_bias
    Processing variable: h.0.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.0.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.0.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.0.attn.c_proj.bias with shape: (768,)
    Processing variable: h.0.ln_2.weight with shape: (768,)
    Processing variable: h.0.ln_2.bias with shape: (768,)
    Processing variable: h.0.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.0.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.0.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.0.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.1.ln_1.weight with shape: (768,)
    Processing variable: h.1.ln_1.bias with shape: (768,)
    Processing variable: h.1.attn.bias with shape: (1024, 1024)
    Skipping variable: h.1.attn.bias
    Processing variable: h.1.attn.masked_bias with shape: ()
    Skipping variable: h.1.attn.masked_bias
    Processing variable: h.1.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.1.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.1.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.1.attn.c_proj.bias with shape: (768,)
    Processing variable: h.1.ln_2.weight with shape: (768,)
    Processing variable: h.1.ln_2.bias with shape: (768,)
    Processing variable: h.1.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.1.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.1.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.1.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.2.ln_1.weight with shape: (768,)
    Processing variable: h.2.ln_1.bias with shape: (768,)
    Processing variable: h.2.attn.bias with shape: (1024, 1024)
    Skipping variable: h.2.attn.bias
    Processing variable: h.2.attn.masked_bias with shape: ()
    Skipping variable: h.2.attn.masked_bias
    Processing variable: h.2.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.2.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.2.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.2.attn.c_proj.bias with shape: (768,)
    Processing variable: h.2.ln_2.weight with shape: (768,)
    Processing variable: h.2.ln_2.bias with shape: (768,)
    Processing variable: h.2.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.2.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.2.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.2.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.3.ln_1.weight with shape: (768,)
    Processing variable: h.3.ln_1.bias with shape: (768,)
    Processing variable: h.3.attn.bias with shape: (1024, 1024)
    Skipping variable: h.3.attn.bias
    Processing variable: h.3.attn.masked_bias with shape: ()
    Skipping variable: h.3.attn.masked_bias
    Processing variable: h.3.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.3.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.3.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.3.attn.c_proj.bias with shape: (768,)
    Processing variable: h.3.ln_2.weight with shape: (768,)
    Processing variable: h.3.ln_2.bias with shape: (768,)
    Processing variable: h.3.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.3.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.3.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.3.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.4.ln_1.weight with shape: (768,)
    Processing variable: h.4.ln_1.bias with shape: (768,)
    Processing variable: h.4.attn.bias with shape: (1024, 1024)
    Skipping variable: h.4.attn.bias
    Processing variable: h.4.attn.masked_bias with shape: ()
    Skipping variable: h.4.attn.masked_bias
    Processing variable: h.4.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.4.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.4.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.4.attn.c_proj.bias with shape: (768,)
    Processing variable: h.4.ln_2.weight with shape: (768,)
    Processing variable: h.4.ln_2.bias with shape: (768,)
    Processing variable: h.4.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.4.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.4.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.4.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.5.ln_1.weight with shape: (768,)
    Processing variable: h.5.ln_1.bias with shape: (768,)
    Processing variable: h.5.attn.bias with shape: (1024, 1024)
    Skipping variable: h.5.attn.bias
    Processing variable: h.5.attn.masked_bias with shape: ()
    Skipping variable: h.5.attn.masked_bias
    Processing variable: h.5.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.5.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.5.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.5.attn.c_proj.bias with shape: (768,)
    Processing variable: h.5.ln_2.weight with shape: (768,)
    Processing variable: h.5.ln_2.bias with shape: (768,)
    Processing variable: h.5.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.5.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.5.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.5.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.6.ln_1.weight with shape: (768,)
    Processing variable: h.6.ln_1.bias with shape: (768,)
    Processing variable: h.6.attn.bias with shape: (1024, 1024)
    Skipping variable: h.6.attn.bias
    Processing variable: h.6.attn.masked_bias with shape: ()
    Skipping variable: h.6.attn.masked_bias
    Processing variable: h.6.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.6.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.6.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.6.attn.c_proj.bias with shape: (768,)
    Processing variable: h.6.ln_2.weight with shape: (768,)
    Processing variable: h.6.ln_2.bias with shape: (768,)
    Processing variable: h.6.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.6.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.6.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.6.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.7.ln_1.weight with shape: (768,)
    Processing variable: h.7.ln_1.bias with shape: (768,)
    Processing variable: h.7.attn.bias with shape: (1024, 1024)
    Skipping variable: h.7.attn.bias
    Processing variable: h.7.attn.masked_bias with shape: ()
    Skipping variable: h.7.attn.masked_bias
    Processing variable: h.7.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.7.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.7.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.7.attn.c_proj.bias with shape: (768,)
    Processing variable: h.7.ln_2.weight with shape: (768,)
    Processing variable: h.7.ln_2.bias with shape: (768,)
    Processing variable: h.7.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.7.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.7.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.7.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.8.ln_1.weight with shape: (768,)
    Processing variable: h.8.ln_1.bias with shape: (768,)
    Processing variable: h.8.attn.bias with shape: (1024, 1024)
    Skipping variable: h.8.attn.bias
    Processing variable: h.8.attn.masked_bias with shape: ()
    Skipping variable: h.8.attn.masked_bias
    Processing variable: h.8.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.8.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.8.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.8.attn.c_proj.bias with shape: (768,)
    Processing variable: h.8.ln_2.weight with shape: (768,)
    Processing variable: h.8.ln_2.bias with shape: (768,)
    Processing variable: h.8.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.8.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.8.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.8.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.9.ln_1.weight with shape: (768,)
    Processing variable: h.9.ln_1.bias with shape: (768,)
    Processing variable: h.9.attn.bias with shape: (1024, 1024)
    Skipping variable: h.9.attn.bias
    Processing variable: h.9.attn.masked_bias with shape: ()
    Skipping variable: h.9.attn.masked_bias
    Processing variable: h.9.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.9.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.9.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.9.attn.c_proj.bias with shape: (768,)
    Processing variable: h.9.ln_2.weight with shape: (768,)
    Processing variable: h.9.ln_2.bias with shape: (768,)
    Processing variable: h.9.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.9.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.9.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.9.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.10.ln_1.weight with shape: (768,)
    Processing variable: h.10.ln_1.bias with shape: (768,)
    Processing variable: h.10.attn.bias with shape: (1024, 1024)
    Skipping variable: h.10.attn.bias
    Processing variable: h.10.attn.masked_bias with shape: ()
    Skipping variable: h.10.attn.masked_bias
    Processing variable: h.10.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.10.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.10.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.10.attn.c_proj.bias with shape: (768,)
    Processing variable: h.10.ln_2.weight with shape: (768,)
    Processing variable: h.10.ln_2.bias with shape: (768,)
    Processing variable: h.10.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.10.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.10.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.10.mlp.c_proj.bias with shape: (768,)
    Processing variable: h.11.ln_1.weight with shape: (768,)
    Processing variable: h.11.ln_1.bias with shape: (768,)
    Processing variable: h.11.attn.bias with shape: (1024, 1024)
    Skipping variable: h.11.attn.bias
    Processing variable: h.11.attn.masked_bias with shape: ()
    Skipping variable: h.11.attn.masked_bias
    Processing variable: h.11.attn.c_attn.weight with shape: (768, 2304)
    Processing variable: h.11.attn.c_attn.bias with shape: (2304,)
    Processing variable: h.11.attn.c_proj.weight with shape: (768, 768)
    Processing variable: h.11.attn.c_proj.bias with shape: (768,)
    Processing variable: h.11.ln_2.weight with shape: (768,)
    Processing variable: h.11.ln_2.bias with shape: (768,)
    Processing variable: h.11.mlp.c_fc.weight with shape: (768, 3072)
    Processing variable: h.11.mlp.c_fc.bias with shape: (3072,)
    Processing variable: h.11.mlp.c_proj.weight with shape: (3072, 768)
    Processing variable: h.11.mlp.c_proj.bias with shape: (768,)
    Processing variable: ln_f.weight with shape: (768,)
    Processing variable: ln_f.bias with shape: (768,)
    Done. Output file: /Volumes/Documentos/Models/gpt2-small-portuguese/ggml-model-f32.bin```

At this time this is not done, when I try inference using converted model I get this error:

gpt2_model_load: unknown tensor 'wte.weight' in model file

@ocordeiro
Copy link
Contributor Author

From what I saw the model uses names the variables differently from the original model:

variable: wte.weight shape:  (50257, 768)
variable: wpe.weight shape:  (1024, 768)
variable: h.0.ln_1.weight  shape:  (768,)
variable: h.0.ln_1.bias  shape:  (768,)
variable: h.0.attn.bias  shape:  (1024, 1024)
variable: h.0.attn.masked_bias  shape:  ()
variable: h.0.attn.c_attn.weight  shape:  (768, 2304)
variable: h.0.attn.c_attn.bias  shape:  (2304,)
variable: h.0.attn.c_proj.weight  shape:  (768, 768)
variable: h.0.attn.c_proj.bias  shape:  (768,)
variable: h.0.ln_2.weight shape:  (768,)
variable: h.0.ln_2.bias shape:  (768,)
variable: h.0.mlp.c_fc.weight shape:  (768, 3072)
variable: h.0.mlp.c_fc.bias  shape:  (3072,)
variable: h.0.mlp.c_proj.weight shape:  (3072, 768)
variable: h.0.mlp.c_proj.bias  shape:  (768,)

@ocordeiro
Copy link
Contributor Author

It worked mapping the tensors correctly :D

result:

main: seed = 1678457502
gpt2_model_load: loading model from '/Volumes/Documentos/Models/gpt2-small-portuguese/ggml-model-f32.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx   = 1024
gpt2_model_load: n_embd  = 768
gpt2_model_load: n_head  = 12
gpt2_model_load: n_layer = 12
gpt2_model_load: f16     = 0
gpt2_model_load: ggml ctx size = 546.74 MB
gpt2_model_load: memory size =    72.00 MB, n_mem = 12288
gpt2_model_load: model size  =   474.70 MB
main: number of tokens in prompt = 14

O brasil é o maior país da america latina, que também possui em comum o português e o

main: mem per token =  2004636 bytes
main:     load time =   242.43 ms
main:   sample time =     1.70 ms
main:  predict time =   153.59 ms / 6.68 ms per token
main:    total time =   423.49 ms

@ggerganov
Copy link
Owner

Cool!
However, instead of changing the tensor names in the .cpp, you have to change the names in the python script to match those already used in the .cpp. Otherwise, you break the compatibility with the existing .ckpt conversion

@ocordeiro
Copy link
Contributor Author

Done 👍

@ocordeiro ocordeiro marked this pull request as ready for review March 28, 2023 13:39
@ggerganov ggerganov merged commit 1f6e88f into ggerganov:master Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants