[GPT-2] Convert h5 to ggml #35

ocordeiro · 2023-03-10T12:59:51Z

I adapted the GPT-J example script to convert a Portuguese fine-tuned GPT2 model in h5 format to ggml.

full conversion log:


Some weights of the model checkpoint at /Volumes/Documentos/Models/gpt2-small-portuguese were not used when initializing GPT2Model: ['lm_head.weight']

This IS expected if you are initializing GPT2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing GPT2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Processing variable: wte.weight with shape:  (50257, 768)

Processing variable: wpe.weight with shape:  (1024, 768)

Processing variable: h.0.ln_1.weight with shape:  (768,)

Processing variable: h.0.ln_1.bias with shape:  (768,)

Processing variable: h.0.attn.bias with shape:  (1024, 1024)

Skipping variable: h.0.attn.bias

Processing variable: h.0.attn.masked_bias with shape:  ()

Skipping variable: h.0.attn.masked_bias

Processing variable: h.0.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.0.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.0.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.0.attn.c_proj.bias with shape:  (768,)

Processing variable: h.0.ln_2.weight with shape:  (768,)

Processing variable: h.0.ln_2.bias with shape:  (768,)

Processing variable: h.0.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.0.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.0.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.0.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.1.ln_1.weight with shape:  (768,)

Processing variable: h.1.ln_1.bias with shape:  (768,)

Processing variable: h.1.attn.bias with shape:  (1024, 1024)

Skipping variable: h.1.attn.bias

Processing variable: h.1.attn.masked_bias with shape:  ()

Skipping variable: h.1.attn.masked_bias

Processing variable: h.1.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.1.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.1.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.1.attn.c_proj.bias with shape:  (768,)

Processing variable: h.1.ln_2.weight with shape:  (768,)

Processing variable: h.1.ln_2.bias with shape:  (768,)

Processing variable: h.1.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.1.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.1.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.1.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.2.ln_1.weight with shape:  (768,)

Processing variable: h.2.ln_1.bias with shape:  (768,)

Processing variable: h.2.attn.bias with shape:  (1024, 1024)

Skipping variable: h.2.attn.bias

Processing variable: h.2.attn.masked_bias with shape:  ()

Skipping variable: h.2.attn.masked_bias

Processing variable: h.2.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.2.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.2.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.2.attn.c_proj.bias with shape:  (768,)

Processing variable: h.2.ln_2.weight with shape:  (768,)

Processing variable: h.2.ln_2.bias with shape:  (768,)

Processing variable: h.2.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.2.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.2.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.2.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.3.ln_1.weight with shape:  (768,)

Processing variable: h.3.ln_1.bias with shape:  (768,)

Processing variable: h.3.attn.bias with shape:  (1024, 1024)

Skipping variable: h.3.attn.bias

Processing variable: h.3.attn.masked_bias with shape:  ()

Skipping variable: h.3.attn.masked_bias

Processing variable: h.3.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.3.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.3.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.3.attn.c_proj.bias with shape:  (768,)

Processing variable: h.3.ln_2.weight with shape:  (768,)

Processing variable: h.3.ln_2.bias with shape:  (768,)

Processing variable: h.3.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.3.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.3.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.3.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.4.ln_1.weight with shape:  (768,)

Processing variable: h.4.ln_1.bias with shape:  (768,)

Processing variable: h.4.attn.bias with shape:  (1024, 1024)

Skipping variable: h.4.attn.bias

Processing variable: h.4.attn.masked_bias with shape:  ()

Skipping variable: h.4.attn.masked_bias

Processing variable: h.4.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.4.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.4.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.4.attn.c_proj.bias with shape:  (768,)

Processing variable: h.4.ln_2.weight with shape:  (768,)

Processing variable: h.4.ln_2.bias with shape:  (768,)

Processing variable: h.4.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.4.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.4.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.4.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.5.ln_1.weight with shape:  (768,)

Processing variable: h.5.ln_1.bias with shape:  (768,)

Processing variable: h.5.attn.bias with shape:  (1024, 1024)

Skipping variable: h.5.attn.bias

Processing variable: h.5.attn.masked_bias with shape:  ()

Skipping variable: h.5.attn.masked_bias

Processing variable: h.5.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.5.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.5.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.5.attn.c_proj.bias with shape:  (768,)

Processing variable: h.5.ln_2.weight with shape:  (768,)

Processing variable: h.5.ln_2.bias with shape:  (768,)

Processing variable: h.5.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.5.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.5.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.5.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.6.ln_1.weight with shape:  (768,)

Processing variable: h.6.ln_1.bias with shape:  (768,)

Processing variable: h.6.attn.bias with shape:  (1024, 1024)

Skipping variable: h.6.attn.bias

Processing variable: h.6.attn.masked_bias with shape:  ()

Skipping variable: h.6.attn.masked_bias

Processing variable: h.6.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.6.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.6.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.6.attn.c_proj.bias with shape:  (768,)

Processing variable: h.6.ln_2.weight with shape:  (768,)

Processing variable: h.6.ln_2.bias with shape:  (768,)

Processing variable: h.6.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.6.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.6.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.6.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.7.ln_1.weight with shape:  (768,)

Processing variable: h.7.ln_1.bias with shape:  (768,)

Processing variable: h.7.attn.bias with shape:  (1024, 1024)

Skipping variable: h.7.attn.bias

Processing variable: h.7.attn.masked_bias with shape:  ()

Skipping variable: h.7.attn.masked_bias

Processing variable: h.7.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.7.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.7.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.7.attn.c_proj.bias with shape:  (768,)

Processing variable: h.7.ln_2.weight with shape:  (768,)

Processing variable: h.7.ln_2.bias with shape:  (768,)

Processing variable: h.7.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.7.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.7.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.7.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.8.ln_1.weight with shape:  (768,)

Processing variable: h.8.ln_1.bias with shape:  (768,)

Processing variable: h.8.attn.bias with shape:  (1024, 1024)

Skipping variable: h.8.attn.bias

Processing variable: h.8.attn.masked_bias with shape:  ()

Skipping variable: h.8.attn.masked_bias

Processing variable: h.8.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.8.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.8.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.8.attn.c_proj.bias with shape:  (768,)

Processing variable: h.8.ln_2.weight with shape:  (768,)

Processing variable: h.8.ln_2.bias with shape:  (768,)

Processing variable: h.8.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.8.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.8.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.8.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.9.ln_1.weight with shape:  (768,)

Processing variable: h.9.ln_1.bias with shape:  (768,)

Processing variable: h.9.attn.bias with shape:  (1024, 1024)

Skipping variable: h.9.attn.bias

Processing variable: h.9.attn.masked_bias with shape:  ()

Skipping variable: h.9.attn.masked_bias

Processing variable: h.9.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.9.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.9.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.9.attn.c_proj.bias with shape:  (768,)

Processing variable: h.9.ln_2.weight with shape:  (768,)

Processing variable: h.9.ln_2.bias with shape:  (768,)

Processing variable: h.9.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.9.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.9.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.9.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.10.ln_1.weight with shape:  (768,)

Processing variable: h.10.ln_1.bias with shape:  (768,)

Processing variable: h.10.attn.bias with shape:  (1024, 1024)

Skipping variable: h.10.attn.bias

Processing variable: h.10.attn.masked_bias with shape:  ()

Skipping variable: h.10.attn.masked_bias

Processing variable: h.10.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.10.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.10.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.10.attn.c_proj.bias with shape:  (768,)

Processing variable: h.10.ln_2.weight with shape:  (768,)

Processing variable: h.10.ln_2.bias with shape:  (768,)

Processing variable: h.10.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.10.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.10.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.10.mlp.c_proj.bias with shape:  (768,)

Processing variable: h.11.ln_1.weight with shape:  (768,)

Processing variable: h.11.ln_1.bias with shape:  (768,)

Processing variable: h.11.attn.bias with shape:  (1024, 1024)

Skipping variable: h.11.attn.bias

Processing variable: h.11.attn.masked_bias with shape:  ()

Skipping variable: h.11.attn.masked_bias

Processing variable: h.11.attn.c_attn.weight with shape:  (768, 2304)

Processing variable: h.11.attn.c_attn.bias with shape:  (2304,)

Processing variable: h.11.attn.c_proj.weight with shape:  (768, 768)

Processing variable: h.11.attn.c_proj.bias with shape:  (768,)

Processing variable: h.11.ln_2.weight with shape:  (768,)

Processing variable: h.11.ln_2.bias with shape:  (768,)

Processing variable: h.11.mlp.c_fc.weight with shape:  (768, 3072)

Processing variable: h.11.mlp.c_fc.bias with shape:  (3072,)

Processing variable: h.11.mlp.c_proj.weight with shape:  (3072, 768)

Processing variable: h.11.mlp.c_proj.bias with shape:  (768,)

Processing variable: ln_f.weight with shape:  (768,)

Processing variable: ln_f.bias with shape:  (768,)

Done. Output file: /Volumes/Documentos/Models/gpt2-small-portuguese/ggml-model-f32.bin```

At this time this is not done, when I try inference using converted model I get this error:

gpt2_model_load: unknown tensor 'wte.weight' in model file

ocordeiro · 2023-03-10T13:24:45Z

From what I saw the model uses names the variables differently from the original model:

variable: wte.weight shape:  (50257, 768)
variable: wpe.weight shape:  (1024, 768)
variable: h.0.ln_1.weight  shape:  (768,)
variable: h.0.ln_1.bias  shape:  (768,)
variable: h.0.attn.bias  shape:  (1024, 1024)
variable: h.0.attn.masked_bias  shape:  ()
variable: h.0.attn.c_attn.weight  shape:  (768, 2304)
variable: h.0.attn.c_attn.bias  shape:  (2304,)
variable: h.0.attn.c_proj.weight  shape:  (768, 768)
variable: h.0.attn.c_proj.bias  shape:  (768,)
variable: h.0.ln_2.weight shape:  (768,)
variable: h.0.ln_2.bias shape:  (768,)
variable: h.0.mlp.c_fc.weight shape:  (768, 3072)
variable: h.0.mlp.c_fc.bias  shape:  (3072,)
variable: h.0.mlp.c_proj.weight shape:  (3072, 768)
variable: h.0.mlp.c_proj.bias  shape:  (768,)

ocordeiro · 2023-03-10T14:12:20Z

It worked mapping the tensors correctly :D

result:

main: seed = 1678457502
gpt2_model_load: loading model from '/Volumes/Documentos/Models/gpt2-small-portuguese/ggml-model-f32.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx   = 1024
gpt2_model_load: n_embd  = 768
gpt2_model_load: n_head  = 12
gpt2_model_load: n_layer = 12
gpt2_model_load: f16     = 0
gpt2_model_load: ggml ctx size = 546.74 MB
gpt2_model_load: memory size =    72.00 MB, n_mem = 12288
gpt2_model_load: model size  =   474.70 MB
main: number of tokens in prompt = 14

O brasil é o maior país da america latina, que também possui em comum o português e o

main: mem per token =  2004636 bytes
main:     load time =   242.43 ms
main:   sample time =     1.70 ms
main:  predict time =   153.59 ms / 6.68 ms per token
main:    total time =   423.49 ms

ggerganov · 2023-03-17T05:11:17Z

Cool!
However, instead of changing the tensor names in the .cpp, you have to change the names in the python script to match those already used in the .cpp. Otherwise, you break the compatibility with the existing .ckpt conversion

ocordeiro · 2023-03-28T13:38:06Z

Done 👍

Script to convert h5 to ggml adapted from gpt-j example

b7143f0

Fix map tensors

3bce0ec

optimize

23bc8d7

ocordeiro added 2 commits March 27, 2023 21:50

rename headers to keep compatibility

a4d384e

revert gpt-2/main.cpp

202db0c

ocordeiro marked this pull request as ready for review March 28, 2023 13:39

ocordeiro mentioned this pull request Mar 28, 2023

[Feature Request] rinna's Japanese GPT model support #33

Open

ggerganov approved these changes Mar 29, 2023

View reviewed changes

ggerganov merged commit 1f6e88f into ggerganov:master Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPT-2] Convert h5 to ggml #35

[GPT-2] Convert h5 to ggml #35

ocordeiro commented Mar 10, 2023 •

edited

Loading

ocordeiro commented Mar 10, 2023

ocordeiro commented Mar 10, 2023

ggerganov commented Mar 17, 2023

ocordeiro commented Mar 28, 2023

[GPT-2] Convert h5 to ggml #35

[GPT-2] Convert h5 to ggml #35

Conversation

ocordeiro commented Mar 10, 2023 • edited Loading

ocordeiro commented Mar 10, 2023

ocordeiro commented Mar 10, 2023

ggerganov commented Mar 17, 2023

ocordeiro commented Mar 28, 2023

ocordeiro commented Mar 10, 2023 •

edited

Loading