Skip to content

Conversation

@sroecker
Copy link
Contributor

The larger Granite Code Models 20B and 34B are based on Starcoder.
One difference though is tied word embeddings.
This change should not break the existing Starcoder models.
A bit more work is required to support all Granite code models:
#7116

sroecker added 2 commits May 15, 2024 18:08
Tie the weights for ARCH_STARCODER to support the larger Granite code models.
Partially addresses ggerganov/issues/7116

There still remains to be a few things to fix.
Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`
@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level model Model specific labels May 16, 2024
@ggerganov ggerganov merged commit 0f98acf into ggml-org:master May 18, 2024
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request May 18, 2024
…rg#7324)

Tie the weights for ARCH_STARCODER to support the larger Granite code models.
Partially addresses ggerganov/issues/7116

There still remains to be a few things to fix.
Currently requires `--override-kv tokenizer.ggml.add_bos_token=bool:false`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants