Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpt : do not duplicate token_embd.weight on disk #5670

Merged
merged 3 commits into from
Feb 22, 2024

Conversation

cebtenzzre
Copy link
Collaborator

Previous attempt was #3626.

Should be merged after #5650, which will quantize the token_embd tensor with the same type that the output tensor was previously.

@cebtenzzre cebtenzzre merged commit 15499eb into master Feb 22, 2024
51 of 58 checks passed
@cebtenzzre cebtenzzre deleted the ceb/mpt-tied-output branch February 22, 2024 22:05
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
@nviet
Copy link

nviet commented Mar 18, 2024

This change helped reduce the model file size but breaks loading some previously converted models. I got "wrong number of tensors" error message while trying to load this model - until b2248 it was still okay. Do we really need to convert the models again or is there any better way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants