-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support MiniCPM #5346
Support MiniCPM #5346
Conversation
runfuture
commented
Feb 5, 2024
- Add a new file, convert-minicpm.py, to convert the model (https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp16). It is very similar to convert.py, which only supports Llama.
- Add a new model architecture in llama.cpp, applying several scaling computations as discussed in MiniCPM 2b model support? #5276.
convert-minicpm.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it difficult to merge it into convert.py
instead of creating a new file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To merge it into convert.py can actually be achieved by adding args that explicitly indicate the model architecture. Otherwise, it is difficult to differentiate them automatically since the weight storage of MiniCPM is almost the same as Llama. What's your suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about in convert-hf-to-gguf.py
- would it be easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about in
convert-hf-to-gguf.py
- would it be easier?
In fact, I've tried to work on that. Copied and pasted a lot of other model code and it almost works, but the hard part is the tokenizer, I haven't finished it yet. The number of lines of code added/modified will be 10x ~ 30x more than convert.py.
I'll try to finish it. Then you could compare which way is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about in
convert-hf-to-gguf.py
- would it be easier?
It seems to be working now. Not familiar with the vocab, there may be bugs in _set_vocab_hf (convert-hf-to-gguf.py).
If this way better, I'll remove convert-minicpm.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great. I'm just not sure about the from convert import HfVocab
@cebtenzzre What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine, as long as we're mindful that convert-hf-to-gguf.py is now dependent on convert.py - python is prone to circular dependency issues that can only be resolved by moving code to a shared module.
Improvements have been submitted. :) |
llama.cpp
Outdated
cb(cur, "result_norm", -1); | ||
|
||
// lm_head scaling | ||
float scale_lmhead = 1.0f/9.0f; // 1/(dim_model/256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe keep these constants expanded:
float scale_lmhead = 1.0f/9.0f; // 1/(dim_model/256) | |
const float scale_lmhead = 256.0f/n_embd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but leaving a todo for the future. Let's wait for model development? :)
llama.cpp
Outdated
const int scale_emb = 12; | ||
const int dim_model_base = 256; | ||
const float scale_depth = 1.4f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor change of the constant names:
const int scale_emb = 12; | |
const int dim_model_base = 256; | |
const float scale_depth = 1.4f; | |
const int64_t n_embd_base = 256; | |
const float scale_embd = 12.0f; | |
const float scale_depth = 1.4f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor change of the constant names:
Done with corresponding modifications: this commit
there seems nothing wrong with it, dont wish a 2b model can do things like coding. If you want a better output, try.tp.enable Mirostat v2 in "More options" |
The version from LLM Farm can work with python. Check this: |
Found the problem, your chat prompt templete is wrong,try configure it like this: |
|
It feels not good. |
Hi, I'm one of the maintainers of MiniCPM repo. Could you provide a detailed instruction of how to reproduce this problem? We would like to figure this out. Thank you! |
|
@sweetcard @sweetcard @huangyuxiang03 |
The script seems has another problem: The llama.cpp cant run the model because |
Latest commit fixed this issue. |
The last line of prompt should be |
@huangyuxiang03 @sweetcard @calvinweb |
Download convert using But not able to startup server llama server listening at http://127.0.0.1:8080 {"timestamp":1707389853,"level":"INFO","function":"main","line":2557,"message":"HTTP server listening","port":"8080","hostname":"127.0.0.1"} D:\llama.cpp> |
There were bugs which have been fixed by a new PR. Could you please try using the latest release, b2101, to test again? |
Thank you for your amazing work. It works now.👍 |
* support minicpm arch. * fix tab/space typo. * convert minicpm model via convert-hf-gguf.py * try to make tokenizer work * fix bug for quantize minicpm * fix for flake8 lint * remove convert-minicpm.py * fix for editorconfig * correct minicpm model type (size) * constants expanded for minicpm * Minor change of the constant names for minicpm
* support minicpm arch. * fix tab/space typo. * convert minicpm model via convert-hf-gguf.py * try to make tokenizer work * fix bug for quantize minicpm * fix for flake8 lint * remove convert-minicpm.py * fix for editorconfig * correct minicpm model type (size) * constants expanded for minicpm * Minor change of the constant names for minicpm