-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLM] Add support for StableLM architecture #1701
Conversation
0bbff8f
to
b823952
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great effort @rickzx!!! 🙏
For some reason I can not make it work. Its your branch with updated mlc-llm.
I have to still specify gpt_neox and fails.
mlc_chat convert_weight $MODEL --quantization q0f32 -o $MODEL/MLC --model-type gpt_neox
[2024-02-03 00:35:39] INFO auto_config.py:115: Found model configuration: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/config.json
[2024-02-03 00:35:39] INFO auto_device.py:85: Not found device: cuda:0
[2024-02-03 00:35:40] INFO auto_device.py:85: Not found device: rocm:0
[2024-02-03 00:35:40] INFO auto_device.py:76: Found device: metal:0
[2024-02-03 00:35:40] INFO auto_device.py:85: Not found device: vulkan:0
[2024-02-03 00:35:41] INFO auto_device.py:85: Not found device: opencl:0
[2024-02-03 00:35:41] INFO auto_device.py:33: Using device: metal:0
[2024-02-03 00:35:41] INFO auto_weight.py:70: Finding weights in: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b
[2024-02-03 00:35:41] INFO auto_weight.py:136: Not found Huggingface PyTorch
[2024-02-03 00:35:41] INFO auto_weight.py:143: Found source weight format: huggingface-safetensor. Source configuration: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/model.safetensors.index.json
[2024-02-03 00:35:41] INFO auto_weight.py:106: Using source weight configuration: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/model.safetensors.index.json. Use `--source` to override.
[2024-02-03 00:35:41] INFO auto_weight.py:110: Using source weight format: huggingface-safetensor. Use `--source-format` to override.
[2024-02-03 00:35:41] INFO auto_config.py:153: Found model type: gpt_neox. Use `--model-type` to override.
Weight conversion with arguments:
--config /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/config.json
--quantization NoQuantize(name='q0f32', kind='no-quant', model_dtype='float32')
--model-type gpt_neox
--device metal:0
--source /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/model.safetensors.index.json
--source-format huggingface-safetensor
--output /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/MLC
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/bin/mlc_chat", line 33, in <module>
sys.exit(load_entry_point('mlc-chat', 'console_scripts', 'mlc_chat')())
File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/__main__.py", line 28, in main
cli.main(sys.argv[2:])
File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/cli/convert_weight.py", line 87, in main
convert_weight(
File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/interface/convert_weight.py", line 156, in convert_weight
_convert_args(args)
File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/interface/convert_weight.py", line 66, in _convert_args
model_config = args.model.config.from_file(args.config)
File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/support/config.py", line 69, in from_file
return cls.from_dict(json.load(in_file))
File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/support/config.py", line 49, in from_dict
return cls(**fields, kwargs=kwargs) # type: ignore[call-arg]
TypeError: __init__() missing 3 required positional arguments: 'use_parallel_residual', 'layer_norm_eps', and 'rotary_pct'
@@ -680,6 +681,25 @@ Conversation Phi2() { | |||
return conv; | |||
} | |||
|
|||
Conversation StableLM2() { | |||
Conversation conv; | |||
conv.name = "stablelm-2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not this zephyr format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CharlieFRuan any thought here?
Hi @DavidGOrtega! I was able to run
Besides, we uploaded all the prebuilts to huggingface; try it out with JIT compilation (you do not need to run
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the great work! Everything looks good to me; uploaded the prebuilt weights to HF and verified with JIT on CUDA.
After adding "stablelm-2",
"stablelm-3b",
to https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_chat/interface/gen_config.py#L189 (else gen_config
would fail), we can merge it in!
@DavidGOrtega If running my command above still fails, one thing to try is use |
This comment was marked as off-topic.
This comment was marked as off-topic.
Ahh looking at your log, line 49 of |
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried in a fresh machine and I was able to compile it! 🥳 (my mac still references the old code?!)
However is missing to add the new config stablelm-2
in /python/mlc_chat/interface/gen_config.py
CONV_TEMPLATES = {
"chatml",
"open_hermes_mistral",
"neural_hermes_mistral",
"llama_default",
"llama-2",
"mistral_default",
"gpt2",
"codellama_completion",
"codellama_instruct",
"vicuna_v1.1",
"conv_one_shot",
"redpajama_chat",
"rwkv_world",
"rwkv",
"gorilla",
"guanaco",
"dolly",
"oasst",
"stablelm",
"stablecode_completion",
"stablecode_instruct",
"minigpt",
"moss",
"LM",
"stablelm-3b",
"gpt_bigcode",
"wizardlm_7b",
"wizard_coder_or_math",
"glm",
"custom", # for web-llm only
"phi-2",
"stablelm-2"
}
if not the config generation for that chat template is failing
mlc_chat gen_config $MODEL --quantization $QUANTIZATION --conv-template stablelm-2 -o $MODEL/MLC
Which config are you using to test it? @rickzx @CharlieFRuan
Testing it with web-llm
this model does not have |
Thanks for testing all of these @DavidGOrtega!
Yep you are right, I mentioned it in my review; after adding it I think we can merge it.
This was fixed in #1705; so I needed to cherry pick that to test this PR, so that |
Sounds good! |
tiktoken works! however in web does not work yet due to the need of creating the conversation config. I have created a PR TODO:
|
@CharlieFRuan I think we still missed some work here? see above |
Just reviewed your web-llm PR. I'll take care of the other two; thanks! |
Ahh |
old flow, new flow, octoml/relax, mlc-ai/relax, tvm... 🥴 |
Yes I understand; we are closer and closer to a stabilized point where we will clear things up |
I truly appreciate whats going on here! You rocks! I still think is a good idea because I can get the most of any GPU without having NVIDIA. Hopefully @tqchen reviews wgpu in TVM and we do noe even need the browser. The browser has the WASM memory limit. |
This PR adds support to StableLM architecture in the SLM pipeline.
The code is compatible with both the old StableLM-3B model as well as the newly released StableLM-2-1_6B model.
Example output:
For 3B model:
For 1.6B new model: