[SLM] Add support for StableLM architecture #1701

rickzx · 2024-02-02T05:26:54Z

This PR adds support to StableLM architecture in the SLM pipeline.

The code is compatible with both the old StableLM-3B model as well as the newly released StableLM-2-1_6B model.

Example output:
For 3B model:

<|user|>: List 3 synonyms for the word "tiny"
<|assistant|>: 
1. Dwarf
2. Miniature
3. Petite

For 1.6B new model:

<|user|>: List 3 synonyms for the word "tiny"
<|assistant|>: 
"Tiny" can be synonymous with several words depending on the context in which it is used. Here are three synonyms for "tiny": 

1. Small - This word is often used to describe something that is small in size or amount. "Small" can be used as a synonym for "tiny" in situations where the size is being compared to another object or thing.

2. Little - This word is often used to describe something that is less significant or important than something else. "Little" can be used as a synonym for "tiny" in situations where the significance is being compared.

3. Minuscule - This word is often used to describe something that is very small or insignificant. "Minuscule" can be used as a synonym for "tiny" in situations where the importance or size of the object is being emphasized.

DavidGOrtega

Great effort @rickzx!!! 🙏

For some reason I can not make it work. Its your branch with updated mlc-llm.
I have to still specify gpt_neox and fails.

mlc_chat convert_weight $MODEL --quantization q0f32  -o $MODEL/MLC --model-type gpt_neox
[2024-02-03 00:35:39] INFO auto_config.py:115: Found model configuration: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/config.json
[2024-02-03 00:35:39] INFO auto_device.py:85: Not found device: cuda:0
[2024-02-03 00:35:40] INFO auto_device.py:85: Not found device: rocm:0
[2024-02-03 00:35:40] INFO auto_device.py:76: Found device: metal:0
[2024-02-03 00:35:40] INFO auto_device.py:85: Not found device: vulkan:0
[2024-02-03 00:35:41] INFO auto_device.py:85: Not found device: opencl:0
[2024-02-03 00:35:41] INFO auto_device.py:33: Using device: metal:0
[2024-02-03 00:35:41] INFO auto_weight.py:70: Finding weights in: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b
[2024-02-03 00:35:41] INFO auto_weight.py:136: Not found Huggingface PyTorch
[2024-02-03 00:35:41] INFO auto_weight.py:143: Found source weight format: huggingface-safetensor. Source configuration: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/model.safetensors.index.json
[2024-02-03 00:35:41] INFO auto_weight.py:106: Using source weight configuration: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/model.safetensors.index.json. Use `--source` to override.
[2024-02-03 00:35:41] INFO auto_weight.py:110: Using source weight format: huggingface-safetensor. Use `--source-format` to override.
[2024-02-03 00:35:41] INFO auto_config.py:153: Found model type: gpt_neox. Use `--model-type` to override.
Weight conversion with arguments:
  --config          /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/config.json
  --quantization    NoQuantize(name='q0f32', kind='no-quant', model_dtype='float32')
  --model-type      gpt_neox
  --device          metal:0
  --source          /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/model.safetensors.index.json
  --source-format   huggingface-safetensor
  --output          /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/MLC
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/bin/mlc_chat", line 33, in <module>
    sys.exit(load_entry_point('mlc-chat', 'console_scripts', 'mlc_chat')())
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/__main__.py", line 28, in main
    cli.main(sys.argv[2:])
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/cli/convert_weight.py", line 87, in main
    convert_weight(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/interface/convert_weight.py", line 156, in convert_weight
    _convert_args(args)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/interface/convert_weight.py", line 66, in _convert_args
    model_config = args.model.config.from_file(args.config)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/support/config.py", line 69, in from_file
    return cls.from_dict(json.load(in_file))
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/support/config.py", line 49, in from_dict
    return cls(**fields, kwargs=kwargs)  # type: ignore[call-arg]
TypeError: __init__() missing 3 required positional arguments: 'use_parallel_residual', 'layer_norm_eps', and 'rotary_pct'

DavidGOrtega · 2024-02-02T18:37:25Z

cpp/conv_templates.cc

@@ -680,6 +681,25 @@ Conversation Phi2() {
  return conv;
 }

+Conversation StableLM2() {
+  Conversation conv;
+  conv.name = "stablelm-2";


is not this zephyr format?

@CharlieFRuan any thought here?

CharlieFRuan · 2024-02-03T01:30:11Z

Hi @DavidGOrtega! I was able to run convert_weight with:

python -m mlc_chat convert_weight dist/stablelm-2-zephyr-1_6b/ --quantization q0f16 --output dist/stablelm-2-zephyr-1_6b-q0f16-MLC

Besides, we uploaded all the prebuilts to huggingface; try it out with JIT compilation (you do not need to run mlc_chat compile):

python -m mlc_chat chat HF://mlc-ai/stablelm-2-zephyr-1_6b-q0f16-MLC

CharlieFRuan

Thank you for the great work! Everything looks good to me; uploaded the prebuilt weights to HF and verified with JIT on CUDA.

After adding "stablelm-2", "stablelm-3b", to https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_chat/interface/gen_config.py#L189 (else gen_config would fail), we can merge it in!

CharlieFRuan · 2024-02-03T01:36:42Z

@DavidGOrtega If running my command above still fails, one thing to try is use python=3.11; not exactly sure if that is the cause here

CharlieFRuan · 2024-02-03T02:36:56Z

Ahh looking at your log, line 49 of support/config.py doesn't match @rickzx's branch: https://github.com/rickzx/mlc-llm/blob/stable_lm/python/mlc_chat/support/config.py#L49, I'm guessing you're not using his branch, hence running into issue when trying convert stablelm

DavidGOrtega

I have tried in a fresh machine and I was able to compile it! 🥳 (my mac still references the old code?!)

However is missing to add the new config stablelm-2 in /python/mlc_chat/interface/gen_config.py

CONV_TEMPLATES = {
    "chatml",
    "open_hermes_mistral",
    "neural_hermes_mistral",
    "llama_default",
    "llama-2",
    "mistral_default",
    "gpt2",
    "codellama_completion",
    "codellama_instruct",
    "vicuna_v1.1",
    "conv_one_shot",
    "redpajama_chat",
    "rwkv_world",
    "rwkv",
    "gorilla",
    "guanaco",
    "dolly",
    "oasst",
    "stablelm",
    "stablecode_completion",
    "stablecode_instruct",
    "minigpt",
    "moss",
    "LM",
    "stablelm-3b",
    "gpt_bigcode",
    "wizardlm_7b",
    "wizard_coder_or_math",
    "glm",
    "custom",  # for web-llm only
    "phi-2",
    "stablelm-2"
}

if not the config generation for that chat template is failing

mlc_chat gen_config $MODEL --quantization $QUANTIZATION  --conv-template stablelm-2 -o $MODEL/MLC

Which config are you using to test it? @rickzx @CharlieFRuan

DavidGOrtega · 2024-02-03T13:54:18Z

Testing it with web-llm

Uncaught (in promise) Error: Cannot handle tokenizer files tokenizer_config.json
    at a.<anonymous> (008Q.js:1:5446904)
    at Generator.next (<anonymous>)
    at 008Q.js:1:931764
    at new Promise (<anonymous>)
    at I (008Q.js:1:931509)
    at a.asyncLoadTokenizer (008Q.js:1:5446047)
    at a.<anonymous> (008Q.js:1:5443895)
    at Generator.next (<anonymous>)
    at E (008Q.js:1:931566)

this model does not have tokenizer.json nor tokenizer.model.
Which chat template have you used to test it @rickzx ?

CharlieFRuan · 2024-02-03T13:58:38Z

Thanks for testing all of these @DavidGOrtega!

if not the config generation for that chat template is failing

Yep you are right, I mentioned it in my review; after adding it I think we can merge it.

this model does not have tokenizer.json nor tokenizer.model.

This was fixed in #1705; so I needed to cherry pick that to test this PR, so that gen_config can convert the tokenizer.

rickzx · 2024-02-03T18:47:32Z

Thank you for the great work! Everything looks good to me; uploaded the prebuilt weights to HF and verified with JIT on CUDA.

After adding "stablelm-2", "stablelm-3b", to https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_chat/interface/gen_config.py#L189 (else gen_config would fail), we can merge it in!

Sounds good!

DavidGOrtega · 2024-02-03T19:41:21Z

@rickzx @CharlieFRuan

tiktoken works! however in web does not work yet due to the need of creating the conversation config. I have created a PR

TODO:

modify utils.py to add the templates ?
modify gen_config to add the templates
add tiktoken as dependency? as [Bug] mlc_chat several dependancies are not installed #1626

DavidGOrtega · 2024-02-03T19:50:06Z

@CharlieFRuan I think we still missed some work here? see above

CharlieFRuan · 2024-02-03T19:56:23Z

Just reviewed your web-llm PR. I'll take care of the other two; thanks!

CharlieFRuan · 2024-02-03T20:02:21Z

Ahh utils.py is no longer used (it mainly belongs to the old flow)

DavidGOrtega · 2024-02-03T20:38:30Z

old flow, new flow, octoml/relax, mlc-ai/relax, tvm... 🥴

CharlieFRuan · 2024-02-03T20:41:52Z

Yes I understand; we are closer and closer to a stabilized point where we will clear things up

DavidGOrtega · 2024-02-03T20:59:22Z

Yes I understand; we are closer and closer to a stabilized point where we will clear things up

I truly appreciate whats going on here! You rocks!
With the first version of web-llm I did an Ollama alike, the main difference is that I used electron capabilities of using node inside a webpage served by the Window to run the models in webgpu being served by a OpenAI api alike. Im thinking to revive it with a proper UI.

I still think is a good idea because I can get the most of any GPU without having NVIDIA. Hopefully @tqchen reviews wgpu in TVM and we do noe even need the browser. The browser has the WASM memory limit.

rickzx requested review from CharlieFRuan, junrushao and tqchen February 2, 2024 05:27

rickzx force-pushed the stable_lm branch 3 times, most recently from 0bbff8f to b823952 Compare February 2, 2024 12:56

[SLM] Add support for StableLM architecture

4c8e32e

rickzx force-pushed the stable_lm branch from b823952 to 4c8e32e Compare February 2, 2024 12:58

jeethu mentioned this pull request Feb 2, 2024

[Model Request] Stable-LM 1.6b #1642

Closed

rickzx requested a review from MasterJH5574 February 2, 2024 15:04

tqchen assigned CharlieFRuan Feb 2, 2024

DavidGOrtega reviewed Feb 2, 2024

View reviewed changes

CharlieFRuan approved these changes Feb 3, 2024

View reviewed changes

This comment was marked as off-topic.

Sign in to view

DavidGOrtega suggested changes Feb 3, 2024

View reviewed changes

Add stable LM template to gen_config.py

d0c4f4d

rickzx force-pushed the stable_lm branch from 3c1f4d5 to d0c4f4d Compare February 3, 2024 18:48

rickzx merged commit e0f2221 into mlc-ai:main Feb 3, 2024
1 check was pending

CharlieFRuan mentioned this pull request Feb 9, 2024

[Bug] StableLM model conversion not working #1616

Closed

tlopex mentioned this pull request Feb 20, 2024

[Question] TVMError: Unknown conversation template #1796

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SLM] Add support for StableLM architecture #1701

[SLM] Add support for StableLM architecture #1701

rickzx commented Feb 2, 2024

DavidGOrtega left a comment •

edited

Loading

DavidGOrtega Feb 2, 2024

DavidGOrtega Feb 3, 2024

CharlieFRuan commented Feb 3, 2024

CharlieFRuan left a comment

CharlieFRuan commented Feb 3, 2024

This comment was marked as off-topic.

CharlieFRuan commented Feb 3, 2024

This comment was marked as off-topic.

DavidGOrtega left a comment •

edited

Loading

DavidGOrtega commented Feb 3, 2024

CharlieFRuan commented Feb 3, 2024

rickzx commented Feb 3, 2024

DavidGOrtega commented Feb 3, 2024 •

edited by CharlieFRuan

Loading

DavidGOrtega commented Feb 3, 2024

CharlieFRuan commented Feb 3, 2024

CharlieFRuan commented Feb 3, 2024

DavidGOrtega commented Feb 3, 2024

CharlieFRuan commented Feb 3, 2024

DavidGOrtega commented Feb 3, 2024 •

edited

Loading

[SLM] Add support for StableLM architecture #1701

[SLM] Add support for StableLM architecture #1701

Conversation

rickzx commented Feb 2, 2024

DavidGOrtega left a comment • edited Loading

Choose a reason for hiding this comment

DavidGOrtega Feb 2, 2024

Choose a reason for hiding this comment

DavidGOrtega Feb 3, 2024

Choose a reason for hiding this comment

CharlieFRuan commented Feb 3, 2024

CharlieFRuan left a comment

Choose a reason for hiding this comment

CharlieFRuan commented Feb 3, 2024

This comment was marked as off-topic.

CharlieFRuan commented Feb 3, 2024

This comment was marked as off-topic.

DavidGOrtega left a comment • edited Loading

Choose a reason for hiding this comment

DavidGOrtega commented Feb 3, 2024

CharlieFRuan commented Feb 3, 2024

rickzx commented Feb 3, 2024

DavidGOrtega commented Feb 3, 2024 • edited by CharlieFRuan Loading

DavidGOrtega commented Feb 3, 2024

CharlieFRuan commented Feb 3, 2024

CharlieFRuan commented Feb 3, 2024

DavidGOrtega commented Feb 3, 2024

CharlieFRuan commented Feb 3, 2024

DavidGOrtega commented Feb 3, 2024 • edited Loading

DavidGOrtega left a comment •

edited

Loading

DavidGOrtega left a comment •

edited

Loading

DavidGOrtega commented Feb 3, 2024 •

edited by CharlieFRuan

Loading

DavidGOrtega commented Feb 3, 2024 •

edited

Loading