Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SLM] Add support for StableLM architecture #1701

Merged
merged 2 commits into from
Feb 3, 2024
Merged

Conversation

rickzx
Copy link
Contributor

@rickzx rickzx commented Feb 2, 2024

This PR adds support to StableLM architecture in the SLM pipeline.

The code is compatible with both the old StableLM-3B model as well as the newly released StableLM-2-1_6B model.

Example output:
For 3B model:

<|user|>: List 3 synonyms for the word "tiny"
<|assistant|>: 
1. Dwarf
2. Miniature
3. Petite

For 1.6B new model:

<|user|>: List 3 synonyms for the word "tiny"
<|assistant|>: 
"Tiny" can be synonymous with several words depending on the context in which it is used. Here are three synonyms for "tiny": 

1. Small - This word is often used to describe something that is small in size or amount. "Small" can be used as a synonym for "tiny" in situations where the size is being compared to another object or thing.

2. Little - This word is often used to describe something that is less significant or important than something else. "Little" can be used as a synonym for "tiny" in situations where the significance is being compared.

3. Minuscule - This word is often used to describe something that is very small or insignificant. "Minuscule" can be used as a synonym for "tiny" in situations where the importance or size of the object is being emphasized.

Copy link

@DavidGOrtega DavidGOrtega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great effort @rickzx!!! 🙏

For some reason I can not make it work. Its your branch with updated mlc-llm.
I have to still specify gpt_neox and fails.

mlc_chat convert_weight $MODEL --quantization q0f32  -o $MODEL/MLC --model-type gpt_neox
[2024-02-03 00:35:39] INFO auto_config.py:115: Found model configuration: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/config.json
[2024-02-03 00:35:39] INFO auto_device.py:85: Not found device: cuda:0
[2024-02-03 00:35:40] INFO auto_device.py:85: Not found device: rocm:0
[2024-02-03 00:35:40] INFO auto_device.py:76: Found device: metal:0
[2024-02-03 00:35:40] INFO auto_device.py:85: Not found device: vulkan:0
[2024-02-03 00:35:41] INFO auto_device.py:85: Not found device: opencl:0
[2024-02-03 00:35:41] INFO auto_device.py:33: Using device: metal:0
[2024-02-03 00:35:41] INFO auto_weight.py:70: Finding weights in: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b
[2024-02-03 00:35:41] INFO auto_weight.py:136: Not found Huggingface PyTorch
[2024-02-03 00:35:41] INFO auto_weight.py:143: Found source weight format: huggingface-safetensor. Source configuration: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/model.safetensors.index.json
[2024-02-03 00:35:41] INFO auto_weight.py:106: Using source weight configuration: /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/model.safetensors.index.json. Use `--source` to override.
[2024-02-03 00:35:41] INFO auto_weight.py:110: Using source weight format: huggingface-safetensor. Use `--source-format` to override.
[2024-02-03 00:35:41] INFO auto_config.py:153: Found model type: gpt_neox. Use `--model-type` to override.
Weight conversion with arguments:
  --config          /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/config.json
  --quantization    NoQuantize(name='q0f32', kind='no-quant', model_dtype='float32')
  --model-type      gpt_neox
  --device          metal:0
  --source          /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/model.safetensors.index.json
  --source-format   huggingface-safetensor
  --output          /Users/davidgortega/Documents/projects/kunzite/models/stablelm-2-zephyr-1_6b/MLC
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/bin/mlc_chat", line 33, in <module>
    sys.exit(load_entry_point('mlc-chat', 'console_scripts', 'mlc_chat')())
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/__main__.py", line 28, in main
    cli.main(sys.argv[2:])
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/cli/convert_weight.py", line 87, in main
    convert_weight(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/interface/convert_weight.py", line 156, in convert_weight
    _convert_args(args)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/interface/convert_weight.py", line 66, in _convert_args
    model_config = args.model.config.from_file(args.config)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/support/config.py", line 69, in from_file
    return cls.from_dict(json.load(in_file))
  File "/opt/homebrew/Caskroom/miniforge/base/envs/hf/lib/python3.9/site-packages/mlc_chat/support/config.py", line 49, in from_dict
    return cls(**fields, kwargs=kwargs)  # type: ignore[call-arg]
TypeError: __init__() missing 3 required positional arguments: 'use_parallel_residual', 'layer_norm_eps', and 'rotary_pct'

@@ -680,6 +681,25 @@ Conversation Phi2() {
return conv;
}

Conversation StableLM2() {
Conversation conv;
conv.name = "stablelm-2";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not this zephyr format?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharlieFRuan any thought here?

@CharlieFRuan
Copy link
Contributor

Hi @DavidGOrtega! I was able to run convert_weight with:

python -m mlc_chat convert_weight dist/stablelm-2-zephyr-1_6b/ --quantization q0f16 --output dist/stablelm-2-zephyr-1_6b-q0f16-MLC

Besides, we uploaded all the prebuilts to huggingface; try it out with JIT compilation (you do not need to run mlc_chat compile):

python -m mlc_chat chat HF://mlc-ai/stablelm-2-zephyr-1_6b-q0f16-MLC

Copy link
Contributor

@CharlieFRuan CharlieFRuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great work! Everything looks good to me; uploaded the prebuilt weights to HF and verified with JIT on CUDA.

After adding "stablelm-2", "stablelm-3b", to https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_chat/interface/gen_config.py#L189 (else gen_config would fail), we can merge it in!

@CharlieFRuan
Copy link
Contributor

@DavidGOrtega If running my command above still fails, one thing to try is use python=3.11; not exactly sure if that is the cause here

@DavidGOrtega

This comment was marked as off-topic.

@CharlieFRuan
Copy link
Contributor

Ahh looking at your log, line 49 of support/config.py doesn't match @rickzx's branch: https://github.com/rickzx/mlc-llm/blob/stable_lm/python/mlc_chat/support/config.py#L49, I'm guessing you're not using his branch, hence running into issue when trying convert stablelm

@DavidGOrtega

This comment was marked as off-topic.

Copy link

@DavidGOrtega DavidGOrtega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried in a fresh machine and I was able to compile it! 🥳 (my mac still references the old code?!)

However is missing to add the new config stablelm-2 in /python/mlc_chat/interface/gen_config.py

CONV_TEMPLATES = {
    "chatml",
    "open_hermes_mistral",
    "neural_hermes_mistral",
    "llama_default",
    "llama-2",
    "mistral_default",
    "gpt2",
    "codellama_completion",
    "codellama_instruct",
    "vicuna_v1.1",
    "conv_one_shot",
    "redpajama_chat",
    "rwkv_world",
    "rwkv",
    "gorilla",
    "guanaco",
    "dolly",
    "oasst",
    "stablelm",
    "stablecode_completion",
    "stablecode_instruct",
    "minigpt",
    "moss",
    "LM",
    "stablelm-3b",
    "gpt_bigcode",
    "wizardlm_7b",
    "wizard_coder_or_math",
    "glm",
    "custom",  # for web-llm only
    "phi-2",
    "stablelm-2"
}

if not the config generation for that chat template is failing

mlc_chat gen_config $MODEL --quantization $QUANTIZATION  --conv-template stablelm-2 -o $MODEL/MLC

Which config are you using to test it? @rickzx @CharlieFRuan

@DavidGOrtega
Copy link

Testing it with web-llm

Uncaught (in promise) Error: Cannot handle tokenizer files tokenizer_config.json
    at a.<anonymous> (008Q.js:1:5446904)
    at Generator.next (<anonymous>)
    at 008Q.js:1:931764
    at new Promise (<anonymous>)
    at I (008Q.js:1:931509)
    at a.asyncLoadTokenizer (008Q.js:1:5446047)
    at a.<anonymous> (008Q.js:1:5443895)
    at Generator.next (<anonymous>)
    at E (008Q.js:1:931566)

this model does not have tokenizer.json nor tokenizer.model.
Which chat template have you used to test it @rickzx ?

@CharlieFRuan
Copy link
Contributor

Thanks for testing all of these @DavidGOrtega!

if not the config generation for that chat template is failing

Yep you are right, I mentioned it in my review; after adding it I think we can merge it.

this model does not have tokenizer.json nor tokenizer.model.

This was fixed in #1705; so I needed to cherry pick that to test this PR, so that gen_config can convert the tokenizer.

@rickzx
Copy link
Contributor Author

rickzx commented Feb 3, 2024

Thank you for the great work! Everything looks good to me; uploaded the prebuilt weights to HF and verified with JIT on CUDA.

After adding "stablelm-2", "stablelm-3b", to https://github.com/mlc-ai/mlc-llm/blob/main/python/mlc_chat/interface/gen_config.py#L189 (else gen_config would fail), we can merge it in!

Sounds good!

@rickzx rickzx merged commit e0f2221 into mlc-ai:main Feb 3, 2024
1 check was pending
@DavidGOrtega
Copy link

DavidGOrtega commented Feb 3, 2024

@rickzx @CharlieFRuan

tiktoken works! however in web does not work yet due to the need of creating the conversation config. I have created a PR

TODO:

@DavidGOrtega
Copy link

@CharlieFRuan I think we still missed some work here? see above

@CharlieFRuan
Copy link
Contributor

Just reviewed your web-llm PR. I'll take care of the other two; thanks!

@CharlieFRuan
Copy link
Contributor

Ahh utils.py is no longer used (it mainly belongs to the old flow)

@DavidGOrtega
Copy link

old flow, new flow, octoml/relax, mlc-ai/relax, tvm... 🥴

@CharlieFRuan
Copy link
Contributor

Yes I understand; we are closer and closer to a stabilized point where we will clear things up

@DavidGOrtega
Copy link

DavidGOrtega commented Feb 3, 2024

Yes I understand; we are closer and closer to a stabilized point where we will clear things up

I truly appreciate whats going on here! You rocks!
With the first version of web-llm I did an Ollama alike, the main difference is that I used electron capabilities of using node inside a webpage served by the Window to run the models in webgpu being served by a OpenAI api alike. Im thinking to revive it with a proper UI.

image

I still think is a good idea because I can get the most of any GPU without having NVIDIA. Hopefully @tqchen reviews wgpu in TVM and we do noe even need the browser. The browser has the WASM memory limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants