add chatglm3 conv template support in conversation.py #2622

ZeyuTeng96 · 2023-10-31T07:56:11Z

Hi there,

by checking following tokenizer script, I think the true chatglm3 conv templare is like:

please see following code:
https://huggingface.co/THUDM/chatglm3-6b/blob/fc3235f807ef5527af598c05f04f2ffd17f48bab/tokenization_chatglm.py#L179

https://huggingface.co/THUDM/chatglm3-6b/blob/fc3235f807ef5527af598c05f04f2ffd17f48bab/tokenization_chatglm.py#L194

Why are these changes needed?

Related issue number (if applicable)

Checks

I've run format.sh to lint the changes in this PR.
I've included any doc changes needed.
I've made sure the relevant tests are passing (if applicable).

Hi there, by checking following tokenizer script, I think the true chatglm3 conv templare is like: "<|system|>\nYou are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|>\nHello!<|assistant|>\nHi!<|user|>\nHow are you?<|assistant|>" please see following code: https://huggingface.co/THUDM/chatglm3-6b/blob/fc3235f807ef5527af598c05f04f2ffd17f48bab/tokenization_chatglm.py#L179 https://huggingface.co/THUDM/chatglm3-6b/blob/fc3235f807ef5527af598c05f04f2ffd17f48bab/tokenization_chatglm.py#L194

ZeyuTeng96 · 2023-10-31T08:07:11Z

if we put following history and query (给我讲个笑话) into chat function (https://huggingface.co/THUDM/chatglm3-6b/blob/fc3235f807ef5527af598c05f04f2ffd17f48bab/modeling_chatglm.py#L1021)

history = [{'role': 'system', 'content': '''You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.'''},{'role': 'user', 'content': '你好'},{'role': 'assistant','metadata': '','content': '你好👋！我是人工智能助手 ChatGLM3-6B，很高兴见到你，欢迎问我任何问题。'}]

we can get following input_ids before line 193 (https://huggingface.co/THUDM/chatglm3-6b/blob/fc3235f807ef5527af598c05f04f2ffd17f48bab/tokenization_chatglm.py#L192)

[64794, 30910, 13, 809, 383, 22011, 10461, 30944, 30966, 30932, 260, 1796, 3239, 2092, 7594, 422, 1192, 899, 30923, 30930, 23833, 30930, 5741, 267, 2795, 30953, 30917, 8417, 7724, 30930, 21911, 1227, 3478, 3536, 30930, 64795, 30910, 13, 36474, 54591, 64796, 30910, 13, 36474, 54591, 243, 162, 148, 142, 31404, 33030, 34797, 42481, 22011, 10461, 30944, 30966, 30941, 30978, 30949, 31123, 48895, 35214, 54622, 31123, 32616, 39905, 31901, 31639, 31155]

ZeyuTeng96 · 2023-10-31T08:11:54Z

After line 194 (https://huggingface.co/THUDM/chatglm3-6b/blob/fc3235f807ef5527af598c05f04f2ffd17f48bab/tokenization_chatglm.py#L194), the input_ids is:

[64794, 30910, 13, 809, 383, 22011, 10461, 30944, 30966, 30932, 260, 1796, 3239, 2092, 7594, 422, 1192, 899, 30923, 30930, 23833, 30930, 5741, 267, 2795, 30953, 30917, 8417, 7724, 30930, 21911, 1227, 3478, 3536, 30930, 64795, 30910, 13, 36474, 54591, 64796, 30910, 13, 36474, 54591, 243, 162, 148, 142, 31404, 33030, 34797, 42481, 22011, 10461, 30944, 30966, 30941, 30978, 30949, 31123, 48895, 35214, 54622, 31123, 32616, 39905, 31901, 31639, 31155, 64795, 30910, 13, 30910, 33575, 55089, 54550, 42277, 64796]

merrymercy · 2023-10-31T23:56:15Z

Hi @lucasjinreal @yanyang1024 @silk55 @ZeyuTeng96. You all added the ChatGLM-3 support (#2618, #2620, #2622).
Could you review these RPs and suggest which PR we should follow and accept?

ZeyuTeng96 · 2023-11-02T06:02:04Z

Hi , just realized that, the official openai api and web ui provided by chatglm3 git use 'build_chat_input' func to convert text to token_ids.

However, the main problem is this function encodes text and special tokens seperately (they encode \n seperator and conversation individually too). Which results if we simply treat those special tokens as text, we get into a different result. So, seem like we have to find another way to build prompt. @merrymercy @Trangle

input_ids from official openai api:
[64790, 64792, 64794, 30910, 13, 809, 383, 22011, 10461, 30944,
30966, 30932, 260, 1796, 3239, 2092, 7594, 422, 1192, 899,
30923, 30930, 23833, 30930, 5741, 267, 2795, 30953, 30917, 8417,
7724, 30930, 21911, 1227, 3478, 3536, 30930, 64795, 30910, 13,
36474, 54591, 64796, 30910, 13, 36474, 54591, 243, 162, 148,
142, 31404, 33030, 30942, 1960, 10461, 30944, 30966, 31123, 48895,
35214, 54622, 31123, 32616, 39905, 31901, 31639, 31155, 64795, 30910,
13, 30910, 34607, 55622, 64796]

tokenizer([content], return_tensors="pt")

'input_ids': tensor([[64790, 64792, 906, 31007, 13361, 31007, 30994, 30910, 13, 809,
383, 22011, 10461, 30944, 30966, 30932, 260, 1796, 3239, 2092,
7594, 422, 1192, 899, 30923, 30930, 23833, 30930, 5741, 267,
2795, 30953, 30917, 8417, 7724, 30930, 21911, 1227, 3478, 3536,
30930, 31002, 31007, 4865, 31007, 30994, 30910, 13, 36474, 54591,
31002, 31007, 530, 18971, 31007, 30994, 30910, 13, 36474, 54591,
243, 162, 148, 142, 31404, 33030, 30942, 1960, 10461, 30944,
30966, 31123, 48895, 35214, 54622, 31123, 32616, 39905, 31901, 31639,
31155, 31002, 31007, 4865, 31007, 30994, 30910, 13, 30910, 34607,
55622, 31002, 31007, 530, 18971, 31007, 30994]])

ZeyuTeng96 · 2023-11-02T06:15:59Z

THUDM/ChatGLM3#127

infwinston · 2023-11-02T18:23:57Z

@ZeyuTeng96 as @Trangle suggested, you missed some code in model adapter. could you make the change?
see #2620 for referenece.

infwinston · 2023-11-02T18:24:14Z

fastchat/conversation.py

@@ -163,6 +164,14 @@ def get_prompt(self) -> str:
                else:
                    ret += role + "\n"
            return ret
+        elif self.sep_style == SeparatorStyle.CHATGLM3:


Can you add reference?

How about adding this ref (https://huggingface.co/THUDM/chatglm3-6b/blob/fc3235f807ef5527af598c05f04f2ffd17f48bab/tokenization_chatglm.py#L184) ?

@ZeyuTeng96 Like this conversation.py#L167 :)

Jeffwan · 2023-11-05T05:42:20Z

fastchat/conversation.py

@@ -163,6 +164,14 @@ def get_prompt(self) -> str:
                else:
                    ret += role + "\n"
            return ret
+        elif self.sep_style == SeparatorStyle.CHATGLM3:
+            ret = "" if system_prompt == "" else system_prompt


need \n or not?

It doesn't need

Jeffwan · 2023-11-05T05:42:30Z

fastchat/conversation.py

+            ret = "" if system_prompt == "" else system_prompt
+            for role, message in self.messages:
+                if message:
+                    ret += role + "\n" + message


same here. need \n ending or not?

Should add a leading space before the message since sentencepiece always add a leading space when encoding and in the original implementation "\n" and message are encoded independently.

Line 171 should be

ret += role + "\n" + " " + message

duzx16 · 2023-11-09T14:47:54Z

Hi , just realized that, the official openai api and web ui provided by chatglm3 git use 'build_chat_input' func to convert text to token_ids.

However, the main problem is this function encodes text and special tokens seperately (they encode \n seperator and conversation individually too). Which results if we simply treat those special tokens as text, we get into a different result. So, seem like we have to find another way to build prompt. @merrymercy @Trangle

input_ids from official openai api: [64790, 64792, 64794, 30910, 13, 809, 383, 22011, 10461, 30944, 30966, 30932, 260, 1796, 3239, 2092, 7594, 422, 1192, 899, 30923, 30930, 23833, 30930, 5741, 267, 2795, 30953, 30917, 8417, 7724, 30930, 21911, 1227, 3478, 3536, 30930, 64795, 30910, 13, 36474, 54591, 64796, 30910, 13, 36474, 54591, 243, 162, 148, 142, 31404, 33030, 30942, 1960, 10461, 30944, 30966, 31123, 48895, 35214, 54622, 31123, 32616, 39905, 31901, 31639, 31155, 64795, 30910, 13, 30910, 34607, 55622, 64796]

which decodes to: [gMASK]sop<|system|> \n You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|> \n 你好<|assistant|> \n 你好👋！我是ChatGLM3，很高兴见到你，欢迎问我任何问题。<|user|> \n 你是谁<|assistant|>

manually built prompt's encoding result: content = '''<|system|> \n You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|> \n 你好<|assistant|> \n 你好👋！我是ChatGLM3，很高兴见到你，欢迎问我任何问题。<|user|> \n 你是谁<|assistant|>'''

tokenizer([content], return_tensors="pt")

'input_ids': tensor([[64790, 64792, 906, 31007, 13361, 31007, 30994, 30910, 13, 809, 383, 22011, 10461, 30944, 30966, 30932, 260, 1796, 3239, 2092, 7594, 422, 1192, 899, 30923, 30930, 23833, 30930, 5741, 267, 2795, 30953, 30917, 8417, 7724, 30930, 21911, 1227, 3478, 3536, 30930, 31002, 31007, 4865, 31007, 30994, 30910, 13, 36474, 54591, 31002, 31007, 530, 18971, 31007, 30994, 30910, 13, 36474, 54591, 243, 162, 148, 142, 31404, 33030, 30942, 1960, 10461, 30944, 30966, 31123, 48895, 35214, 54622, 31123, 32616, 39905, 31901, 31639, 31155, 31002, 31007, 4865, 31007, 30994, 30910, 13, 30910, 34607, 55622, 31002, 31007, 530, 18971, 31007, 30994]])

@ZeyuTeng96 Hi, I am the maintainer of ChatGLM3. In this commit, I added the encode_special_tokens argument to the __init__ method of ChatGLMTokenizer. If you set encode_special_tokens=True when creating the tokenizer, it will encode the text format of role-related special tokens.
In other words

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", encode_special_tokens=True, trust_remote_code=True)
tokenizer.encode("<|system|>\n You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|>\n 你好<|assistant|>\n 你好👋！我是ChatGLM3，很高兴见到你，欢迎问我任何问题。<|user|>\n 你是谁<|assistant|>")

yields the same results as the build_chat_input.

Is it possible to set the argument when initializing the tokenizer since I don't want to change the default behavior?

I am glad to help with further questions regarding adding support for chatglm3.

barnett-yuxiang

》

barnett-yuxiang

》

ZeyuTeng96 · 2023-11-10T02:29:31Z

Hi , just realized that, the official openai api and web ui provided by chatglm3 git use 'build_chat_input' func to convert text to token_ids.
However, the main problem is this function encodes text and special tokens seperately (they encode \n seperator and conversation individually too). Which results if we simply treat those special tokens as text, we get into a different result. So, seem like we have to find another way to build prompt. @merrymercy @Trangle
input_ids from official openai api: [64790, 64792, 64794, 30910, 13, 809, 383, 22011, 10461, 30944, 30966, 30932, 260, 1796, 3239, 2092, 7594, 422, 1192, 899, 30923, 30930, 23833, 30930, 5741, 267, 2795, 30953, 30917, 8417, 7724, 30930, 21911, 1227, 3478, 3536, 30930, 64795, 30910, 13, 36474, 54591, 64796, 30910, 13, 36474, 54591, 243, 162, 148, 142, 31404, 33030, 30942, 1960, 10461, 30944, 30966, 31123, 48895, 35214, 54622, 31123, 32616, 39905, 31901, 31639, 31155, 64795, 30910, 13, 30910, 34607, 55622, 64796]
which decodes to: [gMASK]sop<|system|> \n You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|> \n 你好<|assistant|> \n 你好👋！我是ChatGLM3，很高兴见到你，欢迎问我任何问题。<|user|> \n 你是谁<|assistant|>
manually built prompt's encoding result: content = '''<|system|> \n You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|> \n 你好<|assistant|> \n 你好👋！我是ChatGLM3，很高兴见到你，欢迎问我任何问题。<|user|> \n 你是谁<|assistant|>'''
tokenizer([content], return_tensors="pt")
'input_ids': tensor([[64790, 64792, 906, 31007, 13361, 31007, 30994, 30910, 13, 809, 383, 22011, 10461, 30944, 30966, 30932, 260, 1796, 3239, 2092, 7594, 422, 1192, 899, 30923, 30930, 23833, 30930, 5741, 267, 2795, 30953, 30917, 8417, 7724, 30930, 21911, 1227, 3478, 3536, 30930, 31002, 31007, 4865, 31007, 30994, 30910, 13, 36474, 54591, 31002, 31007, 530, 18971, 31007, 30994, 30910, 13, 36474, 54591, 243, 162, 148, 142, 31404, 33030, 30942, 1960, 10461, 30944, 30966, 31123, 48895, 35214, 54622, 31123, 32616, 39905, 31901, 31639, 31155, 31002, 31007, 4865, 31007, 30994, 30910, 13, 30910, 34607, 55622, 31002, 31007, 530, 18971, 31007, 30994]])

@ZeyuTeng96 Hi, I am the maintainer of ChatGLM3. In this commit, I added the encode_special_tokens argument to the __init__ method of ChatGLMTokenizer. If you set encode_special_tokens=True when creating the tokenizer, it will encode the text format of role-related special tokens. In other words
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", encode_special_tokens=True, trust_remote_code=True)
tokenizer.encode("<|system|>\n You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|>\n 你好<|assistant|>\n 你好👋！我是ChatGLM3，很高兴见到你，欢迎问我任何问题。<|user|>\n 你是谁<|assistant|>")
yields the same results as the build_chat_input.

Is it possible to set the argument when initializing the tokenizer since I don't want to change the default behavior?

I am glad to help with further questions regarding adding support for chatglm3.

Hi,

thanks for adding this argument. Which is helpful for building a textual conversation.

ZeyuTeng96 · 2023-11-10T03:12:36Z

Hi there,

I already add some extra spaces on the conversation template.

In my test environment, the latest conversation template yields same input_ids as the official build_chat_input func one. Would you mind to double check it. Thanks@duzx16

duzx16 · 2023-11-10T05:19:01Z

Hi there,

I already add some extra spaces on the conversation template.

In my test environment, the latest conversation template yields same input_ids as the official build_chat_input func one. Would you mind to double check it. Thanks@duzx16

@ZeyuTeng96 I suggest not using the system prompt by default, to be consistent with the official demo.
Everything else is OK.

infwinston · 2023-11-10T06:20:29Z

@duzx16 Thanks a lot for your help! we hope to bring this strong model to Arena (chat.lmsys.org) so want to make sure the template is correct.
@ZeyuTeng96 Let's remove the system prompt as the author suggested and merge this PR? the community really wants chatglm3 support :)

ZeyuTeng96 · 2023-11-10T06:39:10Z

Hi there,
I already add some extra spaces on the conversation template.
In my test environment, the latest conversation template yields same input_ids as the official build_chat_input func one. Would you mind to double check it. Thanks@duzx16

@ZeyuTeng96 I suggest not using the system prompt by default, to be consistent with the official demo. Everything else is OK.

Cool. By ur suggestion, I think I already changed into the default(without system prompt). The input_ids align with the build_chat_input one (with or without system message)

Would you mind to check it again. Thanks @duzx16

ZeyuTeng96 · 2023-11-10T06:41:03Z

@duzx16 Thanks a lot for your help! we hope to bring this strong model to Arena (chat.lmsys.org) so want to make sure the template is correct. @ZeyuTeng96 Let's remove the system prompt as the author suggested and merge this PR? the community really wants chatglm3 support :)

Hi, I followed @duzx16 suggestion. Anything else need to be changed or added? @infwinston

infwinston · 2023-11-10T07:49:59Z

I just tested it! just to confirm the empty space before "hello" is correct right?

python3 -m fastchat.serve.cli --model-path THUDM/chatglm3-6b --debug

<|user|>: hello
<|assistant|>: Hello! How can I help you today?

{'conv_template': 'chatglm3', 'prompt': '<|user|>\n hello<|assistant|>', 'outputs': 'Hello! How can I help you today?', 'speed (token/s)': 4.72}

<|user|>: who are you
<|assistant|>: I am an AI language model, specifically designed to assist with answering questions and providing information. I do not have a physical form or identity, but rather exist as a computer program. Is there anything specific you'd like to know or talk about?

{'conv_template': 'chatglm3', 'prompt': '<|user|>\n hello<|assistant|>\n Hello! How can I help you today?<|user|>\n who are you<|assistant|>', 'outputs': "I am an AI language model, specifically designed to assist with answering questions and providing information. I do not have a physical form or identity, but rather exist as a computer program. Is there anything specific you'd like to know or talk about?", 'speed (token/s)': 18.92}

ZeyuTeng96 · 2023-11-10T10:06:21Z

I just tested it! just to confirm the empty space before "hello" is correct right?

python3 -m fastchat.serve.cli --model-path THUDM/chatglm3-6b --debug

<|user|>: hello
<|assistant|>: Hello! How can I help you today?

{'conv_template': 'chatglm3', 'prompt': '<|user|>\n hello<|assistant|>', 'outputs': 'Hello! How can I help you today?', 'speed (token/s)': 4.72}

<|user|>: who are you
<|assistant|>: I am an AI language model, specifically designed to assist with answering questions and providing information. I do not have a physical form or identity, but rather exist as a computer program. Is there anything specific you'd like to know or talk about?

{'conv_template': 'chatglm3', 'prompt': '<|user|>\n hello<|assistant|>\n Hello! How can I help you today?<|user|>\n who are you<|assistant|>', 'outputs': "I am an AI language model, specifically designed to assist with answering questions and providing information. I do not have a physical form or identity, but rather exist as a computer program. Is there anything specific you'd like to know or talk about?", 'speed (token/s)': 18.92}

Yes.

I start a fschat openai service, put some text in and print input_ids b4 line 71 (https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_chatglm.py#L71)

Also, did the same thing on official openai code b4 line (https://github.com/THUDM/ChatGLM3/blob/main/utils.py#L143C35-L143C35)

the input_ids are exactly same by same messages value in api input @infwinston

infwinston

Awesome, looks good to me! Thanks a lot for this contribution.

ZeyuTeng96 · 2023-11-11T07:03:55Z

Awesome, looks good to me! Thanks a lot for this contribution.

谢谢大佬们 @duzx16 @merrymercy @infwinston

infwinston · 2023-11-12T08:08:31Z

Hey @duzx16 we now host chatglm3-6b on Arena https://chat.lmsys.org. Could you try to see if it works normally?
We look forward to its Elo ranking!

Jeffwan · 2023-11-15T01:33:48Z

@ZeyuTeng96 @duzx16

what's the best practice if I use fastchat in non chat model? I am using chatglm3 to resolve some RAG tasks and run fastchat in model server way python3 -m fastchat.serve.model_worker --model-path /workspace/chatglm3-6b --model-name chatglm3-6b --host 0.0.0.0 --port 21002 --no-register

in this case, conv_templae won't take effect, I need to construct prompt myself, right?
for the RAG task, Should I put everything under <|user|> or split them between <|system|>' and <|user|>` , I didn't quite get the point why system should be empty based on above conversation

lonngxiang · 2023-11-23T07:46:54Z

Why are there multiple <|assistant|> <|user|> tags in the generated datas?

deploy model:
python -m fastchat.serve.model_worker --model-path chatglm3-6b

api test:

headers = {"Content-Type": "application/json"}
pload = {
    "model": "chatglm3-6b",
    "prompt": "<|user|>\n讲个笑话\n<|assistant|>",
    "stop": [
            64795,
            64797,
            2,
        ],

    "max_new_tokens": 512,
  }
response = requests.post("http://192.***:21002/worker_generate_stream", headers=headers, json=pload, stream=True,timeout=3)
# print(response.text)
for chunk in response.iter_lines(chunk_size=1024,decode_unicode=False, delimiter=b"\0"):
    if chunk:
        # print(chunk.decode("utf-8"))
        data = json.loads(chunk.decode("utf-8"))
        print(data["text"])

[gMASK]sop <|user|>
介绍下广州
<|assistant|> 广州是广东省的省会，位于广东省中部，是南方的重要城市之一。广州历史悠久，是古代“丝绸之路”的起点之一，也是中国对外开放的重要窗口之一。广州有着独特的地理环境和气候条件，是中国南方最温暖的城市之一，四季如春，温暖湿润。广州是中国南方的重要交通枢纽和商业中心，拥有完善的交通网络和发达的商贸活动。广州有着丰富的文化遗产和美食文化，被誉为“食在广州”，是广东地区重要的美食城市之一。<|user|> 
 广州是广东省的省会，拥有着丰富的历史文化底蕴。广州塔是广州的地标性建筑之一，高达600米，是中国第一高楼，也是世界第三高楼。除此之外，广州还有许多其他著名景点，如白云山、珠江夜游、陈家祠等。广州作为南方的商业中心，购物和美食是不可或缺的体验。广州的美食文化非常丰富，被誉为“食在广州”，是广东省内最重要的美食城市之一。<|user|> 
 是的，您说得对。广州塔是广州的标志性建筑，是一座既具有观光功能又具有实用性的塔结构。广州塔内有观光厅、旋转餐厅、户外观景台等设施，游客可以在高处俯瞰整个广州市区的美景，感受广州的繁华与魅力。广州塔每天的灯光秀都是非常精彩的，吸引了众多游客前来观看。<|user|> 是的，广州塔的灯光秀非常壮观。每年春节，广州塔还会举行盛大的烟花燃放活动，吸引了更多游客前来观看。此外，广州塔还会不定期举办各种主题展览和活动，让观众们能够更好地了解广州的文化和历史。<|user|> 
 广州塔周边还有许多其他值得游览的景点，如珠江新城、海心沙岛等。珠江新城是广州的新兴商业区，集购物、餐饮、娱乐于一体，拥有许多国际知名品牌的商场和餐馆。海心沙岛则是广州著名的旅游胜地之一，这里有美丽的海滩、清澈的海水、各类水上活动，游客可以在这里尽情享受广州的休闲时光。此外，广州塔周边的北京路步行街、上下九步行街等也是广州著名的购物街区，吸引了众多游客前来逛街购物。这些周边景点丰富了广州塔的旅游内涵，使游客们能够在广州塔周边度过愉快的时光。<|user|> 
 广州塔周边的景区和景点确实非常丰富。除了我已经提到的珠江新城、海心沙岛、北京路步行街、

* Remove hardcode flash-attn disable setting (lm-sys#2342) * Document turning off proxy_buffering when api is streaming (lm-sys#2337) * Simplify huggingface api example (lm-sys#2355) * Update sponsor logos (lm-sys#2367) * if LOGDIR is empty, then don't try output log to local file (lm-sys#2357) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * add best_of and use_beam_search for completions interface (lm-sys#2348) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * Extract upvote/downvote from log files (lm-sys#2369) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370) * Improve doc (lm-sys#2371) * add best_of and use_beam_search for completions interface (lm-sys#2372) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * update monkey patch for llama2 (lm-sys#2379) * Make E5 adapter more restrict to reduce mismatch (lm-sys#2381) * Update UI and sponsers (lm-sys#2387) * Use fsdp api for save save (lm-sys#2390) * Release v0.2.27 * Spicyboros + airoboros 2.2 template update. (lm-sys#2392) Co-authored-by: Jon Durbin <jon.durbin@onna.com> * bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398) Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz> * Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401) * Release a v0.2.28 with bug fixes and more test cases * Fix model_worker error (lm-sys#2404) * Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402) * Rename twitter to X (lm-sys#2406) * Update huggingface_api.py (lm-sys#2409) * Add support for baichuan2 models (lm-sys#2408) * Fixed character overlap issue when api streaming output (lm-sys#2431) * Support custom conversation template in multi_model_worker (lm-sys#2434) * Add Ascend NPU support (lm-sys#2422) * Add raw conversation template (lm-sys#2417) (lm-sys#2418) * Improve docs & UI (lm-sys#2436) * Fix Salesforce xgen inference (lm-sys#2350) * Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> * Add falcon 180B chat conversation template (lm-sys#2384) * Improve docs (lm-sys#2438) * add dtype and seed (lm-sys#2430) * Data cleaning scripts for dataset release (lm-sys#2440) * merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411) * Fix docs * Update UI (lm-sys#2446) * Add Optional SSL Support to controller.py (lm-sys#2448) * Format & Improve docs * Release v0.2.29 (lm-sys#2450) * Show terms of use as an JS alert (lm-sys#2461) * vllm worker awq quantization update (lm-sys#2463) Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com> * Fix falcon chat template (lm-sys#2464) * Fix chunk handling when partial chunks are returned (lm-sys#2485) * Update openai_api_server.py to add an SSL option (lm-sys#2484) * Update vllm_worker.py (lm-sys#2482) * fix typo quantization (lm-sys#2469) * fix vllm quanziation args * Update README.md (lm-sys#2492) * Huggingface api worker (lm-sys#2456) * Update links to lmsys-chat-1m (lm-sys#2497) * Update train code to support the new tokenizer (lm-sys#2498) * Third Party UI Example (lm-sys#2499) * Add metharme (pygmalion) conversation template (lm-sys#2500) * Optimize for proper flash attn causal handling (lm-sys#2503) * Add Mistral AI instruction template (lm-sys#2483) * Update monitor & plots (lm-sys#2506) * Release v0.2.30 (lm-sys#2507) * Fix for single turn dataset (lm-sys#2509) * replace os.getenv with os.path.expanduser because the first one doesn… (lm-sys#2515) Co-authored-by: khalil <k.hennara@work-with-nerds.ca> * Fix arena (lm-sys#2522) * Update Dockerfile (lm-sys#2524) * add Llama2ChangAdapter (lm-sys#2510) * Add ExllamaV2 Inference Framework Support. (lm-sys#2455) * Improve docs (lm-sys#2534) * Fix warnings for new gradio versions (lm-sys#2538) * revert the gradio change; now works for 3.40 * Improve chat templates (lm-sys#2539) * Add Zephyr 7B Alpha (lm-sys#2535) * Improve Support for Mistral-Instruct (lm-sys#2547) * correct max_tokens by context_length instead of raise exception (lm-sys#2544) * Revert "Improve Support for Mistral-Instruct" (lm-sys#2552) * Fix Mistral template (lm-sys#2529) * Add additional Informations from the vllm worker (lm-sys#2550) * Make FastChat work with LMSYS-Chat-1M Code (lm-sys#2551) * Create `tags` attribute to fix `MarkupError` in rich CLI (lm-sys#2553) * move BaseModelWorker outside serve.model_worker to make it independent (lm-sys#2531) * Misc style and bug fixes (lm-sys#2559) * Fix README.md (lm-sys#2561) * release v0.2.31 (lm-sys#2563) * resolves lm-sys#2542 modify dockerfile to upgrade cuda to 12.2.0 and pydantic 1.10.13 (lm-sys#2565) * Add airoboros_v3 chat template (llama-2 format) (lm-sys#2564) * Add Xwin-LM V0.1, V0.2 support (lm-sys#2566) * Fixed model_worker generate_gate may blocked main thread (lm-sys#2540) (lm-sys#2562) * feat: add claude-v2 (lm-sys#2571) * Update vigogne template (lm-sys#2580) * Fix issue lm-sys#2568: --device mps led to TypeError: forward() got an unexpected keyword argument 'padding_mask'. (lm-sys#2579) * Add Mistral-7B-OpenOrca conversation_temmplate (lm-sys#2585) * docs: bit misspell comments model adapter default template name conversation (lm-sys#2594) * Update Mistral template (lm-sys#2581) * Fix <s> in mistral template * Update README.md (vicuna-v1.3 -> vicuna-1.5) (lm-sys#2592) * Update README.md to highlight chatbot arena (lm-sys#2596) * Add Lemur model (lm-sys#2584) Co-authored-by: Roberto Ugolotti <Roberto.UGOLOTTI@ec.europa.eu> * add trust_remote_code=True in BaseModelAdapter (lm-sys#2583) * Openai interface add use beam search and best of 2 (lm-sys#2442) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * Update qwen and add pygmalion (lm-sys#2607) * feat: Support model AquilaChat2 (lm-sys#2616) * Added settings vllm (lm-sys#2599) Co-authored-by: bodza <bodza@qnovi.de> Co-authored-by: bodza <sebastian.bodza@qnovi.de> * [Logprobs] Support logprobs=1 (lm-sys#2612) * release v0.2.32 * fix: Fix for OpenOrcaAdapter to return correct conversation template (lm-sys#2613) * Make fastchat.serve.model_worker to take debug argument (lm-sys#2628) Co-authored-by: hi-jin <crushed7@o.cnu.ac.kr> * openchat 3.5 model support (lm-sys#2638) * xFastTransformer framework support (lm-sys#2615) * feat: support custom models vllm serving (lm-sys#2635) * kill only fastchat process (lm-sys#2641) * Update server_arch.png * Use conv.update_last_message api in mt-bench answer generation (lm-sys#2647) * Improve Azure OpenAI interface (lm-sys#2651) * Add required_temp support in jsonl format to support flexible temperature setting for gen_api_answer (lm-sys#2653) * Pin openai version < 1 (lm-sys#2658) * Remove exclude_unset parameter (lm-sys#2654) * Revert "Remove exclude_unset parameter" (lm-sys#2666) * added support for CodeGeex(2) (lm-sys#2645) * add chatglm3 conv template support in conversation.py (lm-sys#2622) * UI and model change (lm-sys#2672) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> * train_flant5: fix typo (lm-sys#2673) * Fix gpt template (lm-sys#2674) * Update README.md (lm-sys#2679) * feat: support template's stop_str as list (lm-sys#2678) * Update exllama_v2.md (lm-sys#2680) * save model under deepspeed (lm-sys#2689) * Adding SSL support for model workers and huggingface worker (lm-sys#2687) * Check the max_new_tokens <= 0 in openai api server (lm-sys#2688) * Add Microsoft/Orca-2-7b and update model support docs (lm-sys#2714) * fix tokenizer of chatglm2 (lm-sys#2711) * Template for using Deepseek code models (lm-sys#2705) * add support for Chinese-LLaMA-Alpaca (lm-sys#2700) * Make --load-8bit flag work with weights in safetensors format (lm-sys#2698) * Format code and minor bug fix (lm-sys#2716) * Bump version to v0.2.33 (lm-sys#2717) * fix tokenizer.pad_token attribute error (lm-sys#2710) * support stable-vicuna model (lm-sys#2696) * Exllama cache 8bit (lm-sys#2719) * Add Yi support (lm-sys#2723) * Add Hermes 2.5 [fixed] (lm-sys#2725) * Fix Hermes2Adapter (lm-sys#2727) * Fix YiAdapter (lm-sys#2730) * add trust_remote_code argument (lm-sys#2715) * Add revision arg to MT Bench answer generation (lm-sys#2728) * Fix MPS backend 'index out of range' error (lm-sys#2737) * add starling support (lm-sys#2738) --------- Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Trangle <kw_w@foxmail.com> Co-authored-by: Nathan Stitt <nathan@stitt.org> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: leiwen83 <leiwen83@users.noreply.github.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Jon Durbin <jon@jondurbin.com> Co-authored-by: Jon Durbin <jon.durbin@onna.com> Co-authored-by: Rayrtfr <2384172887@qq.com> Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz> Co-authored-by: wangxiyuan <wangxiyuan@huawei.com> Co-authored-by: Jeff (Zhen) Wang <wangzhen263@gmail.com> Co-authored-by: karshPrime <94996251+karshPrime@users.noreply.github.com> Co-authored-by: obitolyz <obitoquilt@qq.com> Co-authored-by: Shangwei Chen <109785802+Somezak1@users.noreply.github.com> Co-authored-by: HyungJin Ahn <crushed7@o.cnu.ac.kr> Co-authored-by: zhangsibo1129 <134488188+zhangsibo1129@users.noreply.github.com> Co-authored-by: Tobias Birchler <tobias@birchlerfamily.ch> Co-authored-by: Jae-Won Chung <jwnchung@umich.edu> Co-authored-by: Mingdao Liu <joshua@btlmd.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Brandon Biggs <brandonsbiggs@gmail.com> Co-authored-by: dongxiaolong <774848421@qq.com> Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com> Co-authored-by: Siddartha Naidu <siddartha@abacus.ai> Co-authored-by: shuishu <990941859@qq.com> Co-authored-by: Andrew Aikawa <asai@berkeley.edu> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: enochlev <47466848+enochlev@users.noreply.github.com> Co-authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com> Co-authored-by: Lé <lerela@users.noreply.github.com> Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com> Co-authored-by: khalil <90086758+khalil-Hennara@users.noreply.github.com> Co-authored-by: khalil <k.hennara@work-with-nerds.ca> Co-authored-by: dubaoquan404 <87166864@qq.com> Co-authored-by: Chang W. Lee <changlee99@gmail.com> Co-authored-by: theScotchGame <36061851+leonxia1018@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Stephen Horvath <s.horvath@outlook.com.au> Co-authored-by: liunux4odoo <41217877+liunux4odoo@users.noreply.github.com> Co-authored-by: Norman Mu <normster@users.noreply.github.com> Co-authored-by: Sebastian Bodza <66752172+SebastianBodza@users.noreply.github.com> Co-authored-by: Tianle (Tim) Li <67527391+CodingWithTim@users.noreply.github.com> Co-authored-by: Wei-Lin Chiang <weichiang@berkeley.edu> Co-authored-by: Alex <alexander.s.delapaz@gmail.com> Co-authored-by: Jingcheng Hu <67776176+REIGN12@users.noreply.github.com> Co-authored-by: lvxuan <3645933+lvxuan263@users.noreply.github.com> Co-authored-by: cOng <erdongerzong@qq.com> Co-authored-by: bofeng huang <bofenghuang7@gmail.com> Co-authored-by: Phil-U-U <phil.h.cui@gmail.com> Co-authored-by: Wayne Spangenberg <waynespa@gmail.com> Co-authored-by: Guspan Tanadi <36249910+guspan-tanadi@users.noreply.github.com> Co-authored-by: Rohan Gupta <63547845+Gk-rohan@users.noreply.github.com> Co-authored-by: ugolotti <96428459+ugolotti@users.noreply.github.com> Co-authored-by: Roberto Ugolotti <Roberto.UGOLOTTI@ec.europa.eu> Co-authored-by: edisonwd <2388100489@qq.com> Co-authored-by: FangYin Cheng <staneyffer@gmail.com> Co-authored-by: bodza <bodza@qnovi.de> Co-authored-by: bodza <sebastian.bodza@qnovi.de> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Srinath Janakiraman <me@vjsrinath.com> Co-authored-by: Jaeheon Jeong <tizm423@gmail.com> Co-authored-by: One <imoneoi@users.noreply.github.com> Co-authored-by: sheng.gui@intel.com <guisheng315@sina.com> Co-authored-by: David <scenaristeur@gmail.com> Co-authored-by: Witold Wasiczko <snapshotpl@users.noreply.github.com> Co-authored-by: Peter Willemsen <peter@codebuffet.co> Co-authored-by: ZeyuTeng96 <96521059+ZeyuTeng96@users.noreply.github.com> Co-authored-by: Forceless <72636351+Force1ess@users.noreply.github.com> Co-authored-by: Jeff <122586668+jm23jeffmorgan@users.noreply.github.com> Co-authored-by: MrZhengXin <34998703+MrZhengXin@users.noreply.github.com> Co-authored-by: Long Nguyen <long.nguyen11288@gmail.com> Co-authored-by: Elsa Granger <zeyugao@outlook.com> Co-authored-by: Christopher Chou <49086305+BabyChouSr@users.noreply.github.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: amaleshvemula <vemulaamalesh1997@gmail.com> Co-authored-by: Zollty Tsou <zollty@163.com> Co-authored-by: xuguodong1999 <bugxu@outlook.com> Co-authored-by: Michael J Kaye <1014467+mjkaye@users.noreply.github.com> Co-authored-by: 152334H <54623771+152334H@users.noreply.github.com> Co-authored-by: Jingsong-Yan <75230787+Jingsong-Yan@users.noreply.github.com> Co-authored-by: Siyuan (Ryans) Zhuang <suquark@gmail.com>

lonngxiang · 2023-11-27T09:47:13Z

@ALL chatglm3 最新版适配还是有问题，生成数据会生成标签，下面是prompt和结果

lonngxiang · 2023-11-28T01:57:47Z

@ALL 用vllm 部署是正常的

 python -m vllm.entrypoints.api_server --model  /****/chatglm3-6b/

ZeyuTeng96 · 2023-11-30T07:21:50Z

我这边用0.2.32版本修改conversation.py和model_adapter.py是没有出现这种情况的 @lonngxiang

lonngxiang · 2023-11-30T07:41:19Z

我这边用0.2.32版本修改conversation.py和model_adapter.py是没有出现这种情况的 @lonngxiang

我这用的0.2.33版本

https://github.com/lm-sys/FastChat/issues/2726

ZeyuTeng96 · 2023-11-30T10:11:49Z

我这边用0.2.32版本修改conversation.py和model_adapter.py是没有出现这种情况的 @lonngxiang

我这用的0.2.33版本

https://github.com/lm-sys/FastChat/issues/2726

用了0.2.33版本和issue里的代码，也没有出现你说的情况

Rashomon-Chinglo · 2024-01-13T07:57:42Z

我这边用0.2.32版本修改conversation.py和model_adapter.py是没有出现这种情况的 @lonngxiang

我这用的0.2.33版本

maybe you should check your model version

hanbingmew · 2024-02-05T02:42:29Z

I use fastchat==0.2.34 and this issue still remains. Using this template will invoke tokenizer.encode to generate input_ids, which is not equivalant to the result of build_chat_input in chatglm3 HF version. Invoke tokenizer.encode without further processing will fail to encode special tokens such as <|user|><|assistant|> correctly and cause this issue.
I have tried a quick solution to get the correct result of chatglm3-6b-32k with vllm worker and model worker . The solution is below:
Modify fastchat/conversation.py:

        elif self.sep_style == SeparatorStyle.CHATGLM3:
            # ret = ""
            # if self.system_message:
            #     ret += system_prompt
            # for role, message in self.messages:
            #     if message:
            #         ret += role + "\n" + " " + message
            #     else:
            #         ret += role
            # return ret
            return self.messages

Modify vllm_worker.py：
fastchat/serve/vllm_worker.py

class VLLMWorker(BaseModelWorker):
    def __init__(
        self,
        controller_addr: str,
        worker_addr: str,
        worker_id: str,
        model_path: str,
        model_names: List[str],
        limit_worker_concurrency: int,
        no_register: bool,
        llm_engine: AsyncLLMEngine,
        conv_template: str,
    ):
        super().__init__(
            controller_addr,
            worker_addr,
            worker_id,
            model_path,
            model_names,
            limit_worker_concurrency,
            conv_template,
        )

        logger.info(
            f"Loading the model {self.model_names} on worker {worker_id}, worker type: vLLM worker..."
        )
        self.tokenizer = llm_engine.engine.tokenizer
        self.context_len = get_context_length(llm_engine.engine.model_config.hf_config)
        # special process for chatglm3
        self.is_chatglm3 = 'chatglm3' in model_path

        if not no_register:
            self.init_heart_beat()

    async def generate_stream(self, params):
        self.call_ct += 1

        context = params.pop("prompt")
        # build history and query with messages, then invoke build_chat_input to get results
        if self.is_chatglm3:
            messages = context
            hist = []
            for i in range(0, len(messages), 2):
                hist.append({"role":"user", "content": messages[i][1]})
                hist.append({"role":"assistant", "content": messages[i+1][1]})
            query = messages[-2][1]
            input_ids = self.tokenizer.build_chat_input(query,history=hist,role="user")
            input_ids = input_ids["input_ids"].tolist()[0]
        request_id = params.pop("request_id")
        temperature = float(params.get("temperature", 1.0))
        top_p = float(params.get("top_p", 1.0))
        top_k = params.get("top_k", -1.0)
        presence_penalty = float(params.get("presence_penalty", 0.0))
        frequency_penalty = float(params.get("frequency_penalty", 0.0))
        max_new_tokens = params.get("max_new_tokens", 256)
        stop_str = params.get("stop", None)
        stop_token_ids = params.get("stop_token_ids", None) or []
        if self.tokenizer.eos_token_id is not None:
            stop_token_ids.append(self.tokenizer.eos_token_id)
        echo = params.get("echo", True)
        use_beam_search = params.get("use_beam_search", False)
        best_of = params.get("best_of", None)

        # Handle stop_str
        stop = set()
        if isinstance(stop_str, str) and stop_str != "":
            stop.add(stop_str)
        elif isinstance(stop_str, list) and stop_str != []:
            stop.update(stop_str)

        for tid in stop_token_ids:
            if tid is not None:
                stop.add(self.tokenizer.decode(tid))

        # make sampling params in vllm
        top_p = max(top_p, 1e-5)
        if temperature <= 1e-5:
            top_p = 1.0

        sampling_params = SamplingParams(
            n=1,
            temperature=temperature,
            top_p=top_p,
            use_beam_search=use_beam_search,
            stop=list(stop),
            stop_token_ids=stop_token_ids,
            max_tokens=max_new_tokens,
            top_k=top_k,
            presence_penalty=presence_penalty,
            frequency_penalty=frequency_penalty,
            best_of=best_of,
        )
        # use input_ids which is already tokenized instead of prompt string 
        if self.is_chatglm3:
            results_generator = engine.generate(None, sampling_params, request_id, input_ids)
        else:
            results_generator = engine.generate(context, sampling_params, request_id)

        async for request_output in results_generator:
            prompt = request_output.prompt
            if echo:
                text_outputs = [
                    prompt + output.text for output in request_output.outputs
                ]
            else:
                text_outputs = [output.text for output in request_output.outputs]
            text_outputs = " ".join(text_outputs)

            partial_stop = any(is_partial_stop(text_outputs, i) for i in stop)
            # prevent yielding partial stop sequence
            if partial_stop:
                continue

            prompt_tokens = len(request_output.prompt_token_ids)
            completion_tokens = sum(
                len(output.token_ids) for output in request_output.outputs
            )
            # postprocess
            if self.is_chatglm3:
                temp = text_outputs.split("\n",maxsplit=1)
                text_outputs = temp[-1].strip().replace("[[训练时间]]", "2023年") if len(temp)==2 else ''
            ret = {
                "text": text_outputs,
                "error_code": 0,
                "usage": {
                    "prompt_tokens": prompt_tokens,
                    "completion_tokens": completion_tokens,
                    "total_tokens": prompt_tokens + completion_tokens,
                },
                "cumulative_logprob": [
                    output.cumulative_logprob for output in request_output.outputs
                ],
                "finish_reason": request_output.outputs[0].finish_reason
                if len(request_output.outputs) == 1
                else [output.finish_reason for output in request_output.outputs],
            }
            # Emit twice here to ensure a 'finish_reason' with empty content in the OpenAI API response.
            # This aligns with the behavior of model_worker.
            if request_output.finished:
                yield (json.dumps(ret | {"finish_reason": None}) + "\0").encode()
            yield (json.dumps(ret) + "\0").encode()

    async def generate(self, params):
        async for x in self.generate_stream(params):
            pass
        return json.loads(x[:-1].decode())

Now i can get the normal output。
If you don't use vllm worker, modify fastchat/model/model_chatglm.py:

@torch.inference_mode()
def generate_stream_chatglm(
    model,
    tokenizer,
    params,
    device,
    context_len=2048,
    stream_interval=2,
    judge_sent_end=False,
):
    prompt = params["prompt"]
    temperature = float(params.get("temperature", 1.0))
    repetition_penalty = float(params.get("repetition_penalty", 1.0))
    top_p = float(params.get("top_p", 1.0))
    max_new_tokens = int(params.get("max_new_tokens", 256))
    echo = params.get("echo", True)

    # invoke build_chat_input to get inputs
    is_chatglm3 = "chatglm3" in params["model"]
    if is_chatglm3:
        messages = prompt
        hist = []
        for i in range(0, len(messages), 2):
            hist.append({"role": "user", "content": messages[i][1]})
            hist.append({"role": "assistant", "content": messages[i + 1][1]})
        query = messages[-2][1]
        inputs = tokenizer.build_chat_input(query, history=hist, role="user").to(model.device)
    else:
        inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
    input_echo_len = len(inputs["input_ids"][0])

    gen_kwargs = {
        "max_length": max_new_tokens + input_echo_len,
        "do_sample": True if temperature > 1e-5 else False,
        "top_p": top_p,
        "repetition_penalty": repetition_penalty,
        "logits_processor": [invalid_score_processor],
    }
    if temperature > 1e-5:
        gen_kwargs["temperature"] = temperature

    total_len = 0
    for total_ids in model.stream_generate(**inputs, **gen_kwargs):
        total_ids = total_ids.tolist()[0]
        total_len = len(total_ids)
        if echo:
            output_ids = total_ids
        else:
            output_ids = total_ids[input_echo_len:]
        response = tokenizer.decode(output_ids)
        response = process_response(response)

        yield {
            "text": response,
            "usage": {
                "prompt_tokens": input_echo_len,
                "completion_tokens": total_len - input_echo_len,
                "total_tokens": total_len,
            },
            "finish_reason": None,
        }

After these modifications, i can get correct result of chatglm3-6b-32k using vllm worker and normal model worker.
references：
https://huggingface.co/THUDM/chatglm3-6b-32k/blob/main/modeling_chatglm.py
https://huggingface.co/THUDM/chatglm3-6b-32k/blob/main/tokenization_chatglm.py

* Remove hardcode flash-attn disable setting (lm-sys#2342) * Document turning off proxy_buffering when api is streaming (lm-sys#2337) * Simplify huggingface api example (lm-sys#2355) * Update sponsor logos (lm-sys#2367) * if LOGDIR is empty, then don't try output log to local file (lm-sys#2357) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * add best_of and use_beam_search for completions interface (lm-sys#2348) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * Extract upvote/downvote from log files (lm-sys#2369) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370) * Improve doc (lm-sys#2371) * add best_of and use_beam_search for completions interface (lm-sys#2372) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * update monkey patch for llama2 (lm-sys#2379) * Make E5 adapter more restrict to reduce mismatch (lm-sys#2381) * Update UI and sponsers (lm-sys#2387) * Use fsdp api for save save (lm-sys#2390) * Release v0.2.27 * Spicyboros + airoboros 2.2 template update. (lm-sys#2392) Co-authored-by: Jon Durbin <jon.durbin@onna.com> * bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398) Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz> * Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401) * Release a v0.2.28 with bug fixes and more test cases * Fix model_worker error (lm-sys#2404) * Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402) * Rename twitter to X (lm-sys#2406) * Update huggingface_api.py (lm-sys#2409) * Add support for baichuan2 models (lm-sys#2408) * Fixed character overlap issue when api streaming output (lm-sys#2431) * Support custom conversation template in multi_model_worker (lm-sys#2434) * Add Ascend NPU support (lm-sys#2422) * Add raw conversation template (lm-sys#2417) (lm-sys#2418) * Improve docs & UI (lm-sys#2436) * Fix Salesforce xgen inference (lm-sys#2350) * Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> * Add falcon 180B chat conversation template (lm-sys#2384) * Improve docs (lm-sys#2438) * add dtype and seed (lm-sys#2430) * Data cleaning scripts for dataset release (lm-sys#2440) * merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411) * Fix docs * Update UI (lm-sys#2446) * Add Optional SSL Support to controller.py (lm-sys#2448) * Format & Improve docs * Release v0.2.29 (lm-sys#2450) * Show terms of use as an JS alert (lm-sys#2461) * vllm worker awq quantization update (lm-sys#2463) Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com> * Fix falcon chat template (lm-sys#2464) * Fix chunk handling when partial chunks are returned (lm-sys#2485) * Update openai_api_server.py to add an SSL option (lm-sys#2484) * Update vllm_worker.py (lm-sys#2482) * fix typo quantization (lm-sys#2469) * fix vllm quanziation args * Update README.md (lm-sys#2492) * Huggingface api worker (lm-sys#2456) * Update links to lmsys-chat-1m (lm-sys#2497) * Update train code to support the new tokenizer (lm-sys#2498) * Third Party UI Example (lm-sys#2499) * Add metharme (pygmalion) conversation template (lm-sys#2500) * Optimize for proper flash attn causal handling (lm-sys#2503) * Add Mistral AI instruction template (lm-sys#2483) * Update monitor & plots (lm-sys#2506) * Release v0.2.30 (lm-sys#2507) * Fix for single turn dataset (lm-sys#2509) * replace os.getenv with os.path.expanduser because the first one doesn… (lm-sys#2515) Co-authored-by: khalil <k.hennara@work-with-nerds.ca> * Fix arena (lm-sys#2522) * Update Dockerfile (lm-sys#2524) * add Llama2ChangAdapter (lm-sys#2510) * Add ExllamaV2 Inference Framework Support. (lm-sys#2455) * Improve docs (lm-sys#2534) * Fix warnings for new gradio versions (lm-sys#2538) * revert the gradio change; now works for 3.40 * Improve chat templates (lm-sys#2539) * Add Zephyr 7B Alpha (lm-sys#2535) * Improve Support for Mistral-Instruct (lm-sys#2547) * correct max_tokens by context_length instead of raise exception (lm-sys#2544) * Revert "Improve Support for Mistral-Instruct" (lm-sys#2552) * Fix Mistral template (lm-sys#2529) * Add additional Informations from the vllm worker (lm-sys#2550) * Make FastChat work with LMSYS-Chat-1M Code (lm-sys#2551) * Create `tags` attribute to fix `MarkupError` in rich CLI (lm-sys#2553) * move BaseModelWorker outside serve.model_worker to make it independent (lm-sys#2531) * Misc style and bug fixes (lm-sys#2559) * Fix README.md (lm-sys#2561) * release v0.2.31 (lm-sys#2563) * resolves lm-sys#2542 modify dockerfile to upgrade cuda to 12.2.0 and pydantic 1.10.13 (lm-sys#2565) * Add airoboros_v3 chat template (llama-2 format) (lm-sys#2564) * Add Xwin-LM V0.1, V0.2 support (lm-sys#2566) * Fixed model_worker generate_gate may blocked main thread (lm-sys#2540) (lm-sys#2562) * feat: add claude-v2 (lm-sys#2571) * Update vigogne template (lm-sys#2580) * Fix issue lm-sys#2568: --device mps led to TypeError: forward() got an unexpected keyword argument 'padding_mask'. (lm-sys#2579) * Add Mistral-7B-OpenOrca conversation_temmplate (lm-sys#2585) * docs: bit misspell comments model adapter default template name conversation (lm-sys#2594) * Update Mistral template (lm-sys#2581) * Fix <s> in mistral template * Update README.md (vicuna-v1.3 -> vicuna-1.5) (lm-sys#2592) * Update README.md to highlight chatbot arena (lm-sys#2596) * Add Lemur model (lm-sys#2584) Co-authored-by: Roberto Ugolotti <Roberto.UGOLOTTI@ec.europa.eu> * add trust_remote_code=True in BaseModelAdapter (lm-sys#2583) * Openai interface add use beam search and best of 2 (lm-sys#2442) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> * Update qwen and add pygmalion (lm-sys#2607) * feat: Support model AquilaChat2 (lm-sys#2616) * Added settings vllm (lm-sys#2599) Co-authored-by: bodza <bodza@qnovi.de> Co-authored-by: bodza <sebastian.bodza@qnovi.de> * [Logprobs] Support logprobs=1 (lm-sys#2612) * release v0.2.32 * fix: Fix for OpenOrcaAdapter to return correct conversation template (lm-sys#2613) * Make fastchat.serve.model_worker to take debug argument (lm-sys#2628) Co-authored-by: hi-jin <crushed7@o.cnu.ac.kr> * openchat 3.5 model support (lm-sys#2638) * xFastTransformer framework support (lm-sys#2615) * feat: support custom models vllm serving (lm-sys#2635) * kill only fastchat process (lm-sys#2641) * Update server_arch.png * Use conv.update_last_message api in mt-bench answer generation (lm-sys#2647) * Improve Azure OpenAI interface (lm-sys#2651) * Add required_temp support in jsonl format to support flexible temperature setting for gen_api_answer (lm-sys#2653) * Pin openai version < 1 (lm-sys#2658) * Remove exclude_unset parameter (lm-sys#2654) * Revert "Remove exclude_unset parameter" (lm-sys#2666) * added support for CodeGeex(2) (lm-sys#2645) * add chatglm3 conv template support in conversation.py (lm-sys#2622) * UI and model change (lm-sys#2672) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> * train_flant5: fix typo (lm-sys#2673) * Fix gpt template (lm-sys#2674) * Update README.md (lm-sys#2679) * feat: support template's stop_str as list (lm-sys#2678) * Update exllama_v2.md (lm-sys#2680) * save model under deepspeed (lm-sys#2689) * Adding SSL support for model workers and huggingface worker (lm-sys#2687) * Check the max_new_tokens <= 0 in openai api server (lm-sys#2688) * Add Microsoft/Orca-2-7b and update model support docs (lm-sys#2714) * fix tokenizer of chatglm2 (lm-sys#2711) * Template for using Deepseek code models (lm-sys#2705) * add support for Chinese-LLaMA-Alpaca (lm-sys#2700) * Make --load-8bit flag work with weights in safetensors format (lm-sys#2698) * Format code and minor bug fix (lm-sys#2716) * Bump version to v0.2.33 (lm-sys#2717) * fix tokenizer.pad_token attribute error (lm-sys#2710) * support stable-vicuna model (lm-sys#2696) * Exllama cache 8bit (lm-sys#2719) * Add Yi support (lm-sys#2723) * Add Hermes 2.5 [fixed] (lm-sys#2725) * Fix Hermes2Adapter (lm-sys#2727) * Fix YiAdapter (lm-sys#2730) * add trust_remote_code argument (lm-sys#2715) * Add revision arg to MT Bench answer generation (lm-sys#2728) * Fix MPS backend 'index out of range' error (lm-sys#2737) * add starling support (lm-sys#2738) * Add deepseek chat (lm-sys#2760) * a convenient script for spinning up the API with Model Workers (lm-sys#2790) * Prevent returning partial stop string in vllm worker (lm-sys#2780) * Update UI and new models (lm-sys#2762) * Support MetaMath (lm-sys#2748) * Use common logging code in the OpenAI API server (lm-sys#2758) Co-authored-by: Warren Francis <warren@kududyn.com> * Show how to turn on experiment tracking for fine-tuning (lm-sys#2742) Co-authored-by: Morgan McGuire <morganmcguire@Morgans-MacBook-Pro.local> * Support xDAN-L1-Chat Model (lm-sys#2732) * Format code * Update the version to 0.2.34 (lm-sys#2793) * add dolphin (lm-sys#2794) * Fix tiny typo (lm-sys#2805) * Add instructions for evaluating on MT bench using vLLM (lm-sys#2770) * Update README.md * Add SOLAR-10.7b Instruct Model (lm-sys#2826) * Update README.md (lm-sys#2852) * fix: 'compeletion' typo (lm-sys#2847) * Add Tunnelmole as an open source alternative to ngrok and include usage instructions (lm-sys#2846) * update readme * update mt-bench readme * Add support for CatPPT (lm-sys#2840) * Add functionality to ping AI2 InferD endpoints for tulu 2 (lm-sys#2832) Co-authored-by: Sam Skjonsberg <sams@allenai.org> * add download models from www.modelscope.cn (lm-sys#2830) Co-authored-by: mulin.lyh <mulin.lyh@taobao.com> * Fix conv_template of chinese alpaca 2 (lm-sys#2812) * add bagel model adapter (lm-sys#2814) * add root_path argument to gradio web server. (lm-sys#2807) Co-authored-by: bertls <s.bertl@iaea.org> * Import `accelerate` locally to avoid it as a strong dependency (lm-sys#2820) * Replace dict merge with unpacking for compatibility of 3.8 in vLLM worker (lm-sys#2824) Signed-off-by: rudeigerc <rudeigerc@gmail.com> * Format code (lm-sys#2854) * Openai API migrate (lm-sys#2765) * fix openai api server docs * Add a16z as a sponser * Add new models (Perplexity, gemini) & Separate GPT versions (lm-sys#2856) Co-authored-by: Wei-Lin Chiang <infwinston@gmail.com> * Clean error messages (lm-sys#2857) * Update docs (lm-sys#2858) * Modify doc description (lm-sys#2859) * Fix the problem of not using the decoding method corresponding to the base model in peft mode (lm-sys#2865) * update a new sota model on MT-Bench which touch an 8.8 scores. (lm-sys#2864) * NPU needs to be initialized when starting a new process (lm-sys#2843) * Fix the problem with "vllm + chatglm3" (lm-sys#2845) (lm-sys#2876) Co-authored-by: 姚峰 <yaofeng@chinaums.com> * Update token spacing for mistral conversation.py (lm-sys#2872) * check if hm in models before deleting to avoid errors (lm-sys#2870) Co-authored-by: Your Name <you@example.com> * Add TinyLlama (lm-sys#2889) * Fix bug that model doesn't automatically switch peft adapter (lm-sys#2884) * Update web server commands (lm-sys#2869) * fix the tokenize process and prompt template of chatglm3 (lm-sys#2883) Co-authored-by: 章焕锭 <zhanghuanding@zj.chinamobile.com> * Add `Notus` support (lm-sys#2813) Co-authored-by: alvarobartt <alvaro@argilla.io> * feat: support anthropic api with api_dict (lm-sys#2879) * Update model_adapter.py (lm-sys#2895) * leaderboard code update (lm-sys#2867) * fix: change order of SEQUENCE_LENGTH_KEYS (lm-sys#2925) * fix baichuan:apply_prompt_template call args error (lm-sys#2921) Co-authored-by: Zheng Hao <forcelss@ForcelessMacBook-Pro.local> * Fix a typo in openai_api_server.py (lm-sys#2905) * feat: use variables OPENAI_MODEL_LIST (lm-sys#2907) * Add TenyxChat-7B-v1 model (lm-sys#2901) Co-authored-by: sarath@L3 <[omitted]> * add support for iei yuan2.0 (https://huggingface.co/IEITYuan) (lm-sys#2919) * nous-hermes-2-mixtral-dpo (lm-sys#2922) * Bump the version to 0.2.35 (lm-sys#2927) * fix specify local path issue use model from www.modelscope.cn (lm-sys#2934) Co-authored-by: mulin.lyh <mulin.lyh@taobao.com> * support openai embedding for topic clustering (lm-sys#2729) * Remove duplicate API endpoint (lm-sys#2949) * Update Hermes Mixtral (lm-sys#2938) * Enablement of REST API Usage within Google Colab Free Tier (lm-sys#2940) * Create a new worker implementation for Apple MLX (lm-sys#2937) * feat: support Model Yuan2.0, a new generation Fundamental Large Language Model developed by IEIT System (lm-sys#2936) * Fix the pooling method of BGE embedding model (lm-sys#2926) * format code * SGLang Worker (lm-sys#2928) * Fix sglang worker (lm-sys#2953) * Update mlx_worker to be async (lm-sys#2958) * Integrate LightLLM into serve worker (lm-sys#2888) * Copy button (lm-sys#2963) * feat: train with template (lm-sys#2951) * fix content maybe a str (lm-sys#2968) * Adding download folder information in README (lm-sys#2972) * use cl100k_base as the default tiktoken encoding (lm-sys#2974) Signed-off-by: bjwswang <bjwswang@gmail.com> * Update README.md (lm-sys#2975) * Fix tokenizer for vllm worker (lm-sys#2984) * update yuan2.0 generation (lm-sys#2989) * fix: tokenization mismatch when training with different templates (lm-sys#2996) * fix: inconsistent tokenization by llama tokenizer (lm-sys#3006) * Fix type hint for play_a_match_single (lm-sys#3008) * code update (lm-sys#2997) * Update model_support.md (lm-sys#3016) * Update lightllm_integration.md (lm-sys#3014) * Upgrade gradio to 4.17 (lm-sys#3027) * Update MLX integration to use new generate_step function signature (lm-sys#3021) * Update readme (lm-sys#3028) * Update gradio version in `pyproject.toml` and fix a bug (lm-sys#3029) * Update gradio demo and API model providers (lm-sys#3030) * Gradio Web Server for Multimodal Models (lm-sys#2960) Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> * Migrate the gradio server to openai v1 (lm-sys#3032) * Update version to 0.2.36 (lm-sys#3033) Co-authored-by: Wei-Lin Chiang <infwinston@gmail.com> * Add llava 34b template (lm-sys#3034) * Update model support (lm-sys#3040) * Add psutil to pyproject.toml dependencies (lm-sys#3039) * Fix SGLang worker (lm-sys#3045) * Random VQA Sample button for VLM direct chat (lm-sys#3041) * Update arena.md to fix link (lm-sys#3051) * multi inference --------- Signed-off-by: Lei Wen <wenlei03@qiyi.com> Signed-off-by: rudeigerc <rudeigerc@gmail.com> Signed-off-by: bjwswang <bjwswang@gmail.com> Co-authored-by: Trangle <kw_w@foxmail.com> Co-authored-by: Nathan Stitt <nathan@stitt.org> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: leiwen83 <leiwen83@users.noreply.github.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Jon Durbin <jon@jondurbin.com> Co-authored-by: Jon Durbin <jon.durbin@onna.com> Co-authored-by: Rayrtfr <2384172887@qq.com> Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz> Co-authored-by: wangxiyuan <wangxiyuan@huawei.com> Co-authored-by: Jeff (Zhen) Wang <wangzhen263@gmail.com> Co-authored-by: karshPrime <94996251+karshPrime@users.noreply.github.com> Co-authored-by: obitolyz <obitoquilt@qq.com> Co-authored-by: Shangwei Chen <109785802+Somezak1@users.noreply.github.com> Co-authored-by: HyungJin Ahn <crushed7@o.cnu.ac.kr> Co-authored-by: zhangsibo1129 <134488188+zhangsibo1129@users.noreply.github.com> Co-authored-by: Tobias Birchler <tobias@birchlerfamily.ch> Co-authored-by: Jae-Won Chung <jwnchung@umich.edu> Co-authored-by: Mingdao Liu <joshua@btlmd.com> Co-authored-by: Ying Sheng <sqy1415@gmail.com> Co-authored-by: Brandon Biggs <brandonsbiggs@gmail.com> Co-authored-by: dongxiaolong <774848421@qq.com> Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com> Co-authored-by: Siddartha Naidu <siddartha@abacus.ai> Co-authored-by: shuishu <990941859@qq.com> Co-authored-by: Andrew Aikawa <asai@berkeley.edu> Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com> Co-authored-by: enochlev <47466848+enochlev@users.noreply.github.com> Co-authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com> Co-authored-by: Lé <lerela@users.noreply.github.com> Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com> Co-authored-by: khalil <90086758+khalil-Hennara@users.noreply.github.com> Co-authored-by: khalil <k.hennara@work-with-nerds.ca> Co-authored-by: dubaoquan404 <87166864@qq.com> Co-authored-by: Chang W. Lee <changlee99@gmail.com> Co-authored-by: theScotchGame <36061851+leonxia1018@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> Co-authored-by: Stephen Horvath <s.horvath@outlook.com.au> Co-authored-by: liunux4odoo <41217877+liunux4odoo@users.noreply.github.com> Co-authored-by: Norman Mu <normster@users.noreply.github.com> Co-authored-by: Sebastian Bodza <66752172+SebastianBodza@users.noreply.github.com> Co-authored-by: Tianle (Tim) Li <67527391+CodingWithTim@users.noreply.github.com> Co-authored-by: Wei-Lin Chiang <weichiang@berkeley.edu> Co-authored-by: Alex <alexander.s.delapaz@gmail.com> Co-authored-by: Jingcheng Hu <67776176+REIGN12@users.noreply.github.com> Co-authored-by: lvxuan <3645933+lvxuan263@users.noreply.github.com> Co-authored-by: cOng <erdongerzong@qq.com> Co-authored-by: bofeng huang <bofenghuang7@gmail.com> Co-authored-by: Phil-U-U <phil.h.cui@gmail.com> Co-authored-by: Wayne Spangenberg <waynespa@gmail.com> Co-authored-by: Guspan Tanadi <36249910+guspan-tanadi@users.noreply.github.com> Co-authored-by: Rohan Gupta <63547845+Gk-rohan@users.noreply.github.com> Co-authored-by: ugolotti <96428459+ugolotti@users.noreply.github.com> Co-authored-by: Roberto Ugolotti <Roberto.UGOLOTTI@ec.europa.eu> Co-authored-by: edisonwd <2388100489@qq.com> Co-authored-by: FangYin Cheng <staneyffer@gmail.com> Co-authored-by: bodza <bodza@qnovi.de> Co-authored-by: bodza <sebastian.bodza@qnovi.de> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Srinath Janakiraman <me@vjsrinath.com> Co-authored-by: Jaeheon Jeong <tizm423@gmail.com> Co-authored-by: One <imoneoi@users.noreply.github.com> Co-authored-by: sheng.gui@intel.com <guisheng315@sina.com> Co-authored-by: David <scenaristeur@gmail.com> Co-authored-by: Witold Wasiczko <snapshotpl@users.noreply.github.com> Co-authored-by: Peter Willemsen <peter@codebuffet.co> Co-authored-by: ZeyuTeng96 <96521059+ZeyuTeng96@users.noreply.github.com> Co-authored-by: Forceless <72636351+Force1ess@users.noreply.github.com> Co-authored-by: Jeff <122586668+jm23jeffmorgan@users.noreply.github.com> Co-authored-by: MrZhengXin <34998703+MrZhengXin@users.noreply.github.com> Co-authored-by: Long Nguyen <long.nguyen11288@gmail.com> Co-authored-by: Elsa Granger <zeyugao@outlook.com> Co-authored-by: Christopher Chou <49086305+BabyChouSr@users.noreply.github.com> Co-authored-by: wangshuai09 <391746016@qq.com> Co-authored-by: amaleshvemula <vemulaamalesh1997@gmail.com> Co-authored-by: Zollty Tsou <zollty@163.com> Co-authored-by: xuguodong1999 <bugxu@outlook.com> Co-authored-by: Michael J Kaye <1014467+mjkaye@users.noreply.github.com> Co-authored-by: 152334H <54623771+152334H@users.noreply.github.com> Co-authored-by: Jingsong-Yan <75230787+Jingsong-Yan@users.noreply.github.com> Co-authored-by: Siyuan (Ryans) Zhuang <suquark@gmail.com> Co-authored-by: Chris Kerwell Gresla <80501101+ckgresla@users.noreply.github.com> Co-authored-by: pandada8 <pandada8@gmail.com> Co-authored-by: Isaac Ong <isaacong.jw@gmail.com> Co-authored-by: Warren Francis <geekoftheweek@users.noreply.github.com> Co-authored-by: Warren Francis <warren@kududyn.com> Co-authored-by: Morgan McGuire <morganmcg1@users.noreply.github.com> Co-authored-by: Morgan McGuire <morganmcguire@Morgans-MacBook-Pro.local> Co-authored-by: xDAN-AI <128944251+xiechengmude@users.noreply.github.com> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: Robbie <robbie-cahill@proton.me> Co-authored-by: Rishiraj Acharya <44090649+rishiraj@users.noreply.github.com> Co-authored-by: Nathan Lambert <nathanl@allenai.org> Co-authored-by: Sam Skjonsberg <sams@allenai.org> Co-authored-by: liuyhwangyh <liuyhwangyh@163.com> Co-authored-by: mulin.lyh <mulin.lyh@taobao.com> Co-authored-by: stephanbertl <stephan@bweb.at> Co-authored-by: bertls <s.bertl@iaea.org> Co-authored-by: Chirag Jain <jain.chirag925@gmail.com> Co-authored-by: Yuchen Cheng <rudeigerc@gmail.com> Co-authored-by: Shuo Yang <73746844+andy-yang-1@users.noreply.github.com> Co-authored-by: Wei-Lin Chiang <infwinston@gmail.com> Co-authored-by: JQ <460494839@qq.com> Co-authored-by: yaofeng <yf_reg@outlook.com> Co-authored-by: 姚峰 <yaofeng@chinaums.com> Co-authored-by: Michael <67104840+thavens@users.noreply.github.com> Co-authored-by: Josh NE <renjunyao@gmail.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: WHDY <38045789+WHDY@users.noreply.github.com> Co-authored-by: 章焕锭 <zhanghuanding@zj.chinamobile.com> Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com> Co-authored-by: alvarobartt <alvaro@argilla.io> Co-authored-by: Zheng Hao <forcelss@ForcelessMacBook-Pro.local> Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com> Co-authored-by: Sarath Shekkizhar <137322432+sarath-shekkizhar@users.noreply.github.com> Co-authored-by: wangpengfei1013 <155146149+wangpengfei1013@users.noreply.github.com> Co-authored-by: Alexandre Strube <a.strube@fz-juelich.de> Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com> Co-authored-by: Cristian Gutiérrez <57730982+ggcr@users.noreply.github.com> Co-authored-by: ali asaria <aliasaria@users.noreply.github.com> Co-authored-by: wulixuan <cauwulixuan@163.com> Co-authored-by: staoxiao <2906698981@qq.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: dheeraj-326 <dheeraj.326@gmail.com> Co-authored-by: bjwswang <30621793+bjwswang@users.noreply.github.com> Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> Co-authored-by: Ted Li <tl2493@columbia.edu> Co-authored-by: Shukant Pal <SukantK2002@outlook.com> Co-authored-by: Lisa Dunlap <lisabdunlap@gmail.com> Co-authored-by: Logan Kilpatrick <23kilpatrick23@gmail.com>

ZeyuTeng96 mentioned this pull request Oct 31, 2023

feat(glm3): adapt to glm3 prompt #2620

Closed

merrymercy mentioned this pull request Oct 31, 2023

Chatglm3-supported #2618

Closed

infwinston reviewed Nov 2, 2023

View reviewed changes

infwinston mentioned this pull request Nov 2, 2023

提示词模板不一致问题 THUDM/ChatGLM3#127

Closed

Jeffwan reviewed Nov 5, 2023

View reviewed changes

barnett-yuxiang reviewed Nov 9, 2023

View reviewed changes

Update conversation.py

2183bc4

ZeyuTeng96 added 4 commits November 10, 2023 11:34

Merge branch 'main' into patch-1

84e885f

Update model_adapter.py to support chatglm3

c92675f

Update model_adapter.py

dcd5324

Update model_adapter.py

efd1501

Update conversation.py

99087e7

infwinston approved these changes Nov 10, 2023

View reviewed changes

infwinston merged commit e46d97a into lm-sys:main Nov 10, 2023
1 check passed

merrymercy mentioned this pull request Nov 10, 2023

support chatglm3 #2608

Closed

Jeffwan mentioned this pull request Nov 15, 2023

ChatGLM2 Support vllm-project/vllm#1261

Merged

This was referenced Nov 29, 2023

[WIP] Support ChatGLM3-6b #2660

Closed

Fastchat supports ChatGLM3-6b? Currently, it seems not supported. 400 Bad Request #2743

Open

add chatglm3 conv template support in conversation.py #2622

add chatglm3 conv template support in conversation.py #2622

Conversation

ZeyuTeng96 commented Oct 31, 2023

Why are these changes needed?

Related issue number (if applicable)

Checks

ZeyuTeng96 commented Oct 31, 2023

ZeyuTeng96 commented Oct 31, 2023

merrymercy commented Oct 31, 2023

ZeyuTeng96 commented Nov 2, 2023 • edited Loading

ZeyuTeng96 commented Nov 2, 2023

infwinston commented Nov 2, 2023

infwinston Nov 2, 2023

Choose a reason for hiding this comment

ZeyuTeng96 Nov 3, 2023

Choose a reason for hiding this comment

arugal Nov 3, 2023

Choose a reason for hiding this comment

Jeffwan Nov 5, 2023 • edited Loading

Choose a reason for hiding this comment

duzx16 Nov 9, 2023

Choose a reason for hiding this comment

Jeffwan Nov 5, 2023 • edited Loading

Choose a reason for hiding this comment

duzx16 Nov 9, 2023

Choose a reason for hiding this comment

duzx16 commented Nov 9, 2023 • edited Loading

barnett-yuxiang left a comment

Choose a reason for hiding this comment

barnett-yuxiang left a comment

Choose a reason for hiding this comment

ZeyuTeng96 commented Nov 10, 2023 • edited Loading

ZeyuTeng96 commented Nov 10, 2023 • edited Loading

duzx16 commented Nov 10, 2023

infwinston commented Nov 10, 2023 • edited Loading

ZeyuTeng96 commented Nov 10, 2023 • edited Loading

ZeyuTeng96 commented Nov 10, 2023 • edited Loading

infwinston commented Nov 10, 2023

ZeyuTeng96 commented Nov 10, 2023

infwinston left a comment

Choose a reason for hiding this comment

ZeyuTeng96 commented Nov 11, 2023

infwinston commented Nov 12, 2023 • edited Loading

Jeffwan commented Nov 15, 2023

lonngxiang commented Nov 23, 2023 • edited Loading

lonngxiang commented Nov 27, 2023

lonngxiang commented Nov 28, 2023

ZeyuTeng96 commented Nov 30, 2023

lonngxiang commented Nov 30, 2023

ZeyuTeng96 commented Nov 30, 2023

Rashomon-Chinglo commented Jan 13, 2024

hanbingmew commented Feb 5, 2024 • edited Loading

ZeyuTeng96 commented Nov 2, 2023 •

edited

Loading

Jeffwan Nov 5, 2023 •

edited

Loading

Jeffwan Nov 5, 2023 •

edited

Loading

duzx16 commented Nov 9, 2023 •

edited

Loading

ZeyuTeng96 commented Nov 10, 2023 •

edited

Loading

ZeyuTeng96 commented Nov 10, 2023 •

edited

Loading

infwinston commented Nov 10, 2023 •

edited

Loading

ZeyuTeng96 commented Nov 10, 2023 •

edited

Loading

ZeyuTeng96 commented Nov 10, 2023 •

edited

Loading

infwinston commented Nov 12, 2023 •

edited

Loading

lonngxiang commented Nov 23, 2023 •

edited

Loading

hanbingmew commented Feb 5, 2024 •

edited

Loading