Does it support the reasoning acceleration of Qwen-14B? #25

dashi6174 · 2023-12-04T08:15:18Z

Qwen-14B： https://github.com/QwenLM/Qwen

Chillee · 2023-12-04T21:15:05Z

It's similar to the llama architecture, so it should be easy to modify model.py to support it.

DongqiShen · 2023-12-07T03:04:20Z

I have tested it with Qwen-1.8B on RTX 2080, and the reasoning acceleration is about twice the time compared to the original (50 tok/s vs ~100 tok/s) which is fascinating. Considering the Owen series has the same architecture, I thought it should be working for Owen-14B.

dashi6174 · 2023-12-08T06:30:08Z

ascinating. Considering t

3qs（Thank u），I will give it a try.

DongqiShen · 2023-12-08T15:18:20Z

@dashi6174 https://github.com/DongqiShen/qwen-fast

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it support the reasoning acceleration of Qwen-14B? #25

Does it support the reasoning acceleration of Qwen-14B? #25

dashi6174 commented Dec 4, 2023

Chillee commented Dec 4, 2023

DongqiShen commented Dec 7, 2023

dashi6174 commented Dec 8, 2023

DongqiShen commented Dec 8, 2023

Does it support the reasoning acceleration of Qwen-14B? #25

Does it support the reasoning acceleration of Qwen-14B? #25

Comments

dashi6174 commented Dec 4, 2023

Chillee commented Dec 4, 2023

DongqiShen commented Dec 7, 2023

dashi6174 commented Dec 8, 2023

DongqiShen commented Dec 8, 2023