Deepseek v2 support #693

hnyls2002 · 2024-07-21T22:44:09Z

To use deepseek v2, please sepcify the --context-length or --max-num-reqs to avoid oom. The context length for deepseek is quite large, for the current static req_to_token layout, we cannot support large requests num and large context length at the same time.

m0g1cian · 2024-07-22T03:24:55Z

Looking forward to see Deepseek v2 gets supported! I was trying to do the same thing two weeks ago but found the exact same issue of

RuntimeError: shape mismatch: value tensor of shape [7, 16, 256] cannot be broadcast to indexing result of shape [7, 16, 40]

Xu-Chen · 2024-07-27T01:45:41Z

Thank you for your excellent work. Will you support MLA in the future? Reduce KV cache to support larger context length.

hnyls2002 marked this pull request as draft July 21, 2024 22:44

hnyls2002 added 2 commits July 25, 2024 23:06

adapt code

41f106f

align with vllm: support fp8 moe

952ba93

hnyls2002 force-pushed the deepseek branch from a0eccb5 to 96db372 Compare July 25, 2024 23:06

tmp workaround for head_dim

9e8d5d4

hnyls2002 force-pushed the deepseek branch from 59d6297 to 9e8d5d4 Compare July 26, 2024 23:26

hnyls2002 marked this pull request as ready for review July 26, 2024 23:31

hnyls2002 added 3 commits July 26, 2024 23:51

add --max-num-reqs

916f883

Merge branch 'main' into deepseek

89dd9c8

fix hf util

790adc3

hnyls2002 merged commit 679ebcb into main Jul 27, 2024
2 checks passed

hnyls2002 deleted the deepseek branch July 27, 2024 00:10

Ying1123 mentioned this pull request Aug 2, 2024

Development Roadmap (2024 Q3) #634

Closed

29 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepseek v2 support #693

Deepseek v2 support #693

hnyls2002 commented Jul 21, 2024 •

edited

Loading

m0g1cian commented Jul 22, 2024

Xu-Chen commented Jul 27, 2024

Deepseek v2 support #693

Deepseek v2 support #693

Conversation

hnyls2002 commented Jul 21, 2024 • edited Loading

m0g1cian commented Jul 22, 2024

Xu-Chen commented Jul 27, 2024

hnyls2002 commented Jul 21, 2024 •

edited

Loading