DeepSpeed-FastGen #4604

cmikeh2 · 2023-11-03T19:22:01Z

DeepSpeed-FastGen is built to leverage continuous batching and non-contiguous KV caches to enable increased occupancy and higher responsivity for serving LLMs in the data center, similar to existing frameworks such as TRT-LLM, TGI, and vLLM. In order to achieve a new level of performance, DeepSpeed-FastGen introduces SplitFuse which leverages dynamic prompt and generation decomposition and unification to further improve continuous batching and system throughput.

Corresponding blog: #4607

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Connor Holmes <connorholmes@microsoft.com> Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

jeffra

🚀🚀🎉🎉

Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

cmikeh2 requested review from RezaYazdaniAminabadi, jeffra, mrwyattii, awan-10, arashb, tjruwase and loadams as code owners November 3, 2023 19:22

Merge branch 'master' into staging-inference-v2-5

74b6f76

jeffra approved these changes Nov 3, 2023

View reviewed changes

jeffra and others added 2 commits November 3, 2023 15:06

Merge branch 'master' into staging-inference-v2-5

7e9d841

Merge branch 'master' into staging-inference-v2-5

debca5f

jeffra merged commit 38b41df into master Nov 3, 2023
16 checks passed

weiji14 mentioned this pull request Nov 4, 2023

deepspeed v0.12.0 conda-forge/deepspeed-feedstock#34

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeed-FastGen #4604

DeepSpeed-FastGen #4604

cmikeh2 commented Nov 3, 2023 •

edited by jeffra

Loading

jeffra left a comment

DeepSpeed-FastGen #4604

DeepSpeed-FastGen #4604

Conversation

cmikeh2 commented Nov 3, 2023 • edited by jeffra Loading

jeffra left a comment

Choose a reason for hiding this comment

cmikeh2 commented Nov 3, 2023 •

edited by jeffra

Loading