[Common] Modify resize() in DecoderContext to support #367

pujiang2018 · 2024-05-06T03:01:27Z

Refactor resize function in DecoderContext to support Continuous Batching, and removed qkScores member (since it is rarely used and the attention impl. most likely would like to mange the buffer by itself).

…feature/cb_dev

…s member removed

* [Common] Add sequenceMeta, sequenceGroup and sequenecePool. (#343) * merge batchSize and seqLen into one in TokenEembedding * merge batchSize and seqLen into one in TokenEembedding (#350) * [Common] Move Martix into xft namespace. (#351) * remove unsed function in DecoderLayer * [Layer] Remove unused functions in Decoder layer (#353) * fix compile error of embeddingForward * [Model] Fix compile error of embeddingForward in YaRNLlama (#358) * [Common] Add sampling params into group seq. (#356) * remove DecoderContext in computeSoftmax * [Util] Remove DecoderContext in computeSoftmax (#362) * [Common] Refactor sequence.h. (#363) * [kernels] refactor flash attention for continuous batching (#361) * [models] Add attnMeta for continuous batching (#364) * [Layers] fix build error (#365) * [Model] add interface for seq meta. (#366) * refactor resize function in DecoderContext to support CB, and qkScores member removed * [Common] Modify resize() in DecoderContext to support (#367) * add some code to CommonDecoder::forward() * SequenceMeta refactor * [Model] New CommonDecoder::forward impl. skeleton (#369) * new KVCacheMgr supporting CB * fix typo & set default prefixId to -1 in addSequence() * [Common] New KVCacheMgr to support CB (#371) * [Sampling] Add repetition penalty for new seq type. (#373) * New foward to support CB (CommonDecoder->DecoderBlock->DecoderLayer->Attention/MLP) * add todo * [Sampling] Add greedy search for cb path. (#376) * logic issue fix * code fix to make new forward work * add maxSeqLen limitation * cross attention impl. for CB * DecoderContext::resize fix * correct the output of the new forward * add cb_check * fix incorrect buffer size calculation * 2 sequences -> 3 sequences * better method to prepare KV cache --------- Co-authored-by: Changqing Li <changqing.li@intel.com> Co-authored-by: Duyi-Wang <duyi.wang@intel.com> Co-authored-by: Meng,Chen <chen.meng@intel.com>

pujiang2018 added 9 commits April 25, 2024 23:48

merge batchSize and seqLen into one in TokenEembedding

dbcb267

Merge commit '9a53fb2ea6b9141ba7c045bc0d135c1809e8f22c' into pujiang/…

25ee312

…feature/cb_dev

remove unsed function in DecoderLayer

376b2bc

Merge commit '4ff47074fc85a27e13251c3fb618f36e338c456f' into pujiang/…

d281a54

…feature/cb_dev

fix compile error of embeddingForward

b5b225a

remove DecoderContext in computeSoftmax

d5c9407

Merge commit 'f8f85714331c0df2ce4a8344e06972316770ec11' into pujiang/…

be615b2

…feature/cb_dev

Merge commit '2499f602c22184ca5afaa2f013ae0ff4e3bd4263' into pujiang/…

5833d41

…feature/cb_dev

refactor resize function in DecoderContext to support CB, and qkScore…

0514833

…s member removed

pujiang2018 requested a review from abenmao May 6, 2024 03:52

abenmao approved these changes May 6, 2024

View reviewed changes

pujiang2018 merged commit c792aff into cb_dev May 6, 2024

Duyi-Wang pushed a commit that referenced this pull request May 9, 2024

[Common] Modify resize() in DecoderContext to support (#367)

b1e9da2

Duyi-Wang pushed a commit that referenced this pull request May 15, 2024

[Common] Modify resize() in DecoderContext to support (#367)

a4442f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Common] Modify resize() in DecoderContext to support #367

[Common] Modify resize() in DecoderContext to support #367

pujiang2018 commented May 6, 2024

[Common] Modify resize() in DecoderContext to support #367

[Common] Modify resize() in DecoderContext to support #367

Conversation

pujiang2018 commented May 6, 2024