-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Common] New KVCacheMgr to support CB #371
Conversation
@Duyi-Wang @abenmao @changqi1 Pls take time to review this. |
src/common/kvcache_mgr.h
Outdated
public: | ||
virtual ~KVCacheMgrImplBase() = default; | ||
virtual bool delSequence(int seqID) = 0; | ||
virtual bool addSequence(int seqID, int prefixId = 0) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default value for prefixID is better to be -1? since the ID starts from 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revised.
src/common/kvcache_mgr.h
Outdated
readyList.push_back(it->second); | ||
} | ||
|
||
readyCaches == std::move(readyList); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo and fixed.
* [Common] Add sequenceMeta, sequenceGroup and sequenecePool. (#343) * merge batchSize and seqLen into one in TokenEembedding * merge batchSize and seqLen into one in TokenEembedding (#350) * [Common] Move Martix into xft namespace. (#351) * remove unsed function in DecoderLayer * [Layer] Remove unused functions in Decoder layer (#353) * fix compile error of embeddingForward * [Model] Fix compile error of embeddingForward in YaRNLlama (#358) * [Common] Add sampling params into group seq. (#356) * remove DecoderContext in computeSoftmax * [Util] Remove DecoderContext in computeSoftmax (#362) * [Common] Refactor sequence.h. (#363) * [kernels] refactor flash attention for continuous batching (#361) * [models] Add attnMeta for continuous batching (#364) * [Layers] fix build error (#365) * [Model] add interface for seq meta. (#366) * refactor resize function in DecoderContext to support CB, and qkScores member removed * [Common] Modify resize() in DecoderContext to support (#367) * add some code to CommonDecoder::forward() * SequenceMeta refactor * [Model] New CommonDecoder::forward impl. skeleton (#369) * new KVCacheMgr supporting CB * fix typo & set default prefixId to -1 in addSequence() * [Common] New KVCacheMgr to support CB (#371) * [Sampling] Add repetition penalty for new seq type. (#373) * New foward to support CB (CommonDecoder->DecoderBlock->DecoderLayer->Attention/MLP) * add todo * [Sampling] Add greedy search for cb path. (#376) * logic issue fix * code fix to make new forward work * add maxSeqLen limitation * cross attention impl. for CB * DecoderContext::resize fix * correct the output of the new forward * add cb_check * fix incorrect buffer size calculation * 2 sequences -> 3 sequences * better method to prepare KV cache --------- Co-authored-by: Changqing Li <changqing.li@intel.com> Co-authored-by: Duyi-Wang <duyi.wang@intel.com> Co-authored-by: Meng,Chen <chen.meng@intel.com>
* [Common] Add sequenceMeta, sequenceGroup and sequenecePool. (#343) * merge batchSize and seqLen into one in TokenEembedding * merge batchSize and seqLen into one in TokenEembedding (#350) * [Common] Move Martix into xft namespace. (#351) * remove unsed function in DecoderLayer * [Layer] Remove unused functions in Decoder layer (#353) * fix compile error of embeddingForward * [Model] Fix compile error of embeddingForward in YaRNLlama (#358) * [Common] Add sampling params into group seq. (#356) * remove DecoderContext in computeSoftmax * [Util] Remove DecoderContext in computeSoftmax (#362) * [Common] Refactor sequence.h. (#363) * [kernels] refactor flash attention for continuous batching (#361) * [models] Add attnMeta for continuous batching (#364) * [Layers] fix build error (#365) * [Model] add interface for seq meta. (#366) * refactor resize function in DecoderContext to support CB, and qkScores member removed * [Common] Modify resize() in DecoderContext to support (#367) * add some code to CommonDecoder::forward() * SequenceMeta refactor * [Model] New CommonDecoder::forward impl. skeleton (#369) * new KVCacheMgr supporting CB * fix typo & set default prefixId to -1 in addSequence() * [Common] New KVCacheMgr to support CB (#371) * [Sampling] Add repetition penalty for new seq type. (#373) * New foward to support CB (CommonDecoder->DecoderBlock->DecoderLayer->Attention/MLP) * add todo * [Sampling] Add greedy search for cb path. (#376) * logic issue fix * code fix to make new forward work * add maxSeqLen limitation * cross attention impl. for CB * DecoderContext::resize fix * correct the output of the new forward * add cb_check * fix incorrect buffer size calculation * 2 sequences -> 3 sequences * better method to prepare KV cache --------- Co-authored-by: Changqing Li <changqing.li@intel.com> Co-authored-by: Duyi-Wang <duyi.wang@intel.com> Co-authored-by: Meng,Chen <chen.meng@intel.com>
No description provided.