-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Common] Modify resize() in DecoderContext to support #367
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
abenmao
approved these changes
May 6, 2024
Duyi-Wang
pushed a commit
that referenced
this pull request
May 9, 2024
changqi1
added a commit
that referenced
this pull request
May 13, 2024
* [Common] Add sequenceMeta, sequenceGroup and sequenecePool. (#343) * merge batchSize and seqLen into one in TokenEembedding * merge batchSize and seqLen into one in TokenEembedding (#350) * [Common] Move Martix into xft namespace. (#351) * remove unsed function in DecoderLayer * [Layer] Remove unused functions in Decoder layer (#353) * fix compile error of embeddingForward * [Model] Fix compile error of embeddingForward in YaRNLlama (#358) * [Common] Add sampling params into group seq. (#356) * remove DecoderContext in computeSoftmax * [Util] Remove DecoderContext in computeSoftmax (#362) * [Common] Refactor sequence.h. (#363) * [kernels] refactor flash attention for continuous batching (#361) * [models] Add attnMeta for continuous batching (#364) * [Layers] fix build error (#365) * [Model] add interface for seq meta. (#366) * refactor resize function in DecoderContext to support CB, and qkScores member removed * [Common] Modify resize() in DecoderContext to support (#367) * add some code to CommonDecoder::forward() * SequenceMeta refactor * [Model] New CommonDecoder::forward impl. skeleton (#369) * new KVCacheMgr supporting CB * fix typo & set default prefixId to -1 in addSequence() * [Common] New KVCacheMgr to support CB (#371) * [Sampling] Add repetition penalty for new seq type. (#373) * New foward to support CB (CommonDecoder->DecoderBlock->DecoderLayer->Attention/MLP) * add todo * [Sampling] Add greedy search for cb path. (#376) * logic issue fix * code fix to make new forward work * add maxSeqLen limitation * cross attention impl. for CB * DecoderContext::resize fix * correct the output of the new forward * add cb_check * fix incorrect buffer size calculation * 2 sequences -> 3 sequences * better method to prepare KV cache --------- Co-authored-by: Changqing Li <changqing.li@intel.com> Co-authored-by: Duyi-Wang <duyi.wang@intel.com> Co-authored-by: Meng,Chen <chen.meng@intel.com>
Duyi-Wang
pushed a commit
that referenced
this pull request
May 15, 2024
Duyi-Wang
added a commit
that referenced
this pull request
May 15, 2024
* [Common] Add sequenceMeta, sequenceGroup and sequenecePool. (#343) * merge batchSize and seqLen into one in TokenEembedding * merge batchSize and seqLen into one in TokenEembedding (#350) * [Common] Move Martix into xft namespace. (#351) * remove unsed function in DecoderLayer * [Layer] Remove unused functions in Decoder layer (#353) * fix compile error of embeddingForward * [Model] Fix compile error of embeddingForward in YaRNLlama (#358) * [Common] Add sampling params into group seq. (#356) * remove DecoderContext in computeSoftmax * [Util] Remove DecoderContext in computeSoftmax (#362) * [Common] Refactor sequence.h. (#363) * [kernels] refactor flash attention for continuous batching (#361) * [models] Add attnMeta for continuous batching (#364) * [Layers] fix build error (#365) * [Model] add interface for seq meta. (#366) * refactor resize function in DecoderContext to support CB, and qkScores member removed * [Common] Modify resize() in DecoderContext to support (#367) * add some code to CommonDecoder::forward() * SequenceMeta refactor * [Model] New CommonDecoder::forward impl. skeleton (#369) * new KVCacheMgr supporting CB * fix typo & set default prefixId to -1 in addSequence() * [Common] New KVCacheMgr to support CB (#371) * [Sampling] Add repetition penalty for new seq type. (#373) * New foward to support CB (CommonDecoder->DecoderBlock->DecoderLayer->Attention/MLP) * add todo * [Sampling] Add greedy search for cb path. (#376) * logic issue fix * code fix to make new forward work * add maxSeqLen limitation * cross attention impl. for CB * DecoderContext::resize fix * correct the output of the new forward * add cb_check * fix incorrect buffer size calculation * 2 sequences -> 3 sequences * better method to prepare KV cache --------- Co-authored-by: Changqing Li <changqing.li@intel.com> Co-authored-by: Duyi-Wang <duyi.wang@intel.com> Co-authored-by: Meng,Chen <chen.meng@intel.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Refactor resize function in DecoderContext to support Continuous Batching, and removed qkScores member (since it is rarely used and the attention impl. most likely would like to mange the buffer by itself).