enable cpu paged cache#42869
Conversation
|
Hi @jiqing-feng , thanks for the contribution! Just letting you know that CPU-compatible continuous batching is not a priority right now, so even though this PR is small, it will not be reviewed right away. I am cautious about two things:
Will get to review this as soon as I have the bandwidth, thanks you! |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
4ed8d51 to
2a5e941
Compare
|
Hi @remi-or . I have updated the tests and examples for CPU. Now the example and tests can pass on CPU. Please review this PR and let me know your opinion. Thanks! |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
Hi @remi-or . Do you have bandwidth to review this PR? |
|
Hi @SunMarc . We have enabled flash varlen attention for CPU:https://huggingface.co/kernels-community/flash-attn2/tree/main/build. |
|
For the failed tests. I can pass |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Yes that test is a bit flaky, will look into it soon. I just merged a big PR which caused conflict, my bad. Could you update your PR and I will review? Thanks |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi @jiqing-feng , I just run the test on my end and a lot do not pass. Here re the results: I am surprised because I thought there was some version of flash that worked on CPU, but I might be wrong here. If not please add back the decorator for torch_accelerator for those tests or a sip clause. |
|
Hi @remi-or . The most failed tests you listed can pass on my side, some failed tests like are fixed in my last changes.
Here is my key packages: |
|
Hi @remi-or . It seems that you didn't correctly loaded the latest kernels here: https://huggingface.co/kernels-community/flash-attn2/tree/main/build. I'd like to log in to your node to check the env if it is possible. My email is jiqing.feng@intel.com |
src/transformers/generation/continuous_batching/continuous_api.py
Outdated
Show resolved
Hide resolved
|
Hi @remi-or . I have fixed your comment and check cuda before using cuda stream. Please review the new change. Thanks! |
|
The failed CIs are not related to my changes. The main branch also failed. |
src/transformers/generation/continuous_batching/continuous_api.py
Outdated
Show resolved
Hide resolved
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
Hi @remi-or . I've fixed your comment. Please review the new change. Thanks! |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
Hi @remi-or , since this PR has been open for a while, I’m hoping we can wrap it up today. I’ll be online for the next few hours to address your feedback immediately. The failed CI is not related to my changes. |
Refactor the initialization of _graphs to simplify the condition for using CUDA graphs.
|
Hi, I just modified something related to the |
src/transformers/generation/continuous_batching/continuous_api.py
Outdated
Show resolved
Hide resolved
remi-or
left a comment
There was a problem hiding this comment.
LGTM! Thanks for all the work you put into this. Please commit the 2 suggesions before merging! One is needed, the other will be very useful.
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Hi @remi-or . I have submitted your suggested commits. Thanks! |

CPU can also use paged cache with eager or sdpa:
python continuous_batching_simple.py --attn sdpaWithout this change, the previous command error would be like: