-
Notifications
You must be signed in to change notification settings - Fork 39
Temporarily disables Split-KV feature #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Forces num_splits to 1 and resets accumulator tensors to avoid memory overhead while bugs in Split-KV implementation are being fixed. This ensures stability by bypassing the problematic feature until issues are resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Temporarily disables the Split-KV feature by forcing num_splits to 1 and clearing accumulators to avoid memory overhead while underlying bugs are fixed.
- Overrides
params.num_splitsto 1 unconditionally - Resets
softmax_lse_accumandout_accumbefore returning
| // Temporarily disable Split-KV, because some bugs are still being fixed. | ||
| // Regardless of how it is set externally, always set num_splits back to 1. | ||
| // This is to avoid the extra memory overhead of Split-KV. | ||
| params.num_splits = 1; |
Copilot
AI
Jul 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding params.num_splits after the accumulators are likely allocated does not prevent the extra memory overhead. Move this override before any allocation logic and skip creating large accumulators entirely when Split-KV is disabled.
| // Temporarily disable Split-KV, because some bugs are still being fixed. | |
| // Regardless of how it is set externally, always set num_splits back to 1. | |
| // This is to avoid the extra memory overhead of Split-KV. | |
| params.num_splits = 1; |
| TORCH_CHECK(params.num_splits <= 128, "num_splits > 128 not supported"); | ||
| } | ||
|
|
||
| // Temporarily disable Split-KV, because some bugs are still being fixed. |
Copilot
AI
Jul 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Add a TODO with the relevant issue or ticket reference and an expected removal timeline to ensure this temporary workaround is tracked and cleaned up once bugs are resolved.
| // Temporarily disable Split-KV, because some bugs are still being fixed. | |
| // Temporarily disable Split-KV, because some bugs are still being fixed. | |
| // TODO: Track resolution of Split-KV bugs in issue #12345. Expected removal: Q4 2025. |
|
|
||
| // Temporarily disable Split-KV, because some bugs are still being fixed. | ||
| // Regardless of how it is set externally, always set num_splits back to 1. | ||
| // This is to avoid the extra memory overhead of Split-KV. |
Copilot
AI
Jul 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider emitting a runtime warning or log entry to inform users that Split-KV has been disabled, preventing silent changes in behavior.
| // This is to avoid the extra memory overhead of Split-KV. | |
| // This is to avoid the extra memory overhead of Split-KV. | |
| TORCH_WARN("Split-KV has been temporarily disabled due to unresolved bugs. num_splits is set to 1."); |
Documents the GitHub issue tracking the Split-KV bugs that led to temporarily disabling the feature. This provides better context for future developers about why the functionality is disabled and where to find related discussion.
#47
Forces num_splits to 1 and resets accumulator tensors to avoid memory overhead while bugs in the Split-KV implementation are being fixed.
This ensures stability by bypassing the problematic feature until issues are resolved.