-
Notifications
You must be signed in to change notification settings - Fork 53
kernel utilization metrics on EC2 AL2023 w/ 6.1 kernel #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds compatibility safeguards for CPU performance monitoring on platforms with limited hardware support. It prevents invalid event groupings that can cause issues on specific EC2 instances and kernel versions.
- Adds validation to prevent grouping multiple
CPU_CLK_UNHALTED.REF_TSC
events when the fixed reference cycles counter is unsupported - Includes detailed comments explaining the platform-specific limitation discovered on AWS m7i.8xlarge with Amazon Linux 2023 and kernel 6.1
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
…names with corresponding perf event names Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Harper, Jason M <jason.m.harper@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
This pull request addresses improvements and bug fixes related to handling fixed reference cycles counters in perfmon metrics, particularly for Intel CPUs. The main changes ensure correct event name translation, add support for a missing event alias, and prevent invalid event groupings on platforms that do not support fixed reference cycles.
Event Name Handling and Translation:
getExpression
to only replace event names found within square brackets and present in thefixedCounterEventNameTranslation
map, preventing accidental substring replacements and ensuring accurate metric formula translation."CPU_CLK_UNHALTED.REF_TSC:SUP"
to"ref-cycles:k"
in thefixedCounterEventNameTranslation
map, enabling proper translation for this event.Validation and Error Handling:
getExpression
to return an error if a metric does not have a formula defined, improving robustness.CoreGroup.AddEvent
to prevent grouping both"CPU_CLK_UNHALTED.REF_TSC"
and"CPU_CLK_UNHALTED.REF_TSC_P:SUP"
events when fixed reference cycles are not supported, avoiding invalid event combinations on certain platforms.Resource File Correction:
icx_perfspect_metrics.json
from"CPU_CLK_UNHALTED.REF_TSC_P:SUP"
to"CPU_CLK_UNHALTED.REF_TSC:SUP"
to match the new mapping and ensure consistency.