-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Metrics Logging for Kruize Recommendations #1206
Add Metrics Logging for Kruize Recommendations #1206
Conversation
if (counterNotifications == null) { | ||
counterNotifications = MetricsConfig.timerBKruizeNotifications.tags(additionalTags).register(MetricsConfig.meterRegistry); | ||
} | ||
counterNotifications.increment(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if generateRecommendations is called twice for the same time interval and experiment ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counter gets incremented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to include the overhead of the notifications in updateRecommendations API with a scalability run.
@@ -58,6 +61,8 @@ private MetricsConfig() { | |||
timerBListDS = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "listDataSources").tag("method", "GET"); | |||
timerBImportDSMetadata = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "importDataSourceMetadata").tag("method", "POST"); | |||
timerBImportDSMetadata = Timer.builder("kruizeAPI").description(API_METRIC_DESC).tag("api", "importDataSourceMetadata").tag("method", "GET"); | |||
|
|||
timerBKruizeNotifications = Counter.builder("KruizeNotifications").description("Kruize notifications").tag("api","updaterecommendations"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change 'updaterecommendations' to 'updateRecommendations' to follow camel case convention like all methods. Would be good to make the description value as a constant to maintain consistency across.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I agree , @chandrams we might need short scalability run for this... But please ensure each experiment creating at least one error or critical notifications. |
9787f24
to
6eb046c
Compare
@chandrams this sample results json create some error notifications https://privatebin.corp.redhat.com/?3b171fbe1bbb3244#8ZnjimR1QbfKUksj9qGZegAGCPMJUkhSFdidVvrmV3gv |
@@ -28,6 +28,7 @@ | |||
import com.autotune.common.k8sObjects.K8sObject; | |||
import com.autotune.common.utils.CommonUtils; | |||
import com.autotune.database.service.ExperimentDBService; | |||
import com.autotune.metrics.KruizeNotificationCollectionRegistry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msvinaykumar - Please update kruize documentation with the metrics being captured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msvinaykumar Can you update the documentation with the details, as discussed will need this if to update the kruize metrics script before running the scale test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@msvinaykumar - Updated the kruize metrics script to capture the notifications and I have triggered a short scalability run with the new image that you provided - quay.io/vinakuma/autotune_operator:metrics2 and the results json that you shared. |
@msvinaykumar - The scalability 5k / 15 days run took 3 hrs 16 mins which is lesser than the scale test run on the same cluster with 0.0.22_mvp which was 3 hrs 50 mins. Does your build contain all the latest changes along with this PR?
The logs have these errors
Summary of the test run with 0.0.23_mvp on the same cluster
|
@chandrams can you please confirm kruizeRecommendation_total counts |
This error occurs when the PostgreSQL server has reached the maximum number of allowed client connections. we can increase the max_connections setting in your PostgreSQL configuration or optimize your application to use fewer connections. But however we can ignore this error |
Build is having this PR change and please confirm we have generated enough KruizeRecommendations metrics |
@msvinaykumar - You can check the last column in this spreadsheet, was expecting values to be present for all entries but it stopped after a while. |
…mmendations. Signed-off-by: msvinaykumar <vinakuma@redhat.com>
Signed-off-by: msvinaykumar <vinakuma@redhat.com>
Signed-off-by: msvinaykumar <vinakuma@redhat.com>
Signed-off-by: msvinaykumar <vinakuma@redhat.com>
Signed-off-by: msvinaykumar <vinakuma@redhat.com>
Signed-off-by: msvinaykumar <vinakuma@redhat.com>
0c799db
to
344e3c8
Compare
This looks good. The count is over 500k, so the idea is to generate more notifications without impacting execution time. Based on the results, performance is unaffected, so we're good to proceed. We also have a flag to disable it just in case any issues arise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Pull Request Description: Add Metrics Logging for Kruize Recommendations
Summary
This pull request introduces the
KruizeNotificationCollectionRegistry
class, which is responsible for logging and creating metrics for notifications related to Kruize recommendations. The class processes recommendation notifications at various levels (container, timestamp, term, and model) and creates appropriate counters using Micrometer.Key Features
KruizeNotificationCollectionRegistry
: This new class handles the collection and logging of recommendation notifications.logNotification
: Logs notifications fromContainerData
by iterating through its recommendation structure and creating counters.createCounterTag
: Creates a counter with tags for the given level, term, model, and list of recommendation notifications.Detailed Description
Class
KruizeNotificationCollectionRegistry
:Constructor:
experiment_name
,interval_end_time
, andcontainer_name
.Method
logNotification
:ContainerData
object as input.ContainerData
.createCounterTag
.Method
createCounterTag
:RecommendationNotification
objects.KruizeDeploymentInfo.log_recommendation_metrics_level
.KruizeConstants.KRUIZE_RECOMMENDATION_METRICS
.Benefits
Notes
Testing
Related Issues
Please review the changes and provide feedback. Your input is valuable to ensure that this feature integrates seamlessly and functions as expected.
test Image : quay.io/vinakuma/autotune_operator:metrics