Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTSRE-259 | integrate status reporting the hard way #5

Conversation

yashvardhan-kukreja
Copy link
Member

@yashvardhan-kukreja yashvardhan-kukreja commented Nov 29, 2021

Implementation details

main.go runs a central heartbeat reporter loop whose purpose is to periodically report the heartbeat which the addon intends to report. This central heartbeat reporter loop is setup via the following way

// the following section hooks up a heartbeat reporter with the current addon/operator
r := controllers.ReferenceAddonReconciler{
Client: mgr.GetClient(),
Log: ctrl.Log.WithName("controllers").WithName("ReferenceAddon"),
Scheme: mgr.GetScheme(),
}
// setup the heartbeat reporter
if err := addoninstancesdk.SetupHeartbeatReporter(&r, mgr); err != nil {
setupLog.Error(err, "unable to setup heartbeat reporter")
os.Exit(1)

func SetupHeartbeatReporter(r ReconcilerWithHeartbeat, mgr manager.Manager) error {

ReconcilerWithHeartbeat is just an interface which can be implemented by a Reconciler struct which happens to have some additional methods required for heartbeat reporting to happen. More on ReconcilerWithHeartbeat later :)

With every iteration of the heartbeat reporter loop, latest heartbeat is reported concurrently

// Heartbeat reporter section: report a heartbeat at an interval ('currentAddonInstanceConfiguration.HeartbeatUpdatePeriod' seconds)
// 1. Each iteration spins up a go-routine, where that go-routine (concurrently) sends the heartbeat and syncs the 'currentAddonInstanceConfiguration' with the latest one.
// 2. Spinning up go-routines here because that would end up ensuring that the for-loop runs almost exactly at a periodic rate of 'current heartbeatUpdatePeriod' seconds,
// instead of running at a rate of (current heartbeatUpdatePeriod + time to send latest heartbeat + time to sync `currentAddonInstanceConfiguration` with latest one) seconds.
for {
go func(stop chan error) {
if err := SendLatestHeartbeat(ctx, r); err != nil {
mgr.GetLogger().Error(err, "error occurred while sending the latest heartbeat")
}
addonInstanceConfigurationMutex.Lock()
defer addonInstanceConfigurationMutex.Unlock()
latestAddonInstanceConfiguration, err := getAddonInstanceConfiguration(ctx, mgr.GetClient(), addonName, r.GetAddonTargetNamespace())
if err != nil {
// TODO(ykukreja): report error instead and stop the heartbeat reporter loop? (instead of `return`, write `stop <- err`)
mgr.GetLogger().Error(err, fmt.Sprintf("failed to get the AddonInstance configuration corresponding to the Addon '%s'", addonName))
return
}
currentAddonInstanceConfiguration = latestAddonInstanceConfiguration
}(stop)
select {
// stop this heartbeat reporter loop if any of the above go-routines endup raising an error/exit
case err := <-stop:
mgr.GetLogger().Error(err, "failed to report certain heartbeats")
return err
default:
// waiting for heartbeat update period for executing the next iteration
<-time.After(currentAddonInstanceConfiguration.HeartbeatUpdatePeriod.Duration)
}
}
}

btw, a heartbeat is a metav1.Condition object only.

ReconcilerWithHeartbeat is essentially any Reconciler struct which implements some additional methods which are required for by the heartbeat reporter.

type ReconcilerWithHeartbeat interface {
reconcile.Reconciler
GetAddonName() string // so as to give the addon developer the freedom to return addon's name the way they want: through a direct hardcoded string or through env var populated by downwards API
GetAddonTargetNamespace() string // so as to give the addon developer the freedom to return targetNamespace the way they want: through a direct hardcoded string or through env var populated by downwards API
HandleAddonInstanceConfigurationChanges(newAddonInstanceSpec addonsv1alpha1.AddonInstanceSpec) error
GetClient() client.Client
}

From the perspective of ReferenceAddonReconciler

The ReferenceAddonReconciler implements the ReconcilerWithHeartbeat by having the following methods:

func (r ReferenceAddonReconciler) GetAddonName() string {
return "reference-addon"
}
func (r ReferenceAddonReconciler) GetAddonTargetNamespace() string {
// assuming ADDON_TARGET_NAMESPACE would be populated via downwards API in the deployment spec of the reference addon
return os.Getenv("ADDON_TARGET_NAMESPACE")
}
func (r ReferenceAddonReconciler) GetClient() client.Client {
return r.Client
}
// the following 'HandleAddonInstanceConfigurationChanges' method can be absolutely anything depending how reference-addon would want to deal with AddonInstance's configuration change
func (r *ReferenceAddonReconciler) HandleAddonInstanceConfigurationChanges(newAddonInstanceSpec addonsv1alpha1.AddonInstanceSpec) error {
fmt.Println("Handling AddonInstance's configuration changes, whooossh!!!")
return nil
}

And various sections of Reference Addon's codebase can use the addoninstancesdk.SetAndSendHeartbeat(..., heartbeat) function to report the heartbeat which is supposed to be reported. For example, if there's a section in the codebase that spots an error which correlates to the addon being unhealthy, then, that section of codebase can use addoninstancesdk.SetAndSendHeartbeat(r, someUnhealthyHeartbeat) to do so.

For example:

// if the ReferenceAddon object getting reconciled has the name "reference-addon", only then report a successful heartbeat
if strings.HasPrefix(req.NamespacedName.Name, "reference-addon") || strings.HasPrefix(req.NamespacedName.Name, "redhat-") {
if err := addoninstancesdk.SetAndSendHeartbeat(r, successfulCondition); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to set and sent heartbeat: %w", err)
}
} else {
if err := addoninstancesdk.SetAndSendHeartbeat(r, failureCondition); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to set and sent heartbeat: %w", err)
}
}

Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
@yashvardhan-kukreja yashvardhan-kukreja force-pushed the yashvardhan/mtsre-259/integrate-status-reporting-from-scratch branch from 23b9fbe to e1c2fa4 Compare December 1, 2021 07:11
cmd/reference-addon-manager/main.go Outdated Show resolved Hide resolved
cmd/reference-addon-manager/main.go Outdated Show resolved Hide resolved
config/deploy/rbac.yaml Outdated Show resolved Hide resolved
internal/controllers/reference_addon_controller.go Outdated Show resolved Hide resolved
internal/controllers/reference_addon_controller.go Outdated Show resolved Hide resolved
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 6, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yashvardhan-kukreja
To complete the pull request process, please ask for approval from thetechnick after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yashvardhan-kukreja
Copy link
Member Author

/retest

Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
@yashvardhan-kukreja yashvardhan-kukreja force-pushed the yashvardhan/mtsre-259/integrate-status-reporting-from-scratch branch from 09ec1fd to 0600ffe Compare December 6, 2021 13:38
…s API

Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
@yashvardhan-kukreja
Copy link
Member Author

Todo: if the current approach of heartbeat reporting (r.latestHeartbeat, r.GetLatestHeartbeat(), r.SetLatestHeartbeat()) gets approved, then we can proceed with adding concurrency safety to ReferenceAddonReconciler.latestHeartbeat via mutex

config/deploy/rbac.yaml Outdated Show resolved Hide resolved
internal/controllers/reference_addon_controller.go Outdated Show resolved Hide resolved
internal/utils/temporary_utils.go Outdated Show resolved Hide resolved
@@ -0,0 +1,118 @@
// TO BE REMOVED AFTER INTEGRATING ADDON INSTANCE CLIENT SDK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is in dire need of some unittests :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I planned to write the unit tests after the implementation was accepted for this status reporting. Otherwise, after every review, the tests would have required to get refactored as well :)

internal/utils/temporary_utils.go Outdated Show resolved Hide resolved
@yashvardhan-kukreja yashvardhan-kukreja force-pushed the yashvardhan/mtsre-259/integrate-status-reporting-from-scratch branch from 5d02eac to 208a35c Compare December 20, 2021 09:08
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
…donInstance heartbeatUpdatePeriod

Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
@yashvardhan-kukreja
Copy link
Member Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 22, 2021
@yashvardhan-kukreja yashvardhan-kukreja force-pushed the yashvardhan/mtsre-259/integrate-status-reporting-from-scratch branch from d45b349 to 3843a42 Compare January 5, 2022 08:55
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
@yashvardhan-kukreja yashvardhan-kukreja force-pushed the yashvardhan/mtsre-259/integrate-status-reporting-from-scratch branch from 3843a42 to b623348 Compare January 5, 2022 09:09
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
…rtbeat reporting depends

Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
…without depending on controller-runtime

Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
Signed-off-by: Yashvardhan Kukreja <ykukreja@redhat.com>
@yashvardhan-kukreja yashvardhan-kukreja force-pushed the yashvardhan/mtsre-259/integrate-status-reporting-from-scratch branch from 05df09d to 03e33c1 Compare February 9, 2022 13:33
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 9, 2022

@yashvardhan-kukreja: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/lint 03e33c1 link true /test lint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@yashvardhan-kukreja
Copy link
Member Author

closing in favor of #8 and #9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants