Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Events #315

Closed
wants to merge 4 commits into from
Closed

Events #315

wants to merge 4 commits into from

Conversation

jortel
Copy link
Contributor

@jortel jortel commented Sep 11, 2019

Add support for events in the controller(s).

Based on documentation and examples (pods), an Event seems to be intended to provide additional information about what the happening with a resource. This may be different than conditions in that although events are garbage collected, conditions can be transient and only represent the current status of the resource. Events also include a count and first/last occurrence times. So, an event may provide the user with more insight even though there may be a similar condition. Most of the events I have in this prototype feel a bit redundant with conditions but seem helpful? I'm really not sure. We have a very rich set of conditions so perhaps a completely different set of events then prototyped here would be more helpful?

This PR is mainly to promote discussion. Thoughts?

Also considering events for:

  • ValidationFailed
  • Ready
  • NotReady

Examples:

$ oc describe migcluster host

...
Status:
  Conditions:
    Category:              Required
    Last Transition Time:  2019-09-09T13:06:07Z
    Message:               The cluster is ready.
    Status:                True
    Type:                  Ready
Events:
  Type    Reason                   Age               From        Message
  ----    ------                   ----              ----         ------
  Normal  RemoteWatchCreated       14s (x3 over 1m)  controller  Remote watch created.
  Normal  StorageClassListUpdated  14s (x3 over 1m)  controller  Storage Class list updated.

$ oc describe migplan test

...
Status:
  Conditions:
    Category:              Required
    Last Transition Time:  2019-09-09T17:47:03Z
    Message:               The storage resources have been created.
    Status:                True
    Type:                  StorageEnsured
    Category:              Required
    Last Transition Time:  2019-09-09T17:47:05Z
    Message:               The migration registry resources have been created.
    Status:                True
    Type:                  RegistriesEnsured
    Category:              Required
    Last Transition Time:  2019-09-11T15:25:07Z
    Message:               The `persistentVolumes` list has been updated with discovered PVs.
    Reason:                Done
    Status:                True
    Type:                  PvsDiscovered
    Category:              Required
    Last Transition Time:  2019-09-11T15:25:09Z
    Message:               The migration plan is ready.
    Status:                True
    Type:                  Ready
Events:
  Type    Reason           Age              From        Message
  ----    ------           ----             ----        -------
  Normal  StorageCreated   2m (x8 over 2m)  controller  Storage CRs created.
  Normal  RegistryCreated  2m (x8 over 2m)  controller  Registry created.
  Normal  UpdatePVs        1m (x9 over 2m)  controller  PV list updated.

$ oc describe migmigration test

...
Status:
  Conditions:
    Category:              Advisory
    Durable:               true
    Last Transition Time:  2019-09-11T17:20:13Z
    Message:               The migration has completed successfully.
    Reason:                Completed
    Status:                True
    Type:                  Succeeded
  Phase:                   Completed
  Start Timestamp:         2019-09-11T17:18:30Z
Events:
  Type    Reason                       Age              From        Message
  ----    ------                       ----             ----        -------
  Normal  Prepare                      3m               controller  Step completed.
  Normal  Started                      3m (x2 over 3m)  controller  Step completed.
  Normal  AnnotateResources            3m               controller  Step completed.
  Normal  EnsureStagePods              3m (x2 over 3m)  controller  Step completed.
  Normal  StagePodsCreated             3m               controller  Step completed.
  Normal  RestartRestic                3m               controller  Step completed.
  Normal  ResticRestarted              3m               controller  Step completed.
  Normal  EnsureStageBackup            3m (x3 over 3m)  controller  Step completed.
  Normal  StageBackupCreated           3m (x2 over 3m)  controller  Step completed.
  Normal  EnsureStageBackupReplicated  3m (x2 over 3m)  controller  Step completed.
  Normal  EnsureStageRestore           2m               controller  Step completed.
  Normal  StageRestoreCreated          2m (x2 over 2m)  controller  Step completed.
  Normal  EnsureStagePodsDeleted       2m (x2 over 2m)  controller  Step completed.
  Normal  EnsureAnnotationsDeleted     2m (x2 over 2m)  controller  Step completed.
  Normal  Completed                    1m (x2 over 1m)  controller  Step completed.

$ oc describe migmigration test

...
Status:
  Conditions:
    Category:              Advisory
    Durable:               true
    Last Transition Time:  2019-09-11T17:20:13Z
    Message:               The migration has completed successfully.
    Reason:                Completed
    Status:                True
    Type:                  Succeeded
  Phase:                   Completed
  Start Timestamp:         2019-09-11T17:18:30Z
Events:
  Type    Reason                       Age              From        Message
  ----    ------                       ----             ----        -------
  Normal  Prepare                      3m               controller  Step completed.
  Normal  Started                      3m (x2 over 3m)  controller  Step completed.
  Normal  AnnotateResources            3m               controller  Step completed.
  Normal  EnsureStagePods              3m (x2 over 3m)  controller  Step completed.
  Normal  StagePodsCreated             3m               controller  Step completed.
  Normal  RestartRestic                3m               controller  Step completed.
  Normal  ResticRestarted              3m               controller  Step completed.
  Normal  EnsureStageBackup            3m (x3 over 3m)  controller  Step completed.
  Normal  StageBackupCreated           3m (x2 over 3m)  controller  Step completed.
  Normal  EnsureStageBackupReplicated  3m (x2 over 3m)  controller  Step completed.
  Normal  EnsureStageRestore           2m               controller  Step completed.
  Normal  StageRestoreCreated          2m (x2 over 2m)  controller  Step completed.
  Normal  EnsureStagePodsDeleted       2m (x2 over 2m)  controller  Step completed.
  Normal  EnsureAnnotationsDeleted     2m (x2 over 2m)  controller  Step completed.
  Warning ReconcileFailed              16s              controller  Reconcile failed: [no kind is registered for the type v1.DeploymentConfigList in scheme "k8s.io/client-go/kubernetes/scheme/register.go:61"].

migration,
v1.EventTypeNormal,
migration.Status.Phase,
"Step completed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether it might be better to use a message more specific to the step in question (i.e. "Step 2 of 22 completed") You could use task.getStep() to pull the number information out, although you'd have to be careful to get the numbers for the last-completed step rather than the step that's just beginning.

@Danil-Grigorev
Copy link
Contributor

@jortel @jwmatthews It is a common practice to use events for testing controllers. If improvements on a test coverage for mig-controller are still relevant, this PR may be of additional help for some reasons:

  1. It will simplify search for a failure, and supplement logs output from discovery/mig-controller
  2. Will allow to selectively test parts of controller, relying on a restricted set of occurring events, with a known time expectancy.
  3. Informational events like the those https://github.com/fusor/mig-controller/pull/315/files#diff-37894d58ac1ff6a94c8f96a3b61aba83R89-R93 could have been helpful during migration stress testing, as well as accurate measuring of time for migration of application state.

@djwhatle djwhatle added update-operator update-operator Requires updates to mig-operator CRDs / RBAC and removed update-operator-crds labels Apr 16, 2020
@djwhatle
Copy link
Contributor

Added update-operator label in case we need to add RBAC for dealing with events

@djwhatle
Copy link
Contributor

djwhatle commented Apr 17, 2020

Having event info could be quite helpful from a debugging perspective, I'd could see us adding events to our must-gather and extracting a good amount of information if our controller is emitting these at key points in the migration.

Edit:
To add detail to my above statement, the reason I like events is that we could use them to provide information to the user about when key points in a migration are executed along with accompanying timestamp info. This might help users skip having to dig deep into (potentially several) logs as their first course of action.

The way that events are laid out in the example provided above makes it look like it would give the user an effective overview of which stages of the migration were consuming a lot of time to complete, and might make it plain where an anomaly is occuring.

@djwhatle
Copy link
Contributor

Closing due to inactivity, please re-open if desired.

@djwhatle djwhatle closed this Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
update-operator Requires updates to mig-operator CRDs / RBAC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants