Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: ResourceTracker new architecture #2849

Merged
merged 2 commits into from
Dec 10, 2021
Merged

Conversation

Somefive
Copy link
Collaborator

@Somefive Somefive commented Dec 1, 2021

This PR is proposed to rework ResourceTracker and upgrade the current resource management architecture for KubeVela controller.

Major Changes

  1. Remove OwnerReference in managed resources. This simplifies the create and update for resources.
  2. Record rendered Resource in ResourceTracker.Spec.ManagedResource. This allows StateKeep after workflow so that configuration drift for applied resources can be prevented.
  3. ResourceTracker deletion will use application controller to reconcile.
  4. Resources in ManagedCluster are directly recorded in ResourceTrackers in Hub Cluster, which means we do not need RT in Managed Clusters anymore.
  5. Add GarbageCollect Policy and ApplyOnce Policy.
    a. The former one allows difference resources have various life-cycle. Usually, resources are deleted as soon as the latest application does not use them anymore. Some resources need to be kept until the application is gone. Some might need be kept even if the application is gone.
    b. The latter one allows us to prevent configuration-drift by running StateKeep after workflow suspended/terminated/succeed.

Minor Changes

  1. ControllerRevisions are directly recorded in ResourceTrackers.
  2. Addon Application use ApplyOnce by default.
  3. MultiCluster do not need extra gc logic.

Deprecation

  1. AppRollout is not supported anymore. Another PR is called to remove it permanently.
  2. ReosurceTracker.Status is deprecated.

Implementation

  1. Main changes are implemented in pkg/resourcekeeper/ where resource recording, garbage collecting and state keeping are supported. It is recommended for code reviewers to review from here, since other changes are all about making this package work properly.
  2. Lots of tests are fixed and added.

I have:

  • Read and followed KubeVela's contribution process.
  • Related Docs updated properly. In a new feature or configuration option, an update to the documentation is necessary.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.
  • Add more e2e-test.

@codecov
Copy link

codecov bot commented Dec 1, 2021

Codecov Report

Merging #2849 (e548027) into master (b483840) will increase coverage by 3.01%.
The diff coverage is 76.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2849      +/-   ##
==========================================
+ Coverage   56.83%   59.85%   +3.01%     
==========================================
  Files         229      238       +9     
  Lines       23739    24204     +465     
==========================================
+ Hits        13492    14487     +995     
+ Misses       8546     7979     -567     
- Partials     1701     1738      +37     
Flag Coverage Δ
apiserver-unittests 26.83% <0.00%> (-0.95%) ⬇️
core-unittests 47.20% <68.52%> (-6.03%) ⬇️
e2e-multicluster-test 24.92% <49.13%> (+0.43%) ⬆️
e2e-rollout-tests 28.97% <51.26%> (-2.53%) ⬇️
e2etests 37.67% <60.42%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
apis/core.oam.dev/v1alpha1/envbinding_types.go 0.00% <ø> (ø)
...pis/core.oam.dev/v1alpha1/zz_generated.deepcopy.go 6.01% <0.00%> (-1.63%) ⬇️
cmd/core/main.go 10.59% <0.00%> (ø)
...dev/v1alpha2/applicationrollout/rollout_handler.go 11.72% <0.00%> (-38.28%) ⬇️
pkg/multicluster/cluster_management.go 47.72% <ø> (-0.60%) ⬇️
pkg/policy/envbinding/placement.go 90.00% <ø> (+4.28%) ⬆️
pkg/utils/errors/resourcetracker.go 0.00% <0.00%> (ø)
...kg/workflow/providers/multicluster/multicluster.go 89.61% <ø> (+2.26%) ⬆️
pkg/appfile/appfile.go 74.19% <42.85%> (+2.84%) ⬆️
apis/core.oam.dev/v1beta1/zz_generated.deepcopy.go 44.59% <63.33%> (+5.37%) ⬆️
... and 83 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b483840...e548027. Read the comment docs.

@Somefive Somefive force-pushed the new_rt branch 21 times, most recently from 88766e3 to 5cfaf6d Compare December 7, 2021 06:41
@Somefive Somefive marked this pull request as ready for review December 8, 2021 14:10
@Somefive Somefive changed the title Feat: new rt Feat: ResourceTracker new architecture Dec 8, 2021
Signed-off-by: Yin Da <yd219913@alibaba-inc.com>
// DeletedManifestInResourceTracker marks resources as deleted in resourcetracker, if remove is true, resources will be removed from resourcetracker
func DeletedManifestInResourceTracker(ctx context.Context, cli client.Client, rt *v1beta1.ResourceTracker, manifest *unstructured.Unstructured, remove bool) error {
rt.DeleteManagedResource(manifest, remove)
return cli.Update(ctx, rt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do you delete the ownerRef in resourceTracker for compatibility?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compatibility code has not been added yet. Will be added together with upgrade tests.

)

// StateKeep run this function to keep resources up-to-date
func (h *resourceKeeper) StateKeep(ctx context.Context) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should write a user facing doc to explain this policy and recommand the best practice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

GenericFunc: func(genericEvent ctrlEvent.GenericEvent, limitingInterface workqueue.RateLimitingInterface) {
handleResourceTracker(genericEvent.Object, limitingInterface)
},
}).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these handler for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These handlers are added for application controller to response ResourceTracker change events. Designed by @leejanee .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the real affect? Who will change RT besides app controller?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the mode of KeepLegacyResource, history versioned ResourceTrackers will not be automatically deleted by application controller. Instead, users have the control to select which history version to keep or discard. In this case, user will delete RT.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it. That means app controller is also the controller of RT

if err := handler.resourceKeeper.StateKeep(ctx); err != nil {
logCtx.Error(err, "Failed to run prevent-configuration-drift")
r.Recorder.Event(app, event.Warning(velatypes.ReasonFailedStateKeep, err))
return r.endWithNegativeCondition(logCtx, app, condition.ReconcileError(err), phase)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happend In this case? I think the application is already running well here,and the health check should report error first if not?

So , does it necessary to report an error? But a event even change the condition is necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If StateKeep failed (failed to get/update target resource and is not caused by not found or conflict, such as cluster disconnect), it will trigger the next reconcile of StateKeep due to the returned error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least a warning should be reported for users, since a failure is encountered during ensuring application managed resources and bringing state back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it deserves a warning. But return error just cause a new reconcile which doesn't help in this case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, it is worth a retry. For example, if the child cluster apiserver is busy temporarily, then the resource check might timeout which will cause a StateKeep error here. However, the child cluster apiserver might recover by itself later, then a retry will help controller do the StateKeep. Generally, it would be better to have more details for the returned error and handle it more elegantly. I agree that currently we do not necessarily need an error to be returned here. Will fix it.

Signed-off-by: Yin Da <yd219913@alibaba-inc.com>
Copy link
Collaborator

@wonderflow wonderflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work

Copy link
Member

@leejanee leejanee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@@ -476,6 +481,48 @@ const (
WorkflowResourceCreator ResourceCreatorRole = "workflow"
)

// OAMObjectReference defines the object reference for an oam resource
type OAMObjectReference struct {
Component string `json:"component,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it based on one application--one component--one trait module?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. This reference is for one target resource, for example, a deployment or a service. Each resource is always belonging to 1 app - 1 comp (- 1 trait, optional).


// Equal check if two references are equal
func (in OAMObjectReference) Equal(r OAMObjectReference) bool {
return in.Component == r.Component && in.Trait == r.Trait && in.Env == r.Env
Copy link
Collaborator

@zzxwill zzxwill Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can reflect.DeepEqual do the comparison?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It can. It is also possible to do the equality check with reflect.DeepEqual.

@zzxwill
Copy link
Collaborator

zzxwill commented Dec 10, 2021

I noticed there are some changes on apis/core.oam.dev/v1alpha1. Will it lead to any compatibilities problems?

@Somefive
Copy link
Collaborator Author

I noticed there are some changes on apis/core.oam.dev/v1alpha1. Will it lead to any compatibilities problems?

Yes, there will be compatibility problems. ResourceTracker will face an upgrade concerning existing KubeVela system. Some compatibility code and tests will be added.

@zzxwill zzxwill merged commit b622cbd into kubevela:master Dec 10, 2021
@Somefive Somefive deleted the new_rt branch June 20, 2023 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants