Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: change defaultModelSorting as local variable #4176

Merged
merged 1 commit into from
Oct 26, 2023

Conversation

jwcesign
Copy link
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:
Please check: #3854

Which issue(s) this PR fixes:
Part of #3854

Special notes for your reviewer:
none

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: Fix panic caused by concurrent global variable reads and writes.

Signed-off-by: jwcesign <jwcesign@gmail.com>
@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 25, 2023
@jwcesign
Copy link
Member Author

@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 25, 2023
@karmada-bot
Copy link
Collaborator

@jwcesign: GitHub didn't allow me to request PR reviews from the following users: halfrost.

Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @chaosi-zju @XiShanYongYe-Chang @halfrost

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jwcesign
Copy link
Member Author

@halfrost

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 25, 2023
@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 25, 2023
@chaosi-zju
Copy link
Member

/lgtm

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 25, 2023
Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chaunceyjiang
Copy link
Member

Have you found the root cause?

@jwcesign
Copy link
Member Author

@chaunceyjiang The root case is the same as this(#3854 (comment)).

@@ -14,15 +13,11 @@ import (
clusterapis "github.com/karmada-io/karmada/pkg/apis/cluster/v1alpha1"
)

var (
mu sync.Mutex
defaultModelSorting []clusterapis.ResourceName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource models might be different from each other, for example, cluster 1's model could be:

resourceModels:
  - grade: 0
    ranges:
    - max: "1"
      min: "0"
      name: cpu        # Note that only declared cpu

and cluster 2's model could be

resourceModels:
  - grade: 0
    ranges:
    - max: 4Gi
      min: "0"
      name: memory   # Note that only declared memory

The global variable defaultModelSorting actually shared by different clusters and might be overwritten during init the model at https://github.com/karmada-io/karmada/blob/3267155766f69083d211c4bc804e8bda4b18fd4d/pkg/controllers/status/cluster_status_controller.go#L595C35-L595C46.

@halfrost Please correct me if this is not the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the resource model should be different for each cluster. But the original requirement seems to be that this modeling is to build a large resource model. Karmada manages the resource models of all clusters in a unified way. For this reason, it is taken out as a global variable.

Comment on lines 168 to -184
tmpCrn := element.Value.(ClusterResourceNode)
safeChangeNum(&tmpCrn.quantity, crn.quantity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@halfrost Can you remind me why we need the lock when adding the resource quantity to the element?

Copy link
Member Author

@jwcesign jwcesign Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 confused too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function AddToResourceSummary will be called concurrently. If no locking is performed, the eventual consistency of num will be incorrect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwcesign I think the point where you're confused comes from whether it should be made a local variable. It is a global variable at first, so it needs to be locked here.

Copy link
Member Author

@jwcesign jwcesign Oct 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @halfrost Thanks for your clarification!

So, this PR is reasonable and correct? Or there are some other things I may miss?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jwcesign ,

Your changes are correct; there are no issues. With your modifications, each cluster now has its own modeling information.

There are no global variables left. In my previous PR, I didn't turn defaultModelSorting into a member variable because I was under the impression that we needed a broader global modeling context.

So this PR is prefect now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @halfrost for the clarification.
I believe it reasonable for each cluster to have its resource models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this pr makes sense. Thanks for @jwcesign work. 👍🏻

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

I think this PR resolved a functionality issue and a potential panic.
@halfrost your comments and clarification are still welcome!

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2023
@karmada-bot karmada-bot merged commit e03f5d5 into karmada-io:master Oct 26, 2023
12 checks passed
@halfrost
Copy link
Contributor

@RainbowMango These changes are both correct and reasonable. I've reviewed it carefully. 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants