Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Connection to ResourceInterpretWebhook without DNS Service #2999

Merged
merged 1 commit into from
Apr 20, 2023

Conversation

lxtywypc
Copy link
Contributor

@lxtywypc lxtywypc commented Dec 28, 2022

What type of PR is this?
/kind feature

What this PR does / why we need it:
If we deploy karmada with Default dnsPolicy or deploy karmada in physical machine/virtual machine directly, and there is no coreDNS or other DNS services, it would not connect to resource-interpret-webhooks successfully with DefaultServiceResolver.

Which issue(s) this PR fixes:
Fixes #2998

Special notes for your reviewer:
NONE

Does this PR introduce a user-facing change?:

`karmada-controller-manager`/`karmada-agent`: support connection to resourceInterpretWebhook without DNS Service.

@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 28, 2022
@karmada-bot
Copy link
Collaborator

Welcome @lxtywypc! It looks like this is your first PR to karmada-io/karmada 🎉

@karmada-bot karmada-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 28, 2022
@codecov-commenter
Copy link

codecov-commenter commented Dec 29, 2022

Codecov Report

Merging #2999 (71ffbf4) into master (7f2b660) will decrease coverage by 0.03%.
The diff coverage is n/a.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #2999      +/-   ##
==========================================
- Coverage   51.63%   51.61%   -0.03%     
==========================================
  Files         210      210              
  Lines       18926    18926              
==========================================
- Hits         9773     9769       -4     
- Misses       8622     8625       +3     
- Partials      531      532       +1     
Flag Coverage Δ
unittests 51.61% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@XiShanYongYe-Chang
Copy link
Member

Hi @lxtywypc, the E2E has failed in the CI, have you tested it on your site?

Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly ping @lxtywypc

@@ -45,8 +49,23 @@ func NewCustomizedInterpreter(informer genericmanager.SingleClusterInformerManag
if err != nil {
return nil, err
}

serviceInformer := informers.NewSharedInformerFactory(kubernetes.NewForConfigOrDie(config), 0).Core().V1().Services()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the input parameters informer directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is a dynamic informer, but we need a versioned informer.
Maybe I can use indexer to build a service lister directly, but that need to add a new method to get indexer in interface SingleClusterInformerManager, I don't know if it would be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i get it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the versioned informer to replace the dynamic informer, if this works, we can do it in the next pr.

@lxtywypc
Copy link
Contributor Author

lxtywypc commented Jan 9, 2023

Hi @lxtywypc, the E2E has failed in the CI, have you tested it on your site?

Humm...
I don't know well about how e2e test works, but it may show different error after each retry, and it seems to be ok after some retries.

Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
/cc @RainbowMango

@RainbowMango
Copy link
Member

The E2E still failing after retrying. @XiShanYongYe-Chang

@XiShanYongYe-Chang
Copy link
Member

Hi @lxtywypc, I test with this pr on my side, after I deployed Karmada, I find that the status condition of the cluster member3 (Pull mode) is Unknown, and when I delete the old karmada-controller-manager pod, the logs of new pod report an panic:

image

@karmada-bot karmada-bot added the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Jan 12, 2023
@lxtywypc
Copy link
Contributor Author

Hi @lxtywypc, I test with this pr on my side, after I deployed Karmada, I find that the status condition of the cluster member3 (Pull mode) is Unknown, and when I delete the old karmada-controller-manager pod, the logs of new pod report an panic:

image

Ok, I will check again later

@lxtywypc
Copy link
Contributor Author

@XiShanYongYe-Chang @RainbowMango
It should be ok this time.
I used a for-select loop to make sure the informer had synced, and in the early version I made it sleep 1s if the loop hadn't been done.
4bbdf74#diff-a111978a679a9f03e49c67a43ecb913299d153e59ce48d2d30f5029f27ab601aR532
It seems that some calling of resource-interpreter comes early during this period, then it causes a terrible panic of empty interpreter pointer.
This time I build service informer and wait for it synced before newing a resource-interpreter. It can work now.
But since it would cause a panic for a period of sleep during starting interpreter, I think it may still have a potential panic risk if it process too long during starting interpreter. How do you think about?

@XiShanYongYe-Chang
Copy link
Member

/assign

@XiShanYongYe-Chang
Copy link
Member

Thanks, @lxtywypc, can you help compress your commits into one?

Ask for @ikaven1024 @RainbowMango to retake a review.
/assign @ikaven1024 @RainbowMango

@lxtywypc
Copy link
Contributor Author

lxtywypc commented Mar 20, 2023

Thanks, @lxtywypc, can you help compress your commits into one?

Ok, but how about vendor? @XiShanYongYe-Chang

@XiShanYongYe-Chang
Copy link
Member

/cc @RainbowMango
Ask for @RainbowMango to help take a look.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test report?

cmd/agent/app/agent.go Outdated Show resolved Hide resolved
@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 22, 2023
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

/hold
for the test report, since we don't have the test for this scenario.

@karmada-bot karmada-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Mar 22, 2023
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 22, 2023
@XiShanYongYe-Chang
Copy link
Member

Hi @lxtywypc can you help post your test result here? A simple test process report is enough.

@lxtywypc
Copy link
Contributor Author

lxtywypc commented Apr 17, 2023

Hi @XiShanYongYe-Chang , I'm sorry for replying late.
I'll be glad to raise a test report, but what kind report would you like? Could you offer me a template?

…minate dependency of DNS

Signed-off-by: lxtywypc <lxtywypc@gmail.com>
@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2023
@XiShanYongYe-Chang
Copy link
Member

I'll be glad to raise a test report, but what kind report would you like? Could you offer me a template?

Thanks~

I'm sorry, I don't have a specific template here either. My idea is that you can simply explain your local test process and post your test results here will be ok.

@lxtywypc
Copy link
Contributor Author

lxtywypc commented Apr 17, 2023

My idea is that you can simply explain your local test process and post your test results here will be ok.

Ok, I'll do it later.

@lxtywypc
Copy link
Contributor Author

Here is the test report:

Note: Some parts with the information of company in the example has been hidden.

  1. Prepare: We has a workload CR whose kind is Ro, and a resource-interpreter-webhook in cluster named ro-webhook and implemented the InterpretReplica point:
    $ kubectl get svc -n karmada-system ro-webhook
    NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
    ro-webhook   ClusterIP   10.7.93.168   <none>        443/TCP   196d
    $ kubectl get resourceinterpreterwebhookconfiguration ro-webhook -o jsonpath='{"service: "}{.webhooks[0].clientConfig.service}{"\nrules: "}{.webhooks[0].rules}{"\n"}'
    service: {"name":"ro-webhook","namespace":"karmada-system","path":"/interpret-ro","port":443}
    rules: [{"apiGroups":["xxx.xxx"],"apiVersions":["v1"],"kinds":["Ro"],"operations":["InterpretReplica"]}]
    
  2. First, we apply a workload named test with 0 replicas in namespace test, and karmada-controller-manager with image based on branch master:
    $ kubectl get ro -n test test -o jsonpath='{.spec.replicas}{"\n"}'
    0
    $ kubectl get deploy -n karmada-system karmada-controller-manager -o jsonpath='{.spec.template.spec.containers[0].image}{"\n"}'
    xxx.xxx/karmada/karmada-controller-manager:master
    
  3. Then we scale the workload up to 3, and get it's replicas in resourcebinding:
    $ kubectl scale ro -n test test --replicas=3
    xxx.xxx/test scaled
    $ kubectl get rb -n test test-ro -o jsonpath='{.spec.replicas}{"\n"}'
    
    $
    
    Then we look up the log of karmada-controller-manager:
    $ kubectl exec -it karmada-controller-manager-6db886b97f-4fb5q -n karmada-system -- /bin/sh
    / # cd /var/log/karmada-controller-manager/
    /var/log/karmada-controller-manager # cat runtime.log | grep test
    I0420 07:08:18.189526       1 customized.go:74] Get replicas for object: xxx.xxx/v1, Kind=Ro test/test with webhook interpreter.
    E0420 07:08:18.195101       1 detector.go:649] Failed to customize replicas for xxx.xxx/v1, Kind=Ro(test), Internal error occurred: failed calling webhook "ro-webhook/xxx.xxx": failed to call webhook: Post "https://ro-webhook.karmada-system.svc:443/interpret-ro?timeout=3s": dial tcp: lookup ro-webhook.karmada-system.svc on 10.5.168.10:53: no such host
    E0420 07:08:18.195121       1 detector.go:472] Failed to build resourceBinding for object: xxx.xxx/v1, kind=Ro, test/test. error: Internal error occurred: failed calling webhook "ro-webhook/xxx.xxx": failed to call webhook: Post "https://ro-webhook.karmada-system.svc:443/interpret-ro?timeout=3s": dial tcp: lookup ro-webhook.karmada-system.svc on 10.5.168.10:53: no such host
    I0420 07:08:18.195198       1 recorder.go:103] "events: Apply cluster policy(ro) failed: Internal error occurred: failed calling webhook \"ro-webhook/xxx.xxx\": failed to call webhook: Post \"https://ro-webhook.karmada-system.svc:443/interpret-ro?timeout=3s\": dial tcp: lookup ro-webhook.karmada-system.svc on 10.5.168.10:53: no such host" type="Warning" object={Kind:Ro Namespace:test Name:test UID:0f9fc62b-26ea-45da-8b18-085f1b167864 APIVersion:xxx.xxx/v1 ResourceVersion:643725715 FieldPath:} reason="ApplyPolicyFailed"
    
  4. Now, we change the image of karmada-controller-manager to the image based on this pull request, and get the replicas in resourcebinding again:
    $ kubectl get deploy -n karmada-system karmada-controller-manager -o jsonpath='{.spec.template.spec.containers[0].image}{"\n"}'
    xxx.xxx/karmada/karmada-controller-manager:master-with-resolver
    $ kubectl get rb -n test test-ro -o jsonpath='{.spec.replicas}{"\n"}'
    3
    
    Let's look up the log of karmada-controller-manager again:
    $ kubectl exec -it karmada-controller-manager-57dcd97579-k9shq -n karmada-system -- /bin/sh
    / # cd /var/log/karmada-controller-manager/
    /var/log/karmada-controller-manager # cat runtime.log | grep test
    I0420 07:32:17.723681       1 customized.go:77] Get replicas for object: xxx.xxx/v1, Kind=Ro test/test with webhook interpreter.
    I0420 07:32:17.735193       1 detector.go:510] Update ResourceBinding(test-ro) successfully.
    I0420 07:32:17.735240       1 recorder.go:103] "events: Apply cluster policy(ro) succeed" type="Normal" object={Kind:Ro Namespace:test Name:test UID:0f9fc62b-26ea-45da-8b18-085f1b167864 APIVersion:xxx.xxx/v1 ResourceVersion:643725715 FieldPath:} reason="ApplyPolicySucceed"
    

The webhook can be accessed successfully, and we haven't met any other problems yet.

@lxtywypc
Copy link
Contributor Author

@XiShanYongYe-Chang
Copy link
Member

@lxtywypc a clear test report, thanks a lot.

/lgtm
/hold cancel

Can you help update the release-note, such as:

karmada-controller-manager/karmada-agent: support connection to resourceInterpretWebhook without DNS Service

@karmada-bot karmada-bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 20, 2023
@karmada-bot karmada-bot merged commit ed2b101 into karmada-io:master Apr 20, 2023
@lxtywypc
Copy link
Contributor Author

@XiShanYongYe-Chang I'm glad to. But I don't know clearly what I should do.
Should I cherry pick this PR to other release branches and make tags?

@XiShanYongYe-Chang
Copy link
Member

I'm glad to. But I don't know clearly what I should do.

Modify this place will be ok:
image

Should I cherry pick this PR to other release branches and make tags?

We usually synchronize bugs to previous versions, and the current change is more like a feature to us. Do you need to use it in previous versions?

@RainbowMango
Copy link
Member

I don't think we need to cherry-pick this to release branches.

@lxtywypc
Copy link
Contributor Author

@XiShanYongYe-Chang @RainbowMango I see. Thanks a lot.

@lxtywypc lxtywypc deleted the update-resolver branch August 1, 2023 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Connection to ResourceInterpretWebhook without DNS Services
6 participants