Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix VC Config init in Syncer #967

Merged
merged 1 commit into from Jun 14, 2021
Merged

Fix VC Config init in Syncer #967

merged 1 commit into from Jun 14, 2021

Conversation

chethanv28
Copy link
Collaborator

@chethanv28 chethanv28 commented Jun 10, 2021

What this PR does / why we need it:

While debugging the incorrect login attempts from CSI syncer, I observed that syncer VC config was actually never initialized correctly, as a result of which ReloadConfig() gets invoked even when password is not changed. i.e., the below logic was getting executed successfully

metadataSyncer.configInfo.Cfg.Global.User != newVCConfig.Username ||
				metadataSyncer.configInfo.Cfg.Global.Password != newVCConfig.Password

When I printed the metadataSyncer.configInfo.Cfg.Global.User & metadataSyncer.configInfo.Cfg.Global.Password was actually empty. See here:

2021-06-11T19:57:50.149Z	INFO	syncer/metadatasyncer.go:502	Existing username: "", password: ""	{"TraceId": "aefdc21e-5135-498f-a0a3-c84ef49d5c92"}

I have fixed the InitMetadataSyncer to address the issue

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Testing done:
Before this change:

./vpxd/vpxd-7.log:2021-06-10T06:36:17.811Z error vpxd[16023] [Originator@6876 sub=User opID=42224782] Failed to authenticate user <workload_storage_management-69c4b155-03ae-42b1-8611-86d53da85c2c@vsphere.local>
./vpxd/vpxd-7.log:2021-06-10T06:36:27.880Z error vpxd[15997] [Originator@6876 sub=User opID=754dd10d] Failed to authenticate user <workload_storage_management-69c4b155-03ae-42b1-8611-86d53da85c2c@vsphere.local>
./vpxd/vpxd-7.log:2021-06-10T06:36:35.938Z error vpxd[16110] [Originator@6876 sub=User opID=5e035829] Failed to authenticate user <workload_storage_management-69c4b155-03ae-42b1-8611-86d53da85c2c@vsphere.local>
./vpxd/vpxd-7.log:2021-06-10T06:36:45.024Z error vpxd[16009] [Originator@6876 sub=User opID=1750be1a] Failed to authenticate user <workload_storage_management-69c4b155-03ae-42b1-8611-86d53da85c2c@vsphere.local>
./vpxd/vpxd-7.log:2021-06-10T06:36:54.117Z error vpxd[16056] [Originator@6876 sub=User opID=6fbb2cfb] Failed to authenticate user <workload_storage_management-69c4b155-03ae-42b1-8611-86d53da85c2c@vsphere.local>

Account remains locked for next 5 minutes:

./vpxd/vpxd-7.log:2021-06-10T06:36:54.113Z error vpxd[16056] [Originator@6876 sub=UserDirectorySso opID=6fbb2cfb] AcquireToken exception: N9SsoClient27InvalidCredentialsExceptionE(Authentication failed: The account of the user trying to authenticate is locked. :: The account of the user trying to authenticate is locked. :: User account locked: {Name: workload_storage_management-69c4b155-03ae-42b1-8611-86d53da85c2c, Domain: vsphere.local})
./vpxd/vpxd-7.log:2021-06-10T06:37:03.168Z error vpxd[15999] [Originator@6876 sub=UserDirectorySso opID=1498796f] AcquireToken exception: N9SsoClient27InvalidCredentialsExceptionE(Authentication failed: The account of the user trying to authenticate is locked. :: The account of the user trying to authenticate is locked. :: User account locked: {Name: workload_storage_management-69c4b155-03ae-42b1-8611-86d53da85c2c, Domain: vsphere.local})
...
...
./vpxd/vpxd-7.log:2021-06-10T06:41:56.431Z error vpxd[16029] [Originator@6876 sub=UserDirectorySso opID=410f9ed4] AcquireToken exception: N9SsoClient27InvalidCredentialsExceptionE(Authentication failed: The account of the user trying to authenticate is locked. :: The account of the user trying to authenticate is locked. :: User account locked: {Name: workload_storage_management-69c4b155-03ae-42b1-8611-86d53da85c2c, Domain: vsphere.local})

After this change:

workload_storage_management-69c4b155-03ae-42b1-8611-86d53da85c2c user account locking was not observed

Special notes for your reviewer:

Release note:

Fix VC Config init in Syncer

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 10, 2021
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 10, 2021
@svcbot-qecnsdp
Copy link

Build ID: 44
WCP build status: FAILURE 
Stage before exit: null 

@svcbot-qecnsdp
Copy link

Build ID: 45
WCP build status: FAILURE 
Stage before exit: null 

@svcbot-qecnsdp
Copy link

Build ID: 46
WCP build status: FAILURE 
Stage before exit: null 

@svcbot-qecnsdp
Copy link

Started WCP block pre-checkin pipeline... Build Number: 47

@svcbot-qecnsdp
Copy link

Build ID: 47
WCP build status: FAILURE 
Stage before exit: null 
Jenkins E2E Test Results: 
Ran 14 of 190 Specs in 93.480 seconds
FAIL! -- 0 Passed | 14 Failed | 0 Pending | 176 Skipped
--- FAIL: TestE2E (93.61s)
FAIL

Ginkgo ran 1 suite in 1m41.073509523s
Test Suite Failed
make: Leaving directory `/home/worker/workspace/csi-wcp-pre-check-in/Results/47/vsphere-csi-driver`

@chethanv28 chethanv28 changed the title Increasing retry time interval to 60 seconds during reload configuration Increase retry time interval to 60 seconds during reload configuration Jun 10, 2021
@svcbot-qecnsdp
Copy link

Build ID: 49
WCP build status: FAILURE 
Stage before exit: null 

@svcbot-qecnsdp
Copy link

Started WCP block pre-checkin pipeline... Build Number: 52

@svcbot-qecnsdp
Copy link

Build ID: 52
WCP build status: FAILURE 
Stage before exit: null 
Jenkins E2E Test Results: 
Ran 14 of 190 Specs in 3574.837 seconds
FAIL! -- 9 Passed | 5 Failed | 0 Pending | 176 Skipped
--- FAIL: TestE2E (3574.96s)
FAIL

Ginkgo ran 1 suite in 1h0m7.403543636s
Test Suite Failed
make: Leaving directory `/home/worker/workspace/csi-wcp-pre-check-in/Results/52/vsphere-csi-driver`

@svcbot-qecnsdp
Copy link

Started WCP block pre-checkin pipeline... Build Number: 53

@svcbot-qecnsdp
Copy link

Build ID: 53
WCP build status: SUCCESS 
Stage before exit: finally 
Jenkins E2E Test Results: 

Ran 14 of 190 Specs in 3744.047 seconds
SUCCESS! -- 14 Passed | 0 Failed | 0 Pending | 176 Skipped
PASS

Ginkgo ran 1 suite in 1h2m54.559964909s
Test Suite Passed
make: Leaving directory `/home/worker/workspace/csi-wcp-pre-check-in/Results/53/vsphere-csi-driver`

@chethanv28
Copy link
Collaborator Author

cc: @divyenpatel @RaunakShah @BaluDontu Can you help with reviews & approvals

@RaunakShah
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jun 11, 2021
@RaunakShah
Copy link
Contributor

/approve

1 similar comment
@gohilankit
Copy link
Contributor

/approve

@divyenpatel
Copy link
Member

This can happen because it usually takes 40~60e seconds for new password to get notified using fsnotify event handling.

Shouldn't the VC password be changed first and then we get notified, so we always get the latest password. Are there any server-side fixes we will be making to ensure we get notified after the password is successfully applied on the vCenter?

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 11, 2021
@chethanv28
Copy link
Collaborator Author

This can happen because it usually takes 40~60e seconds for new password to get notified using fsnotify event handling.

Shouldn't the VC password be changed first and then we get notified, so we always get the latest password. Are there any server-side fixes we will be making to ensure we get notified after the password is successfully applied on the vCenter?

While debugging the incorrect login attempts from CSI syncer, I observed that syncer VC config was actually never initialized correctly, as a result of which ReloadConfig() gets invoked even when password is not changed. i.e., the below logic was getting executed successfully

metadataSyncer.configInfo.Cfg.Global.User != newVCConfig.Username ||
				metadataSyncer.configInfo.Cfg.Global.Password != newVCConfig.Password

When I printed the metadataSyncer.configInfo.Cfg.Global.User & metadataSyncer.configInfo.Cfg.Global.Password was actually empty. See here:

2021-06-11T19:57:50.149Z	INFO	syncer/metadatasyncer.go:502	Existing username: "", password: ""	{"TraceId": "aefdc21e-5135-498f-a0a3-c84ef49d5c92"}

I have fixed the InitMetadataSyncer to address this. But I think CSI should not aggressively retry in 5 seconds, now that we have /etc/vmware/wcp/tls in the same parent directory as /etc/vmware/wcp/vsphere-cloud-provider.conf. So if there is a REMOVE event for any file on this path, we will get a ReloadConfig() call.

@svcbot-qecnsdp
Copy link

Started WCP block pre-checkin pipeline... Build Number: 54

@svcbot-qecnsdp
Copy link

Build ID: 54
WCP build status: SUCCESS 
Stage before exit: finally 
Jenkins E2E Test Results: 

Ran 14 of 190 Specs in 3723.006 seconds
SUCCESS! -- 14 Passed | 0 Failed | 0 Pending | 176 Skipped
PASS

Ginkgo ran 1 suite in 1h2m32.471703668s
Test Suite Passed
make: Leaving directory `/home/worker/workspace/csi-wcp-pre-check-in/Results/54/vsphere-csi-driver`

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 11, 2021
@chethanv28 chethanv28 changed the title Increase retry time interval to 60 seconds during reload configuration Fix VC Config init in Syncer Jun 11, 2021
pkg/syncer/metadatasyncer.go Outdated Show resolved Hide resolved
pkg/syncer/metadatasyncer.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 12, 2021
@svcbot-qecnsdp
Copy link

Started Vanilla block pre-checkin pipeline... Build Number: 62

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 12, 2021
@svcbot-qecnsdp
Copy link

Started Vanilla block pre-checkin pipeline... Build Number: 64

@svcbot-qecnsdp
Copy link

Started Vanilla block pre-checkin pipeline... Build Number: 65

@svcbot-qecnsdp
Copy link

Started Vanilla block pre-checkin pipeline... Build Number: 67

@svcbot-qecnsdp
Copy link

Build ID: 67
Block vanilla build status: SUCCESS 
Stage before exit: e2e-tests 
Jenkins E2E Test Results: 

Ran 42 of 192 Specs in 9743.128 seconds
SUCCESS! -- 42 Passed | 0 Failed | 0 Pending | 150 Skipped
PASS

Ginkgo ran 1 suite in 2h42m48.166905585s
Test Suite Passed
make: Leaving directory `/home/worker/workspace/csi-block-vanilla-pre-check-in@2/67/vsphere-csi-driver`

Copy link
Member

@divyenpatel divyenpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chethanv28, divyenpatel, gohilankit, RaunakShah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [RaunakShah,chethanv28,divyenpatel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@divyenpatel
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 14, 2021
@k8s-ci-robot k8s-ci-robot merged commit 8a70053 into kubernetes-sigs:master Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants