Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.5 and older series has locking related issues #2155

Closed
gnufied opened this issue Jan 10, 2023 · 10 comments · Fixed by #2252
Closed

2.5 and older series has locking related issues #2155

gnufied opened this issue Jan 10, 2023 · 10 comments · Fixed by #2252
Assignees

Comments

@gnufied
Copy link
Contributor

gnufied commented Jan 10, 2023

I have been reading the driver code and it appears that, older version of driver at least has locking related issues.

Older version of driver uses rexray - https://github.com/rexray/gocsi which allows per-volumeid or per-volumename based locks, but some operations inside control-plane of driver are global and hence should be protected by global lock.

For example following bit of code:

func (vc *VirtualCenter) Connect(ctx context.Context) error {
}

IMO the entire function needs to be locked, rather than just the private connect function, because if connect fails for some reason by 1 goroutine and another goroutine it succeeds, then logout should use the right session. Without locks I am not sure what would happen.

Then functions like:

func (vc *VirtualCenter) ListDatacenters(ctx context.Context) (

Should also have locks IMO because connect function can overrwrite the vc.Client variable at any time while ListDataCenters is executing and hence another goroutine won't see latest value.

I see that latest version has removed rexray, but we should still double check if all the locking logic is sound.

@divyenpatel
Copy link
Member

cc: @gohilankit

@chethanv28
Copy link
Collaborator

/assign @gohilankit

@gnufied
Copy link
Contributor Author

gnufied commented Jan 18, 2023

The other thing that seems to be problematic is following bit of code:

// ConnectCns creates a CNS client for the virtual center.
func (vc *VirtualCenter) ConnectCns(ctx context.Context) error {
	log := logger.GetLogger(ctx)
	var err = vc.Connect(ctx)
	if err != nil {
		log.Errorf("failed to connect to Virtual Center host %q with err: %v", vc.Config.Host, err)
		return err
	}
	if vc.CnsClient == nil {
		if vc.CnsClient, err = NewCnsClient(ctx, vc.Client.Client); err != nil {
			log.Errorf("failed to create CNS client on vCenter host %q with err: %v", vc.Config.Host, err)
			return err
		}
	}
	return nil
}

See the ConnectCns returns a client which uses a pointer (vc.Client.Client), but the thing is - any concurrent calls can change the value of vc.Client at any point in time (if session times out or becomes invalid).

So what happens to the functions which are still using "older" pointer. Because they are no locks, there are no guarantees about what they will see.

@divyenpatel
Copy link
Member

/assign @skogta

@k8s-ci-robot
Copy link
Contributor

@divyenpatel: GitHub didn't allow me to assign the following users: skogta.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @skogta

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@divyenpatel
Copy link
Member

@skogta has the PR out to fix this issue #2155
@gnufied do you want to take a look and help review?

@gnufied
Copy link
Contributor Author

gnufied commented Feb 26, 2023

@divyenpatel @skogta yep sure. thank you.

@skogta
Copy link
Contributor

skogta commented Mar 1, 2023

@gnufied please review the PR: #2252

Please note that this PR solves only part of the problem - that is, for the public connect method.

For other places, we will need a more elaborate solution as we are using vc.Client.Client after connecting to the VC at several other places, all over the CSI code base. I will discuss the same with Divyen.

Most likely it should not be a problem as per an experiment that i conducted. Before using vc.Client.Client, I did a logout on the VC. With this vc.Client.Client, I was able to create a new client successfully using method NewCnsClient

@divyenpatel
Copy link
Member

@gnufied if we move clientMutex lock to public VC Connect method from private VC connect method then we don't need any change in the ConnectCns as in the ConnectCns we are calling vc.Connect(ctx)

@gnufied
Copy link
Contributor Author

gnufied commented Jun 1, 2023

@divyenpatel yeah I took a look. It seems much more reasonable now. I think for now we can close this issue. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants