Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: close func refactor #5926

Merged
merged 4 commits into from
Jan 18, 2023
Merged

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Jan 12, 2023

The xDS client functionality is implemented by the clientImpl type:

type clientImpl struct {

But the object returned to callers of xdsclient.NewXxx() functions has two levels of wrappers around the actual implementation:

  • a reference counting xdsClient which wraps the underlying implementation and maintains reference counts
  • another wrapper which ensures that Close() calls are idempotent

This PR addresses the following issue with the existing implementation:

  • the reference counting implementation sets it pointer to the underlying implementation to nil when the reference count reaches zero. See here. This means that if callers invoke an API after all references have been released, this could lead to a panic. This currently leads to a data race in Flaky test: 12/10k: UnmarshalListener_WithUpdateValidatorFunc #5231.

A detailed description of the issues facing the current implementation around Close() can be found at #5844.

Summary of changes:

  • Refactor xdsclient.NewXxx() functions to return a close function:
    • The returned close function is implemented as a grpcsync.OnceFunc, thereby eliminating the need for the onceClosingClient type.
    • Remove Close() method from the XDSClient interface, and unexport the existing close method on the clientImpl type.
  • Guard access to the singletonClient with a sync.Mutex, and switch the reference count to use atomics.
  • Treat the bootstrap config passed to xdsclient.NewXxx() functions as a fallback config, and use it only if the bootstrap environment variables are not set.
    • This is currently the behavior in other languages.
  • Add support for client_default_listener_resource_name_template field in testutils bootstrap package.
  • Fix all callsites and tests that create an xDS client.

Fixes: #5895

#resource-agnostic-xdsclient-api

RELEASE NOTES: none

@easwars easwars requested a review from dfawley January 12, 2023 01:53
@easwars easwars added this to the 1.53 Release milestone Jan 12, 2023
@easwars
Copy link
Contributor Author

easwars commented Jan 12, 2023

@arvindbr8 : FYI

xds/csds/csds.go Outdated
Comment on lines 139 to 143
s.mu.Lock()
if s.xdsClient != nil {
s.xdsClient.Close()
if s.xdsClientClose != nil {
s.xdsClientClose()
}
s.mu.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this done in a lock?

Is the xdsClient itself not okay with using it while calling close? Otherwise this close doesn't do anything else or indicate that the xdsClient no longer exists (e.g. by setting to nil), so I don't see why this lock is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Removed it.

configCh <- config
return tXDSClient, nil
return tXDSClient, func() { tXDSClient.Close() }, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not want to follow the same close pattern with the test xds client? It seems that would be best, but maybe it's too much work or doesn't make sense for other reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but looks like a bit of work here. There is just way too much which is overridden in this test, and it would be nicer to not do that. But also, all this test wants to verify is that the client with built with the appropriate config and that is cleaned up at the end. It would definitely be nicer to have an e2e test though, but I guess the c2p resolver needs a lot of stuff (like running on GCE, DP being enabled etc).

Comment on lines 124 to 125
closeFunc := func(c *clientRefCounted, key string) func() {
return grpcsync.OnceFunc(func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be more complicated than it needs to be? Can you instead refactor the code so the two non-error returns are just a single return instead, and do this there, but without the closure that returns a closure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Not sure if this is what you meant. But it looks like more repetitive code than before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I meant to combine the two happy paths so the closure only needs to be written once as well. Something like this:

	clientsMu.Lock()
	defer clientsMu.Unlock()

	c := clients[string(contents)]
	if c != nil {
		c.incrRef()
	} else {

		// Maybe pull this into its own function
		bcfg, err := bootstrap.NewConfigFromContentsForTesting(contents)
		if err != nil {
			return nil, nil, fmt.Errorf("xds: error with bootstrap config: %v", err)
		}

		cImpl, err := newWithConfig(bcfg, defaultWatchExpiryTimeout, defaultIdleAuthorityDeleteTimeout)
		if err != nil {
			return nil, nil, err
		}
		c = &clientRefCounted{clientImpl: cImpl, refCount: 1}
		// (to here)

		clients[string(contents)] = c
	}

	return c, grpcsync.OnceFunc(func() {
		clientsMu.Lock()
		defer clientsMu.Unlock()
		if c.decrRef() == 0 {
			c.close()
			delete(clients, string(contents))
		}
	}), nil

Or like this perhaps:

func getOrMakeClient(...) (*clientRefCounted, error) {
	clientsMu.Lock()
	defer clientsMu.Unlock()

	if c := clients[string(contents)]; c != nil {
		return c, nil
	}
	bcfg, err := bootstrap.NewConfigFromContentsForTesting(contents)
	if err != nil {
		return nil, fmt.Errorf("xds: error with bootstrap config: %v", err)
	}

	cImpl, err := newWithConfig(bcfg, defaultWatchExpiryTimeout, defaultIdleAuthorityDeleteTimeout)
	if err != nil {
		return nil, err
	}
	c := &clientRefCounted{clientImpl: cImpl, refCount: 0}
	clients[string(contents)] = c
	return c, nil
}


....
	c, err := getOrMakeClient(...)
	// forward err
	c.incrRef()
	return c, grpcsync.OnceFunc(func() {
		clientsMu.Lock()
		defer clientsMu.Unlock()
		if c.decrRef() == 0 {
			c.close()
			delete(clients, string(contents))
		}
	}), nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@dfawley dfawley assigned easwars and unassigned dfawley Jan 17, 2023
@easwars easwars assigned dfawley and unassigned easwars Jan 18, 2023
@@ -155,6 +133,28 @@ func NewWithBootstrapContentsForTesting(contents []byte) (XDSClient, func(), err
}), nil
}

func getOrMakeClient(config []byte) (*clientRefCounted, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that this inc's the ref of the client returned but doesn't dec it or provide a function for doing so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@dfawley dfawley assigned easwars and unassigned dfawley Jan 18, 2023
@easwars easwars merged commit ace8082 into grpc:master Jan 18, 2023
1 check passed
@easwars easwars deleted the xdsclient_close_func branch January 18, 2023 19:32
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[5x one week] Data race in Test/WatchCallAnotherWatch
2 participants