Skip to content

Add xds-server implementation#176

Merged
danehans merged 11 commits intoenvoyproxy:mainfrom
youngnick:add-xds-server
Jul 29, 2022
Merged

Add xds-server implementation#176
danehans merged 11 commits intoenvoyproxy:mainfrom
youngnick:add-xds-server

Conversation

@youngnick
Copy link
Contributor

@youngnick youngnick commented Jul 22, 2022

Fixes #40.

I've made a few related updates here:

  • Added a shim (LogrWrapper) to wrap the logr.Logger in some extra methods so that we can pass it to some go-control-plane constructors.
  • Added internal/xds/cache to handle the snapshot cache that's required to use go-control-plane.
  • Moved the Context struct out of internal/xds/translator into internal/types because I needed it in internal/cache.
  • The Context struct was renamed to ResourceVersionTable, which isn't a great name, but was the best I could come up with. Alternatives considered: CacheVersion, ResourceVersion, other similiar ideas. I think that calling it a Context risks confusion with context.Context.
  • Added a gateway xdstest command that will run an xDS server for you to connect to to validate this works. This runs through three different IRs and updates the xDS caches with a new translated version every 10 seconds, so that you can see changes run through the system. It's deliberately left way too verbose, since we want to get rid of it anyway.

I've got the Delta/incremental xDS stuff to work by having the callbacks manage Envoy nodeIDs and ensure that there's a snapshot per nodeID. This is a bit hacky and should be discussed with go-control-plane to see if there's a better way.

Still todo (I'll log issues for stuff we don't have covered already):

  • Reopen Select Logging Framework #76 and we'll relitigate, keeping in mind that go-control-plane at least expects levelled logs, not V-style ones. logr is probably not a good fit for this project, sadly.
  • Follow up with go-control-plane about the callback hacks for managing per-nodeID caches.
  • Once we have everything wired up correctly, remove gateway xdstest as a command.
  • I'll see if I can figure out a way to test this properly - I think that we could make a Go xDS client, and do an almost-e2e test that IRs produce xDS outputs, but I'll write something up about it first.
  • Longer-term, we'll need to figure out how to handle NACKs, internal/xds/cache/snapshotcache.go is the place where that chain of events will begin.

@youngnick youngnick requested a review from a team as a code owner July 22, 2022 05:59
@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2022

Codecov Report

Merging #176 (fbfb23b) into main (7246e3a) will decrease coverage by 5.82%.
The diff coverage is 6.97%.

@@            Coverage Diff             @@
##             main     #176      +/-   ##
==========================================
- Coverage   67.57%   61.74%   -5.83%     
==========================================
  Files          20       20              
  Lines        1659     1814     +155     
==========================================
- Hits         1121     1120       -1     
- Misses        477      632     +155     
- Partials       61       62       +1     
Impacted Files Coverage Δ
internal/cmd/xdstest.go 3.65% <3.65%> (ø)
internal/xds/translator/translator.go 64.93% <71.42%> (ø)
internal/cmd/root.go 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the input is fixed, this function would be could be a good candidate for a test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but in order to test it, we'll need an xDS client, which I'll need to write. That will be a reasonable amount of work, so I wanted to do it in a followup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's roll with the test cmd and create the xds client as part of #177

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why SoTW callbacks are implemented, we should just log and return imho

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not implementing the full protocol when we can pretty easily makes me uncomfortable - if we do happen to make an error with the Delta callbacks or implementation, the SOTW is a good fallback to have available for operational purposes.

I don't think that not meeting expectations about what variants we support is worth the small saving in development effort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tracks rogue streams ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It ensures that the assignment to s.streamIDNodeID[streamID] in the StreamRequest callback will always succeed, since every stream has StreamOpen called first, then StreamRequest per request received. This could be more defensive though, I'll update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if s.lastSnapshot == nil {
if s.lastSnapshot != nil {
err := s.SetSnapshot(context.TODO(), nodeID, s.lastSnapshot); err != nil {
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I'm using a Go idiom that basically says "The leftmost side of the code should be the happy path, exceptions should be in if blocks." That way we're not indending most of the rest of the function.

Note that this is a different case to the next one - this one is that no snapshot has been generated by the rest of EG yet, the s.GetSnapshot(nodeID) call below checks if there's a snapshot with the current Node.

@youngnick
Copy link
Contributor Author

When reviewing the code after @arkodg's comments, I realized that these callbacks need to be concurrent-safe, so I sprinkled some sync.Mutex on it.

I also realized that requests don't always have to have node ID set, so tidied up the handling there too.

arkodg
arkodg previously approved these changes Jul 26, 2022
Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for implementing delta xDS
hoping the xdstest can be moved to a go test function in a followup PR

@danehans
Copy link
Contributor

hoping the xdstest can be moved to a go test function in a followup PR

@youngnick when you have a moment, please create an issue and xref.

@youngnick
Copy link
Contributor Author

I created #177 to cover setting up automated testing for the xDS server.

LukeShu
LukeShu previously approved these changes Jul 28, 2022
Copy link
Contributor

@LukeShu LukeShu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit, and 2 minor suggestions. But IMO mergeable.

Comment on lines 1 to 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't have this in a separate package or be exported; that only serves to encourage using it instead of the preferred logging abstraction (logr). IMO this should be a private implementation detail of internal/xds (or internal/xds/cache?).

danehans
danehans previously approved these changes Jul 28, 2022
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's roll with the test cmd and create the xds client as part of #177

@youngnick youngnick dismissed stale reviews from danehans, LukeShu, and arkodg via 883be6c July 29, 2022 06:08
youngnick added 11 commits July 29, 2022 06:33
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
Signed-off-by: Nick Young <ynick@vmware.com>
@youngnick youngnick force-pushed the add-xds-server branch 2 times, most recently from a8a8549 to fbfb23b Compare July 29, 2022 06:36
@danehans danehans merged commit e20d7ad into envoyproxy:main Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Initial xDS Server Design and Implementation

5 participants