Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vtgateproxy v15 retarget #385

Closed
wants to merge 44 commits into from
Closed

vtgateproxy v15 retarget #385

wants to merge 44 commits into from

Conversation

demmer
Copy link
Collaborator

@demmer demmer commented May 30, 2024

Description

Merge vtgateproxy into the v15 branch. WIP

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

@demmer demmer requested a review from a team as a code owner May 30, 2024 20:22
@github-actions github-actions bot added this to the v15.0.5 milestone May 30, 2024
go/vt/servenv/servenv.go Outdated Show resolved Hide resolved
demmer and others added 26 commits July 30, 2024 11:02
This doesn't actually do anything yet except spark up the mysql server and
start listening for connections.
* First draft of discovery

* Fix address list collection

* Fix nebula discovery
* Stash the connection attributes on the conn struct

* Clean up code style around upstream server changes
* Cleanup discovery. Prep for load balancing

* first draft of az affinity

* Restore git deps

* Honor num connections

* fix bugs

* Don't try to register channelz (this should be done elsewhere)

	lis, err := net.Listen("tcp", "localhost:8153")
	if err != nil {
		log.Fatalf("failed to listen: %v", err)
	}
	s := grpc.NewServer()
	service.RegisterChannelzServiceToServer(s)
	go s.Serve(lis)
* Draft: very messy and doesn't compile

* Simplifyy

* less log, plz

* simplify more

* Simplified by a lot - much simpler

now pick fewer addresses

* fixy

* Account for infinite

* copyright nonsense

* clean up debug logging

* round_robin works!

* use rw mutex to serialize creation

* rework the filtering to make everything parameterized and more explicit

Change all the config so that instead of hard coded constants we set the
various connection attributes, json field names, etc using command line flags.

Then make the pool type and affinity arguments more explicit and less generic.

* no longer needed

* update comments

* only pass through the URL params we need

* affinity is actually optional

---------

Co-authored-by: Michael Demmer <mdemmer@slack-corp.com>
* Fix locking in getConnection

Signed-off-by: Henry Robinson <hrobinson@slack-corp.com>

* Comment

Signed-off-by: Henry Robinson <hrobinson@slack-corp.com>

* Undo bash change

Signed-off-by: Henry Robinson <hrobinson@slack-corp.com>

---------

Signed-off-by: Henry Robinson <hrobinson@slack-corp.com>
* Make -grpc_prometheus work when not using a grpc server

Signed-off-by: Henry Robinson <hrobinson@slack-corp.com>
Signed-off-by: Esme Lamb <dlamb@slack-corp.com>
* refactor a bit more to consolidate the command line flags

* rework again to have a single parser and move around all the logic

This way we only have a single entity watching the file and dispatching out to
the resolvers when things change rather than a bunch of tickers watching the
file. Also cleaned up the code a bunch.

* redo the shuffle to be a single pass

* actually use the builder rand not the global one

* add some metrics

* only stat once per second

* split out counter for unknown affinity
* reinitalise targets when parsing host list

* remove metrics and logging changes
* move organization of target hosts to parse time

* rework metrics and logging of parse errors

* add discovery bits to debug status page

* reset parseErr in the right place

* add sync and change debug page to do the shuffle

* unrelated but just move some code around

---------

Co-authored-by: Michael Demmer <mdemmer@slack-corp.com>
henryr and others added 18 commits July 30, 2024 11:05
Signed-off-by: Henry Robinson <hrobinson@slack-corp.com>
Signed-off-by: Esme Lamb <dlamb@slack-corp.com>
* shuffle targets during parse phase

* don't use global rand
* add firstready balancer

* fail if an invalid balancer is picked

* maintain the currentConn even if another one becomes ready

* remove unnecessary globals

* normalize the names
* return a custom error to fail fast

* fatal if the resolver cannot initialize

Signed-off-by: Esme Lamb <dlamb@slack-corp.com>
Signed-off-by: Esme Lamb <dlamb@slack-corp.com>
Signed-off-by: Esme Lamb <dlamb@slack-corp.com>
* Add end to end tests for vtgateproxy

* Fix linter warnings and unhandled errors

---------

Signed-off-by: Riley Laine <rlaine@slack-corp.com>
Signed-off-by: Esme Lamb <dlamb@slack-corp.com>
Change the discovery layer to pass in the pool type as part of the Address
Attributes, and use that to maintain a map of current connections for each pool
type.
Signed-off-by: Riley Laine <rlaine@slack-corp.com>
Signed-off-by: Riley Laine <rlaine@slack-corp.com>
Signed-off-by: Riley Laine <rlaine@slack-corp.com>
When a connection is closed or reset before it is properly established, the
existing ::getSession will log a spurious error since the connection has no
attributes to extract for the pool type.

Fix this by refactoring so that when closing a connection we only act on the
session object if one already existed, i.e. the conn was used before.
Signed-off-by: Riley Laine <rlaine@slack-corp.com>
Signed-off-by: Esme Lamb <dlamb@slack-corp.com>
* Add vtgateproxy tests to CI
* Refactor tests to make them less flappy
---------
Signed-off-by: Riley Laine <rlaine@slack-corp.com>
Signed-off-by: Esme Lamb <dlamb@slack-corp.com>
Signed-off-by: Henry Robinson <hrobinson@slack-corp.com>
When there are a lot more targets than the number of connections in the pool,
then it's possible that if the list of hosts changes, the builder might pick a
totally new set of hosts than the previously selected ones, none of which will
have established subconns.

Instead of giving this new list to the picker immediately, first combine it
with the list of hosts that were previously selected, so that those subconns
have some time to warm up while the current set is still in the list.
Copy link

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale label Aug 29, 2024
Copy link

github-actions bot commented Sep 5, 2024

This PR was closed because it has been stale for 7 days with no activity.

@github-actions github-actions bot closed this Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants