balancer: create many tcp connections use the same endpoint #11371

cfc4n · 2019-11-19T14:20:57Z

What version of etcd are you using?

3.3.17

What version of Go are you using (`go version`)?

go version go1.13.1 linux/amd64

What operating system (Linux, Windows, …) and version?

CentOS release 6.5 * 2

What did you do?

Client: 192.168.1.199
Etcd Node A:192.168.1.101

What did you expect to see?

step 1 : run the code

package main

import (
	"context"
	"fmt"
	"github.com/coreos/etcd/clientv3"
	"google.golang.org/grpc/grpclog"
	"log"
	"os"
	"os/signal"
	"syscall"
	"time"
)

const ETCD_CONNECT_TIMEOUT = 5 * time.Second

func main() {
	log.Println("try to connect etcd cluster :%s", time.Now())

	Etcd_dsn := []string{
		"http://192.168.1.101:2379",
		"http://192.168.1.101:2379",
		"http://192.168.1.101:2379",
	}

	loger := grpclog.NewLoggerV2WithVerbosity(os.Stderr, os.Stderr, os.Stderr, 1)
	clientv3.SetLogger(loger)


	cfg := clientv3.Config{
		Endpoints:   Etcd_dsn,
		DialTimeout: ETCD_CONNECT_TIMEOUT,
	}

	client, err := clientv3.New(cfg)
	if err != nil {
		panic(err)
	}

	log.Println("connected etcd cluster")

	log.Println("get etcd key /sec/hids/")
	ctx, ctxCancelFun := context.WithTimeout(context.Background(), time.Second*5)
	_, err = client.Get(ctx, "/sec/hids/", clientv3.WithCountOnly(), clientv3.WithPrefix())
	defer ctxCancelFun()
	if err != nil {
		panic(err)
	}
	log.Println("foreach result")
	log.Println("start goroutine")
	// set iptables on etcd node A...
	// iptables -A INPUT -p tcp -s 192.168.1.199 -j DROP
	go func() {
		<- time.After(time.Second * 10)
		log.Println("start to get key ...")
		ctx, ctxCancelFun := context.WithTimeout(context.Background(), time.Second*5)
		defer ctxCancelFun()
		_, err := client.Get(ctx, "/cc/etcd-dns")

		if err != nil {
			log.Println(err)
		}
		log.Println("start to get key ... end")
	}()

	signaler()
	fmt.Println("exit")
}


func signaler() {
	var ch chan os.Signal
	ch = make(chan os.Signal, 2)
	signal.Notify(ch, syscall.SIGHUP, syscall.SIGTERM, syscall.SIGINT, syscall.SIGPROF)
	for {
		switch <-ch {
		case syscall.SIGHUP:
			os.Exit(0)
		case syscall.SIGTERM:
			os.Exit(0)
		case syscall.SIGINT:
			os.Exit(0)
		}
	}

}

set iptables command

set iptables command on etcd node A After start goroutine strings display to simulate CPU busy, and can't accept TCP socket.

iptables -A INPUT -p tcp -s 192.168.1.199 -j DROP

check TCP connections number

netstat -antp|grep 2379
tcp        0      0 192.168.1.199:54323       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54324       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54325       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54326       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54327       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54328       192.168.1.101:2379      SYN_SENT
tcp        0      0 192.168.1.199:54329       192.168.1.101:2379      SYN_SENT

There are 6~7 connections with SYN_SENT status TCP connection.
In fact, It will create a tcp connection with every endpoints

Etcd client create many tcp connects with the same endpoint when client balancer works.

What did you see instead?

It create one tcp connection only ,or create an other tcp socket insead when connect failed.

More detail

casestudy

In my case, that is SYN-FLOOD attack.
etcd cluster:7 nodes
etcd clients:200K +
DB size:6-8G

each of clients create a new tcp connection with next endpoint when network fluctuations.
client connection has been disconnected, triggering .
etcd process crashed when send packets to a disconnection client, triggering 3.3.7 panic: send on closed channel #9956 .
supervisor pulls etcd process up again.
etcd node rejoins the etcd cluster, and the cluster leader send snapshots to the etcd new node, and block heartbeats from all nodes. (fixed by learner)
other etcd node will election a new leader.
Next endpoint receives a lot of TCP requests instantaneously, with high CPU load and unable to accept TCP SYN.
Connections of client's is SYN_SENT status.
Grpc balancer mechanism will try tryAllAddrs function, retry all endpoints with a backoff mechanism.
Etcd client will send a signal to grpc's balancer mechanism per hb.healthCheckTimeout (min 3S, max set by client connect) in the healthBalancer.updateUnhealthy function, trigger tryAllAddrs again, reconnect all of endpoints.
All of clients will trigger DDOS attacks to etcd cluster.

bugs

healthBalancer.Notify() filter invalid channel msg.
grpc's bug issue: balancer: create many tcp connections use the same endpoints ; pr:establish only one TCP connection with the same endpoint

The text was updated successfully, but these errors were encountered:

cfc4n · 2019-11-26T14:00:31Z

I recived a reply from grpc-go community like this:

It's a feature for the users to create multiple TCP connections to same endpoint, and the users have the full control. I don't think this is a problem in gRPC.

Closing. Please reply if you have more updates.

And I agree with that etcd client need to full control about create TCP connections.

and what about you think ?
/cc @xiang90 @gyuho

xiang90 · 2019-11-28T17:39:44Z

It create one tcp connection only ,or create an other tcp socket insead when connect failed.

this is the behavior of the old etcd grpc client. i think the team made the decision to pre create tcp connections with every given endpoint. and we feel it is fine since the number of etcd server should be small.

what is your concern on creating a tcp connection on ever given etcd server endpoint?

/cc @gyuho @jpbetz

stale · 2020-04-06T19:56:40Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

cfc4n · 2020-06-12T13:22:33Z

reopen, assign @cfc4n .

ref. #9949

gyuho · 2020-06-12T22:51:40Z

this is the behavior of the old etcd grpc client. i think the team made the decision to pre create tcp connections with every given endpoint. and we feel it is fine since the number of etcd server should be small.

@xiang90 is right. The main motivation was to "simplify" the previous implementation. The old balancer used to keep only one connection, but the code became too complicated and error-prone.

cfc4n · 2020-08-21T16:48:50Z

ping...ping...

cfc4n · 2020-09-24T10:33:08Z

I found a similar bug #grpc/grpc-go#3667 , fixed by #grpc/grpc-go#2985 .
and I'll continue to follow up on this issue.

stale · 2020-12-23T18:15:28Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

cfc4n mentioned this issue Nov 19, 2019

balancer: create many tcp connections use the same endpoints grpc/grpc-go#3192

Closed

xiang90 added the area/clientv3 label Nov 28, 2019

xiang90 assigned gyuho and jpbetz Nov 28, 2019

xiang90 added the type/question label Nov 28, 2019

stale bot added the stale label Apr 6, 2020

stale bot closed this as completed Apr 27, 2020

xiang90 reopened this Jun 12, 2020

stale bot removed the stale label Jun 12, 2020

xiang90 assigned cfc4n Jun 12, 2020

stale bot added the stale label Dec 23, 2020

stale bot closed this as completed Jan 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

balancer: create many tcp connections use the same endpoint #11371

balancer: create many tcp connections use the same endpoint #11371

cfc4n commented Nov 19, 2019 •

edited

cfc4n commented Nov 26, 2019 •

edited

xiang90 commented Nov 28, 2019

stale bot commented Apr 6, 2020

cfc4n commented Jun 12, 2020 •

edited

gyuho commented Jun 12, 2020

cfc4n commented Aug 21, 2020 •

edited

cfc4n commented Sep 24, 2020 •

edited

stale bot commented Dec 23, 2020

balancer: create many tcp connections use the same endpoint #11371

balancer: create many tcp connections use the same endpoint #11371

Comments

cfc4n commented Nov 19, 2019 • edited

What version of etcd are you using?

What version of Go are you using (go version)?

What operating system (Linux, Windows, …) and version?

What did you do?

What did you expect to see?

step 1 : run the code

set iptables command

check TCP connections number

What did you see instead?

More detail

casestudy

bugs

cfc4n commented Nov 26, 2019 • edited

xiang90 commented Nov 28, 2019

stale bot commented Apr 6, 2020

cfc4n commented Jun 12, 2020 • edited

gyuho commented Jun 12, 2020

cfc4n commented Aug 21, 2020 • edited

cfc4n commented Sep 24, 2020 • edited

stale bot commented Dec 23, 2020

cfc4n commented Nov 19, 2019 •

edited

What version of Go are you using (`go version`)?

cfc4n commented Nov 26, 2019 •

edited

cfc4n commented Jun 12, 2020 •

edited

cfc4n commented Aug 21, 2020 •

edited

cfc4n commented Sep 24, 2020 •

edited