Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DialContext will be blocked if no address exists when use a name target #976

Closed
ruinanchen opened this issue Nov 8, 2016 · 5 comments
Closed
Assignees

Comments

@ruinanchen
Copy link

ruinanchen commented Nov 8, 2016

hi, my question as below:

  1. I use grpc balance, so I init a grpcclient use function grpc.Dial with parameter service name:
    conn, err = grpc.Dial(servicename,...,grpc.WithBalancer(balancer))

  2. the function grpc.Dial use DialContext, and my master routine will block here:
    ch := cc.dopts.balancer.Notify()
    if ch == nil {
    // There is no name resolver installed.
    addrs = append(addrs, Address{Addr: target})
    } else {
    addrs, ok = <-ch
    if !ok || len(addrs) == 0 {
    waitC <- errNoAddr
    return
    }
    }

Because of no address available, there is not element([]Address) input into the channel.So, the dial will return "grpc: timed out when dialing".

  1. I can change it, input a empty []Address when actually no address in initialization phase, but it satisfys:
    if !ok || len(addrs) == 0 {

so dial will also return an error,and it is "grpc: there is no address available to dial" .

In our system, we have much processes implement different serivce. Unfortunately, these processes is a situation of mutual dependence. So I can't init any first process because i make it exit when dial fail, saddly this will lead to i can't start our system.

Actually I just hope dial return a grpc.ClientConn even if there is no address available.I think this can solve my problem.

@iamqizhao
Copy link
Contributor

This behavior is intentional. As the function name advises, "DialContext" should start the dialing procedure to a target. If it fails to do that (as in your case), it should return an error.

I think blocking is the right behavior. If you do not want it to block your other work you can put Dial into a separate goroutine -- you can start all the processes with a blocking goroutine doing dialing. I am not clear what the architecture design is in your project. But if this is a hard requirement for your design, I would question the design ...

@ruinanchen
Copy link
Author

ruinanchen commented Nov 10, 2016

Very thank you for your suggestion. we solve the problem, but i feel a little strange.As below:
1.we define an unary rpc client interface variable in package rpclient:

var BizClient pb.MessageServiceClient

the example we call the client:

_, err := rpcclient.BizClient.OfflineDeliver(ctx, reqPacket)

2.We don't want to judge whether BizClient is nil or not before called, So we guarantee that BizClient is not a nil after function init:

func Init(etcdOpts *conf.EtcdOpts, rpcOpts *conf.RpcClientOpts) {
    resolver, err := loadbalancer.NewEtcdResolver(etcdOpts.RadarAddrs, "", "")
    if err != nil {
        logger.Fatalf(nil, "init etcd service resolver failed: %v", err)
        return
    }

    timeout = time.Second * time.Duration(rpcOpts.Timeout)
    newClientConn := func(service string, balancer grpc.Balancer) (conn *grpc.ClientConn, err error) {
        conn, err = grpc.Dial(
            service,
            grpc.WithInsecure(),
            grpc.WithBlock(),
            grpc.WithTimeout(timeout),
            grpc.WithUserAgent(userAgent),
            grpc.WithBalancer(balancer),
            grpc.WithUnaryInterceptor(interceptor.NewUnaryClientInterceptor(timeout, true)),
        )

        if err != nil && conn != nil {
            conn.Close()
            conn = nil
        }
        return
    }
    bizConn, err := newClientConn(rpcOpts.BizServ,
        loadbalancer.NewTraceAwareBalancer(grpc.RoundRobin(resolver)))
    if err != nil {
        logger.Errorf(nil, "failed to try connect bizServ, service name=%v", rpcOpts.BizServ)
        bizConnFake, err := grpc.Dial("unavailable", grpc.WithInsecure())//no block mode, so return a clientConn even if it unavailable
        if err != nil {
            logger.Fatalf(nil, "bizConn init, dial unavailable failed: %v", err)
        }
        BizClient = pb.NewMessageServiceClient(bizConnFake)//guarantee it not nil
        logger.Infof(nil, "create an unavailable bizClient")
        go func() {
            for {
                bizConn, err = newClientConn(rpcOpts.BizServ,
                    loadbalancer.NewTraceAwareBalancer(grpc.RoundRobin(resolver)))
                if err != nil {
                    continue
                }
                bizConnFake.Close()//close older
                BizClient = pb.NewMessageServiceClient(bizConn)
                logger.Infof(nil, "create bizClient, service name=%v", rpcOpts.BizServ)
                return
            }
        }()
    } else {
        BizClient = pb.NewMessageServiceClient(bizConn)
        logger.Infof(nil, "create bizClient, service name=%v", rpcOpts.BizServ)
    }

3.I think dial can return a clientConn when balance strategy even if not address available, as an unblock ordinay(don't use balance) dial do. Actually, clientConn with no address allowed during execute, for example: Firstly an address avaible which make dial success, soonly the address unregistered from balancer's watcher. Why not allowd in init stage.
So i think dial with balancer also should has unblock mode, the dial's exception could be found until the rpc client called. If it did, we can simplify our code(^-^).

@ruinanchen ruinanchen changed the title DialContext will blocked if no address exists when use a name target DialContext will be blocked if no address exists when use a name target Nov 10, 2016
@tomwilkie
Copy link
Contributor

I too am a bit surprised by this - would you at least consider an DialOption such that we can have a way of creating a client that is guaranteed to be non-blocking?

@alecthomas
Copy link
Contributor

alecthomas commented Mar 5, 2017

I don't think this decision is consistent with other dialing behaviour of grpc-go:

  1. When using a custom resolver, the WithBlock() option does not do what it describes.
  2. It is the opposite of the default DNS "resolver" behaviour (see below).
  3. It makes the default DNS resolver behaviour unique and unreplicable.

The default "resolver" behaves in such a way that If you pass a host:port that is not resolvable, it will return successfully, but continually fail until the name starts to resolve, except if WithBlock() is provided. This seems ideal and is not uncommon in environments with dynamic service DNS or, as others in this issue have mentioned, dynamic resolvers..

For a custom resolver, if I pass grpc.Dial("FooService", grpc.WithResolver(resolver)) and the hosts backing FooService are not ready yet, this will error. This is a) different to how the default works b) makes it impossible to replicate the builtin behaviour via a custom resolver.

Default builtin "resolver":

package main

import (
	"log"
	"os"

	"google.golang.org/grpc"
	"google.golang.org/grpc/grpclog"
)

func main() {
	l := log.New(os.Stdout, "grpc: ", 0)
	grpclog.SetLogger(l)
	_, err := grpc.Dial("foobar.asdfasdf:1234", grpc.WithInsecure())
	if err != nil {
		panic(err)
	}
	l.Println("dialled")
	select {}
}

Custom resolver:

package main

import (
	"net"

	"google.golang.org/grpc"
	"google.golang.org/grpc/naming"
)

type resolver struct{}

func (r *resolver) Resolve(target string) (naming.Watcher, error) {
	return &watcher{target}, nil
}

type watcher struct {
	address string
}

func (w *watcher) Next() ([]*naming.Update, error) {
	host, err := net.ResolveTCPAddr("tcp", w.address)
	if err != nil {
		return nil, nil
	}
	return []*naming.Update{{Op: naming.Add, Addr: host.String()}}, nil
}

func (w *watcher) Close() {
}

func main() {
	r := &resolver{}
	_, err := grpc.Dial("foobar.asdfasdf:1234", grpc.WithBalancer(grpc.RoundRobin(r)), grpc.WithInsecure())
	if err != nil {
		panic(err)
	}
}

It seems like it should be consistent one way or the other.

@alecthomas
Copy link
Contributor

alecthomas commented Mar 5, 2017

In fact, I would suggest that this behaviour is contrary to the connectivity semantics described for the CONNECTING state (emphasis mine):

CONNECTING: The channel is trying to establish a connection and is waiting to make progress on one of the steps involved in name resolution, TCP connection establishment or TLS handshake. This may be used as the initial state for channels upon creation.

And wait-for-ready:

RPCs SHOULD NOT fail as a result of the channel being in other states (CONNECTING, READY, or IDLE).

dfawley added a commit to dfawley/grpc-go that referenced this issue Mar 8, 2017
…rvers

This modifies the WithBlock behavior somewhat to block until there is at least
one valid connection.  Previously, each connection would be made serially until
all had completed successfully, with any errors returned to the caller.  Errors
are now only returned due to connecting to a backend if a balancer is not used,
or if there is an error starting the balancer itself.

Fixes grpc#976
@dfawley dfawley self-assigned this Mar 14, 2017
dfawley added a commit that referenced this issue Mar 21, 2017
…rvers (#1112)

This modifies the WithBlock behavior somewhat to block until there is at least
one valid connection.  Previously, each connection would be made serially until
all had completed successfully, with any errors returned to the caller.  Errors
are now only returned due to connecting to a backend if a balancer is not used,
or if there is an error starting the balancer itself.

Fixes #976
@lock lock bot locked as resolved and limited conversation to collaborators Sep 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants