Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datastore: Calls to Put hang when run inside Kubernetes cluster, fine out of cluster. #928

Closed
jeffd opened this issue Mar 9, 2018 · 10 comments
Assignees
Labels
api: datastore Issues related to the Datastore API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@jeffd
Copy link

jeffd commented Mar 9, 2018

I've been having an issue that I cannot figure out or even properly debug. When developing locally with "cloud.google.com/go/datastore" on Kubernetes using an in-cluster configuration, I can write to Cloud Datastore just fine. However when I deploy it on my cluster, my programs hangs and never returns once .Put(... is called on my datastore client. I don't get any output whatsoever. I've been able to get rudimentary gdb access to a running process on my cluster but have not been able to figure out what is going wrong or where the code is getting stuck.

I have followed the directions here.

I have tried loading my service account file by these two methods.

client, err := datastore.NewClient(ctx, projectID, option.WithServiceAccountFile("/var/secrets/google/key.json"))
if err != nil {
	log.Fatalf("Failed to create client: %v", err)
}
client, err := datastore.NewClient(ctx, projectID)
if err != nil {
	log.Fatalf("Failed to create client: %v", err)
}

Both work in creating a valid client.

I also tried moving to new nodes with more permissions enabled with:

gcloud --project MY_PROJECT container node-pools create main-pool \
   --cluster my-cluster-us-cntrl1a \
   --zone us-central1-a \
   --enable-autoupgrade \
   --num-nodes 1 --machine-type n1-standard-2 \
   --enable-autoscaling --min-nodes=1 --max-nodes=6 \
   --scopes cloud-platform,datastore

The permissions to my cluster looks like this:
screen shot 2018-03-08 at 6 54 28 pm

My service account has the role of Cloud Datastore User and Owner for good measure.

What are other things to check for when running on Kubernetes from within the cluster? Is there any good way to debug this to get logs as to what's happening?

@jba jba self-assigned this Mar 9, 2018
@jba jba added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. api: datastore Issues related to the Datastore API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Mar 9, 2018
@jba
Copy link
Contributor

jba commented Mar 9, 2018

I tried to replicate this. I wrote the following program:

package main

import (
    "flag"
    "fmt"
    "log"

    "cloud.google.com/go/datastore"
    "golang.org/x/net/context"
)

var projectID = flag.String("project", "", "project ID")

type Task struct {
    Description string
}

func main() {
    flag.Parse()
    ctx := context.Background()
    client, err := datastore.NewClient(ctx, *projectID)
    if err != nil {
        log.Fatal(err)
    }
    log.Print("created client")
    key := datastore.NameKey("Task", "milk", nil)
    task := &Task{Description: "Buy milk"}
    if _, err := client.Put(ctx, key, task); err != nil {
        log.Fatalf("Put: %v", err)
    }
    log.Print("Put succeeded")
    var gtask Task
    if err := client.Get(ctx, key, &gtask); err != nil {
        log.Fatalf("Get: %v", err)
    }
    fmt.Println(gtask)
}

I made a docker container for it:

FROM gcr.io/distroless/base
ENV GRPC_GO_LOG_SEVERITY_LEVEL INFO
ADD put-get .
ENTRYPOINT ["./put-get"]

(Note the environment variable enabling gRPC logging.)

I tagged and pushed it:

docker build -t datastore-put-get .
docker tag datastore-put-get gcr.io/MY_PROJECT/datastore-put-get
gcloud docker -- push gcr.io/MY_PROJECT/datastore-put-get

I wrote a pod yaml:

apiVersion: v1
kind: Pod
metadata:
  name: datastore-put-get
spec:
  containers:
  - name: datastore-put-get
    image: gcr.io/MY_PROJECT/datastore-put-get
    args: [-project, MY_PROJECT]

Then I ran it on my GKE cluster and grabbed the output:

$ kubectl create -f put-get.yaml 
pod "datastore-put-get" created
$ kubectl logs datastore-put-get
INFO: 2018/03/09 18:57:18 dialing to target with scheme: ""
2018/03/09 18:57:18 created client
INFO: 2018/03/09 18:57:18 ccResolverWrapper: sending new addresses to cc: [{datastore.googleapis.com:443 0  <nil>}]
INFO: 2018/03/09 18:57:18 ClientConn switching balancer to "pick_first"
INFO: 2018/03/09 18:57:18 pickfirstBalancer: HandleSubConnStateChange: 0xc420180630, CONNECTING
INFO: 2018/03/09 18:57:18 pickfirstBalancer: HandleSubConnStateChange: 0xc420180630, READY
2018/03/09 18:57:19 Put succeeded
{Buy milk}

Could you duplicate that and see if it works? If it does, how do your real code and commands differ from these?

@jeffd
Copy link
Author

jeffd commented Mar 9, 2018

Thanks for looking into this!

I put that code into my setup in addition to the GRPC_GO_LOG_SEVERITY_LEVEL flag.

Here's what I got in the logs:

INFO: 2018/03/09 21:38:33 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:33 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:33 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:26 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:26 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:26 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:22 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:22 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:22 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:19 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:19 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:19 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:18 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:18 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:18 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:17 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, TRANSIENT_FAILURE
WARNING: 2018/03/09 21:38:17 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: failed to load system roots and no roots provided". Reconnecting...
INFO: 2018/03/09 21:38:17 pickfirstBalancer: HandleSubConnStateChange: 0xc4201d4630, CONNECTING
INFO: 2018/03/09 21:38:17 ClientConn switching balancer to "pick_first"
INFO: 2018/03/09 21:38:17 ccResolverWrapper: sending new addresses to cc: [{datastore.googleapis.com:443 0 <nil>}]
2018/03/09 21:38:17 created client
INFO: 2018/03/09 21:38:17 dialing to target with scheme: ""

@jba
Copy link
Contributor

jba commented Mar 9, 2018

Could you be using alpine? See #791.

@jeffd
Copy link
Author

jeffd commented Mar 9, 2018

Ah! Yes I am running it on alpine.

Adding RUN apk --no-cache --update add ca-certificates to my dockerfile did the trick!

@jba jba closed this as completed Mar 9, 2018
@kaizenlabs
Copy link

kaizenlabs commented Nov 14, 2018

Ah! Yes I am running it on alpine.

Adding RUN apk --no-cache --update add ca-certificates to my dockerfile did the trick!

Jeffd, if we have a multi-build container, would you running that 'apk add ca-certificates' in the build portion or the second stage, or both? Using golang:alpine for the builder stage and alpine:latest for the copy-from-builder final stage.

Also running the 'RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo' for building the Go binary, gRPC's still hang no matter what we try.

Below are the gRPC logs:

IINFO: 2018/11/14 17:09:29 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, CONNECTING
WARNING: 2018/11/14 17:09:29 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0  <nil>}. Err :connection error: desc = "transport: authenticatio
n handshake failed: x509: certificate signed by unknown authority". Reconnecting...
INFO: 2018/11/14 17:09:29 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, TRANSIENT_FAILURE
INFO: 2018/11/14 17:09:30 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, CONNECTING
WARNING: 2018/11/14 17:09:30 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0  <nil>}. Err :connection error: desc = "transport: authenticatio
n handshake failed: x509: certificate signed by unknown authority". Reconnecting...
INFO: 2018/11/14 17:09:30 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, TRANSIENT_FAILURE
INFO: 2018/11/14 17:09:32 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, CONNECTING
WARNING: 2018/11/14 17:09:32 grpc: addrConn.createTransport failed to connect to {datastore.googleapis.com:443 0  <nil>}. Err :connection error: desc = "transport: authenticatio
n handshake failed: x509: certificate signed by unknown authority". Reconnecting...
INFO: 2018/11/14 17:09:32 pickfirstBalancer: HandleSubConnStateChange: 0xc0002244b0, TRANSIENT_FAILURE

@twiggg
Copy link

twiggg commented Dec 24, 2018

On Google Cloud Platform, starting an instance from docker image (VM with Container-Optimized OS), built from scratch and adding a compiled golang app (bin), setting the GRPC_GO_LOG_SEVERITY_LEVEL to INFO also shows the underlying grpc call for a datastoreClient.Put() fails silently due to x509 unknown certificate authority.

My docker Image is based on scratch and only contains the bin and opens 80/443 ports. Since this is not based on Alpine but scratch I can not do the magic
"RUN apk --no-cache --update add ca-certificates"
If I don't want an multi-stage build.

Any other way to include ca-certificates ?

...

I'm migrating from appengine, where I did not need to use the client.Put() but an older package where I just called datastore.Put(ctx,key,entity) ... So I did not car about TLS, grpc and certificates ...

Somebody has an idea on that?

@twiggg
Copy link

twiggg commented Dec 24, 2018

@JohnAntonusMaximus the first stage of the Docker build should be to build your golang bin and import/test things, the second stage should start from a scratch image and copy only the artifacts for the app if I understood it well

@jeanbza
Copy link
Member

jeanbza commented Dec 27, 2018

@twiggg The scratch distro appears to have a package manager https://github.com/emmett1/scratchpkg. I would imagine you could add it using this.

@twiggg
Copy link

twiggg commented Dec 27, 2018 via email

@mattwelke
Copy link

Thanks for posting this. I was using debian-slim and had this problem. Had to add a line RUN apt-get update && apt-get install -y ca-certificates to get my app to work when run on platforms other than Google Cloud Run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the Datastore API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants