Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pachd fails: panic: failed to initialize pach client: context deadline exceeded #4432

Closed
benwbooth opened this issue Dec 16, 2019 · 27 comments
Closed
Assignees
Projects

Comments

@benwbooth
Copy link

What happened?:

Ran pachctl deploy to create an on-premises pachyderm cluster:

pachctl deploy custom --object-store s3 any-string 10 <bucket> <accesskey> <secretkey> rook-ceph-rgw-my-store.rook-ceph:80 --etcd-storage-class nfs-client --image-pull-secret boss-6000 --namespace pachyderm --dynamic-etcd-nodes 1

pachd is failing to start up and is reporting the following in the logs:

2019-12-16T18:46:13Z INFO no Jaeger collector found (JAEGER_COLLECTOR_SERVICE_HOST not set) 
2019-12-16T18:46:19Z WARNING TLS disabled: could not stat public cert at /pachd-tls-cert/tls.crt: stat /pachd-tls-cert/tls.crt: no such file or directory 
2019-12-16T18:46:19Z WARNING s3gateway TLS disabled: could not stat public cert at /pachd-tls-cert/tls.crt: stat /pachd-tls-cert/tls.crt: no such file or directory 
2019-12-16T18:46:20Z INFO validating kubernetes access returned no errors 
2019-12-16T18:46:49Z INFO error starting githook server context deadline exceeded 
 
panic: failed to initialize pach client: context deadline exceeded 
 
goroutine 492 [running]: 
github.com/pachyderm/pachyderm/src/server/pkg/serviceenv.(*ServiceEnv).GetPachClient(0xc00021f450, 0x2ad81a0, 0xc00053a2c0, 0xc00053a2c0) 
	src/github.com/pachyderm/pachyderm/src/server/pkg/serviceenv/service_env.go:171 +0x11a 
github.com/pachyderm/pachyderm/src/server/pps/server.(*apiServer).master.func1(0x0, 0x0) 
	src/github.com/pachyderm/pachyderm/src/server/pps/server/master.go:58 +0xe5 
github.com/pachyderm/pachyderm/src/server/pkg/backoff.RetryNotify(0xc00088c220, 0x2a99520, 0xc00061d6e0, 0xc0009fbfb8, 0x2a, 0xc00113f4c0) 
	src/github.com/pachyderm/pachyderm/src/server/pkg/backoff/retry.go:35 +0x4a 
github.com/pachyderm/pachyderm/src/server/pps/server.(*apiServer).master(0xc0002bcfc0) 
	src/github.com/pachyderm/pachyderm/src/server/pps/server/master.go:52 +0x20a 
created by github.com/pachyderm/pachyderm/src/server/pps/server.NewAPIServer 
	src/github.com/pachyderm/pachyderm/src/server/pps/server/server.go:67 +0x3d4 
panic: failed to initialize pach client: context deadline exceeded 
 
goroutine 513 [running]: 
github.com/pachyderm/pachyderm/src/server/pkg/serviceenv.(*ServiceEnv).GetPachClient(0xc00021f450, 0x2ad81e0, 0xc0000560d0, 0x7f01cf21a008) 
	src/github.com/pachyderm/pachyderm/src/server/pkg/serviceenv/service_env.go:171 +0x11a 
github.com/pachyderm/pachyderm/src/server/transaction/server.newAPIServer.func1(0xc001107260) 
	src/github.com/pachyderm/pachyderm/src/server/transaction/server/api_server.go:43 +0x48 
created by github.com/pachyderm/pachyderm/src/server/transaction/server.newAPIServer 
	src/github.com/pachyderm/pachyderm/src/server/transaction/server/api_server.go:43 +0x103 

What you expected to happen?:

pachd should load successfully

How to reproduce it (as minimally and precisely as possible)?:

Anything else we need to know?:

Environment?:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:23:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-13T11:13:49Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
  • Pachyderm CLI and pachd server version (use pachctl version):
COMPONENT           VERSION
pachctl             1.9.9
pachd               1.9.9
  • Cloud provider (e.g. aws, azure, gke) or local deployment (e.g. minikube vs dockerized k8s): on-premises Rancher 2.3.3 with 7 nodes
  • OS (e.g. from /etc/os-release): Ubuntu 18.04
  • Others:
@JoeyZwicker
Copy link
Member

@benwbooth sorry you're hitting this. Can you double check to make sure etcd is healthy and running as well as the perms to your object store bucket? Being unable to connect to one of those is the most common place where this error happens as in #4424

@benwbooth
Copy link
Author

The etcd server looks good as far as I can tell. Here are the etcd logs:

2019-12-16 19:15:50.164655 I | pkg/flags: recognized and used environment variable ETCD_NAME=etcd-0 
2019-12-16 19:15:50.164760 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_PORT_CLIENT_PORT=2379 
2019-12-16 19:15:50.164769 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_HOST=10.43.52.136 
2019-12-16 19:15:50.164773 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_ADDR=10.43.52.136 
2019-12-16 19:15:50.164776 W | pkg/flags: unrecognized environment variable ETCD_SERVICE_PORT=2379 
2019-12-16 19:15:50.164790 W | pkg/flags: unrecognized environment variable ETCD_PORT=tcp://10.43.52.136:2379 
2019-12-16 19:15:50.164794 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_PORT=2379 
2019-12-16 19:15:50.164797 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP_PROTO=tcp 
2019-12-16 19:15:50.164803 W | pkg/flags: unrecognized environment variable ETCD_PORT_2379_TCP=tcp://10.43.52.136:2379 
2019-12-16 19:15:50.164832 I | etcdmain: etcd Version: 3.3.5 
2019-12-16 19:15:50.164839 I | etcdmain: Git SHA: 70c872620 
2019-12-16 19:15:50.164843 I | etcdmain: Go Version: go1.9.6 
2019-12-16 19:15:50.164846 I | etcdmain: Go OS/Arch: linux/amd64 
2019-12-16 19:15:50.164851 I | etcdmain: setting maximum number of CPUs to 72, total number of available CPUs is 72 
2019-12-16 19:15:50.165631 N | etcdmain: the server is already initialized as member before, starting as etcd member... 
2019-12-16 19:15:50.165862 I | embed: listening for peers on http://0.0.0.0:2380 
2019-12-16 19:15:50.165919 I | embed: listening for client requests on 0.0.0.0:2379 
2019-12-16 19:15:50.166933 W | etcdserver: MaxRequestBytes 52428800 exceeds maximum recommended size 10485760 
2019-12-16 19:15:50.184839 I | etcdserver: name = etcd-0 
2019-12-16 19:15:50.184871 I | etcdserver: data dir = /var/data/etcd 
2019-12-16 19:15:50.184881 I | etcdserver: member dir = /var/data/etcd/member 
2019-12-16 19:15:50.184890 I | etcdserver: heartbeat = 100ms 
2019-12-16 19:15:50.184904 I | etcdserver: election = 1000ms 
2019-12-16 19:15:50.184911 I | etcdserver: snapshot count = 100000 
2019-12-16 19:15:50.184933 I | etcdserver: advertise client URLs = http://0.0.0.0:2379 
2019-12-16 19:15:50.224045 I | etcdserver: restarting member e2618eea65c36b3f in cluster e5434e97deb3373f at commit index 1854 
2019-12-16 19:15:50.225043 I | raft: e2618eea65c36b3f became follower at term 3 
2019-12-16 19:15:50.225072 I | raft: newRaft e2618eea65c36b3f [peers: [], term: 3, commit: 1854, applied: 0, lastindex: 1854, lastterm: 3] 
2019-12-16 19:15:50.227230 W | auth: simple token is not cryptographically signed 
2019-12-16 19:15:50.227767 I | etcdserver: starting server... [version: 3.3.5, cluster version: to_be_decided] 
2019-12-16 19:15:50.229802 I | etcdserver/membership: added member e2618eea65c36b3f [http://etcd-0.etcd-headless.pachyderm.svc.cluster.local:2380] to cluster e5434e97deb3373f 
2019-12-16 19:15:50.229991 N | etcdserver/membership: set the initial cluster version to 3.3 
2019-12-16 19:15:50.230059 I | etcdserver/api: enabled capabilities for version 3.3 
2019-12-16 19:15:51.426564 I | raft: e2618eea65c36b3f is starting a new election at term 3 
2019-12-16 19:15:51.426646 I | raft: e2618eea65c36b3f became candidate at term 4 
2019-12-16 19:15:51.426714 I | raft: e2618eea65c36b3f received MsgVoteResp from e2618eea65c36b3f at term 4 
2019-12-16 19:15:51.426745 I | raft: e2618eea65c36b3f became leader at term 4 
2019-12-16 19:15:51.426760 I | raft: raft.node: e2618eea65c36b3f elected leader e2618eea65c36b3f at term 4 
2019-12-16 19:15:51.427493 I | embed: ready to serve client requests 
2019-12-16 19:15:51.427694 I | etcdserver: published {Name:etcd-0 ClientURLs:[http://0.0.0.0:2379]} to cluster e5434e97deb3373f 
2019-12-16 19:15:51.428759 N | embed: serving insecure client requests on [::]:2379, this is strongly discouraged! 

@benwbooth
Copy link
Author

I am using an on-premises Rook/Ceph cluster as the object store. I have the access key, secret key, endpoint, bucket name all set correctly. I'm not sure what else to try.

@benwbooth
Copy link
Author

I still havent been able to get pachctl deploy to work by itself, however I was able to get the helm chart to work. I had to download the chart yaml and set the Deployment apiVersion to apps/v1 and update the image tag to 1.9.9.

The RBAC permissions of the helm chart were not allowing pachd to query the kubernetes API to get the nodes when using the helm chart, so I first installed using pachctl deploy, then deleted pachd and etcd and related services and secrets, then I installed the helm chart on top of the pachctl deploy installation, using the same parameters that were passed to pachctl deploy. This seems to be working so far.

@suman724
Copy link

suman724 commented Dec 16, 2019

+1

Running into exactly similar problem with same version of pachyderm. However, I am trying to use minio as object store instead of Rook/ceph on-prem.

I posted the logs and other details on slack channel. I used the following command to generate the deployment manifest.

pachctl deploy custom --persistent-disk aws --object-store s3 any-string 10 pachyderm 'someuser' 'somepassword' '192.168.92.254:9000' --static-etcd-volume nfs-pv --dryrun > pachyderm.json

I noticed that pachctl is generating deployment manifest with STORAGE_BACKEND as AMAZON. I therefore updated it to MINIO. I also updated the secret generated by pachctl to use minio-* keys instead of amazon-* keys. Not sure what is missing at this point.

Can you please refer us to some documentation on how to generate deployment manifest using pachctl for on-prem usecases where object store used is MINIO or ceph? It appears pachctl --object-store parameter does not accept minio as a valid value.

@benwbooth
Copy link
Author

benwbooth commented Dec 16, 2019

Scratch that-- the helm chart still has RBAC issues when attempting to run a pipeline:

{"pipelineName":"iarpa-unicycler-assembly","workerId":"pipeline-iarpa-unicycler-assembly-v1-cmvbl","datumId":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","ts":"2019-12-16T21:37:55.885538802Z","message":"error getting number of workers, default to 1 worker: unable to retrieve node list from k8s to determine parallelism: nodes is forbidden: User \"system:serviceaccount:pachyderm:default\" cannot list resource \"nodes\" in API group \"\" at the cluster scope"} 
{"pipelineName":"iarpa-unicycler-assembly","workerId":"pipeline-iarpa-unicycler-assembly-v1-cmvbl","ts":"2019-12-16T21:37:55.982554521Z","message":"processing job eb2499a6d4214c4c957fd0e1449d2987"} 

@benwbooth
Copy link
Author

I've tested the object store using s3cmd with the same accesskey/secretkey/endpoint that pachyderm is configured to use. I was able to get and retrieve an object without any issues.

@benwbooth
Copy link
Author

@suman724 I think you are supposed to use s3 instead of aws when deploying to a custom s3-like object store. Is your pachyderm installation working now? Maybe I should try putting aws instead of s3 then change the manifest and see if it works.

@suman724
Copy link

suman724 commented Dec 16, 2019

@benwbooth I fixed the typo in my previous command. Yes, I am using object-store as s3 and then modifying the generated yaml file. No, it is still not working. I see the same stack trace that you have posted on this thread.

@benwbooth
Copy link
Author

The problem seems to be here in src/server/pkg/serviceenv/service_env.go:

func (env *ServiceEnv) GetPachClient(ctx context.Context) *client.APIClient {
	if err := env.pachEg.Wait(); err != nil {
		panic(err) // If env can't connect, there's no sensible way to recover
	}
	return env.pachClient.WithCtx(ctx)
}

It's timing out waiting for the error group.

@benwbooth
Copy link
Author

Some better error messages would really be helpful to diagnose the issue. #4424 has the same error message but is really no help for diagnosing the problem

@nitinjainsj
Copy link
Contributor

@benwbooth @suman724 We are taking a closer look at this issue.

@nitinjainsj nitinjainsj added this to Needs triage in Bugs via automation Dec 18, 2019
@suman724
Copy link

Thank you @nitinjainsj . I created the issue #4437 with detailed logs and deployment manifest I used.

@nitinjainsj
Copy link
Contributor

@suman724 @benwbooth We are attempting to recreate this. Does this work in any older pachyderm version like 1.9.8?

@ysimonson
Copy link
Contributor

@ benwbooth in the original issue, is that the full set of panic tracebacks? Did you see a traceback with src/server/cmd/pachd/main.go?

@benwbooth
Copy link
Author

@ysimonson I ran it again on pachd v1.9.9 and got this traceback. Don't see a main.go:

2019-12-20T00:07:22Z INFO no Jaeger collector found (JAEGER_COLLECTOR_SERVICE_HOST not set)
2019-12-20T00:07:49Z WARNING TLS disabled: could not stat public cert at /pachd-tls-cert/tls.crt: stat /pachd-tls-cert/tls.crt: no such file or directory
2019-12-20T00:07:50Z WARNING s3gateway TLS disabled: could not stat public cert at /pachd-tls-cert/tls.crt: stat /pachd-tls-cert/tls.crt: no such file or directory
2019-12-20T00:07:50Z INFO validating kubernetes access returned no errors
2019-12-20T00:08:19Z INFO error starting githook server context deadline exceeded
panic: failed to initialize pach client: context deadline exceeded
goroutine 530 [running]:
github.com/pachyderm/pachyderm/src/server/pkg/serviceenv.(*ServiceEnv).GetPachClient(0xc000194b60, 0x2ad81a0, 0xc000275d80, 0xc000275d80)
src/github.com/pachyderm/pachyderm/src/server/pkg/serviceenv/service_env.go:171 +0x11a
github.com/pachyderm/pachyderm/src/server/pps/server.(*apiServer).master.func1(0x0, 0x0)
src/github.com/pachyderm/pachyderm/src/server/pps/server/master.go:58 +0xe5
github.com/pachyderm/pachyderm/src/server/pkg/backoff.RetryNotify(0xc0005928c0, 0x2a99520, 0xc00058ede0, 0xc000647fb8, 0x2a, 0xc0011842b0)
src/github.com/pachyderm/pachyderm/src/server/pkg/backoff/retry.go:35 +0x4a
github.com/pachyderm/pachyderm/src/server/pps/server.(*apiServer).master(0xc000210480)
src/github.com/pachyderm/pachyderm/src/server/pps/server/master.go:52 +0x20a
created by github.com/pachyderm/pachyderm/src/server/pps/server.NewAPIServer
src/github.com/pachyderm/pachyderm/src/server/pps/server/server.go:67 +0x3d4

If I change the pachd image to v1.9.8, I see:

2019-12-20T00:09:52Z WARNING TLS disabled: could not stat public cert at /pachd-tls-cert/tls.crt: stat /pachd-tls-cert/tls.crt: no such file or directory
2019-12-20T00:09:52Z WARNING s3gateway TLS disabled: could not stat public cert at /pachd-tls-cert/tls.crt: stat /pachd-tls-cert/tls.crt: no such file or directory
2019-12-20T00:09:52Z INFO validating kubernetes access returned no errors
2019-12-20T00:10:22Z INFO error starting githook server context deadline exceeded
2019-12-20T00:10:31Z INFO error starting grpc server pfs.NewBlockAPIServer: unable to write to object storage: RequestError: send request failed
caused by: Put https://rook-ceph-rgw-my-store.rook-ceph:443/ceph-bkt-a34d91ff-6ee7-44af-965a-e676736fc5f6/82d5faf8f757de38dad45398f1b43c05: x509: certificate is not valid for any names, but wanted to match rook-ceph-rgw-my-store.rook-ceph
panic: failed to initialize pach client: context deadline exceeded
goroutine 424 [running]:
github.com/pachyderm/pachyderm/src/server/pkg/serviceenv.(*ServiceEnv).GetPachClient(0xc00046eea0, 0x29d9a00, 0xc0000560d0, 0xc000504360)
src/github.com/pachyderm/pachyderm/src/server/pkg/serviceenv/service_env.go:171 +0x11a
github.com/pachyderm/pachyderm/src/server/transaction/server.newAPIServer.func1(0xc00116b080)
src/github.com/pachyderm/pachyderm/src/server/transaction/server/api_server.go:43 +0x48
created by github.com/pachyderm/pachyderm/src/server/transaction/server.newAPIServer
src/github.com/pachyderm/pachyderm/src/server/transaction/server/api_server.go:43 +0x103
panic: failed to initialize pach client: context deadline exceeded
goroutine 369 [running]:
github.com/pachyderm/pachyderm/src/server/pkg/serviceenv.(*ServiceEnv).GetPachClient(0xc00046eea0, 0x29d9a00, 0xc0000560d0, 0x0)
src/github.com/pachyderm/pachyderm/src/server/pkg/serviceenv/service_env.go:171 +0x11a
github.com/pachyderm/pachyderm/src/server/pfs/server.newAPIServer.func1(0xc0001da2c0)
src/github.com/pachyderm/pachyderm/src/server/pfs/server/api_server.go:66 +0x48
created by github.com/pachyderm/pachyderm/src/server/pfs/server.newAPIServer
src/github.com/pachyderm/pachyderm/src/server/pfs/server/api_server.go:66 +0x152

I'm using a self-signed certificate on the object store. The common name is set to the URL that I'm using to access the object store. Does pachyderm not allow self-signed certs?

@benwbooth
Copy link
Author

I've tried this with SSL disabled on the object store, and got a different error with 1.9.8 saying that pachd was trying to speak HTTPS but the object store was trying to speak HTTP. It looks like pachd does not allow unencrypted object stores, is that correct? On v1.9.9 I didn't get any useful error messages, just the stack trace.

@benwbooth
Copy link
Author

What do I need to do to get pachyderm to connect to my object store using a self-signed cert? Or is there a way to allow pachyderm to connect to an unencrypted object store?

@suman724
Copy link

I tried version 1.9.8 as well. I disabled HTTPS on Minio. I therefore ran pachctl as below to generate the deployment manifest and make pachd work with non-secure minio.

pachctl deploy custom --persistent-disk aws --object-store s3 any-string 10 pachyderm 'suman' 'hadoop123' '192.168.92.254:9000' --static-etcd-volume nfs-pv --isS3V2 --dry-run > pachyderm1.json

I am now passing isS3V2 go make non-secure connection to minio. If I do not pass --isS3V2, I ran into the same error @benwbooth mentioned here.

With that extra parameter, pachd started successfully.

# kubectl get pods -l suite=pachyderm
NAME                     READY   STATUS    RESTARTS   AGE
dash-6b9cd76b4f-qdn6k    2/2     Running   0          13m
etcd-674484698f-mzd62    1/1     Running   0          13m
pachd-7fbdc9466f-br97g   1/1     Running   0          6m

# ./pachctl version
COMPONENT           VERSION
pachctl             1.9.8
pachd               1.9.8


So, the original problem (pachd service startup failure) does appear to be only in 1.9.9 in my environment.

@nitinjainsj
Copy link
Contributor

@suman724 @benwbooth 1.9.10 has additional logging to help debug this further.

@suman724 I want to clarify if you pass the --isS3V2 flag it work in 1.9.8 and fail in 1.9.9?

@suman724
Copy link

@suman724 @benwbooth 1.9.10 has additional logging to help debug this further.

@suman724 I want to clarify if you pass the --isS3V2 flag it work in 1.9.8 and fail in 1.9.9?

@nitinjainsj In version 1.9.9, pachd was failing with the following stack trace.

goroutine 367 [running]:
github.com/pachyderm/pachyderm/src/server/pkg/serviceenv.(*ServiceEnv).GetPachClient(0xc000175450, 0x2ad81a0, 0xc0007c4400, 0xc0007c4400)
        src/github.com/pachyderm/pachyderm/src/server/pkg/serviceenv/service_env.go:171 +0x11a
github.com/pachyderm/pachyderm/src/server/pps/server.(*apiServer).master.func1(0x0, 0x0)
        src/github.com/pachyderm/pachyderm/src/server/pps/server/master.go:58 +0xe5
github.com/pachyderm/pachyderm/src/server/pkg/backoff.RetryNotify(0xc000974040, 0x2a99520, 0xc000776000, 0xc00064dfb8, 0x2a, 0x0)
        src/github.com/pachyderm/pachyderm/src/server/pkg/backoff/retry.go:35 +0x4a
github.com/pachyderm/pachyderm/src/server/pps/server.(*apiServer).master(0xc0002aeb40)
        src/github.com/pachyderm/pachyderm/src/server/pps/server/master.go:52 +0x20a
created by github.com/pachyderm/pachyderm/src/server/pps/server.NewAPIServer
        src/github.com/pachyderm/pachyderm/src/server/pps/server/server.go:67 +0x3d4

When I used version 1.9.8, I think it went past this failure that occurred in 1.9.9 and actually tried to make a connection to minio which is exposing only http in my test environment. However, pachd was attempting to make a https connection. Error message was printed in the logs with this message in 1.9.8. No such errors were printed in 1.9.9. To resolve this handshake problem, I used --isS3V2 flag with version 1.9.8. I read in pachctl documentation that this flag will make a http connection and also use backward version of S3 API.

@benwbooth
Copy link
Author

I was able to get pachyderm to deploy correctly and connect to my object storage using pachyderm 1.9.10:

pachctl deploy custom --object-store s3 any-string 10 <bucket> <accesskey> <secretkey> rook-ceph-rgw-my-store.rook-ceph:80 --etcd-storage-class nfs-client --image-pull-secret boss-6000 --namespace pachyderm --dynamic-etcd-nodes 1 --isS3V2

However, I had to use an unencrypted object store on port 80. If I tried to use port 443 instead, then pachd would never become ready. Is this because of the self-signed cert? It would be nice if there was a way for pachyderm to support object stores that use self-signed certificates.

@brycemcanally
Copy link
Contributor

I think it might make sense to address the set of issues you two (@benwbooth and @suman724) are hitting in this one issue instead of splitting it between this issue and #4437 and #4466.

Just to give a little bit of background on the object storage client stuff, we are trying, in general, to move away from using the minio client libraries for custom deployments and instead use the aws client libraries across the board. The only time the minio client libraries are used is when V2 signatures are needed (the isS3V2 flag) and there are still some unresolved issues that can pop up, as you guys are seeing.

So, do you guys need V2 signing or not? If not, then lets just focus on getting the aws client libraries working (I am going to assume V2 signing is not needed for the following stuff, but we can try and figure something out if V2 signing is needed).

If your object storage provider has ssl disabled then you should be able to prefix the endpoint with http:// to force the amazon client libraries to disable ssl. As far as using self-signed certificates is concerned, we don't have support for it right now, but I am working on adding a way to skip the certificate verification process which should allow use of self signed certificates.

@benwbooth
Copy link
Author

I tried deploying pachyderm again, this time using http:// and omitting the --isS3V2 flag. This is the command-line I used:

pachctl deploy custom --object-store s3 any-string 10 <bucket> <accesskey> <secretkey> http://rook-ceph-rgw-my-store.rook-ceph:80 --etcd-storage-class nfs-client --image-pull-secret boss-6000 --namespace pachyderm --dynamic-etcd-nodes 1

Here are the logs from pachd:

2020-01-07T20:18:19Z INFO no Jaeger collector found (JAEGER_COLLECTOR_SERVICE_HOST not set) 
2020-01-07T20:19:02Z WARNING TLS disabled: could not stat public cert at /pachd-tls-cert/tls.crt: stat /pachd-tls-cert/tls.crt: no such file or directory 
2020-01-07T20:19:02Z INFO started setting up Internal Block API GRPC Server 
2020-01-07T20:19:02Z INFO started setting up External PFS API GRPC Server 
2020-01-07T20:19:02Z INFO finished setting up External PFS API GRPC Server 
2020-01-07T20:19:02Z INFO started setting up External PPS API GRPC Server 
2020-01-07T20:19:02Z INFO errored setting up Internal Block API GRPC Server 
2020-01-07T20:19:02Z WARNING s3gateway TLS disabled: could not stat public cert at /pachd-tls-cert/tls.crt: stat /pachd-tls-cert/tls.crt: no such file or directory 
2020-01-07T20:19:02Z INFO validating kubernetes access returned no errors 
2020-01-07T20:19:02Z INFO finished setting up External PPS API GRPC Server 
2020-01-07T20:19:02Z INFO started setting up External Auth API GRPC Server 
2020-01-07T20:19:02Z INFO finished setting up External Auth API GRPC Server 
2020-01-07T20:19:02Z INFO started setting up External Transaction API GRPC Server 
2020-01-07T20:19:02Z INFO finished setting up External Transaction API GRPC Server 
2020-01-07T20:19:02Z INFO started setting up External Enterprise API GRPC Server 
2020-01-07T20:19:02Z INFO finished setting up External Enterprise API GRPC Server 
2020-01-07T20:19:02Z INFO started setting up External Admin API GRPC Server 
2020-01-07T20:19:02Z INFO finished setting up External Admin API GRPC Server 
2020-01-07T20:19:02Z INFO started setting up External Health GRPC Server 
2020-01-07T20:19:02Z INFO finished setting up External Health GRPC Server 
2020-01-07T20:19:02Z INFO started setting up External Version API GRPC Server 
2020-01-07T20:19:02Z INFO finished setting up External Version API GRPC Server 
2020-01-07T20:19:02Z INFO started setting up External Debug GRPC Server 
2020-01-07T20:19:02Z INFO finished setting up External Debug GRPC Server 
2020-01-07T20:19:31Z INFO error starting githook server context deadline exceeded 
 
goroutine 260 [running]: 
github.com/pachyderm/pachyderm/src/server/pkg/serviceenv.(*ServiceEnv).GetPachClient(0xc000254c30, 0x2a81560, 0xc000124018, 0xc00046ec00) 
	/pachyderm/src/server/pkg/serviceenv/service_env.go:171 +0x11a 
github.com/pachyderm/pachyderm/src/server/pfs/server.newAPIServer.func1(0xc0005c6630) 
	/pachyderm/src/server/pfs/server/api_server.go:54 +0x48 
created by github.com/pachyderm/pachyderm/src/server/pfs/server.newAPIServer 
	/pachyderm/src/server/pfs/server/api_server.go:54 +0x152 

@SolitaryThinker
Copy link

I'm having the same issue and also tried using http:// and omitting the --isS3V2 flag and still recieved:

INFO error starting githook server context deadline exceeded

@brycemcanally
Copy link
Contributor

brycemcanally commented Jan 16, 2020

Hey @benwbooth @suman724 @SolitaryThinker , we have a new release (1.9.11) that contains improved logging and adds more configuration that may help address your issues (such as disabling ssl or skipping certificate verification). It would be great to get an update on whether you are/were able to resolve the issue(s) or whether this helps identify the root issue(s).

@nitinjainsj nitinjainsj moved this from Needs triage to High priority in Bugs Feb 5, 2020
@brycemcanally
Copy link
Contributor

Closing this since there has not been a response for a while. Feel free to reopen with logs from a release >=1.9.12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Bugs
  
Closed
Development

No branches or pull requests

7 participants