start-kube-docker not working in Vagrant image #161

brinman2002 · 2015-11-27T17:20:08Z

Trying to run Pachyderm in Vagrant using the Vagrantfile/init.sh in the Github documentation QUICKSTART.md. gcr.io/google_containers/hyperkube:v1.1.2 container does not start.

Steps to reproduce:

vagrant destroy # or download per README.md
vagrant up
vagrant ssh

go get github.com/pachyderm/pachyderm/...
cd ~/go/src/github.com/pachyderm/pachyderm
etc/kube/start-kube-docker.sh

~/pachyderm_vagrant$ vagrant version
Installed Version: 1.7.4
Latest Version: 1.7.4

You're running an up-to-date version of Vagrant!

Console log: kubeNotStarting.txt

The text was updated successfully, but these errors were encountered:

brinman2002 · 2015-11-27T21:16:13Z

Couple things I noted; I don't see anything in the script starting the rethinkdb container so I've done that manually (docker run -d rethinkdb:2.0.4). The other is that the kubelet container just doesn't start hyperkube. If I commit the container and run it with bash, then execute the same command in the script, it seems to start but can't seem to connect to rethinkdb on 8080. Still learning my way around docker though; I'll keep playing with it.

edit: sorry for the comment spam, but I did realize that 8080 is one of the kube ports and I was misreading the quickstart "to check if it worked" bit.

brinman2002 · 2015-11-27T22:45:50Z

Here is the output from running hyperkube manually.

hyperkube_fromcmdline.txt

One problem with the container is that the volumes don't appear to be linking to the host properly.

root@65f570570abc:/# ls /var/run/   
kubernetes  lock  utmp

vagrant@vagrant-ubuntu-vivid-64:~/go/src/github.com/pachyderm/pachyderm$ ls /var/run
acpid.socket  cloud-init    dhclient.eth0.pid  initctl    motd.dynamic  plymouth    rpcbind       rsyslogd.pid     sshd       thermald    utmp
atd.pid       crond.pid     docker             initramfs  mount         pppconfig   rpcbind.lock  screen           sshd.pid   tmpfiles.d  uuidd
blkid         crond.reboot  docker.pid         lock       network       puppet      rpcbind.sock  sendsigs.omit.d  sysconfig  udev
chef          dbus          docker.sock        log        pcscd         resolvconf  rpc_pipefs    shm              systemd    user

brinman2002 · 2015-11-30T03:38:53Z

Unfortunately I didn't really keep good track of everything I tried, but it does seem like Docker is the problem or at least the symptom of another problem. I've gotten kubernetes to start and pachctl to at least connect to it by dropping back to earlier versions of everything, but then the enhancements that seem to be needed by Pachyderm aren't there. I've gotten the "kubelet" container to start by committing the container created by the script and running /bin/bash and then running hyperkube manually, but that doesn't start up the pod.

Not directly related, but the experimental Docker build also can't stop containers due to it complaining about permissions. Based on doc for Kubernetes, I added cgroup_enable=memory swapaccount=1 to the Kernel parameters for the VM but I wonder if there is some other Kernel setup needed?

jdoliner · 2015-11-30T21:07:36Z

So I'm a little confused on the status of this issue. You have Kubernetes up and running but it's the wrong version? What happens when you try to start the correct version?

Also could you try this without Vagrant? All you need is Docker and Golang which I think makes Vagrant sort of unneeded. I think we should consider just removing the Vagrant file.

brinman2002 · 2015-12-01T01:40:17Z

So I'm a little confused on the status of this issue. You have Kubernetes up and running but it's the wrong version? What happens when you try to start the correct version?

Sorry for the confusion. At one point Kubernetes would start when I referenced an earlier version. I probably wasn't using the Pachyderm master.json and Pachyderm didn't work correctly. I imagine that this would be expected. The correct version as referenced by the start kube script does not start correctly.

Also could you try this without Vagrant? All you need is Docker and Golang which I think makes Vagrant sort of unneeded. I think we should consider just removing the Vagrant file.

Vagrant makes it possible to blow away bad experiments and start over. Is Docker or any of the other technologies used known to not play well in a VM? I do understand if you don't want to maintain an official Vagrantfile.

Thanks

brinman2002 · 2015-12-01T04:22:03Z

Ok, I do have to correct myself. I am able to get a Kubernetes cluster working by using the instructions on their site substituting the 1.1.2 version instead. But, that leaves me with this error-

vagrant@ubuntu-14:~/go/src/github.com/pachyderm/pachyderm$ $HOME/go/bin/pachctl create-cluster
ReplicationController "pfsd-rc" is invalid: spec.template.spec.containers[0].securityContext.privileged: forbidden '<*>(0xc2092af1f8)true'
vagrant@ubuntu-14:~/go/src/github.com/pachyderm/pachyderm$ $HOME/go/bin/kubectl get svc
NAME         LABELS                                    SELECTOR      IP(S)        PORT(S)
etcd         app=etcd,suite=pachyderm                  app=etcd      10.0.0.173   2379/TCP
                                                                                  2380/TCP
kubernetes   component=apiserver,provider=kubernetes   <none>        10.0.0.1     443/TCP
rethink      app=rethink,suite=pachyderm               app=rethink   10.0.0.194   8080/TCP
                                                                                  28015/TCP
                                                                                  29015/TCP

In this setup the custom master.json doesn't get copied to the Kubernetes image, so presumably the configuration in it is a part of the problem?

Also forgot to mention that I switched to a Phusion based image (phusion/ubuntu-14.04-amd64) instead of the default Ubuntu as they are supposed to be more Docker friendly. I think Pachyderm is a great idea and I'm hoping I can get this working.

jaybennett89 · 2015-12-01T05:54:24Z

@brinman2002, Docker is very similar to Vagrant in it's functions in that it also manages the automated deployment of virtual machines. So why run a virtualization Inception? Try docker on its own!

brinman2002 · 2015-12-03T02:04:36Z

Much to my surprise, Docker doesn't work correctly in Vagrant (with VirtualBox as the backend anyway) but it is working (better) on bare metal. I can only assume it is doing lower level virtualization that isn't supported by VirtualBox.

I'm still not completely working but I'm past all of the issues documented here.

jdoliner · 2015-12-04T01:34:35Z

@brinman2002 what sorts of issues are you hitting with bare metal? make launch works for me and I think all I did to get the environment working is install Docker and Go, but there's probably some little things that I've forgotten at this point.

brinman2002 · 2015-12-05T04:13:37Z

I just tried to update and now make install doesn't work. The launch goal triggers install so it fails as well. When I made the previous comment, I was using the /etc/kube script as that was what was suggested on issue 160.

~/go/src/github.com/pachyderm/pachyderm$ make install
GO15VENDOREXPERIMENT=1 go install ./src/cmd/pachctl
# go.pedge.io/pkg/sync
../../../go.pedge.io/pkg/sync/lazy_loader.go:21: undefined: atomic.Value
../../../go.pedge.io/pkg/sync/lazy_loader.go:22: undefined: atomic.Value
# golang.org/x/crypto/ssh
../../../golang.org/x/crypto/ssh/keys.go:492: undefined: crypto.Signer
# github.com/pachyderm/pachyderm/src/pkg/shard
src/pkg/shard/shard.pb.log.go:10: undefined: protolog.Register
src/pkg/shard/shard.pb.log.go:10: undefined: protolog.MessageType_MESSAGE_TYPE_EVENT
src/pkg/shard/shard.pb.log.go:10: undefined: protolog.Message
src/pkg/shard/shard.pb.log.go:11: undefined: protolog.Register
src/pkg/shard/shard.pb.log.go:11: undefined: protolog.MessageType_MESSAGE_TYPE_EVENT
src/pkg/shard/shard.pb.log.go:11: undefined: protolog.Message
src/pkg/shard/shard.pb.log.go:12: undefined: protolog.Register
src/pkg/shard/shard.pb.log.go:12: undefined: protolog.MessageType_MESSAGE_TYPE_EVENT
src/pkg/shard/shard.pb.log.go:12: undefined: protolog.Message
src/pkg/shard/shard.pb.log.go:13: undefined: protolog.Register
src/pkg/shard/shard.pb.log.go:13: too many errors
Makefile:55: recipe for target 'install' failed
make: *** [install] Error 2

brinman2002 · 2015-12-05T22:13:20Z

The previous error was from not having my machine set up correctly. I think I have it set up right now but get is still having issues:

~$ go get github.com/pachyderm/pachyderm/...
package github.com/pachyderm/pachyderm/vendor/github.com/emicklei/go-restful/examples/google_app_engine
    imports google.golang.com/appengine: unrecognized import path "google.golang.com/appengine"
package github.com/pachyderm/pachyderm/vendor/github.com/emicklei/go-restful/examples/google_app_engine
    imports google.golang.com/appengine/memcache: unrecognized import path "google.golang.com/appengine/memcache"
package github.com/pachyderm/pachyderm/vendor/github.com/emicklei/go-restful/examples/google_app_engine/datastore
    imports google.golang.com/appengine/datastore: unrecognized import path "google.golang.com/appengine/datastore"
package github.com/pachyderm/pachyderm/vendor/github.com/emicklei/go-restful/examples/google_app_engine/datastore
    imports google.golang.com/appengine/user: unrecognized import path "google.golang.com/appengine/user"
package github.com/pachyderm/pachyderm/vendor/go.pedge.io/protolog/cmd/protoc-gen-protolog
    imports go.pedge.io/proto/plugin: cannot find package "go.pedge.io/proto/plugin" in any of:
    /opt/go/src/go.pedge.io/proto/plugin (from $GOROOT)
    /home/brandon/go/src/go.pedge.io/proto/plugin (from $GOPATH)

jdoliner · 2015-12-06T01:56:10Z

Hmm, I don't fully understand go get but go get github.com/pachyderm/pachyderm seems to work for me.

brinman2002 · 2015-12-06T02:05:58Z

Yeah I'm not sure what changed but make install is working now.

Do you recommend make launch or etc/kube/start-kube-docker.sh to run Pachyderm? make launch still uses Docker Compose, which you said wasn't "a viable way to deploy".

brinman2002 · 2015-12-06T02:11:14Z

Running etc/kube/start-kube-docker.sh seems to be working now. Thanks!

jdoliner · 2015-12-06T02:21:47Z

If you're on master make launch should launch you a kubernetes cluster and then get pachyderm running on that cluster. You will still see some docker-compose output because our containers are still built through docker compose.

I'm trying to get docker-compose ripped out soon, but our unit tests still use it.

brinman2002 · 2015-12-06T03:38:06Z

Not completely working after all. I can create repos with pachctl but it doesn't create commits.

~/go/src/github.com/pachyderm/pachyderm$ sudo  $(which pachctl) start-commit foo
rpc error: code = 2 desc = "btrfs subvolume create /pfs/btrfs/repo/foo/120d16b28a28451c918fe4762cd8ac24: exit status 1\n\tERROR: can't access to '/pfs/btrfs/repo/foo'\n"
~/go/src/github.com/pachyderm/pachyderm$ ls /pfs/foo
ls: cannot access /pfs/foo: Input/output error
~/go/src/github.com/pachyderm/pachyderm$ ls /pfs
data  output

Also, if you do the echo in the quickstart to demonstrate that you can't write to the data directory, it throws things into a bad state. The echo command never exits and the pachctl mount command continuously throws errors like these-

2015/12/05 21:36:32 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:40089->127.0.0.1:650: read: connection reset by peer.
2015/12/05 21:36:32 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:40091->127.0.0.1:650: read: connection reset by peer.
2015/12/05 21:36:32 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:40093->127.0.0.1:650: read: connection reset by peer.
2015/12/05 21:36:32 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:40095->127.0.0.1:650: read: connection reset by peer.
2015/12/05 21:36:32 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:40097->127.0.0.1:650: read: connection reset by peer.
2015/12/05 21:36:32 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:40099->127.0.0.1:650: read: connection reset by peer.
2015/12/05 21:36:32 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:40101->127.0.0.1:650: read: connection reset by peer.

I've had to reboot to make to things work again.

brinman2002 · 2015-12-06T17:52:04Z

On the quickstart, is using "foo" just a documentation error?

brandon@tamami:~$ pachctl create-repo data
brandon@tamami:~$ pachctl create-repo output
brandon@tamami:~$ ls /pfs
data  output
brandon@tamami:~$ pachctl start-commit data
7ea12a9dac744eec817c634e4d486bd0
brandon@tamami:~$ echo "Hello world" > /pfs/data/7ea12a9dac744eec817c634e4d486bd0/hello.txt
brandon@tamami:~$ pachctl finish-commit data 7ea12a9dac744eec817c634e4d486bd0
brandon@tamami:~$ cat /pfs/data/7ea12a9dac744eec817c634e4d486bd0/hello.txt 
Hello world

brinman2002 · 2015-12-06T18:02:45Z

According to the quickstart, this shouldn't work either since it hasn't committed yet.

brandon@tamami:~$ pachctl start-commit data
559e851fc6f64086be60740d0a890540
brandon@tamami:~$ echo "Hello world" > /pfs/data/559e851fc6f64086be60740d0a890540/hello2.txt
brandon@tamami:~$ cat /pfs/data/559e851fc6f64086be60740d0a890540/hello2.txt 
Hello world

I know you mentioned the doc is a little stale (and as a developer, I know how that happens :D ), but I thought I'd point this out because it seems like it could be an issue in the code as well.

brinman2002 · 2015-12-06T18:08:16Z

Another hang

brandon@tamami:~$ pachctl list-commit data
ID                                 PARENT              STATUS              STARTED             FINISHED            TOTAL_SIZE          DIFF_SIZE           
559e851fc6f64086be60740d0a890540   <none>              writeable           45 years ago                            28 B                12 B                
7ea12a9dac744eec817c634e4d486bd0   <none>              read-only           45 years ago        15 minutes ago      26 B                12 B                
brandon@tamami:~$ pachctl finish-commit 559e851fc6f64086be60740d0a890540
Expected 2 args, got 1
brandon@tamami:~$ pachctl finish-commit 559e851fc6f64086be60740d0a890540 data
rpc error: code = 2 desc = "commit 559e851fc6f64086be60740d0a890540/data not found"
brandon@tamami:~$ pachctl finish-commit data 559e851fc6f64086be60740d0a890540 
# hung on this command

jdoliner · 2015-12-06T21:43:00Z

Thanks so much for reporting, just updated the quickstart to not reference foo anymore.

Regarding issue with the hang. I've created #162 to track that.

The issue with files being returned from unfinished commits is tracked in #159.

I think these are both fairly simple issues so I'll try to get them fixed soon.

brinman2002 · 2015-12-06T22:10:57Z

Great, thanks. Is there an IRC/Slack chat/mailing list for more informal questions?

teodor-pripoae · 2015-12-10T04:27:18Z

Hi,

I'm running a default vagrant cluster with 3 k8s nodes to simulate a production cluster. pachyderm/pfsd is crashing every time. All other containers seem to work fine.

Cluster started with: KUBERNETES_MINION_MEMORY=2048 NUM_MINIONS=3 KUBERNETES_PROVIDER=vagrant ./kube-up.sh

[vagrant@kubernetes-minion-2 ~]$ sudo docker logs -f 4ff34f7e27da
Turning ON incompat feature 'extref': increased hardlink limit per file to 65536

WARNING! - Btrfs v3.12 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /pfs-img/btrfs.img
    nodesize 16384 leafsize 16384 sectorsize 4096 size 10.00GiB
Btrfs v3.12
mount: could not find any free loop device

jdoliner · 2015-12-10T08:44:51Z

Hi @teodor-pripoae, sorry you ran into this.
What's going on is that pfs needs a loop device to mount the btrfs volume on.
Sometimes there's a weird bug where when the container finishes the loop device can get leaked and there's no way to unmount it.

What do you get when you do losetup -a?

* Remove unused orb * Initial pass for lint errors

derekchiang closed this as completed May 5, 2016

chainlink added a commit that referenced this issue Sep 6, 2022

Linting (#161)

59a9d3c

* Remove unused orb * Initial pass for lint errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

start-kube-docker not working in Vagrant image #161

start-kube-docker not working in Vagrant image #161

brinman2002 commented Nov 27, 2015

brinman2002 commented Nov 27, 2015

brinman2002 commented Nov 27, 2015

brinman2002 commented Nov 30, 2015

jdoliner commented Nov 30, 2015

brinman2002 commented Dec 1, 2015

brinman2002 commented Dec 1, 2015

jaybennett89 commented Dec 1, 2015

brinman2002 commented Dec 3, 2015

jdoliner commented Dec 4, 2015

brinman2002 commented Dec 5, 2015

brinman2002 commented Dec 5, 2015

jdoliner commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

jdoliner commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

jdoliner commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

teodor-pripoae commented Dec 10, 2015

jdoliner commented Dec 10, 2015

start-kube-docker not working in Vagrant image #161

start-kube-docker not working in Vagrant image #161

Comments

brinman2002 commented Nov 27, 2015

brinman2002 commented Nov 27, 2015

brinman2002 commented Nov 27, 2015

brinman2002 commented Nov 30, 2015

jdoliner commented Nov 30, 2015

brinman2002 commented Dec 1, 2015

brinman2002 commented Dec 1, 2015

jaybennett89 commented Dec 1, 2015

brinman2002 commented Dec 3, 2015

jdoliner commented Dec 4, 2015

brinman2002 commented Dec 5, 2015

brinman2002 commented Dec 5, 2015

jdoliner commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

jdoliner commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

jdoliner commented Dec 6, 2015

brinman2002 commented Dec 6, 2015

teodor-pripoae commented Dec 10, 2015

jdoliner commented Dec 10, 2015