Add support for minio deploy #1277

harshavardhana · 2017-01-18T22:07:55Z

This adds support for Minio and all other S3 compatible servers.
This patch also uses minio-go. This has an added benefit i.e
this can be used S3 as well transparently.

jdoliner · 2017-01-20T01:55:17Z

@harshavardhana thanks so much for the PR, very excited to have support for minio and other s3 compatible stores. A few things I'd like to see added / changed before we merge this in:

I'd prefer to just call this the Minio backend rather than s3-compatible. A few reasons for this 1) it's a bit weird to have an s3-compatible backend and an s3 backend. Users might find that confusing. Eventually this backend might subsume the s3 backend in which case we'll consider this. 2) Lots of users have asked about a way to run this on prem, when this PR lands we're going to start recommending Minio as the best way to do that so it'll be more discoverable if we make the names match. 3) you guys wrote the code so I feel like you should get the name recognition that comes with it :)
In addition to the backend code for Minio we'll also need some code that creates a k8s secret with the Minio credentials for deployment. For an example checkout how we create AmazonSecrets. @dwhitena you've already done a bit of legwork on this yes?
It'd also be nice to have an example of how to deploy Minio on k8s that we could put in our docs so people have a complete guide for how to deploy an on prem Pachyderm cluster.
Lastly we'll need you to sign our CLA

harshavardhana · 2017-01-20T02:15:26Z

answers inline

I'd prefer to just call this the Minio backend rather than s3-compatible. A few reasons for this 1) it's a bit weird to have an s3-compatible backend and an s3 backend. Users might find that confusing. Eventually this backend might subsume the s3 backend in which case we'll consider this. 2) Lots of users have asked about a way to run this on prem, when this PR lands we're going to start recommending Minio as the best way to do that so it'll be more discoverable if we make the names match. 3) you guys wrote the code so I feel like you should get the name recognition that comes with it :)

Certainly, i thought to keep it more generic and say Minio. Let me change it to minio.

In addition to the backend code for Minio we'll also need some code that creates a k8s secret with the Minio credentials for deployment. For an example checkout how we create AmazonSecrets. @dwhitena you've already done a bit of legwork on this yes?

I think i already did that with this patch - https://github.com/kubernetes/charts/tree/master/stable/minio

It'd also be nice to have an example of how to deploy Minio on k8s that we could put in our docs so people have a complete guide for how to deploy an on prem Pachyderm cluster.

Sure would a helm chart be sufficient for you guys?

Lastly we'll need you to sign our CLA

Already signed CLA as well.

dwhitena · 2017-01-20T21:29:22Z

Ok, I did the following based on this PR:

build pachctl with make install
build the docker images with make docker-build
re-tagged the docker images with a custom tag and pushed them to my docker hub
created a local minio instance via Docker.
ran ~/go/bin/pachctl deploy s3compatible docker <id> <secret> <end_point> --dry-run > test_minio.json to create a manifest for the deploy with minio.
modified the test_minio.json manifest to pull my custom docker images for pachd and job-shim
minikube start
kubectl create -f test_minio.json

After this, it appears that the cluster is healthy:

NAME               READY     STATUS    RESTARTS   AGE
po/etcd-41dk2      1/1       Running   0          12m
po/pachd-j38gj     1/1       Running   2          12m
po/rethink-shidc   1/1       Running   0          12m

NAME         DESIRED   CURRENT   READY     AGE
rc/etcd      1         1         1         12m
rc/pachd     1         1         1         12m
rc/rethink   1         1         1         12m

NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)                                          AGE
svc/etcd         10.0.0.49    <none>        2379/TCP,2380/TCP                                12m
svc/kubernetes   10.0.0.1     <none>        443/TCP                                          14m
svc/pachd        10.0.0.122   <nodes>       650:30650/TCP,651:30651/TCP                      12m
svc/rethink      10.0.0.18    <nodes>       8080:32080/TCP,28015:32081/TCP,29015:30838/TCP   12m

NAME              DESIRED   SUCCESSFUL   AGE
jobs/pachd-init   1         1            12m

So, then I tried creating a repo and committing some data:

$ ~/go/bin/pachctl create-repo testminio
$ ~/go/bin/pachctl list-repo
NAME                CREATED             SIZE                
testminio           4 seconds ago       0 B 
$ ~/go/bin/pachctl put-file testminio master blah -c -f README.md 
$ pachctl list-repo
NAME                CREATED              SIZE                
testminio           About a minute ago   9.922 KiB

That all looks good. However, nothing shows up in Minio (even after a refresh):

And, looking at the pachd logs with kubectl logs <pachd pod name> reveals the following:

2017-01-20T21:04:21Z INFO  protorpclog.Call {"service":"pfs.BlockAPIServer.Local","method":"PutBlock","duration":"0.000s"}
2017-01-20T21:04:21Z INFO  protorpclog.Call {"service":"pfs.BlockAPIServer.Local","method":"PutBlock","response":"block_ref:\u003cblock:\u003chash:\"jWbSXl4TnwvJCf3hto9Ou0rECexi987etKKtIieYYQZy1yRyy9jT-iZHjZhY2KlSWpdRaVw8yYml6CgHrOHIDw==\" \u003e range:\u003cupper:5080 \u003e \u003e ","duration":"0.000496264s"}

Pachyderm is still using the local filesystem, not minio. So checking the manifest again (which I should have done to start) reveals, that indeed the deploy command didn't properly set the backend variable:

432               {
433                 "name": "STORAGE_BACKEND"
434               },

So, I then manually set this as:

              {
                "name": "STORAGE_BACKEND",
		"value": "S3COMPATIBLE"
              },

Then, I restarted my k8s cluster, and repeated the above steps to deploy pachyderm with the new manually modified manifest. This results in:

NAME               READY     STATUS             RESTARTS   AGE
po/etcd-tdn6h      1/1       Running            0          4m
po/pachd-zc3jg     0/1       CrashLoopBackOff   5          4m
po/rethink-bq2w5   1/1       Running            0          4m

NAME         DESIRED   CURRENT   READY     AGE
rc/etcd      1         1         1         4m
rc/pachd     1         1         0         4m
rc/rethink   1         1         1         4m

NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)                                          AGE
svc/etcd         10.0.0.132   <none>        2379/TCP,2380/TCP                                4m
svc/kubernetes   10.0.0.1     <none>        443/TCP                                          5m
svc/pachd        10.0.0.112   <nodes>       650:30650/TCP,651:30651/TCP                      4m
svc/rethink      10.0.0.192   <nodes>       8080:32080/TCP,28015:32081/TCP,29015:32604/TCP   4m

NAME              DESIRED   SUCCESSFUL   AGE
jobs/pachd-init   1         1            4m

Checking the logs we get:

$ kubectl logs pachd-zc3jg
2017-01-20T21:25:24Z INFO  shard.StartAssignRoles {}
open /amazon-secret/bucket: no such file or directory

It appears that it's not finding the bucket file. However, the secret is definitely in the manifest:

{
  "kind": "Secret",
  "apiVersion": "v1",
  "metadata": {
    "name": "amazon-secret",
    "creationTimestamp": null,
    "labels": {
      "app": "amazon-secret",
      "suite": "pachyderm"
    }
  },
  "data": {
    "secret": "cEFMTlFtZzh4N0JHbk14MkRFSWh2eVlZMjZDcVVwam9RV2F6NFVwYg==",
    "endpoint": "aHR0cDovLzE3Mi4xNy4wLjM6OTAwMA==",
    "secure": "MA==",
    "bucket": "ZG9ja2Vy",
    "id": "SDZVMUNSWFVKNUFYU0ZLREdPMlA="
  }
}

All in all, there seems to be two minor issues here (which I expect will be easy fixes):

Enable pachctl deploy... to populate the correct backend variable in the manifest
Fix the passing of the minio secret to pachd

harshavardhana · 2017-01-21T00:00:03Z

I tried this not sure what ErrImagePull means.

$ kubectl get pods
NAME               READY     STATUS             RESTARTS   AGE
etcd-5xhsh         1/1       Running            0          28s
pachd-init-sfbst   0/1       ImagePullBackOff   0          28s
pachd-zl31g        0/1       ErrImagePull       0          28s
rethink-1k4g9      1/1       Running            0          28s

harshavardhana · 2017-01-21T00:00:13Z

is this because i need to publish these?

dwhitena · 2017-01-21T00:09:37Z

Yes, normally the deploy pulls the images corresponding to your release version. However we are trying to do something custom. You will need to put the images in some registry that pachyderm can pull from. I will pushed mine up to my docker hub. Then you can replace pachyderm/pachd and pachyderm/job-shim in the manifest (my test_minio.json) with your images. In my case dwhitena/minio-pachd etc. Is there an easy way to build and deploy custom pachd @jdoliner?

jdoliner · 2017-01-21T00:20:31Z

@dwhitena @harshavardhana you can get deploy to just print out the manifest with --dry-run and then manually edit the images referenced to be what you want.

harshavardhana · 2017-01-21T00:36:41Z

$ kubectl logs pachd-init-bwcdk
time="2017-01-21T00:35:57Z" level=warning msg="Error creating connection: gorethink: dial tcp 10.0.0.137:28015: getsockopt: connection refused" 
gorethink: dial tcp 10.0.0.137:28015: getsockopt: connection refused
$ kubectl logs pachd-kd0gv
gorethink: Database `pachyderm_pps` does not exist. in: 
r.DB("pachyderm_pps").Table("JobInfos").Wait()
$ kubectl logs pachd-kd0gv
gorethink: Database `pachyderm_pps` does not exist. in: 
r.DB("pachyderm_pps").Table("JobInfos").Wait()

After changing these to use y4m4/

harshavardhana · 2017-01-21T00:37:35Z

$ kubectl get pods
NAME               READY     STATUS             RESTARTS   AGE
etcd-wpt8m         1/1       Running            0          3m
pachd-init-bwcdk   0/1       CrashLoopBackOff   4          3m
pachd-kd0gv        0/1       CrashLoopBackOff   4          3m
rethink-2160n      1/1       Running            0          3m

harshavardhana · 2017-01-21T00:42:38Z

Looks like pods are using wrong ips they don't have 10.x.x. assigned and wrongly using a separate ip range as well are you guys aware of this @dwhitena @jdoliner ?

jdoliner · 2017-01-21T00:45:32Z

@harshavardhana I haven't seen any problems like that with wrong IPs being assigned. Looking at the logs it seems that pachd-init is failing to get in contact with rethinkdb while pachd is succeeding but not finding the tables it needs because pachd-init isn't creating them. I'm a little confused about what exactly is being assigned a wrong IP address here, is it the rethink pod?

harshavardhana · 2017-01-21T01:09:24Z

@harshavardhana I haven't seen any problems like that with wrong IPs being assigned. Looking at the logs it seems that pachd-init is failing to get in contact with rethinkdb while pachd is succeeding but not finding the tables it needs because pachd-init isn't creating them. I'm a little confused about what exactly is being assigned a wrong IP address here, is it the rethink pod?

Actually it worked fine but had do more changes.. Finishing them now there were few checks to disallow any backend other than localBackend to serve rethinkdb i just choose HostPath for minioBackend as well. Unlike EBS for amazonBackend.

Does this sound right to you @jdoliner @dwhitena ?

jdoliner · 2017-01-21T01:15:52Z

Yeah that sounds reasonable, we're likely going to need to make things a little bit more sophisticated now since people will probably want to be able to do minio with an ebs volume for amazon deploys or a PD for GCE deploys or what have you. But that's not necessary for the purposes of this PR.

harshavardhana · 2017-01-21T01:16:53Z

Yeah that sounds reasonable, we're likely going to need to make things a little bit more sophisticated now since people will probably want to be able to do minio with an ebs volume for amazon deploys or a PD for GCE deploys or what have you. But that's not necessary for the purposes of this PR.

Understood..

dwhitena · 2017-01-26T17:45:28Z

Ok @harshavardhana, we have Pachyderm running backed by Minio:

After updating pachctl and the docker images to your latest and redeploying, I was able to successfully deploy the cluster locally with minikube and minio. However, upon trying to commit data into Pachyderm, I was getting:

$ ~/go/bin/pachctl put-file testminio master blah -c -f README.md 
Get http://127.0.0.1:9000/docker/?location=: dial tcp 127.0.0.1:9000: getsockopt: connection refused

The problem here was that minikube is running in a VM locally, and thus it was seeing 127.0.0.1 as the VM IP. In order to fix this, I did our minio deploy with an endpoint of 10.0.2.2:9000, where 10.0.2.2 is the IP that virtualbox uses to access the actual localhost.

When I did that. Boom! Everything works great. I can commit data in and it goes right into Minio. Great work!

We need to resolve the following points before or after merging this:

This minio deploy works for a locally deployed pachyderm. However, we really want to deploy our clusters to the cloud. This brings up an interesting question, because we could have a minio deploy to AWS, Google, or Azure. The deploy should be very similar, but we need to decide if we want 4 different commands (minio-local, minio-aws, etc.), or if we make subcommands etc.
We will need to add some quick docs to this, which I can tackle.

harshavardhana · 2017-01-26T18:45:33Z

This minio deploy works for a locally deployed pachyderm. However, we really want to deploy our clusters to the cloud. This brings up an interesting question, because we could have a minio deploy to AWS, Google, or Azure. The deploy should be very similar, but we need to decide if we want 4 different commands (minio-local, minio-aws, etc.), or if we make subcommands etc

The problem i see with choosing different sets of credentials is that the CLI should be simplified and it can get really hard to specify many such options on command line.

I am not sure if there is a work on this area already - to make it simple. Ideally it would be better to be in a single command so that documentation becomes easier. Sub-commands for minio choosing different cloud flavors is also an option.

Any other thoughts? - once we finalize i can work this out and send another PR after this.

dwhitena · 2017-01-27T19:02:33Z

After talking with @jdoliner the plan for next things are:
(1) merge this local version (when fully reviews and tests pass)
(2) work on a PR to re-organize the commands to support all the combinations we have now (as mentioned by @harshavardhana above)
(3) possibly replace the existing amazon SDK stuff with the minio client (as it is duplicate logic)

jdoliner · 2017-01-27T20:27:39Z

src/server/pfs/server/server.go

 func NewBlockAPIServer(dir string, cacheBytes int64, backend string) (pfsclient.BlockAPIServer, error) {
+	log.Info("Initializing new blockAPIServer", dir, cacheBytes, backend)


Did you mean to leave this in? It doesn't seem like a particularly important thing to inform the user of, you'll basically just see this once on startup. Also log messages should use lion right now to match the rest of our code.

Oh no this is not necessary.

jdoliner · 2017-01-27T20:29:21Z

src/server/pkg/deploy/cmds/cmds.go

@@ -40,6 +41,7 @@ func DeployCmd(noMetrics *bool) *cobra.Command {
 	var hostPath string
 	var dev bool
 	var dryRun bool
+	var isSecure bool


Small nit, I'd call this secure, no need for the "is"

jdoliner · 2017-01-27T20:31:41Z

src/server/pkg/obj/minio_client.go

+}
+
+func (c *minioClient) Walk(name string, fn func(name string) error) error {
+	isRecursive := true


Another very small nit, I'd call this recursive not isRecursive

jdoliner · 2017-01-27T20:33:49Z

LGTM after those comments

harshavardhana · 2017-01-27T20:53:08Z

Oops looks like i force pushed it and lost @dwhitena commit, Can you send it again? sorry.

harshavardhana · 2017-01-27T20:53:30Z

LGTM after those comments

Addressed all the comments.

This adds support for Minio and all other S3 compatible servers. This patch also uses `minio-go`. This has an added benefit i.e this can be used S3 as well transparently.

harshavardhana force-pushed the minio-support branch from 2a14abd to 6b471e2 Compare January 19, 2017 04:16

harshavardhana force-pushed the minio-support branch from 6b471e2 to 4ae7491 Compare January 20, 2017 23:08

harshavardhana force-pushed the minio-support branch from 4ae7491 to 8415c63 Compare January 21, 2017 00:59

harshavardhana force-pushed the minio-support branch 2 times, most recently from d41e643 to 1d6babf Compare January 23, 2017 23:53

harshavardhana force-pushed the minio-support branch from 1d6babf to eac9482 Compare January 26, 2017 18:58

harshavardhana changed the title ~~[PROPOSAL] Add S3 compatible client support.~~ Add minio support. Jan 27, 2017

jdoliner reviewed Jan 27, 2017

View reviewed changes

harshavardhana force-pushed the minio-support branch from 4f04c12 to ee99982 Compare January 27, 2017 20:52

harshavardhana changed the title ~~Add minio support.~~ Add support for minio deploy Jan 27, 2017

Add support for minio deploy.

17979d7

This adds support for Minio and all other S3 compatible servers. This patch also uses `minio-go`. This has an added benefit i.e this can be used S3 as well transparently.

harshavardhana force-pushed the minio-support branch from ee99982 to 17979d7 Compare January 27, 2017 21:00

new docs for the minio support (#3)

2aa946a

dwhitena merged commit 7d12823 into pachyderm:master Jan 27, 2017

harshavardhana deleted the minio-support branch January 27, 2017 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for minio deploy #1277

Add support for minio deploy #1277

harshavardhana commented Jan 18, 2017

jdoliner commented Jan 20, 2017

harshavardhana commented Jan 20, 2017

dwhitena commented Jan 20, 2017

harshavardhana commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

dwhitena commented Jan 21, 2017

jdoliner commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

jdoliner commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

jdoliner commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

dwhitena commented Jan 26, 2017

harshavardhana commented Jan 26, 2017

dwhitena commented Jan 27, 2017

jdoliner Jan 27, 2017

harshavardhana Jan 27, 2017

jdoliner Jan 27, 2017

harshavardhana Jan 27, 2017

jdoliner Jan 27, 2017

harshavardhana Jan 27, 2017

jdoliner commented Jan 27, 2017

harshavardhana commented Jan 27, 2017

harshavardhana commented Jan 27, 2017

		func NewBlockAPIServer(dir string, cacheBytes int64, backend string) (pfsclient.BlockAPIServer, error) {
		log.Info("Initializing new blockAPIServer", dir, cacheBytes, backend)

Add support for minio deploy #1277

Add support for minio deploy #1277

Conversation

harshavardhana commented Jan 18, 2017

jdoliner commented Jan 20, 2017

harshavardhana commented Jan 20, 2017

dwhitena commented Jan 20, 2017

harshavardhana commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

dwhitena commented Jan 21, 2017

jdoliner commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

jdoliner commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

jdoliner commented Jan 21, 2017

harshavardhana commented Jan 21, 2017

dwhitena commented Jan 26, 2017

harshavardhana commented Jan 26, 2017

dwhitena commented Jan 27, 2017

jdoliner Jan 27, 2017

Choose a reason for hiding this comment

harshavardhana Jan 27, 2017

Choose a reason for hiding this comment

jdoliner Jan 27, 2017

Choose a reason for hiding this comment

harshavardhana Jan 27, 2017

Choose a reason for hiding this comment

jdoliner Jan 27, 2017

Choose a reason for hiding this comment

harshavardhana Jan 27, 2017

Choose a reason for hiding this comment

jdoliner commented Jan 27, 2017

harshavardhana commented Jan 27, 2017

harshavardhana commented Jan 27, 2017