Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for minio deploy #1277

Merged
merged 2 commits into from Jan 27, 2017
Merged

Conversation

harshavardhana
Copy link
Contributor

This adds support for Minio and all other S3 compatible servers.
This patch also uses minio-go. This has an added benefit i.e
this can be used S3 as well transparently.

@jdoliner
Copy link
Member

@harshavardhana thanks so much for the PR, very excited to have support for minio and other s3 compatible stores. A few things I'd like to see added / changed before we merge this in:

  • I'd prefer to just call this the Minio backend rather than s3-compatible. A few reasons for this 1) it's a bit weird to have an s3-compatible backend and an s3 backend. Users might find that confusing. Eventually this backend might subsume the s3 backend in which case we'll consider this. 2) Lots of users have asked about a way to run this on prem, when this PR lands we're going to start recommending Minio as the best way to do that so it'll be more discoverable if we make the names match. 3) you guys wrote the code so I feel like you should get the name recognition that comes with it :)
  • In addition to the backend code for Minio we'll also need some code that creates a k8s secret with the Minio credentials for deployment. For an example checkout how we create AmazonSecrets. @dwhitena you've already done a bit of legwork on this yes?
  • It'd also be nice to have an example of how to deploy Minio on k8s that we could put in our docs so people have a complete guide for how to deploy an on prem Pachyderm cluster.
  • Lastly we'll need you to sign our CLA

@harshavardhana
Copy link
Contributor Author

answers inline

I'd prefer to just call this the Minio backend rather than s3-compatible. A few reasons for this 1) it's a bit weird to have an s3-compatible backend and an s3 backend. Users might find that confusing. Eventually this backend might subsume the s3 backend in which case we'll consider this. 2) Lots of users have asked about a way to run this on prem, when this PR lands we're going to start recommending Minio as the best way to do that so it'll be more discoverable if we make the names match. 3) you guys wrote the code so I feel like you should get the name recognition that comes with it :)

Certainly, i thought to keep it more generic and say Minio. Let me change it to minio.

In addition to the backend code for Minio we'll also need some code that creates a k8s secret with the Minio credentials for deployment. For an example checkout how we create AmazonSecrets. @dwhitena you've already done a bit of legwork on this yes?

I think i already did that with this patch - https://github.com/kubernetes/charts/tree/master/stable/minio

It'd also be nice to have an example of how to deploy Minio on k8s that we could put in our docs so people have a complete guide for how to deploy an on prem Pachyderm cluster.

Sure would a helm chart be sufficient for you guys?

Lastly we'll need you to sign our CLA

Already signed CLA as well.

@dwhitena
Copy link
Contributor

Ok, I did the following based on this PR:

  1. build pachctl with make install
  2. build the docker images with make docker-build
  3. re-tagged the docker images with a custom tag and pushed them to my docker hub
  4. created a local minio instance via Docker.
  5. ran ~/go/bin/pachctl deploy s3compatible docker <id> <secret> <end_point> --dry-run > test_minio.json to create a manifest for the deploy with minio.
  6. modified the test_minio.json manifest to pull my custom docker images for pachd and job-shim
  7. minikube start
  8. kubectl create -f test_minio.json

After this, it appears that the cluster is healthy:

NAME               READY     STATUS    RESTARTS   AGE
po/etcd-41dk2      1/1       Running   0          12m
po/pachd-j38gj     1/1       Running   2          12m
po/rethink-shidc   1/1       Running   0          12m

NAME         DESIRED   CURRENT   READY     AGE
rc/etcd      1         1         1         12m
rc/pachd     1         1         1         12m
rc/rethink   1         1         1         12m

NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)                                          AGE
svc/etcd         10.0.0.49    <none>        2379/TCP,2380/TCP                                12m
svc/kubernetes   10.0.0.1     <none>        443/TCP                                          14m
svc/pachd        10.0.0.122   <nodes>       650:30650/TCP,651:30651/TCP                      12m
svc/rethink      10.0.0.18    <nodes>       8080:32080/TCP,28015:32081/TCP,29015:30838/TCP   12m

NAME              DESIRED   SUCCESSFUL   AGE
jobs/pachd-init   1         1            12m

So, then I tried creating a repo and committing some data:

$ ~/go/bin/pachctl create-repo testminio
$ ~/go/bin/pachctl list-repo
NAME                CREATED             SIZE                
testminio           4 seconds ago       0 B 
$ ~/go/bin/pachctl put-file testminio master blah -c -f README.md 
$ pachctl list-repo
NAME                CREATED              SIZE                
testminio           About a minute ago   9.922 KiB

That all looks good. However, nothing shows up in Minio (even after a refresh):

image

And, looking at the pachd logs with kubectl logs <pachd pod name> reveals the following:

2017-01-20T21:04:21Z INFO  protorpclog.Call {"service":"pfs.BlockAPIServer.Local","method":"PutBlock","duration":"0.000s"}
2017-01-20T21:04:21Z INFO  protorpclog.Call {"service":"pfs.BlockAPIServer.Local","method":"PutBlock","response":"block_ref:\u003cblock:\u003chash:\"jWbSXl4TnwvJCf3hto9Ou0rECexi987etKKtIieYYQZy1yRyy9jT-iZHjZhY2KlSWpdRaVw8yYml6CgHrOHIDw==\" \u003e range:\u003cupper:5080 \u003e \u003e ","duration":"0.000496264s"}

Pachyderm is still using the local filesystem, not minio. So checking the manifest again (which I should have done to start) reveals, that indeed the deploy command didn't properly set the backend variable:

432               {
433                 "name": "STORAGE_BACKEND"
434               },

So, I then manually set this as:

              {
                "name": "STORAGE_BACKEND",
		"value": "S3COMPATIBLE"
              },

Then, I restarted my k8s cluster, and repeated the above steps to deploy pachyderm with the new manually modified manifest. This results in:

NAME               READY     STATUS             RESTARTS   AGE
po/etcd-tdn6h      1/1       Running            0          4m
po/pachd-zc3jg     0/1       CrashLoopBackOff   5          4m
po/rethink-bq2w5   1/1       Running            0          4m

NAME         DESIRED   CURRENT   READY     AGE
rc/etcd      1         1         1         4m
rc/pachd     1         1         0         4m
rc/rethink   1         1         1         4m

NAME             CLUSTER-IP   EXTERNAL-IP   PORT(S)                                          AGE
svc/etcd         10.0.0.132   <none>        2379/TCP,2380/TCP                                4m
svc/kubernetes   10.0.0.1     <none>        443/TCP                                          5m
svc/pachd        10.0.0.112   <nodes>       650:30650/TCP,651:30651/TCP                      4m
svc/rethink      10.0.0.192   <nodes>       8080:32080/TCP,28015:32081/TCP,29015:32604/TCP   4m

NAME              DESIRED   SUCCESSFUL   AGE
jobs/pachd-init   1         1            4m

Checking the logs we get:

$ kubectl logs pachd-zc3jg
2017-01-20T21:25:24Z INFO  shard.StartAssignRoles {}
open /amazon-secret/bucket: no such file or directory

It appears that it's not finding the bucket file. However, the secret is definitely in the manifest:

{
  "kind": "Secret",
  "apiVersion": "v1",
  "metadata": {
    "name": "amazon-secret",
    "creationTimestamp": null,
    "labels": {
      "app": "amazon-secret",
      "suite": "pachyderm"
    }
  },
  "data": {
    "secret": "cEFMTlFtZzh4N0JHbk14MkRFSWh2eVlZMjZDcVVwam9RV2F6NFVwYg==",
    "endpoint": "aHR0cDovLzE3Mi4xNy4wLjM6OTAwMA==",
    "secure": "MA==",
    "bucket": "ZG9ja2Vy",
    "id": "SDZVMUNSWFVKNUFYU0ZLREdPMlA="
  }
}

All in all, there seems to be two minor issues here (which I expect will be easy fixes):

  1. Enable pachctl deploy... to populate the correct backend variable in the manifest
  2. Fix the passing of the minio secret to pachd

@harshavardhana
Copy link
Contributor Author

I tried this not sure what ErrImagePull means.

$ kubectl get pods
NAME               READY     STATUS             RESTARTS   AGE
etcd-5xhsh         1/1       Running            0          28s
pachd-init-sfbst   0/1       ImagePullBackOff   0          28s
pachd-zl31g        0/1       ErrImagePull       0          28s
rethink-1k4g9      1/1       Running            0          28s

@harshavardhana
Copy link
Contributor Author

is this because i need to publish these?

@dwhitena
Copy link
Contributor

Yes, normally the deploy pulls the images corresponding to your release version. However we are trying to do something custom. You will need to put the images in some registry that pachyderm can pull from. I will pushed mine up to my docker hub. Then you can replace pachyderm/pachd and pachyderm/job-shim in the manifest (my test_minio.json) with your images. In my case dwhitena/minio-pachd etc. Is there an easy way to build and deploy custom pachd @jdoliner?

@jdoliner
Copy link
Member

@dwhitena @harshavardhana you can get deploy to just print out the manifest with --dry-run and then manually edit the images referenced to be what you want.

@harshavardhana
Copy link
Contributor Author

$ kubectl logs pachd-init-bwcdk
time="2017-01-21T00:35:57Z" level=warning msg="Error creating connection: gorethink: dial tcp 10.0.0.137:28015: getsockopt: connection refused" 
gorethink: dial tcp 10.0.0.137:28015: getsockopt: connection refused
$ kubectl logs pachd-kd0gv
gorethink: Database `pachyderm_pps` does not exist. in: 
r.DB("pachyderm_pps").Table("JobInfos").Wait()
$ kubectl logs pachd-kd0gv
gorethink: Database `pachyderm_pps` does not exist. in: 
r.DB("pachyderm_pps").Table("JobInfos").Wait()

After changing these to use y4m4/

@harshavardhana
Copy link
Contributor Author

$ kubectl get pods
NAME               READY     STATUS             RESTARTS   AGE
etcd-wpt8m         1/1       Running            0          3m
pachd-init-bwcdk   0/1       CrashLoopBackOff   4          3m
pachd-kd0gv        0/1       CrashLoopBackOff   4          3m
rethink-2160n      1/1       Running            0          3m

@harshavardhana
Copy link
Contributor Author

Looks like pods are using wrong ips they don't have 10.x.x. assigned and wrongly using a separate ip range as well are you guys aware of this @dwhitena @jdoliner ?

@jdoliner
Copy link
Member

@harshavardhana I haven't seen any problems like that with wrong IPs being assigned. Looking at the logs it seems that pachd-init is failing to get in contact with rethinkdb while pachd is succeeding but not finding the tables it needs because pachd-init isn't creating them. I'm a little confused about what exactly is being assigned a wrong IP address here, is it the rethink pod?

@harshavardhana
Copy link
Contributor Author

@harshavardhana I haven't seen any problems like that with wrong IPs being assigned. Looking at the logs it seems that pachd-init is failing to get in contact with rethinkdb while pachd is succeeding but not finding the tables it needs because pachd-init isn't creating them. I'm a little confused about what exactly is being assigned a wrong IP address here, is it the rethink pod?

Actually it worked fine but had do more changes.. Finishing them now there were few checks to disallow any backend other than localBackend to serve rethinkdb i just choose HostPath for minioBackend as well. Unlike EBS for amazonBackend.

Does this sound right to you @jdoliner @dwhitena ?

@jdoliner
Copy link
Member

Yeah that sounds reasonable, we're likely going to need to make things a little bit more sophisticated now since people will probably want to be able to do minio with an ebs volume for amazon deploys or a PD for GCE deploys or what have you. But that's not necessary for the purposes of this PR.

@harshavardhana
Copy link
Contributor Author

Yeah that sounds reasonable, we're likely going to need to make things a little bit more sophisticated now since people will probably want to be able to do minio with an ebs volume for amazon deploys or a PD for GCE deploys or what have you. But that's not necessary for the purposes of this PR.

Understood..

@harshavardhana harshavardhana force-pushed the minio-support branch 2 times, most recently from d41e643 to 1d6babf Compare January 23, 2017 23:53
@dwhitena
Copy link
Contributor

Ok @harshavardhana, we have Pachyderm running backed by Minio:

image

After updating pachctl and the docker images to your latest and redeploying, I was able to successfully deploy the cluster locally with minikube and minio. However, upon trying to commit data into Pachyderm, I was getting:

$ ~/go/bin/pachctl put-file testminio master blah -c -f README.md 
Get http://127.0.0.1:9000/docker/?location=: dial tcp 127.0.0.1:9000: getsockopt: connection refused

The problem here was that minikube is running in a VM locally, and thus it was seeing 127.0.0.1 as the VM IP. In order to fix this, I did our minio deploy with an endpoint of 10.0.2.2:9000, where 10.0.2.2 is the IP that virtualbox uses to access the actual localhost.

When I did that. Boom! Everything works great. I can commit data in and it goes right into Minio. Great work!

We need to resolve the following points before or after merging this:

  • This minio deploy works for a locally deployed pachyderm. However, we really want to deploy our clusters to the cloud. This brings up an interesting question, because we could have a minio deploy to AWS, Google, or Azure. The deploy should be very similar, but we need to decide if we want 4 different commands (minio-local, minio-aws, etc.), or if we make subcommands etc.
  • We will need to add some quick docs to this, which I can tackle.

@harshavardhana
Copy link
Contributor Author

This minio deploy works for a locally deployed pachyderm. However, we really want to deploy our clusters to the cloud. This brings up an interesting question, because we could have a minio deploy to AWS, Google, or Azure. The deploy should be very similar, but we need to decide if we want 4 different commands (minio-local, minio-aws, etc.), or if we make subcommands etc

The problem i see with choosing different sets of credentials is that the CLI should be simplified and it can get really hard to specify many such options on command line.

I am not sure if there is a work on this area already - to make it simple. Ideally it would be better to be in a single command so that documentation becomes easier. Sub-commands for minio choosing different cloud flavors is also an option.

Any other thoughts? - once we finalize i can work this out and send another PR after this.

@harshavardhana harshavardhana changed the title [PROPOSAL] Add S3 compatible client support. Add minio support. Jan 27, 2017
@dwhitena
Copy link
Contributor

After talking with @jdoliner the plan for next things are:
(1) merge this local version (when fully reviews and tests pass)
(2) work on a PR to re-organize the commands to support all the combinations we have now (as mentioned by @harshavardhana above)
(3) possibly replace the existing amazon SDK stuff with the minio client (as it is duplicate logic)

func NewBlockAPIServer(dir string, cacheBytes int64, backend string) (pfsclient.BlockAPIServer, error) {
log.Info("Initializing new blockAPIServer", dir, cacheBytes, backend)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to leave this in? It doesn't seem like a particularly important thing to inform the user of, you'll basically just see this once on startup. Also log messages should use lion right now to match the rest of our code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no this is not necessary.

@@ -40,6 +41,7 @@ func DeployCmd(noMetrics *bool) *cobra.Command {
var hostPath string
var dev bool
var dryRun bool
var isSecure bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, I'd call this secure, no need for the "is"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

}

func (c *minioClient) Walk(name string, fn func(name string) error) error {
isRecursive := true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another very small nit, I'd call this recursive not isRecursive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@jdoliner
Copy link
Member

LGTM after those comments

@harshavardhana
Copy link
Contributor Author

Oops looks like i force pushed it and lost @dwhitena commit, Can you send it again? sorry.

@harshavardhana
Copy link
Contributor Author

LGTM after those comments

Addressed all the comments.

@harshavardhana harshavardhana changed the title Add minio support. Add support for minio deploy Jan 27, 2017
This adds support for Minio and all other S3 compatible servers.
This patch also uses `minio-go`. This has an added benefit i.e
this can be used S3 as well transparently.
@dwhitena dwhitena merged commit 7d12823 into pachyderm:master Jan 27, 2017
@harshavardhana harshavardhana deleted the minio-support branch January 27, 2017 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants