Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Using garden to deploy ML models with seldon #814

Closed
timonbimon opened this issue Jun 4, 2019 · 17 comments
Closed

[Question] Using garden to deploy ML models with seldon #814

timonbimon opened this issue Jun 4, 2019 · 17 comments

Comments

@timonbimon
Copy link

timonbimon commented Jun 4, 2019

Disclaimer

Garden looks awesome, thanks for building it!!!. :)

I am very new to the whole Kubernetes kerfuffle and due to my inexperience with Kubernetes I am not sure which of the questions that follow are due to me misunderstanding something/missing something and thus misusing/misconfiguring something and which are good questions in the sense of missing support on the Garden (or possibly seldon-core) side of things.

Also I know this is a pretty long issue/question, but hopefully this can

  • help me get a grip on garden + k8s
  • help you update the docs s.t. other newbies like me can find their way when using Garden (if there is anything missing - still possible that I just did not read carefully enough)
  • (potentially help garden get some exposure in the ML community: if I get this to work I'd like to write a short blog post about the setup and add it to the tutorials in the seldon docs)

Context for my question

I want to deploy a machine learning model using seldon-core. If you look at an example workflow here, you can see that it involves many consecutive steps of messing with kubectl, so I thought Garden would be a great way to bring some structure to the insanity.

What I currently have are three services (aka helm charts - seldon-core-operator, seldon-single-model and ambassador; the first two can be found here) and one custom container module (that I called ml-service-image that shoud be deployed via seldon-single-model) that defines my machine learning model I want to deploy.

Here are the garden.yml files I currently have.

Project-level

project:
  name: seldon-core
  environments:
    - name: local
      providers:
        - name: local-kubernetes
          setupIngressController: false

Seldon Core Operator

module:
  description: Seldon Core operator running Seldon Core CRD and Controller
  name: seldon-core-operator
  type: helm
  repo: https://storage.googleapis.com/seldon-charts
  chart: seldon-core-operator
  values:
    usageMetrics:
      enabled: true

Seldon Single Model

module:
  description: Seldon service for deploying a single model
  name: seldon-ml-service
  type: helm
  serviceResource:
    containerModule: ml-service-image
  repo: https://storage.googleapis.com/seldon-charts
  chart: seldon-single-model
  build:
    dependencies: [ml-service-image]
 # metadata:
 #   ports:
 #     - name: http
 #       containerPort: 8080
 #       # Maps service:80 -> container:8080
 #       servicePort: 80
 #   annotations:
 #     getambassador.io/config: |
 #       ---
 #       apiVersion: ambassador/v1
 #       kind:  Mapping
 #       name:  ml-service_mapping
 #       prefix: /ml-service/
 #       service: ml-service:80
  values:
    model:
      image:
        name: ${modules.ml-service-image.version}

ML Service Image

module:
  description: ML service container with Seldon-conformant API
  name: ml-service-image
  type: container
  tests:
    - name: integration
      args: ["seldon-core-microservice-tester", "contract.json", "0.0.0.0", "5000", "-p"]

Ambassador

module:
  description: Ambassador API Gateway
  type: helm
  name: ambassador
  chart: stable/ambassador
  values:
    service:
      annotations:
        getambassador.io/config: |
          ---
          apiVersion: ambassador/v1
          kind: Module
          name: ambassador
          config:
            service_port: 8080 # Set port since the default ingress already occupies the default port
      http:
        port: 8080 # Set port since the default ingress already occupies the default port

You can have a look here for the full setup.

Questions

  1. garden build works fine. When running garden deploy I get the following error:
Failed deploying service seldon-ml-service (from module seldon-ml-service). Here is the output:
————————————————————————————————————————————————————————————————————————————————
Could not find resource type machinelearning.seldon.io/v1alpha2/SeldonDeployment
————————————————————————————————————————————————————————————————————————————————


Failed deploying service seldon-core-operator (from module seldon-core-operator). Here is the output:
————————————————————————————————————————————————————————————————————————————————
Invalid apiVersion v1
————————————————————————————————————————————————————————————————————————————————

How can I fix this? It seems related to the fact that seldon uses custom resource definitions that garden cannot find.

  1. I am not sure how to add the Ambassador ingress to the seldon-single-model service which is a helm chart (that's why there is some uncommented stuff) - I could not find anything like this in the other examples. How would I do this the right way?

  2. I could not really find good documentation on the serviceResource and containerModule key words for helm modules. I added it in the garden.yml for seldon-single-model, because I saw it in some of the other examples, but I am not sure if it is needed. Is it?

Thanks a lot for your help! :)

@edvald
Copy link
Collaborator

edvald commented Jun 4, 2019

Hey @timonbimon! Thanks for the kind words :)

Happy to help with that.

First thing I notice is that seldon-ml-service should depend on seldon-core-operator, like so:

module:
  description: Seldon service for deploying a single model
  name: seldon-ml-service
  type: helm
  dependencies: [seldon-core-operator]   # <---
...

This should make sure the CRDs and operator are configured before trying to deploy something that needs the CRD.

What I'm less sure about is the error when deploying the operator. It's possible that their Helm chart references a native Kubernetes resource that isn't yet supported in your version of K8s. Which Kubernetes version are you running?

I'm kinda browsing the Seldon docs to figure out how best to approach that part, but it looks to me like you might need to take the seldon-single-model example chart and modify it directly to your needs. You can achieve that by copying the files from https://github.com/SeldonIO/seldon-core/tree/master/helm-charts/seldon-single-model, putting those next to the garden.yml for the service, and removing the repo and chart keys from the module config.

That is, if the service needs to be exposed through ingress. Nice thing about the container tests, is that they actually run inside the cluster. The test container should be able to reach the service directly via the seldon-single-model hostname, if Seldon exposes the appropriate port. I'm guessing a little bit because Seldon is new to me, but I hope that's at least helpful.

You probably don't need those keywords, since you're not using hot reloading or tasks for those services. The reference is here if you need it though.

I hope that's useful. We'll think on how we could improve our docs as well. Meanwhile, keep the questions coming, we're happy to assist. Also you're welcome to join our Slack if you'd like!

@timonbimon
Copy link
Author

Cool, thanks for the detailed reply! :)

  1. I updated the garden.yml of seldon-single-model to
module:
  description: Seldon service for deploying a single model
  name: seldon-ml-service
  type: helm
  repo: https://storage.googleapis.com/seldon-charts
  chart: seldon-single-model
  build:
    dependencies: [seldon-core-operator, ml-service-image]
  values:
    model:
      image:
        name: ${modules.ml-service-image.version}

It also depends on the ml-service-image as far as I understand.
As to the k8s version: I was running 1.10.11 - I just tried updating to the Edge Release of Docker for Mac with Kubernetes version 1.14.1

=> Unfortunately I am still getting the same exact two errors as noted above. Do you have any other ideas?

  1. Thanks for the hint! I'll try to figure that out once 1) is resolved
  2. Sounds good! I completely missed the reference!

@timonbimon
Copy link
Author

timonbimon commented Jun 5, 2019

When I look at the Helm guide for developing charts (https://helm.sh/docs/developing_charts/) - it says "apiVersion: The chart API version, always "v1" (required)" - so it seems like the v1 thing is pretty standard for helm charts, so not sure why this would throw an error...

@timonbimon
Copy link
Author

timonbimon commented Jun 5, 2019

Hmm I am starting to feel this might be a problem on Garden's side. I can install the chart via helm.

Here is the minimal setup, maybe you can reproduce the error on your side?

Project level

project:
  name: seldon
  environments:
    - name: local
      providers:
        - name: local-kubernetes
          setupIngressController: false

in a folder called seldon-core-operator

module:
  description: Seldon Core operator running Seldon Core CRD and Controller
  name: seldon-core-operator
  type: helm
  repo: https://storage.googleapis.com/seldon-charts
  chart: seldon-core-operator
  version: 0.3.0
  values:
    usageMetrics:
      enabled: true

@edvald
Copy link
Collaborator

edvald commented Jun 6, 2019

Ah, maybe it is a Garden issue. If it is, we'll fix it ahead of our next release. I'll test it and see what I find.

@timonbimon
Copy link
Author

timonbimon commented Jun 6, 2019

i just saw i copy-pasted the same garden.yml twice 🙈 just updated the comment above
and thanks for having a look at this :)

@edvald
Copy link
Collaborator

edvald commented Jun 7, 2019

I think I've fixed this in #826, which will be in our next release (0.10), hopefully out next week. Turns out this belonged to a class of issues relating to our API wrapper for Kubernetes, and how we were handling certain resource types. Sorry about that!

I do see one more thing in your config. You should put seldon-core-operator as a runtime dependency and not a build dependency. Meaning, it should be under dependencies and not build.dependencies, because one needs to be deployed before the other, as opposed to built before. I hope that makes sense. That's for sure one thing we could explain in more detail in our docs, I'll make a note of that.

@timonbimon
Copy link
Author

do you already have an ETA for the next release? :)

@edvald
Copy link
Collaborator

edvald commented Jun 12, 2019

We hope to have an RC today or tomorrow, release shortly after :)

@edvald
Copy link
Collaborator

edvald commented Jun 14, 2019

@timonbimon our first RC is ready: https://github.com/garden-io/garden/releases/tag/v0.10.0-0

If you can try it and see if the problem is fixed, we can make 100% sure it's all set before the final 0.10 release. We'll probably do one more RC and then final release.

Your Docker for Mac version should be fine btw, anything fairly recent should do nicely.

Then our 0.10 release adds much improved support for remote clusters, so if you have a dev cluster somewhere you can use that instead of running K8s locally, whichever works best.

@timonbimon
Copy link
Author

timonbimon commented Jun 14, 2019

Ok, great, we seem to get a little bit ahead, unfortunately there is still an error left. :/

Current status of my garden.ymls

Project-level

project:
  name: seldon
  environments:
    - name: local
      providers:
        - name: local-kubernetes
          setupIngressController: false

seldon-single-model

module:
  description: Seldon service for deploying a single model
  name: seldon-ml-service
  type: helm
  repo: https://storage.googleapis.com/seldon-charts
  chart: seldon-single-model
  dependencies: [seldon-core-operator]
  build:
    dependencies: [ml-service-image]
 # metadata:
 #   ports:
 #     - name: http
 #       containerPort: 8080
 #       # Maps service:80 -> container:8080
 #       servicePort: 80
 #   annotations:
 #     getambassador.io/config: |
 #       ---
 #       apiVersion: ambassador/v1
 #       kind:  Mapping
 #       name:  ml-service_mapping
 #       prefix: /ml-service/
 #       service: ml-service:80
  values:
    model:
      image:
        name: ${modules.ml-service-image.version}

seldon-core-operator

module:
  description: Seldon Core operator running Seldon Core CRD and Controller
  name: seldon-core-operator
  type: helm
  repo: https://storage.googleapis.com/seldon-charts
  chart: seldon-core-operator
  version: 0.3.0
  values:
    usageMetrics:
      enabled: true

ml-service-image

module:
  description: ML service container with Seldon-conformant API
  name: ml-service-image
  type: container
  tests:
    - name: integration
      args: ["seldon-core-microservice-tester", "contract.json", "0.0.0.0", "5000", "-p"]

ambassador

module:
  description: Ambassador API Gateway
  type: helm
  name: ambassador
  chart: stable/ambassador
  values:
    service:
      annotations:
        getambassador.io/config: |
          ---
          apiVersion: ambassador/v1
          kind: Module
          name: ambassador
          config:
            service_port: 8080 # Set port since the default ingress already occupies the default port
      http:
        port: 8080 # Set port since the default ingress already occupies the default port

garden build works like a charm

garden deploy fails with

Failed deploying service seldon-core-operator (from module seldon-core-operator). Here is the output:
————————————————————————————————————————————————————————————————————————————————
Unrecognized resource type v1/List
————————————————————————————————————————————————————————————————————————————————


1 deploy task(s) failed!

And here is the detailed error log, if that helps:


[2019-06-14T13:27:21.024Z] Error: Unrecognized resource type v1/List
    at KubeApi.<anonymous> (/snapshot/dist/build/src/plugins/kubernetes/api.js:160:23)
    at Generator.next (<anonymous>)
    at fulfilled (/snapshot/dist/build/src/plugins/kubernetes/api.js:11:58)
Error Details:
manifest:
  apiVersion: v1
  items:
    - apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: seldon-spartakus-volunteer
        namespace: kube-system
    - apiVersion: rbac.authorization.k8s.io/v1beta1
      kind: ClusterRole
      metadata:
        name: seldon-spartakus-volunteer
      rules:
        - apiGroups:
            - ''
          resources:
            - nodes
          verbs:
            - list
    - apiVersion: rbac.authorization.k8s.io/v1beta1
      kind: ClusterRoleBinding
      metadata:
        name: seldon-spartakus-volunteer
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: ClusterRole
        name: seldon-spartakus-volunteer
      subjects:
        - kind: ServiceAccount
          name: seldon-spartakus-volunteer
          namespace: kube-system
  kind: List
  metadata:
    annotations: {}

Any idea what to do with this?

@timonbimon
Copy link
Author

Can you reproduce the same error?

@edvald
Copy link
Collaborator

edvald commented Jun 17, 2019

Yep, I figured this out. It's a case that I hadn't seen anywhere before: Their Helm chart includes a List resource, which is a valid but unusual way of submitting multiple resources to a cluster. We need to explicitly handle that on our side, so I'm working on that for our next RC.

@edvald
Copy link
Collaborator

edvald commented Jun 17, 2019

You can actually work around this in the meantime by setting usageMetric.enabled to false, since this only applies when that value is set to true.

@timonbimon
Copy link
Author

Ok, great, that works!

One more question that now just concerns my usage of garden:

I need to pass the name of the docker image built by ml-service-image to the seldon-single-model chart.

module:
  description: Seldon service for deploying a single model
  name: seldon-ml-service
  type: helm
  repo: https://storage.googleapis.com/seldon-charts
  chart: seldon-single-model
  dependencies: [seldon-core-operator]
  values:
    model:
      image:
        name: ${modules.ml-service-image.version}

The above is wrong since version just gives the tag of the docker image if I understand correctly. How do I get the full name of the image including the tag? I tried having a look around the docs (and found this: https://docs.garden.io/reference/template-strings) but couldn't find how to do it.

Thanks a lot for your patience and sticking through this with me :)

@edvald
Copy link
Collaborator

edvald commented Jun 18, 2019

Of course, we're most happy to help :)

You've actually caught a weak spot in our documentation there, we should explain much better what's available in template strings.

What you want is something like this:

module:
  description: Seldon service for deploying a single model
  name: seldon-ml-service
  type: helm
  repo: https://storage.googleapis.com/seldon-charts
  chart: seldon-single-model
  dependencies: [seldon-core-operator]
  values:
    model:
      image:
        name: ${modules.ml-service-image.outputs.deployment-image-name}:${modules.ml-service-image.version}

That's a bit of a handful, but basically this compiles a string with both the deployment image name and the module version as the tag. This is shown in a couple of Helm project examples but not properly documented yet, so no wonder you missed that. Hope that does the trick!

@timonbimon
Copy link
Author

Great, that does the trick!
I'll close this issue for now and reopen if I run into any other issues. :)

As an aside: are you interested in me making a PR if I figure out the seldon setup to add it to your examples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants