Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA inference server docs need to be updated to not use ksonnet #959

Closed
jlewi opened this issue Jul 22, 2019 · 21 comments
Closed

NVIDIA inference server docs need to be updated to not use ksonnet #959

jlewi opened this issue Jul 22, 2019 · 21 comments

Comments

@jlewi
Copy link
Contributor

@jlewi jlewi commented Jul 22, 2019

https://www.kubeflow.org/docs/components/serving/trtinferenceserver/

Docs still use ksonnet.

For serving, I think its simpler than rather than provide kustomize manifests, our docs just contain example YAML specs that people can copy and paste in order to create a deployment.

@pdmack Any interest in picking this up?

@issue-label-bot
Copy link

@issue-label-bot issue-label-bot bot commented Jul 22, 2019

Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.

Loading

@sarahmaddox
Copy link
Collaborator

@sarahmaddox sarahmaddox commented Aug 15, 2019

We've added a comment in the doc, saying that it still uses ksonnet and needs updating:
https://www.kubeflow.org/docs/components/serving/trtinferenceserver/#kubernetes-generation-and-deploy

It'd be good to get this updated before we archive the v0.6 branch of the docs.

@pdmack is this something you could take on?

Loading

@asispatra
Copy link

@asispatra asispatra commented Aug 29, 2019

Do this document available for kustomize ?

Loading

@svnoesis
Copy link

@svnoesis svnoesis commented Aug 29, 2019

We are trying to setup Nvidia TensorRT Inference Server Image on Kubernetes/Kubeflow. Looking at the readme instructions in the github, it talks about ksonnet.
https://github.com/kubeflow/kubeflow/tree/master/kubeflow/nvidia-inference-server#kubernetes-generation-and-deploy

Is there any updated steps for using kustomize for deploying nvidia-inference-server?
Could anyone please help ?

Loading

@jlewi
Copy link
Contributor Author

@jlewi jlewi commented Sep 8, 2019

@svnoesis Here's a sample Kubernetes spec (I generated it using the ksonnet commands) for deploying the NVIDIA inference server.

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: inference-server
    ksonnet.io/component: iscomp
  name: inference-server
  namespace: default
spec:
  ports:
  - name: http-inference-server
    port: 8000
    targetPort: 8000
  - name: grpc-inference-server
    port: 8001
    targetPort: 8001
  - name: metrics-inference-server
    port: 8002
    targetPort: 8002
  selector:
    app: inference-server
  type: LoadBalancer
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: inference-server
    ksonnet.io/component: iscomp
  name: inference-server-v1
  namespace: default
spec:
  template:
    metadata:
      labels:
        app: inference-server
        version: v1
    spec:
      containers:
      - args:
        - trtserver
        - --model-store=gs://inference-server-model-store/tf_model_store
        image: nvcr.io/nvidia/inferenceserver:18.08.1-py2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          httpGet:
            path: /api/health/live
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
        name: inference-server
        ports:
        - containerPort: 8000
        - containerPort: 8001
        - containerPort: 8002
        readinessProbe:
          httpGet:
            path: /api/health/ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          limits:
            nvidia.com/gpu: 1
        securityContext:
          fsGroup: 1000
          runAsUser: 1000
      imagePullSecrets:
      - name: ngc

You should just modify that spec as needed to make it work for your use case; i.e change model-store to point to your model.

It would be great if someone would open up a PR updating the docs.
https://www.kubeflow.org/docs/components/serving/trtinferenceserver/

Loading

@jtfogarty
Copy link

@jtfogarty jtfogarty commented Jan 5, 2020

/area docs
/kind feature

Loading

@jtfogarty jtfogarty moved this from To Do to Assigned to Area Owner For Triage in Needs Triage Jan 6, 2020
@jlewi jlewi added this to To do in KF1.0 via automation Jan 6, 2020
@jlewi
Copy link
Contributor Author

@jlewi jlewi commented Jan 6, 2020

Bump to P0 we should get the doc updated for KF 1.0

Loading

@kubeflow-bot kubeflow-bot removed this from Assigned to Area Owner For Triage in Needs Triage Jan 13, 2020
@sarahmaddox
Copy link
Collaborator

@sarahmaddox sarahmaddox commented Jan 29, 2020

/assign @pdmack

@pdmack I'm assigning this to you, based on an earlier Slack conversation. (The issue includes an example of a YAML spec, but that may be out of date by now.) This issue is a P0 for Kubeflow v1.0. Let me know if you can't address it. In that case, I'll remove the content of the page (leaving the page in place) and say that NVIDIA Inference Server is not supported beyond Kubeflow v0.6.

Loading

@mapsacosta
Copy link

@mapsacosta mapsacosta commented Jan 31, 2020

Any progress on this? Can someone provide hints on how to deploy this through a non-deprecated tool, namely, kustomize? This is a critical step for us.

Loading

@jlewi
Copy link
Contributor Author

@jlewi jlewi commented Feb 3, 2020

@mapsacosta see my comment above
#959 (comment)

You can just create K8s YAML specs to deploy the NVIDIA inference server.

Loading

@sarahmaddox
Copy link
Collaborator

@sarahmaddox sarahmaddox commented Feb 4, 2020

/unassign @pdmack

Unassigning this issue in case someone else can pick it up.

Loading

@hefedev
Copy link
Contributor

@hefedev hefedev commented Feb 5, 2020

Loading

@jlewi
Copy link
Contributor Author

@jlewi jlewi commented Feb 7, 2020

@sarahmaddox I think we can close this after #1608 is merged. I don't think there is any more immediate work.

Loading

@sarahmaddox
Copy link
Collaborator

@sarahmaddox sarahmaddox commented Feb 7, 2020

Thanks @jlewi Agreed that there's no immediate work required. But I think it would be good to add more info to this page if we can. I've labeled this issue for the doc sprint, in case anyone wants to pick it up then.

For people interested in picking up this issue: There's useful info in the comments on the issue. In addition, you may find useful content (though out of date) in the Kubeflow v0.6 docs.

Loading

@hefedev
Copy link
Contributor

@hefedev hefedev commented Feb 7, 2020

This is what you need to get this working with yaml spec posted above, I'd wait for kustomize build since it offers better integration with kubeflow

export NVIDIA_API_KEY="your_nvidia_api_key"
export NVIDIA_API_EMAIL="your_nvidia_api_email"
export NVIDIA_IMAGE_TAG="18.08.1-py2"
export NVIDIA_EXAMPLE_MODEL_URL="github.com/NVIDIA/tensorrt-inference-server"

kubectl create secret \
    docker-registry ngc \
    --docker-server=nvcr.io \
    --docker-username="\$oauthtoken" \
    --docker-password=$NVIDIA_API_KEY \
    --docker-email=$NVIDIA_API_EMAIL

Loading

@sarahmaddox
Copy link
Collaborator

@sarahmaddox sarahmaddox commented Mar 30, 2020

@deadeyegoodwin Do you have bandwidth to update the Triton Inference Server docs to include the info provided in the comments on this issue?

If you don't have bandwidth, I'll close this issue now, since the docs no longer reference ksonnet. (They just link to the NVIDIA docs without providing additional info.)

Loading

@mengdong
Copy link

@mengdong mengdong commented Mar 30, 2020

@sarahmaddox would you be more specific on what you request for Triton Inference Server docs? Thanks. Would the helm chart the sufficient to you or you would prefer a different example?

Loading

@sarahmaddox
Copy link
Collaborator

@sarahmaddox sarahmaddox commented Mar 31, 2020

@mengdong I'd be happy for someone from NVIDIA to confirm that the Triton Inference Server deployment using Helm is the best one for integration with Kubeflow. If that's the case, then no further work needed. In that case, I'll close this issue.

Otherwise, if someone is able to develop and document a kustomize deployment, I'll leave this issue open. (I'd actually thought that the comments above were referring to a kustomize YAML file, but I see the YAML uses ksonnet, which Kubeflow no longer uses.)

Loading

@mengdong
Copy link

@mengdong mengdong commented Mar 31, 2020

Okay, thanks for clarifying. I am from NVIDIA. We can document a basic kustomize deployment for Triton Inference server. I will keep you posted

Loading

@sarahmaddox
Copy link
Collaborator

@sarahmaddox sarahmaddox commented Mar 31, 2020

Thanks, that's excellent news.

Loading

@stale
Copy link

@stale stale bot commented Jun 30, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Loading

@stale stale bot closed this Jul 7, 2020
ksonnet-turndown automation moved this from To do to Done Jul 7, 2020
doc-sprint automation moved this from Sprint backlog to Done Jul 7, 2020
KF1.0 automation moved this from Docs to Done Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
0.6.0
  
New
KF1.0
  
Done
doc-sprint
  
Done
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants