Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triton bert predictor - can not find config.pbtxt #1075

Closed
mokpolar opened this issue Sep 3, 2020 · 6 comments
Closed

triton bert predictor - can not find config.pbtxt #1075

mokpolar opened this issue Sep 3, 2020 · 6 comments

Comments

@mokpolar
Copy link

mokpolar commented Sep 3, 2020

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
I'm trying to deploy the Triton Bert model on the sample.
By the way, the transformer was deployed, but I found that the predictor had fallen into a crashloopbackoff state.

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

kubectl -n kfserving-test logs bert-large-predictor-default-rtjz2-deployment-f4cc44d4c-zspww -c kfserving-container
...
2020-09-03 02:41:47.565925: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
I0903 02:41:47.612648 1 metrics.cc:164] found 1 GPUs supporting NVML metrics
I0903 02:41:47.618143 1 metrics.cc:173]   GPU 0: Tesla V100-SXM2-16GB
I0903 02:41:47.618395 1 server.cc:120] Initializing Triton Inference Server
E0903 02:41:47.836537 1 model_repository_manager.cc:1519] failed to open text file for read /mnt/models/1/config.pbtxt: No such file or directory
error: creating server: INTERNAL - failed to load all models
kubectl -n kfserving-test logs bert-large-predictor-default-rtjz2-deployment-f4cc44d4c-zspww -c storage-initializer
[I 200903 02:35:12 initializer-entrypoint:13] Initializing, args: src_uri [gs://kfserving-samples/models/triton/bert] dest_path[ [/mnt/models]
[I 200903 02:35:12 storage:35] Copying contents of gs://kfserving-samples/models/triton/bert to local
[I 200903 02:35:12 storage:111] Downloading: /mnt/models/1/model.savedmodel/saved_model.pb
[I 200903 02:35:12 storage:111] Downloading: /mnt/models/1/model.savedmodel/variables/variables.data-00000-of-00001
[I 200903 02:35:42 storage:111] Downloading: /mnt/models/1/model.savedmodel/variables/variables.index
[I 200903 02:35:42 storage:111] Downloading: /mnt/models/config.pbtxt
[I 200903 02:35:42 storage:60] Successfully copied gs://kfserving-samples/models/triton/bert to /mnt/models

The predictor looks for config.pbtxt in /mnt/models/1/.
Is it correct to download to /mnt/models/?

Environment:

  • Istio Version:
  • Knative Version:
  • KFServing Version: 0.4.0
  • Kubeflow version:
  • Kfdef:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
  • Minikube version:
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.6-beta.0", GitCommit:"e7f962ba86f4ce7033828210ca3556393c377bcc", GitTreeState:"clean", BuildDate:"2020-01-15T08:26:26Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-eks-4c6976", GitCommit:"4c6976793196d70bc5cd29d56ce5440c9473648e", GitTreeState:"clean", BuildDate:"2020-07-17T18:46:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • OS (e.g. from /etc/os-release):
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/inference 0.78

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

1 similar comment
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/inference 0.78

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@yuzisun
Copy link
Member

yuzisun commented Sep 3, 2020

@mokpolar sorry uploaded the bert model on the wrong level !
gs://kfserving-samples/models/triton/bert
|_ config.pbtxt
|_ 1/

it should be
gs://kfserving-samples/models/triton/bert
|_ bert_tf_v2_large_fp16_128_v2
--|_ config.pbtxt
--|_ 1/

@mokpolar
Copy link
Author

mokpolar commented Sep 3, 2020

@mokpolar sorry uploaded the bert model on the wrong level !
gs://kfserving-samples/models/triton/bert
|_ config.pbtxt
|_ 1/

it should be
gs://kfserving-samples/models/triton/bert
|_ bert_tf_v2_large_fp16_128_v2
--|_ config.pbtxt
--|_ 1/

I uploaded the model to pvc and changed the directories according to the path you told me.
The pod has been deployed, thank you!

kubectl get pod -n kfserving-test
NAME                                                              READY   STATUS    RESTARTS   AGE
bert-large-predictor-default-5rjvx-deployment-76d5f587-5xd8v      2/2     Running   0          84s
bert-large-transformer-default-mw8mm-deployment-6c4f66c9c7ml9qs   2/2     Running   0          83s

@yuzisun
Copy link
Member

yuzisun commented Oct 9, 2020

we lost the permission to kfserving-samples so I have uploaded the model to the new place kfserving-examples and updated the storage uri on the doc.

@yuzisun yuzisun closed this as completed Oct 9, 2020
@gandharv-kapoor
Copy link

@mokpolar sorry uploaded the bert model on the wrong level !
gs://kfserving-samples/models/triton/bert
|_ config.pbtxt
|_ 1/
it should be
gs://kfserving-samples/models/triton/bert
|_ bert_tf_v2_large_fp16_128_v2
--|_ config.pbtxt
--|_ 1/

I uploaded the model to pvc and changed the directories according to the path you told me. The pod has been deployed, thank you!

kubectl get pod -n kfserving-test
NAME                                                              READY   STATUS    RESTARTS   AGE
bert-large-predictor-default-5rjvx-deployment-76d5f587-5xd8v      2/2     Running   0          84s
bert-large-transformer-default-mw8mm-deployment-6c4f66c9c7ml9qs   2/2     Running   0          83s

@mokpolar
can you show your pvc setup and how to upload on pvc? sorry I am new to this.

@yuzisun I have uploaded the model in similar directory hierarchy in our artifactory https://artifactory.docusignhq.com/artifactory/docusign-public/agreementintelligence/docker/test/models/

gs://kfserving-samples/models/triton/bert
|_ sequence_transformer
--|_ config.pbtxt
--|_ 1/
----|_model.onxx

but I am seeing following error

DSA027907:transformer gandharv.kapoor$ kubectl logs sequence-transformer-transformer-default-00001-deployment-wj9r5 -n kserve-test -c storage-initializer
/usr/local/lib/python3.7/site-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
  "update your install command.", FutureWarning)
[I 211023 00:43:21 initializer-entrypoint:13] Initializing, args: src_uri [https://artifactory.docusignhq.com/artifactory/docusign-public/agreementintelligence/docker/test/models] dest_path[ [/mnt/models]
[I 211023 00:43:21 storage:52] Copying contents of https://artifactory.docusignhq.com/artifactory/docusign-public/agreementintelligence/docker/test/models to local
Traceback (most recent call last):
  File "/storage-initializer/scripts/initializer-entrypoint", line 14, in <module>
    kserve.Storage.download(src_uri, dest_path)
  File "/usr/local/lib/python3.7/site-packages/kserve/storage.py", line 75, in download
    return Storage._download_from_uri(uri, out_dir)
  File "/usr/local/lib/python3.7/site-packages/kserve/storage.py", line 308, in _download_from_uri
    % uri)
RuntimeError: URI: https://artifactory.docusignhq.com/artifactory/docusign-public/agreementintelligence/docker/test/models did not respond with 'Content-Type': 'application/octet-stream'
DSA027907:transformer gandharv.kapoor$ 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants