-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First mlflow-sklearn-e2e workflow run stuck for several minutes in the build step #134
Comments
More context: this is a section of the tekton operator log, around the time the workflow is stuck. It indicates the TaskRun is created properly, but nothing happens after that (i.e. no reconciliation to implement the TaskRun):
Then much later, the Tekton operator "gets unstuck" and continues, but it does print out some strange error about AWS credentials just before it resumes:
|
I have taken a look on this and this seems to be an issue with GCR, the registry where the kaniko image is pulled from. According to the Tekton container contract when there is no |
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint. Also updates the kaniko task image to v1.6.0
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint. Also updates the kaniko task image to v1.6.0
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint.
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint.
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint.
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint.
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint.
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint.
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint.
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint. Also updates the kaniko task image to v1.6.0
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint. Also updates the kaniko task image to v1.6.0
This change fixes: fuseml#134 by calling kaniko through `command` instead of its entrypoint. Also updates the kaniko task image to v1.6.0
I'm running the MLFlow sklearn e2e workflow example, as documented, and the first workflow run always gets stuck in the build step and remains in a pending state for several minutes.
Here's some relevant output from the k8s cluster:
You can see from the pod list the builder-prep step has completed, but the builder (kaniko) step hasn't started yet.
The text was updated successfully, but these errors were encountered: