Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[B] Failed to build image in azure cloud #523

Open
YevheniiSemendiak opened this issue Jun 5, 2022 · 0 comments
Open

[B] Failed to build image in azure cloud #523

YevheniiSemendiak opened this issue Jun 5, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@YevheniiSemendiak
Copy link
Collaborator

Summary

We are not able to build images on GPU presets. Previously, it was working well, but those clusters were deprovisioned, I cannot get example.

STRs:
Dockerfile:

FROM ubuntu
COPY whatever .
  1. Being on Azure-baked cluster, perform
    neuro-extras image build -s gpu-small . image:test
    Get error in Kaniko
    error building image: error building stage: failed to get filesystem from image: error removing lib to make way for new symlink: unlinkat //lib/firmware/nvidia/470.57.02/gsp.bin: device or resource busy (job-b08864fb-8f6f-43d4-af8b-b0820c1495ae`)

Expected result

Image builds.

Environment

Mandatory:
  • neuro-extras version: neuro-extras package version: 22.2.2
  • neuro CLI version: Neuro Platform Client 22.1.3
  • platform-registry-22.5.0
Auxiliary notes according the environment set-up are welcome

Observations:

  1. Kaniko tries to extract FS і fails with those files of GPU drivers. From debug logs, it seems that they should be ignored, but for some reasons, they are not.
  2. on CPU-only jobs, this cannot be reproduced.
  3. if we use gcr.io/kaniko-project/executor:v1.8.1-debug and add --ignore-path=/lib/firmware Kaniko argument - this does not resolve the problem (job-8cc643fe-7e5f-404f-9d85-b1c3b93c80cf), but if we set --ignore-path=/lib - this helps (job-c08eef7e-5705-4a03-b397-228608bba8e7), but later another problem occurs if one uses RUN command in dockerfile: error building image: error building stage: failed to execute command: starting command: fork/exec /bin/sh: no such file or directory (job-d15aab86-fdd7-4f58-b69f-8b4177724eb2)
  4. Might be related to failed to get filesystem from image: error removing lib to make way for new symlink: unlinkat GoogleContainerTools/kaniko#1745
  5. This is not reproducible in AWS-baked (default) or onprem clusters (onprem-poc)
@YevheniiSemendiak YevheniiSemendiak added the bug Something isn't working label Jun 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants