fix: use ENTRYPOINT instead of CMD for /start.sh in base image#112
fix: use ENTRYPOINT instead of CMD for /start.sh in base image#112
Conversation
NVIDIA CUDA base images set ENTRYPOINT to /opt/nvidia/nvidia_entrypoint.sh, which enforces a strict driver version check before running anything. When the host driver doesn't support the container's CUDA version (e.g. driver 550 only supports CUDA 12.4, but the image has CUDA 12.8.1), the entrypoint exits immediately and the container never reaches /start.sh. Changing CMD to ENTRYPOINT overrides the inherited NVIDIA entrypoint, same as the ComfyUI image already does. This lets the container boot on any machine — if CUDA operations fail at runtime due to driver mismatch, the user gets a clear error instead of an undebuggable boot loop. Made-with: Cursor
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a6934ddd74
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| RUN echo 'echo -e "\nFor detailed documentation and guides, please visit:\n\033[1;34mhttps://docs.runpod.io/\033[0m and \033[1;34mhttps://blog.runpod.io/\033[0m\n\n"' >> /root/.bashrc | ||
|
|
||
| CMD ["/start.sh"] | ||
| ENTRYPOINT ["/start.sh"] |
There was a problem hiding this comment.
Keep CMD overrides working in derived images
Changing the base image to ENTRYPOINT ["/start.sh"] hard-codes /start.sh as the executable for all child images. In Docker, a child CMD is then passed as arguments to that entrypoint, so a downstream image like FROM runpod/base ... CMD ["python","app.py"] will execute /start.sh python app.py instead of starting python directly. Since container-template/start.sh does not forward "$@" and always ends in sleep infinity, those child-image commands will never run, which is a backward-incompatible regression for users extending this base image.
Useful? React with 👍 / 👎.
|
Closing — the reviewer is right, this breaks child images that override CMD. Moving the fix to the autoresearch Dockerfile only. |
Summary
CMD ["/start.sh"]toENTRYPOINT ["/start.sh"]in the base Dockerfile/opt/nvidia/nvidia_entrypoint.sh) which enforces a strict driver version checknvidia/cuda:12.8.1+crash instantly on RunPod machines with driver 550 (supports only CUDA 12.4) — the NVIDIA entrypoint exits before/start.shever runsENTRYPOINT ["/start.sh"])Context
The autoresearch template (
x7o8gn1p4f) has been broken since launch — pods enter an undebuggable boot loop with 0 uptime. Root cause is the NVIDIA entrypoint failing the CUDA driver compatibility check on machines with older drivers. The container dies before SSH, Jupyter, or any logging starts.Affects all templates using
runpod/basewith CUDA 12.8+ on machines with driver <570.Test plan
runpod/baseandrunpod/autoresearchimages from this branch/workspace/autoresearchis populatedtorch.cuda.is_available()returns TrueMade with Cursor