Skip to content

fix: use ENTRYPOINT instead of CMD for /start.sh in base image#112

Closed
max4c wants to merge 1 commit intomainfrom
fix/base-entrypoint
Closed

fix: use ENTRYPOINT instead of CMD for /start.sh in base image#112
max4c wants to merge 1 commit intomainfrom
fix/base-entrypoint

Conversation

@max4c
Copy link
Contributor

@max4c max4c commented Mar 18, 2026

Summary

  • Changes CMD ["/start.sh"] to ENTRYPOINT ["/start.sh"] in the base Dockerfile
  • This overrides the inherited NVIDIA CUDA entrypoint (/opt/nvidia/nvidia_entrypoint.sh) which enforces a strict driver version check
  • Without this, containers built on nvidia/cuda:12.8.1+ crash instantly on RunPod machines with driver 550 (supports only CUDA 12.4) — the NVIDIA entrypoint exits before /start.sh ever runs
  • This is the same approach ComfyUI already uses (its image has ENTRYPOINT ["/start.sh"])

Context

The autoresearch template (x7o8gn1p4f) has been broken since launch — pods enter an undebuggable boot loop with 0 uptime. Root cause is the NVIDIA entrypoint failing the CUDA driver compatibility check on machines with older drivers. The container dies before SSH, Jupyter, or any logging starts.

Affects all templates using runpod/base with CUDA 12.8+ on machines with driver <570.

Test plan

  • Rebuild runpod/base and runpod/autoresearch images from this branch
  • Deploy autoresearch template on RTX 4090 (community cloud, which has mixed driver versions)
  • Verify container boots, SSH connects, /workspace/autoresearch is populated
  • Verify torch.cuda.is_available() returns True

Made with Cursor

NVIDIA CUDA base images set ENTRYPOINT to /opt/nvidia/nvidia_entrypoint.sh,
which enforces a strict driver version check before running anything. When
the host driver doesn't support the container's CUDA version (e.g. driver
550 only supports CUDA 12.4, but the image has CUDA 12.8.1), the entrypoint
exits immediately and the container never reaches /start.sh.

Changing CMD to ENTRYPOINT overrides the inherited NVIDIA entrypoint, same
as the ComfyUI image already does. This lets the container boot on any
machine — if CUDA operations fail at runtime due to driver mismatch, the
user gets a clear error instead of an undebuggable boot loop.

Made-with: Cursor
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6934ddd74

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

RUN echo 'echo -e "\nFor detailed documentation and guides, please visit:\n\033[1;34mhttps://docs.runpod.io/\033[0m and \033[1;34mhttps://blog.runpod.io/\033[0m\n\n"' >> /root/.bashrc

CMD ["/start.sh"]
ENTRYPOINT ["/start.sh"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep CMD overrides working in derived images

Changing the base image to ENTRYPOINT ["/start.sh"] hard-codes /start.sh as the executable for all child images. In Docker, a child CMD is then passed as arguments to that entrypoint, so a downstream image like FROM runpod/base ... CMD ["python","app.py"] will execute /start.sh python app.py instead of starting python directly. Since container-template/start.sh does not forward "$@" and always ends in sleep infinity, those child-image commands will never run, which is a backward-incompatible regression for users extending this base image.

Useful? React with 👍 / 👎.

@max4c
Copy link
Contributor Author

max4c commented Mar 18, 2026

Closing — the reviewer is right, this breaks child images that override CMD. Moving the fix to the autoresearch Dockerfile only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant