Skip to content

ECR image_id pull fails with command-substituted SKYPILOT_DOCKER_PASSWORD due to dependency timing (Blocks PR #4871 use case) #5677

@andylizf

Description

@andylizf

The functionality introduced in PR #4871, which supports command substitution in SKYPILOT_DOCKER_PASSWORD (e.g., $(aws ecr get-login-password ...)), cannot be effectively used when resources.image_id points to an ECR image. The initial docker pull of this ECR image on the provisioned VM fails due to a dependency timing conflict.

Problem Details:

  1. SkyPilot provisions a base VM.
  2. When PR Support cli var substitution in docker login command env #4871's feature is used, SkyPilot attempts docker login by executing the command in SKYPILOT_DOCKER_PASSWORD, followed by docker pull for the specified ECR image_id. This happens on the newly provisioned VM.
  3. The critical issue is that SkyPilot's standard procedure for copying necessary user credentials, files, or tools (e.g., ~/.aws/credentials, the AWS CLI itself if not pre-installed, or custom scripts which the password command might be) typically happens in a later provisioning stage.
  4. Consequently, the command in SKYPILOT_DOCKER_PASSWORD executes before its required dependencies are available on the VM. This leads to command failure, docker login failure, and ultimately, the ECR image_id pull fails.

Impact on PR #4871:
This timing conflict is a significant blocker for using PR #4871's command substitution feature for a primary use case: dynamically authenticating to ECR (or similar registries) when the ECR image itself is the runtime environment specified via image_id. While the substitution mechanism itself might be in place, it fails in practice for image_id scenarios due to this provisioning order.

Expected Behavior/Resolution Path:
To enable PR #4871 for ECR image_id use cases, the dependencies (tools, configs, scripts) required by the SKYPILOT_DOCKER_PASSWORD command must be present and configured on the VM at the moment docker login is attempted for the initial image_id pull.

Potential solutions could involve:

  • Modifying SkyPilot's provisioning sequence to ensure these specific dependencies are deployed to the VM before the docker login for image_id is attempted.
  • Introducing a dedicated pre-flight mechanism for staging just these critical Docker authentication dependencies.

Addressing this is essential for PR #4871 to successfully support dynamic ECR authentication for containerized runtime environments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions