Skip to content

fix: route Docker API calls through socket proxy instead of raw socket mount #131

@rorybyrne

Description

@rorybyrne

Problem

When OSA is deployed using the published ghcr.io/opensciencearchive/osa image (which ends in USER appuser), spawning ingester/validator containers fails:

[i.o.ingester_runner] Pulling ingester image: osa-hooks-ingesters/cultivarium:latest
[infra.event.worker]  Cannot connect to Docker Engine via unix:///run/docker.sock ssl:default [Permission denied]

The host's /var/run/docker.sock is root:docker 0660. appuser isn't in a group with access, so the call fails with EACCES.

This is invisible in up-dev and the pockets pilot because both build with target: builder — an earlier multistage layer that hasn't reached the USER appuser directive yet, so the container runs as root and socket access works.

Why not just fix the user/group

A GID-detection entrypoint script would work (detect the socket's group, alias a group to that GID, add appuser to it, drop privileges). But it papers over a real problem: OSA's container has direct access to the host's Docker daemon, which is functionally equivalent to host root. The "fix" would be making a non-root user inside the container hold a root-equivalent capability.

Proposed solution: socket proxy

Add a docker-socket-proxy service (e.g. tecnativa/docker-socket-proxy) to the default deploy/docker-compose.yml. OSA talks to the proxy over TCP (DOCKER_HOST=tcp://docker-socket-proxy:2375) instead of mounting the host socket.

Benefits:

  • Fixes the permission bug — OSA never touches the socket file, so GID issues disappear.
  • Closes the dev/prod divergence — dev uses the proxy too, so "only works because dev is root" stops being a class of bug.
  • Documents the Docker API surface OSA actually needs (auditable, mirrors the K8s runner permission model).
  • Self-host-friendly — still a single docker-compose up, just one extra small service. No K8s required.

Implementation sketch

  1. Add to deploy/docker-compose.yml:
    docker-socket-proxy:
      image: tecnativa/docker-socket-proxy:latest
      restart: unless-stopped
      environment:
        CONTAINERS: 1
        IMAGES: 1
        POST: 1
        VOLUMES: 1   # if ingesters mount volumes
      volumes:
        - /var/run/docker.sock:/var/run/docker.sock:ro
      networks: [osa-internal]
  2. Set DOCKER_HOST=tcp://docker-socket-proxy:2375 in OSA's service env.
  3. Remove the host socket mount from the OSA service.
  4. Verify aiodocker in the Docker runner respects DOCKER_HOST (it does by default).
  5. Apply the same change to deploy/docker-compose.dev.yml to close the dev/prod gap.

Out of scope

  • The Kubernetes runner — already a separate path, no changes needed.
  • Rootless Docker / Podman support — interesting future work, not blocking.
  • The target: builder shortcut in dev — should be revisited separately; ideally dev also ends on USER appuser so prod-only bugs stop existing.

Acceptance

  • OSA image runs as appuser and successfully spawns ingester containers via the proxy.
  • Default docker-compose up works end-to-end with no extra host setup.
  • Dev mode (docker-compose.dev.yml) uses the same proxy path.
  • Proxy env vars document the minimum Docker API surface OSA uses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghigh-prioritySignificant bug or blocks feature workinfrastructureCI, Docker, deployment, migrationssecuritySecurity-related issues

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions