fix: route Docker API calls through socket proxy instead of raw socket mount

## Problem

When OSA is deployed using the published `ghcr.io/opensciencearchive/osa` image (which ends in `USER appuser`), spawning ingester/validator containers fails:

```
[i.o.ingester_runner] Pulling ingester image: osa-hooks-ingesters/cultivarium:latest
[infra.event.worker]  Cannot connect to Docker Engine via unix:///run/docker.sock ssl:default [Permission denied]
```

The host's `/var/run/docker.sock` is `root:docker 0660`. `appuser` isn't in a group with access, so the call fails with `EACCES`.

This is invisible in `up-dev` and the pockets pilot because both build with `target: builder` — an earlier multistage layer that hasn't reached the `USER appuser` directive yet, so the container runs as root and socket access works.

## Why not just fix the user/group

A GID-detection entrypoint script would work (detect the socket's group, alias a group to that GID, add `appuser` to it, drop privileges). But it papers over a real problem: OSA's container has direct access to the host's Docker daemon, which is functionally equivalent to host root. The "fix" would be making a non-root user inside the container hold a root-equivalent capability.

## Proposed solution: socket proxy

Add a `docker-socket-proxy` service (e.g. `tecnativa/docker-socket-proxy`) to the default `deploy/docker-compose.yml`. OSA talks to the proxy over TCP (`DOCKER_HOST=tcp://docker-socket-proxy:2375`) instead of mounting the host socket.

Benefits:
- Fixes the permission bug — OSA never touches the socket file, so GID issues disappear.
- Closes the dev/prod divergence — dev uses the proxy too, so "only works because dev is root" stops being a class of bug.
- Documents the Docker API surface OSA actually needs (auditable, mirrors the K8s runner permission model).
- Self-host-friendly — still a single `docker-compose up`, just one extra small service. No K8s required.

## Implementation sketch

1. Add to `deploy/docker-compose.yml`:
    ```yaml
    docker-socket-proxy:
      image: tecnativa/docker-socket-proxy:latest
      restart: unless-stopped
      environment:
        CONTAINERS: 1
        IMAGES: 1
        POST: 1
        VOLUMES: 1   # if ingesters mount volumes
      volumes:
        - /var/run/docker.sock:/var/run/docker.sock:ro
      networks: [osa-internal]
    ```
2. Set `DOCKER_HOST=tcp://docker-socket-proxy:2375` in OSA's service env.
3. Remove the host socket mount from the OSA service.
4. Verify `aiodocker` in the Docker runner respects `DOCKER_HOST` (it does by default).
5. Apply the same change to `deploy/docker-compose.dev.yml` to close the dev/prod gap.

## Out of scope

- The Kubernetes runner — already a separate path, no changes needed.
- Rootless Docker / Podman support — interesting future work, not blocking.
- The `target: builder` shortcut in dev — should be revisited separately; ideally dev also ends on `USER appuser` so prod-only bugs stop existing.

## Acceptance

- [ ] OSA image runs as `appuser` and successfully spawns ingester containers via the proxy.
- [ ] Default `docker-compose up` works end-to-end with no extra host setup.
- [ ] Dev mode (`docker-compose.dev.yml`) uses the same proxy path.
- [ ] Proxy env vars document the minimum Docker API surface OSA uses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: route Docker API calls through socket proxy instead of raw socket mount #131

Problem

Why not just fix the user/group

Proposed solution: socket proxy

Implementation sketch

Out of scope

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fix: route Docker API calls through socket proxy instead of raw socket mount #131

Description

Problem

Why not just fix the user/group

Proposed solution: socket proxy

Implementation sketch

Out of scope

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions