clone(2) silently ignores CLONE_NEW* namespace flags while clone3(2) rejects them with EINVAL

### Summary

The legacy `clone(2)` path silently ignores `CLONE_NEW*` namespace flags, while `clone3(2)` rejects the same flags with `EINVAL`. The two entry points therefore disagree on identical inputs.

Found while reviewing the OCI image support work (#31, PR #34): container runtimes and tools request namespaces through whichever `clone` variant their libc picks, so the behavior a guest sees depends on the syscall number rather than on a single, intentional policy.

### Details

`clone3(2)` explicitly rejects unsupported isolation features:

```c
// src/runtime/forkipc.c  (sys_clone3)
if (ca.flags & LINUX_CLONE_INTO_CGROUP)
    return -LINUX_EINVAL; /* cgroups not implemented */   // forkipc.c:1572
if (ca.flags & LINUX_CLONE3_NS_FLAGS)
    return -LINUX_EINVAL; /* namespaces not implemented */ // forkipc.c:1574
if (ca.set_tid_size != 0)
    return -LINUX_EINVAL; /* set_tid not implemented */    // forkipc.c:1576
```

The legacy path does not. `sc_clone` forwards the raw flags unchanged:

```c
// src/syscall/syscall.c:1543
return sys_clone(current_thread->vcpu, g, x0, x1, 0, 0, x2, x3, x4, ...);
```

and `sys_clone` (`src/runtime/forkipc.c:1061`) contains no `CLONE_NEW*` / cgroup / `set_tid` validation anywhere. The `CLONE_NEW*` bits simply fall through to the normal `posix_spawn`-based fork path.

### Impact

- `clone(CLONE_NEWPID | CLONE_NEWNS | ... )` returns a child PID and **appears to succeed**, but no isolation is set up. A caller that assumes the namespace was created proceeds on a false premise instead of getting a clean error it can handle or fall back from.
- `clone3()` with the same flags returns `EINVAL`, so a runtime that probes `clone3` first and falls back to `clone` will behave differently from one that calls `clone` directly.

### Expected

Both entry points should apply the same policy. Given the project's current "namespaces not implemented" stance, the legacy path should also reject these flags with `EINVAL`.

### Proposed fix

Mirror the `clone3` checks in `sys_clone` before the spawn path, reusing the existing mask:

```c
/* LINUX_CLONE3_NS_FLAGS is already defined for the clone3 path */
if (flags & LINUX_CLONE3_NS_FLAGS)
    return -LINUX_EINVAL; /* namespaces not implemented */
if (flags & LINUX_CLONE_INTO_CGROUP)
    return -LINUX_EINVAL; /* cgroups not implemented */
```

A regression test asserting that `clone()` and `clone3()` return the same result for each `CLONE_NEW*` flag would lock the two paths together.

Refs #31, PR #34.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clone(2) silently ignores CLONE_NEW* namespace flags while clone3(2) rejects them with EINVAL #44

Summary

Details

Impact

Expected

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

clone(2) silently ignores CLONE_NEW* namespace flags while clone3(2) rejects them with EINVAL #44

Description

Summary

Details

Impact

Expected

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions