LTP rwtest04: mmap-write + msync shows partial data visibility (page tear) on immediate verify-read

、

### Environment
- Workload: LTP `fs/rwtest04`
- Reproducer:
  - Command: `/opt/ltp/runltp -Q -b /root/lin/mnt002 -f fs -s rwtest04 -d /root/lin/mnt002`
  - File: `/root/lin/mnt002/ltp-*/mm-sync-<pid>`
- Client: FUSE-based file I/O (verify uses O_RDONLY)
- Observation time: 2025-11-12 17:12:45.xxx

### Expected behavior
After `memcpy` to a `MAP_SHARED` mapping and `msync(..., MS_SYNC)` returns 0, a verification read of the same range should observe fully updated data.

### Actual behavior
Immediate verify-read returns a mix of old and new data for the same range (data comparison error). Logs show:
- Old 12KB content was persisted just before the mmap `memcpy`
- The subsequent `msync` appears to have flushed only the first 4KB page
- The verify-read hits cached data prepared earlier by a window-aligned backend read, so its syscall time is very short and “doesn’t align” with the backend request timing

### User-space timeline (doio)
- mmap-write memcpy (offset 4442856, len 9927):

```2983:2990:doio.c
doio_debug_log("mmap-%s memcpy done: fd=%d offset=%d copied=%d memaddr=0x%lx duration_ns=%lld\n",
               (sysc->sy_flags & SY_WRITE) ? "write" : "read",
               fd, req->r_data.io.r_offset, req->r_data.io.r_nbytes,
               (unsigned long)memaddr, ...);
```

```
[2025-11-12 17:12:45.977696142] mmap-write memcpy done: fd=3 offset=4442856 copied=9927 memaddr=0x7f6663c3cae8 duration_ns=9430449
```

- msync start and completion:

```2996:3009:doio.c
doio_debug_log("msync: fd=%d offset=%d base=0x%lx len=%d flags=MS_SYNC\n",
               fd, req->r_data.io.r_offset, (unsigned long)fdc->c_memaddr,
               (int)sbuf.st_size);
msret = msync(fdc->c_memaddr, (int)sbuf.st_size, MS_SYNC);
doio_debug_log("msync: fd=%d offset=%d ret=%d errno=%d duration_ns=%lld\n", ...);
```

```
[2025-11-12 17:12:45.977715758] msync: fd=3 offset=4442856 base=0x7f6663800000 len=12800000 flags=MS_SYNC
[2025-11-12 17:12:45.982667206] msync: fd=3 offset=4442856 ret=0 errno=0 duration_ns=4937949
```

- verify-read (independent O_RDONLY fd) right after msync:

```3845:3886:doio.c
doio_debug_log("verify: fd=%d file=%s offset=%d len=%d\n", fd, file, offset, length);
doio_debug_log("verify-lseek: fd=%d offset=%d ret=%d errno=%d duration_ns=%lld\n", ...);
doio_debug_log("verify-read: fd=%d offset=%d buf=0x%lx len=%d ret=%d errno=%d duration_ns=%lld\n", ...);
```

```
[2025-11-12 17:12:45.982712405] verify: fd=4 file=/root/lin/mnt002/ltp-np5plAMCXo/mm-sync-107502 offset=4442856 len=9927
[2025-11-12 17:12:45.982726786] verify-lseek: fd=4 offset=4442856 ret=4442856 errno=0 duration_ns=1226
[2025-11-12 17:12:45.982805728] verify-read: fd=4 offset=4442856 buf=0x1a85c70 len=9927 ret=9927 errno=0 duration_ns=65796
```

- Data comparison error (mixed old/new):

```
*** DATA COMPARISON ERROR ***
check_file(..., 4442856, 9927, I:107540:ip-10-52-148-199:doio*, 31, 0) failed

Corrupt regions follow - unprintable chars are represented as '.'
-----------------------------------------------------------------
corrupt bytes starting at file offset 4442856
    1st 32 expected bytes:  I:107540:ip-10-52-148-199:doio*I
    1st 32 actual bytes:    ........................9:doio*I
```

### Server/FUSE-side timeline (backend I/O)
- Prior to `memcpy`, a 12KB write of old data persisted:

```
2025/11/12 17:12:45.968617 [DEBUG] file.go:425: TRACE Write enter: ino(8508829) ... offset(4440064) len(12288) flags(WriteCache) ...
2025/11/12 17:12:45.977674 [DEBUG] file.go:532: TRACE Write: ino(8508829) offset(4440064) len(12288) ... (8999822)ns
```

- Read path uses window-aligned requests (128KB), may start earlier than user offset; hole handling logs:

```
2025/11/12 17:12:45.968765 [DEBUG] file.go:357: TRACE Read enter: ... offset(4395008) reqsize(131072) ... ffl=OpenReadOnly+OpenSync)
2025/11/12 17:12:45.982158 [DEBUG] stream_reader.go:189: ... req FileOffset(4395008) Size(45056) ExtentKey(<nil>)
2025/11/12 17:12:45.982179 [DEBUG] stream_reader.go:206: Stream read hole: ino(8508829) req(FileOffset(4395008) Size(45056) ExtentKey(<nil>)) total(45056)
```

- The 12KB region [4440064, 4452352) is delivered before verify-read:

```
2025/11/12 17:12:45.982426 [DEBUG] extent_reader.go:114: ExtentReader Read exit: req(FileOffset(4440064) Size(12288) ...) ... readBytes(12288) err(<nil>)
```

- Only a single 4KB overwrite (new data) is observed right after:

```
2025/11/12 17:12:45.982545 [DEBUG] stream_conn.go:228: sendToConn exit: ... Op(OpRandomWrite) ... FileOffset(4440064) Size(4096) ...
2025/11/12 17:12:45.982569 [DEBUG] stream_writer.go:436: Streamer write exit: ... offset(4440064) size(4096) done total(4096) err(<nil>)
```

- A separate example of verify-read at higher offset still triggering an earlier aligned window:

```
[2025-11-12 17:12:45.968535747] verify-read: fd=4 offset=5072258 ... len=124935 ret=124935 ... duration_ns=1880012
2025/11/12 17:12:45.968765 [DEBUG] file.go:357: TRACE Read enter: ... offset(4395008) reqsize(131072) ...
```

### Why this looks like a “page tear”
- 4KB page size. User-space mmap write: S=4442856, L=9927, end inclusive E=S+L-1=4452782.
- Spans 4 pages:
  - P0 [4440064, 4444160) → tail 1304B
  - P1 [4444160, 4448256) → full 4096B
  - P2 [4448256, 4452352) → full 4096B
  - P3 [4452352, 4456448) → head 431B
- Before `memcpy`, old data for [4440064, 4452352) (P0–P2) was persisted.
- The `msync` window that followed only resulted in a single 4KB overwrite (P0), while P1/P2/P3 were not updated in that window.
- The immediate verify-read then observed “new P0 tail + old P1/P2 + old P3 head”, matching the compare output where a long run of '.' (old) is followed by the new tail “9:doio*I”.

### Why verify-read time “doesn’t align” with backend time
- `check_file()` uses a separate O_RDONLY fd and does a single lseek+read (not reading the writer’s mmap). Its short syscall time reflects copying data from cache.
- The backend read had already populated the needed 12KB into cache before verify-read was issued, due to window-aligned prefetch/extent reads.

### Impact
- Immediate data verification after `msync(MS_SYNC)` can read partially updated content (mixed old/new) for the same range, causing false data corruption failures in mmap workloads.

### What might be happening
- Race between:
  - Background/streamed writeback of old content (already completed for P0–P2 before `memcpy`)
  - The subsequent `msync` only resulting in one 4KB flush (observed)
- Page-granular flushing and request coalescing lead to only a subset of dirty pages being persisted in that specific window, making an immediate read see a torn range.

### Requests for maintainers
- Confirm semantics: After `msync(..., MS_SYNC)` on MAP_SHARED, should CubeFS guarantee the full dirty range is persisted atomically to the backend before returning?
- Investigate:
  - Ensuring `msync`-triggered flush covers the entire dirty range intersected, not just a single page
  - Coordination between FUSE writeback and mmap `msync` to avoid persisting stale data that races with a newer mmap write
  - Read path’s window alignment is expected, but verify that it cannot cache stale data over freshly dirtied pages post-msync
- If full atomicity is not guaranteed, document recommended barriers/flags for mmap writers to ensure strong persistence before verification.

### Possible workarounds (for the test)
- Delay verification slightly after `msync` to allow remaining pages to flush
- Verify on page-aligned, page-sized boundaries or use `fsync` on the fd after `msync`
- Avoid immediate verification of sub-page tails/heads that straddle pages



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LTP rwtest04: mmap-write + msync shows partial data visibility (page tear) on immediate verify-read #1268

Environment

Expected behavior

Actual behavior

User-space timeline (doio)

Server/FUSE-side timeline (backend I/O)

Why this looks like a “page tear”

Why verify-read time “doesn’t align” with backend time

Impact

What might be happening

Requests for maintainers

Possible workarounds (for the test)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LTP rwtest04: mmap-write + msync shows partial data visibility (page tear) on immediate verify-read #1268

Description

Environment

Expected behavior

Actual behavior

User-space timeline (doio)

Server/FUSE-side timeline (backend I/O)

Why this looks like a “page tear”

Why verify-read time “doesn’t align” with backend time

Impact

What might be happening

Requests for maintainers

Possible workarounds (for the test)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions