New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption on external volumes on Linux containers hosted on Windows #37284

Open
ayende opened this Issue Jun 14, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@ayende
Copy link

ayende commented Jun 14, 2018

Description

When running a Linux Docker container using an external volume hosted on Windows, certain I/O patterns can result in losing your writes.

Full repro and code below.

In particular:

  • Pre-allocate a file of 256MB
  • Start writing to that file using O_DIRECT | O_DSYNC in 4KB intervals
  • Using another file description on the same process, read from this file.

If the sequence of events goes like this:

  • pread(rfd, readBuffer, 4096, 257 * 4096);

Read zeros from the file, as expected, since we didn't write to it yet.

  • pwrite(wfd, data, 4096, 257 * 4096);

Write to the actual location in the file, using O_DIRECT | O_DSYNC

  • pread(rfd, readBuffer, 4096, 257 * 4096);

Try to read again, but you'll get cached data (all zeroes) instead of the just written data.

I've a reproduction that repeats this issue consistently, without the need of threading.

Steps to reproduce the issue:

  1. Take the C code
    bad_io.c.txt and save it to your local machine.
  2. Save the following script as well:
#!/bin/sh
gcc /wrk/bad_io.c -o /wrk/bad_io.exec
/wrk/bad_io.exec /wrk/test.file
  1. Execute: docker run --rm -v PWD:/wrk gcc /wrk/setup.sh

Describe the results you received:

I'm getting: Line 130 Read corrupted data from disk: 0 but exepcted '$'

This is because we are reading the cached data.

Describe the results you expected:

I should be able to read the data I just wrote.

Additional information you deem important (e.g. issue happens only occasionally):

Reproduced this on multiple machines. I tried mounting the external volume with cache=strict and cache=none, didn't help.

This is in the context of hosting a database server on the container instance and being able to read the transaction journal for ACID purposes.

Output of docker version:

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:12:48 2018
 OS/Arch:      windows/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:22:38 2018
  OS/Arch:      linux/amd64
  Experimental: true

Output of docker info:

Containers: 5
 Running: 1
 Paused: 0
 Stopped: 4
Images: 10
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.87-linuxkit-aufs
Operating System: Docker for Windows
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.934GiB
Name: linuxkit-00155d004b14
ID: V6TR:RBJ3:2GUN:PH7A:FWKX:QMNR:N6XW:LQGS:EG4P:J77I:JSGK:SJHL
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 21
 Goroutines: 44
 System Time: 2018-06-14T14:10:19.4028802Z
 EventsListeners: 1
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

@ayende

This comment has been minimized.

Copy link

ayende commented Jun 14, 2018

Note that this behavior seems to violate POSIX:
http://pubs.opengroup.org/onlinepubs/9699919799/

After a write() to a regular file has successfully returned:

Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

At least on the same file descriptor, not sure about different file descriptors.

@ayende

This comment has been minimized.

Copy link

ayende commented Jun 14, 2018

Just double checked, and it seems that on a single fd, this works. The problem is that in the real scenario, we have two file descriptors working in tandem.

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Jun 14, 2018

This looks to be an issue specific to docker for windows, where (if I'm not mistaken) host directories are shared with the Linux VM through cifs/smb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment