Corrupted files on linux with many concurrent write calls #544

Closed
bobrik opened this Issue Sep 4, 2012 · 17 comments

Projects

None yet

4 participants

@bobrik
bobrik commented Sep 4, 2012

https://gist.github.com/3617558 - this gist is test case + sample strace output. WTF may not appear after every run. Changing reader() to setTimeout(reader, 1000) does not change anything.

All pread syscalls come after last pwrite syscall. This only happens if many writes fired at once. Writing from time to time is ok.

OS X is not affected (i have ssd so this may be just too fast), Linux is affected (spinning disks at least).

cc @bnoordhuis @indutny

@indutny
Contributor
indutny commented Sep 4, 2012

And also, not sure why, but if you'll initialize those buffers - error will disappear.

@bobrik
bobrik commented Sep 4, 2012

From irc with @indutny: https://gist.github.com/3618478 100 writes < 4096 bytes, all pread after last pwrite — same results.

@piscisaureus
Member

So this is all on BSDs?

@bobrik
bobrik commented Sep 4, 2012

I see this on Linux. I may ask for FreeBSD machine to check if it matters.

@bnoordhuis
Contributor

Maybe I'm missing something. You're doing lots of concurrent overlapping writes, of course you're going to get corrupted results.

@indutny
Contributor
indutny commented Sep 4, 2012

Writes are not overlapping.

@bobrik
bobrik commented Sep 4, 2012

@bnoordhuis they do not overlap

@bnoordhuis
Contributor

You're doing concurrent writes at the end of the file, right? Let's say thread A writes 100 bytes at position 100, thread B writes 100 bytes at position 200 and they're doing it concurrently. On most operating systems it's undetermined what actually happens. If A comes before B or vice versa, you'll probably get the right results. If A and B overlap in time, non-determinate things happen.

@bnoordhuis
Contributor

By the way, the reason that you're not seeing this on OS X is that all writes get serialized through a big mutex. It was considered necessary because pwrite() comes with even less guarantees on that platform.

@bobrik
bobrik commented Sep 4, 2012

Yep, platforms without pwrite will work because of libeio emulation with mutex :)

@bnoordhuis
Contributor

That's not quite what I mean. We use pwrite() on OS X but all calls are guarded by a single Big Lock.

@bnoordhuis bnoordhuis closed this Sep 4, 2012
@bobrik
bobrik commented Sep 4, 2012

https://gist.github.com/3620592 here we go again. sequential writes (no overlapping), but file is corrupted from time to time.

@piscisaureus piscisaureus reopened this Sep 4, 2012
@piscisaureus
Member

@bnoordhuis looks valid, I can reproduce this on ubuntu.

@bobrik
bobrik commented Sep 4, 2012

@piscisaureus there's a bug in this code, closing.

@bobrik bobrik closed this Sep 4, 2012
@bobrik
bobrik commented Sep 4, 2012

Final update: this was "a+" behaviour on linux. Gist to prove that: https://gist.github.com/3621892

Maybe it should be documented somehow, because it's kind of unexpected thing to see.

Waiting for final @bnoordhuis words to close this mess.

@bobrik bobrik reopened this Sep 4, 2012
@bnoordhuis
Contributor

Waiting for final @bnoordhuis words to close this mess.

Hmm... go in peace, my son?

I welcome documentation patches, by the way. :-)

@bnoordhuis bnoordhuis closed this Sep 5, 2012
@Mithgol Mithgol referenced this issue in nodejs/node-v0.x-archive Sep 7, 2012
Closed

Added append open mode note #3972

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment